Skip to main content

Showing 1–50 of 772 results for author: Li, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Ya**g Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, **g Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  2. arXiv:2406.18935  [pdf

    eess.SY

    Generalized Averaging Method for Power Electronics Modeling from DC to above Half the Switching Frequency

    Authors: Hongchang Li, Kang** Wang, **gyang Fang, Wenjie Chen, Xu Yang

    Abstract: Modeling power electronic converters at frequencies close to or above half the switching frequency has been difficult due to the time-variant and discontinuous switching actions. This paper uses the properties of moving Fourier coefficients to develop the generalized averaging method, breaking though the limit of half the switching frequency. The paper also proposes the generalized average model f… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.16326  [pdf, other

    eess.AS

    RefXVC: Cross-Lingual Voice Conversion with Enhanced Reference Leveraging

    Authors: Mingyang Zhang, Yi Zhou, Yi Ren, Chen Zhang, Xiang Yin, Haizhou Li

    Abstract: This paper proposes RefXVC, a method for cross-lingual voice conversion (XVC) that leverages reference information to improve conversion performance. Previous XVC works generally take an average speaker embedding to condition the speaker identity, which does not account for the changing timbre of speech that occurs with different pronunciations. To address this, our method uses both global and loc… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Manuscript under review by TASLP

  4. arXiv:2406.16150  [pdf, other

    eess.IV cs.CV

    Intensity Confusion Matters: An Intensity-Distance Guided Loss for Bronchus Segmentation

    Authors: Haifan Gong, Wenhao Huang, Huan Zhang, Yu Wang, Xiang Wan, Hong Shen, Guanbin Li, Haofeng Li

    Abstract: Automatic segmentation of the bronchial tree from CT imaging is important, as it provides structural information for disease diagnosis. Despite the merits of previous automatic bronchus segmentation methods, they have paied less attention to the issue we term as \textit{Intensity Confusion}, wherein the intensity values of certain background voxels approach those of the foreground voxels within br… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: IEEE International Conference on Multimedia & Expo (ICME) 2024

  5. arXiv:2406.15945  [pdf, other

    eess.SP cs.IT

    Full-Space Wireless Sensing Enabled by Multi-Sector Intelligent Surfaces

    Authors: Yumeng Zhang, Xiaodan Shao, Hongyu Li, Bruno Clerckx, Rui Zhang

    Abstract: The multi-sector intelligent surface (IS), benefiting from a smarter wave manipulation capability, has been shown to enhance channel gain and offer full-space coverage in communications. However, the benefits of multi-sector IS in wireless sensing remain unexplored. This paper introduces the application of multi-sector IS for wireless sensing/localization. Specifically, we propose a new self-sensi… ▽ More

    Submitted 25 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: 13 pages, 9 figures

  6. arXiv:2406.15716  [pdf, other

    eess.IV cs.CV

    Predicting fluorescent labels in label-free microscopy images with pix2pix and adaptive loss in Light My Cells challenge

    Authors: Han Liu, Hao Li, Jiacheng Wang, Yubo Fan, Zhoubing Xu, Ipek Oguz

    Abstract: Fluorescence labeling is the standard approach to reveal cellular structures and other subcellular constituents for microscopy images. However, this invasive procedure may perturb or even kill the cells and the procedure itself is highly time-consuming and complex. Recently, in silico labeling has emerged as a promising alternative, aiming to use machine learning models to directly predict the flu… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  7. arXiv:2406.14118  [pdf, other

    eess.IV cs.CV

    Prediction and Reference Quality Adaptation for Learned Video Compression

    Authors: Xihua Sheng, Li Li, Dong Liu, Houqiang Li

    Abstract: Temporal prediction is one of the most important technologies for video compression. Various prediction coding modes are designed in traditional video codecs. Traditional video codecs will adaptively to decide the optimal coding mode according to the prediction quality and reference quality. Recently, learned video codecs have made great progress. However, they ignore the prediction and reference… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  8. arXiv:2406.14064  [pdf, other

    cs.IT eess.SP

    PAPR Reduction with Pre-chirp Selection for Affine Frequency Division Multiple

    Authors: Haozhi Yuan, Yin Xu, Xinghao Guo, Tianyao Ma, Haoyang Li, Dazhi He, Wenjun Zhang

    Abstract: Affine frequency division multiplexing (AFDM) is a promising new multicarrier technique based on discrete affine Fourier transform (DAFT). By properly tuning pre-chirp parameter and post-chirp parameter in the DAFT, the effective channel in the DAFT domain can completely avoid overlap of different paths, thus constitutes a full representation of delay-Doppler profile, which significantly improves… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  9. arXiv:2406.13340  [pdf, other

    cs.CL cs.SD eess.AS

    SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

    Authors: Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu

    Abstract: Speech encompasses a wealth of information, including but not limited to content, paralinguistic, and environmental information. This comprehensive nature of speech significantly impacts communication and is crucial for human-computer interaction. Chat-Oriented Large Language Models (LLMs), known for their general-purpose assistance capabilities, have evolved to handle multi-modal inputs, includin… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  10. arXiv:2406.12726  [pdf, other

    cs.SD cs.AI eess.AS

    ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting

    Authors: Zeyang Song, Qianhui Liu, Qu Yang, Yizhou Peng, Haizhou Li

    Abstract: Keyword Spotting (KWS) is essential in edge computing requiring rapid and energy-efficient responses. Spiking Neural Networks (SNNs) are well-suited for KWS for their efficiency and temporal capacity for speech. To further reduce the latency and energy consumption, this study introduces ED-sKWS, an SNN-based KWS model with an early-decision mechanism that can stop speech processing and output the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  11. arXiv:2406.12447  [pdf, other

    eess.AS

    Text-aware Speech Separation for Multi-talker Keyword Spotting

    Authors: Haoyu Li, Baochen Yang, Yu Xi, Linfeng Yu, Tian Tan, Hao Li, Kai Yu

    Abstract: For noisy environments, ensuring the robustness of keyword spotting (KWS) systems is essential. While much research has focused on noisy KWS, less attention has been paid to multi-talker mixed speech scenarios. Unlike the usual cocktail party problem where multi-talker speech is separated using speaker clues, the key challenge here is to extract the target speech for KWS based on text clues. To ad… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  12. arXiv:2406.11401  [pdf, other

    eess.AS

    An Exploration of Length Generalization in Transformer-Based Speech Enhancement

    Authors: Qiquan Zhang, Hongxu Zhu, Xinyuan Qian, Eliathamby Ambikairajah, Haizhou Li

    Abstract: The use of Transformer architectures has facilitated remarkable progress in speech enhancement. Training Transformers using substantially long speech utterances is often infeasible as self-attention suffers from quadratic complexity. It is a critical and unexplored challenge for a Transformer-based speech enhancement model to learn from short speech utterances and generalize to longer ones. In thi… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  13. arXiv:2406.10844  [pdf, other

    eess.AS cs.SD

    Multi-Scale Accent Modeling with Disentangling for Multi-Speaker Multi-Accent TTS Synthesis

    Authors: Xuehao Zhou, Mingyang Zhang, Yi Zhou, Zhizheng Wu, Haizhou Li

    Abstract: Synthesizing speech across different accents while preserving the speaker identity is essential for various real-world customer applications. However, the individual and accurate modeling of accents and speakers in a text-to-speech (TTS) system is challenging due to the complexity of accent variations and the intrinsic entanglement between the accent and speaker identity. In this paper, we present… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  14. arXiv:2406.08266  [pdf, other

    eess.AS cs.SD

    Refining Self-Supervised Learnt Speech Representation using Brain Activations

    Authors: Hengyu Li, Kangdi Mei, Zhaoci Liu, Yang Ai, Li** Chen, Jie Zhang, Zhenhua Ling

    Abstract: It was shown in literature that speech representations extracted by self-supervised pre-trained models exhibit similarities with brain activations of human for speech perception and fine-tuning speech representation models on downstream tasks can further improve the similarity. However, it still remains unclear if this similarity can be used to optimize the pre-trained speech models. In this work,… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: accpeted by Interspeech2024

  15. arXiv:2406.08196  [pdf, other

    cs.SD eess.AS

    FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter

    Authors: Yuanjun Lv, Hai Li, Ying Yan, Junhui Liu, Danming Xie, Lei Xie

    Abstract: Vocoders reconstruct speech waveforms from acoustic features and play a pivotal role in modern TTS systems. Frequent-domain GAN vocoders like Vocos and APNet2 have recently seen rapid advancements, outperforming time-domain models in inference speed while achieving comparable audio quality. However, these frequency-domain vocoders suffer from large parameter sizes, thus introducing extra memory bu… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024; 5 pages, 5 figures

  16. arXiv:2406.07422  [pdf, other

    eess.AS

    Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation

    Authors: Hanzhao Li, Liumeng Xue, Haohan Guo, Xinfa Zhu, Yuanjun Lv, Lei Xie, Yunlin Chen, Hao Yin, Zhifei Li

    Abstract: The multi-codebook speech codec enables the application of large language models (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction. To avoid this obstacle, we propose Single-Codec, a single-codebook single-sequence codec, which employs a disentangled VQ-VAE to decouple speech into a time-invariant embedding and a phonetically-rich discrete sequence. Furthermor… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  17. arXiv:2406.07198  [pdf, other

    eess.AS cs.MM

    Target Speech Diarization with Multimodal Prompts

    Authors: Yidi Jiang, Ruijie Tao, Zhengyang Chen, Yanmin Qian, Haizhou Li

    Abstract: Traditional speaker diarization seeks to detect ``who spoke when'' according to speaker characteristics. Extending to target speech diarization, we detect ``when target event occurs'' according to the semantic characteristics of speech. We propose a novel Multimodal Target Speech Diarization (MM-TSD) framework, which accommodates diverse and multi-modal prompts to specify target events in a flexib… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 13 pages, 7 figures

  18. arXiv:2406.05982  [pdf

    eess.IV cs.LG physics.med-ph

    Artificial Intelligence for Neuro MRI Acquisition: A Review

    Authors: Hongjia Yang, Guanhua Wang, Ziyu Li, Haoxiang Li, Jialan Zheng, Yuxin Hu, Xiaozhi Cao, Congyu Liao, Huihui Ye, Qiyuan Tian

    Abstract: Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potenti… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Submitted to MAGMA for review

  19. arXiv:2406.05551  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Autoregressive Diffusion Transformer for Text-to-Speech Synthesis

    Authors: Zhijun Liu, Shuai Wang, Sho Inoue, Qibing Bai, Haizhou Li

    Abstract: Audio language models have recently emerged as a promising approach for various audio generation tasks, relying on audio tokenizers to encode waveforms into sequences of discrete symbols. Audio tokenization often poses a necessary compromise between code bitrate and reconstruction accuracy. When dealing with low-bitrate audio codes, language models are constrained to process only a subset of the i… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  20. arXiv:2406.02483  [pdf, other

    eess.AS cs.AI cs.SD

    How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?

    Authors: Tianchi Liu, Lin Zhang, Rohan Kumar Das, Yi Ma, Ruijie Tao, Haizhou Li

    Abstract: Partially manipulating a sentence can greatly change its meaning. Recent work shows that countermeasures (CMs) trained on partially spoofed audio can effectively detect such spoofing. However, the current understanding of the decision-making process of CMs is limited. We utilize Grad-CAM and introduce a quantitative analysis metric to interpret CMs' decisions. We find that CMs prioritize the artif… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  21. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  22. arXiv:2406.02247  [pdf, other

    physics.ins-det eess.SY

    A Study of the Latest Updates of the Readout System for the Hybird-Pixel Detector at HEPS

    Authors: Hangxu Li, Jie Zhang, Wei Wei, Zhenjie Li, Xiaolu Ji, Yan Zhang, Xuanzheng Yang, Shuihan Zhang, Xueke Ma, Peng Liu, Zheng Wang, Yuanbai Chen

    Abstract: The High Energy Photon Source (HEPS) represents a fourth-generation light source. This facility has made unprecedented advancements in accelerator technology, necessitating the development of new detectors to satisfy physical requirements such as single-photon resolution, large dynamic range, and high frame rates. Since 2016, the Institute of High Energy Physics has introduced the first user-exper… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  23. arXiv:2406.00516  [pdf, other

    eess.SY

    Deep Learning based Performance Testing for Analog Integrated Circuits

    Authors: Jiawei Cao, Chongtao Guo, Hao Li, Zhigang Wang, Houjun Wang, Geoffrey Ye Li

    Abstract: In this paper, we propose a deep learning based performance testing framework to minimize the number of required test modules while guaranteeing the accuracy requirement, where a test module corresponds to a combination of one circuit and one stimulus. First, we apply a deep neural network (DNN) to establish the map** from the response of the circuit under test (CUT) in each module to all specif… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  24. arXiv:2406.00485  [pdf

    eess.IV cs.RO

    TacShade A New 3D-printed Soft Optical Tactile Sensor Based on Light, Shadow and Greyscale for Shape Reconstruction

    Authors: Zhenyu Lu, Jialong Yang, Haoran Li, Yifan Li, Weiyong Si, Nathan Lepora, Chenguang Yang

    Abstract: In this paper, we present the TacShade a newly designed 3D-printed soft optical tactile sensor. The sensor is developed for shape reconstruction under the inspiration of sketch drawing that uses the density of sketch lines to draw light and shadow, resulting in the creation of a 3D-view effect. TacShade, building upon the strengths of the TacTip, a single-camera tactile sensor of large in-depth de… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by ICRA 2024

  25. arXiv:2406.00449  [pdf, other

    eess.IV cs.CV

    Dual Hyperspectral Mamba for Efficient Spectral Compressive Imaging

    Authors: Jiahua Dong, Hui Yin, Hongliu Li, Wenbo Li, Yulun Zhang, Salman Khan, Fahad Shahbaz Khan

    Abstract: Deep unfolding methods have made impressive progress in restoring 3D hyperspectral images (HSIs) from 2D measurements through convolution neural networks or Transformers in spectral compressive imaging. However, they cannot efficiently capture long-range dependencies using global receptive fields, which significantly limits their performance in HSI reconstruction. Moreover, these methods may suffe… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 13 pages, 6 figures

  26. arXiv:2405.20693  [pdf, other

    eess.IV cs.CV

    R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction

    Authors: Ruyi Zha, Tao Jun Lin, Yuanhao Cai, Jiwen Cao, Yanhao Zhang, Hongdong Li

    Abstract: 3D Gaussian splatting (3DGS) has shown promising results in image rendering and surface reconstruction. However, its potential in volumetric reconstruction tasks, such as X-ray computed tomography, remains under-explored. This paper introduces R2-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction. By carefully deriving X-ray rasterization functions, we discover a p… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  27. arXiv:2405.20595  [pdf, other

    eess.SP

    Multi-Beam Integrated Sensing and Communication: State-of-the-Art, Challenges and Opportunities

    Authors: Yinxiao Zhuo, Tianqi Mao, Hao** Li, Chen Sun, Zhaocheng Wang, Zhu Han, Sheng Chen

    Abstract: Integrated sensing and communication (ISAC) has been envisioned as a critical enabling technology for the next-generation wireless communication, which can realize location/motion detection of surroundings with communication devices. This additional sensing capability leads to a substantial network quality gain and expansion of the service scenarios. As the system evolves to millimeter wave (mmWav… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  28. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  29. arXiv:2405.18356  [pdf, other

    eess.IV cs.CV

    Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography

    Authors: Jie Liu, Yixiao Zhang, Kang Wang, Mehmet Can Yavuz, Xiaoxi Chen, Yixuan Yuan, Haoliang Li, Yang Yang, Alan Yuille, Yucheng Tang, Zongwei Zhou

    Abstract: The advancement of artificial intelligence (AI) for organ segmentation and tumor detection is propelled by the growing availability of computed tomography (CT) datasets with detailed, per-voxel annotations. However, these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classes due to limitations in the one-hot encoding, architectural design, and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to Medical Image Analysis

  30. arXiv:2405.15553  [pdf, other

    eess.SP

    Massive MIMO-ISAC System With 1-Bit ADCs/DACs

    Authors: Bowen Wang, Hongyu Li, Bin Liao, Ziyang Cheng

    Abstract: This paper investigates a hardware-efficient massive multiple-input multiple-output integrated sensing and communication (MIMO-ISAC) system with 1-bit analog-to-digital converters (ADCs)/digital-to-analog converters (DACs). The proposed system, referred to as 1BitISAC, employs 1-bit DACs at the ISAC transmitter and 1-bit ADCs at the sensing receiver, achieving significant reductions in power consu… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  31. arXiv:2405.15093  [pdf, other

    eess.AS

    Real-Time and Accurate: Zero-shot High-Fidelity Singing Voice Conversion with Multi-Condition Flow Synthesis

    Authors: Hui Li, Hongyu Wang, Zhi** Chen, Bohan Sun, Bo Li

    Abstract: Singing voice conversion is to convert the source sing voice into the target sing voice except for the content. Currently, flow-based models can complete the task of voice conversion, but they struggle to effectively extract latent variables in the more rhythmically rich and emotionally expressive task of singing voice conversion, while also facing issues with low efficiency in speech processing.… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 5 pages,4 figures

  32. arXiv:2405.12609  [pdf, other

    eess.AS cs.SD

    Mamba in Speech: Towards an Alternative to Self-Attention

    Authors: Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps

    Abstract: Transformer and its derivatives have achieved success in diverse tasks across computer vision, natural language processing, and speech processing. To reduce the complexity of computations within the multi-head self-attention mechanism in Transformer, Selective State Space Models (i.e., Mamba) were proposed as an alternative. Mamba exhibited its effectiveness in natural language processing and comp… ▽ More

    Submitted 23 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  33. arXiv:2405.12584  [pdf, other

    eess.IV cs.CV cs.LG

    Is Dataset Quality Still a Concern in Diagnosis Using Large Foundation Model?

    Authors: Ziqin Lin, Heng Li, Zinan Li, Huazhu Fu, Jiang Liu

    Abstract: Recent advancements in pre-trained large foundation models (LFM) have yielded significant breakthroughs across various domains, including natural language processing and computer vision. These models have been particularly impactful in the domain of medical diagnostic tasks. With abundant unlabeled data, an LFM has been developed for fundus images using the Vision Transformer (VIT) and a self-supe… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures

  34. arXiv:2405.09787  [pdf, other

    eess.IV cs.CV cs.LG

    Analysis of the BraTS 2023 Intracranial Meningioma Segmentation Challenge

    Authors: Dominic LaBella, Ujjwal Baid, Omaditya Khanna, Shan McBurney-Lin, Ryan McLean, Pierre Nedelec, Arif Rashid, Nourel Hoda Tahon, Talissa Altes, Radhika Bhalerao, Yaseen Dhemesh, Devon Godfrey, Fathi Hilal, Scott Floyd, Anastasia Janas, Anahita Fathi Kazerooni, John Kirkpatrick, Collin Kent, Florian Kofler, Kevin Leu, Nazanin Maleki, Bjoern Menze, Maxence Pajot, Zachary J. Reitman, Jeffrey D. Rudie , et al. (96 additional authors not shown)

    Abstract: We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 16 pages, 11 tables, 10 figures, MICCAI

  35. arXiv:2405.09514  [pdf, other

    eess.SP cs.IT cs.LG

    Tackling Distribution Shifts in Task-Oriented Communication with Information Bottleneck

    Authors: Hongru Li, Jiawei Shao, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

    Abstract: Task-oriented communication aims to extract and transmit task-relevant information to significantly reduce the communication overhead and transmission latency. However, the unpredictable distribution shifts between training and test data, including domain shift and semantic shift, can dramatically undermine the system performance. In order to tackle these challenges, it is crucial to ensure that t… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 13 pages, 8 figures, submitted to IEEE for potential publication

  36. arXiv:2405.09179  [pdf, other

    eess.SP

    Integrated Sensing and Communication Enabled Cooperative Passive Sensing Using Mobile Communication System

    Authors: Zhiqing Wei, Haotian Liu, Hujun Li, Wangjun Jiang, Zhiyong Feng, Huici Wu, ** Zhang

    Abstract: Integrated sensing and communication (ISAC) is a potential technology of the sixth-generation (6G) mobile communication system, which enables communication base station (BS) with sensing capability. However, the performance of single-BS sensing is limited, which can be overcome by multi-BS cooperative sensing. There are three types of multi-BS cooperative sensing, including cooperative active sens… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 16 pages, 11 figures, Submitted to IEEE Transactions on Mobile Computing

  37. Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis

    Authors: Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li

    Abstract: It remains a challenge to effectively control the emotion rendering in text-to-speech (TTS) synthesis. Prior studies have primarily focused on learning a global prosodic representation at the utterance level, which strongly correlates with linguistic prosody. Our goal is to construct a hierarchical emotion distribution (ED) that effectively encapsulates intensity variations of emotions at various… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: This is accepted to IEEE ICASSP 2024

  38. arXiv:2405.07717  [pdf, other

    eess.IV

    On the Adversarial Robustness of Learning-based Image Compression Against Rate-Distortion Attacks

    Authors: Chenhao Wu, Qingbo Wu, Haoran Wei, Shuai Chen, Lei Wang, King Ngi Ngan, Fanman Meng, Hongliang Li

    Abstract: Despite demonstrating superior rate-distortion (RD) performance, learning-based image compression (LIC) algorithms have been found to be vulnerable to malicious perturbations in recent studies. Adversarial samples in these studies are designed to attack only one dimension of either bitrate or distortion, targeting a submodel with a specific compression ratio. However, adversaries in real-world sce… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  39. arXiv:2405.07297  [pdf, other

    eess.SP cs.IT

    Beyond Diagonal Reconfigurable Intelligent Surfaces in Wideband OFDM Communications: Circuit-Based Modeling and Optimization

    Authors: Hongyu Li, Matteo Nerini, Shanpu Shen, Bruno Clerckx

    Abstract: This work investigates the modeling and optimization of beyond diagonal reconfigurable intelligent surface (BD-RIS), which generalizes conventional RIS with diagonal phase shift matrices and provides additional flexibility for manipulating wireless channels, in wideband communication systems. Specifically, we start from the signal modeling of the BD-RIS-aided orthogonal frequency division multiple… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 12 pages, 6 figures, submitted to IEEE journal. arXiv admin note: text overlap with arXiv:2403.12893

  40. arXiv:2405.07216  [pdf, other

    eess.SY

    Magnetic-Guided Flexible Origami Robot toward Long-Term Phototherapy of H. pylori in the Stomach

    Authors: Sishen Yuan, Baijia Liang, Po Wa Wong, Ming**g Xu, Chi Hsuan Li, Zhen Li, Hongliang Ren

    Abstract: Helicobacter pylori, a pervasive bacterial infection associated with gastrointestinal disorders such as gastritis, peptic ulcer disease, and gastric cancer, impacts approximately 50% of the global population. The efficacy of standard clinical eradication therapies is diminishing due to the rise of antibiotic-resistant strains, necessitating alternative treatment strategies. Photodynamic therapy (P… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: IEEE ICRA 2024

  41. arXiv:2405.07202  [pdf, other

    cs.CV cs.AI cs.LG cs.MM cs.SD eess.AS

    Unified Video-Language Pre-training with Synchronized Audio

    Authors: Shentong Mo, Haofan Wang, Huaxia Li, Xu Tang

    Abstract: Video-language pre-training is a typical and challenging problem that aims at learning visual and textual representations from large-scale data in a self-supervised way. Existing pre-training approaches either captured the correspondence of image-text pairs or utilized temporal ordering of frames. However, they do not explicitly explore the natural synchronization between audio and the other two m… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  42. arXiv:2405.06175  [pdf, other

    eess.IV cs.CV

    Prior-guided Diffusion Model for Cell Segmentation in Quantitative Phase Imaging

    Authors: Zhuchen Shao, Mark A. Anastasio, Hua Li

    Abstract: Purpose: Quantitative phase imaging (QPI) is a label-free technique that provides high-contrast images of tissues and cells without the use of chemicals or dyes. Accurate semantic segmentation of cells in QPI is essential for various biomedical applications. While DM-based segmentation has demonstrated promising results, the requirement for multiple sampling steps reduces efficiency. This study ai… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  43. arXiv:2405.04821  [pdf, other

    cs.RO eess.SY

    ATDM:An Anthropomorphic Aerial Tendon-driven Manipulator with Low-Inertia and High-Stiffness

    Authors: Quman Xu, Zhan Li, Hai Li, Xinghu Yu, Yipeng Yang

    Abstract: Aerial Manipulator Systems (AMS) have garnered significant interest for their utility in aerial operations. Nonetheless, challenges related to the manipulator's limited stiffness and the coupling disturbance with manipulator movement persist. This paper introduces the Aerial Tendon-Driven Manipulator (ATDM), an innovative AMS that integrates a hexrotor Unmanned Aerial Vehicle (UAV) with a 4-degree… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  44. arXiv:2405.03848  [pdf, other

    cs.LG cs.CY eess.SY

    CityLearn v2: Energy-flexible, resilient, occupant-centric, and carbon-aware management of grid-interactive communities

    Authors: Kingsley Nweye, Kathryn Kaspar, Giacomo Buscemi, Tiago Fonseca, Giuseppe Pinto, Dipanjan Ghose, Satvik Duddukuru, Pavani Pratapa, Han Li, Javad Mohammadi, Luis Lino Ferreira, Tianzhen Hong, Mohamed Ouf, Alfonso Capozzoli, Zoltan Nagy

    Abstract: As more distributed energy resources become part of the demand-side infrastructure, it is important to quantify the energy flexibility they provide on a community scale, particularly to understand the impact of geographic, climatic, and occupant behavioral differences on their effectiveness, as well as identify the best control strategies to accelerate their real-world adoption. CityLearn provides… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  45. arXiv:2405.01230  [pdf, other

    cs.CV eess.SP

    Evaluation of Video-Based rPPG in Challenging Environments: Artifact Mitigation and Network Resilience

    Authors: Nhi Nguyen, Le Nguyen, Honghan Li, Miguel Bordallo López, Constantino Álvarez Casado

    Abstract: Video-based remote photoplethysmography (rPPG) has emerged as a promising technology for non-contact vital sign monitoring, especially under controlled conditions. However, the accurate measurement of vital signs in real-world scenarios faces several challenges, including artifacts induced by videocodecs, low-light noise, degradation, low dynamic range, occlusions, and hardware and network constra… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 22 main article pages with 3 supplementary pages, journal

  46. arXiv:2404.19609  [pdf, other

    cs.CV eess.IV

    Seeing Through the Clouds: Cloud Gap Imputation with Prithvi Foundation Model

    Authors: Denys Godwin, Hanxi Li, Michael Cecil, Hamed Alemohammad

    Abstract: Filling cloudy pixels in multispectral satellite imagery is essential for accurate data analysis and downstream applications, especially for tasks which require time series data. To address this issue, we compare the performance of a foundational Vision Transformer (ViT) model with a baseline Conditional Generative Adversarial Network (CGAN) model for missing value imputation in time series of mul… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  47. arXiv:2404.18501  [pdf, other

    eess.AS cs.SD

    Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention

    Authors: Ruijie Tao, Xinyuan Qian, Yidi Jiang, Junjie Li, Jiadong Wang, Haizhou Li

    Abstract: Audio-visual target speaker extraction (AV-TSE) aims to extract the specific person's speech from the audio mixture given auxiliary visual cues. Previous methods usually search for the target voice through speech-lip synchronization. However, this strategy mainly focuses on the existence of target speech, while ignoring the variations of the noise characteristics. That may result in extracting noi… ▽ More

    Submitted 8 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  48. arXiv:2404.17161  [pdf, other

    cs.SD eess.AS eess.SP

    An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder

    Authors: Yicheng Gu, Xueyao Zhang, Liumeng Xue, Haizhou Li, Zhizheng Wu

    Abstract: Generative Adversarial Network (GAN) based vocoders are superior in both inference speed and synthesis quality when reconstructing an audible waveform from an acoustic representation. This study focuses on improving the discriminator for GAN-based vocoders. Most existing Time-Frequency Representation (TFR)-based discriminators are rooted in Short-Time Fourier Transform (STFT), which owns a constan… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.14957

  49. arXiv:2404.16302  [pdf, other

    cs.CV cs.MM cs.RO eess.IV

    CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions

    Authors: Haoyuan Li, Qi Hu, You Yao, Kailun Yang, Peng Chen

    Abstract: Cross-modality images that integrate visible-infrared spectra cues can provide richer complementary information for object detection. Despite this, existing visible-infrared object detection methods severely degrade in severe weather conditions. This failure stems from the pronounced sensitivity of visible images to environmental perturbations, such as rain, haze, and snow, which frequently cause… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: The dataset and source code will be made publicly available at https://github.com/lhy-zjut/CFMW

  50. arXiv:2404.15009  [pdf, other

    cs.CV eess.IV

    The Brain Tumor Segmentation in Pediatrics (BraTS-PEDs) Challenge: Focus on Pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs)

    Authors: Anahita Fathi Kazerooni, Nastaran Khalili, Deep Gandhi, Xinyang Liu, Zhifan Jiang, Syed Muhammed Anwar, Jake Albrecht, Maruf Adewole, Udunna Anazodo, Hannah Anderson, Sina Bagheri, Ujjwal Baid, Timothy Bergquist, Austin J. Borja, Evan Calabrese, Verena Chung, Gian-Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Ariana Familiar, Keyvan Farahani, Anurag Gottipati, Debanjan Haldar, Shuvanjan Haldar , et al. (51 additional authors not shown)

    Abstract: Pediatric tumors of the central nervous system are the most common cause of cancer-related death in children. The five-year survival rate for high-grade gliomas in children is less than 20%. Due to their rarity, the diagnosis of these entities is often delayed, their treatment is mainly based on historic treatment concepts, and clinical trials require multi-institutional collaborations. Here we pr… ▽ More

    Submitted 29 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2305.17033