Skip to main content

Showing 1–50 of 732 results for author: Li, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19328  [pdf, other

    cs.SD cs.LG eess.AS

    Subtractive Training for Music Stem Insertion using Latent Diffusion Models

    Authors: Ivan Villa-Renteria, Mason L. Wang, Zachary Shah, Zhe Li, Soohyun Kim, Neelesh Ramachandran, Mert Pilanci

    Abstract: We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.18840  [pdf

    eess.IV

    Shorter SPECT Scans Using Self-supervised Coordinate Learning to Synthesize Skipped Projection Views

    Authors: Zongyu Li, Yixuan Jia, Xiaojian Xu, Jason Hu, Jeffrey A. Fessler, Yuni K. Dewaraja

    Abstract: Purpose: This study addresses the challenge of extended SPECT imaging duration under low-count conditions, as encountered in Lu-177 SPECT imaging, by develo** a self-supervised learning approach to synthesize skipped SPECT projection views, thus shortening scan times in clinical settings. Methods: We employed a self-supervised coordinate-based learning technique, adapting the neural radiance fie… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 25 pages, 5568 words

  3. arXiv:2406.18467  [pdf, other

    eess.SY math.OC

    Algebraic Connectivity Control and Maintenance in Multi-Agent Networks under Attack

    Authors: Wenjie Zhao, Diego Deplano, Zhiwu Li, Alessandro Giua, Mauro Franceschelli

    Abstract: This paper studies the problem of increasing the connectivity of an ad-hoc peer-to-peer network subject to cyber-attacks targeting the agents in the network. The adopted strategy involves the design of local interaction rules for the agents to locally modify the graph topology by adding and removing links with neighbors. Two distributed protocols are presented to boost the algebraic connectivity o… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.16981  [pdf

    eess.IV cs.AI cs.LG eess.SP

    Research on Feature Extraction Data Processing System For MRI of Brain Diseases Based on Computer Deep Learning

    Authors: Lingxi Xiao, **xin Hu, Yutian Yang, Yinqiu Feng, Zichao Li, Zexi Chen

    Abstract: Most of the existing wavelet image processing techniques are carried out in the form of single-scale reconstruction and multiple iterations. However, processing high-quality fMRI data presents problems such as mixed noise and excessive computation time. This project proposes the use of matrix operations by combining mixed noise elimination methods with wavelet analysis to replace traditional itera… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  5. arXiv:2406.15885  [pdf, other

    cs.SD cs.AI eess.AS

    The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

    Authors: Jiajia Li, Lu Yang, Mingni Tang, Cong Chen, Zuchao Li, ** Wang, Hai Zhao

    Abstract: Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-rel… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL-Findings 2024

  6. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  7. arXiv:2406.14485   

    cs.AI cs.HC cs.MM cs.SD eess.AS

    Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)

    Authors: Nick Bryan-Kinns, Corey Ford, Shuoyang Zheng, Helen Kennedy, Alan Chamberlain, Makayla Lewis, Drew Hemment, Zi** Li, Qiong Wu, Lanxi Xiao, Gus Xia, Jeba Rezwana, Michael Clemens, Gabriel Vigliensoni

    Abstract: This second international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 16th ACM Conference on Creativity and Cognition (C&C 2024), Chicago, USA.

    Submitted 22 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  8. arXiv:2406.13705  [pdf, other

    eess.IV cs.AI cs.CV

    EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy

    Authors: Long Bai, Qiaozhi Tan, Tong Chen, Wan Jun Nah, Yanheng Li, Zhicheng He, Sishen Yuan, Zhen Chen, **lin Wu, Mobarakol Islam, Zhen Li, Hongbin Liu, Hongliang Ren

    Abstract: Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: To appear in MICCAI 2024. Code and dataset availability: https://github.com/longbai1006/EndoUIC

  9. arXiv:2406.13674  [pdf, other

    eess.IV cs.CV

    Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases

    Authors: Xiangde Luo, Zihan Li, Shaoting Zhang, Wenjun Liao, Guotai Wang

    Abstract: Deep learning has enabled great strides in abdominal multi-organ segmentation, even surpassing junior oncologists on common cases or organs. However, robustness on corner cases and complex organs remains a challenging open problem for clinical adoption. To investigate model robustness, we collected and annotated the RAOS dataset comprising 413 CT scans ($\sim$80k 2D images, $\sim$8k 3D organ annot… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 10 pages, 1 figure, 6 tables, Early Accept to MICCAI 2024

  10. arXiv:2406.13526  [pdf, other

    eess.SP

    Using Geometrical information to Measure the Vibration of A Swaying Millimeter-wave Radar

    Authors: Chengyao Tang, Yongpeng Dai, Zhi Li, Tian **

    Abstract: This paper presents two new, simple yet effective approaches to measure the vibration of a swaying millimeter-wave radar (mmRadar) utilizing geometrical information. Specifically, for the planar vibrations, we firstly establish an equation based on the area difference between the swaying mmRadar and the reference objects at different moments, which enables the quantification of planar displacement… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 5 pages, 4 figures,submitted to the IEEE for publication

  11. arXiv:2406.13038  [pdf, other

    cs.AI eess.SP

    Traffic Prediction considering Multiple Levels of Spatial-temporal Information: A Multi-scale Graph Wavelet-based Approach

    Authors: Zilin Bian, **gqin Gao, Kaan Ozbay, Zhenning Li

    Abstract: Although traffic prediction has been receiving considerable attention with a number of successes in the context of intelligent transportation systems, the prediction of traffic states over a complex transportation network that contains different road types has remained a challenge. This study proposes a multi-scale graph wavelet temporal convolution network (MSGWTCN) to predict the traffic states… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  12. arXiv:2406.10514  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

    Authors: Zehua Kcriss Li, Meiying Melissa Chen, Yi Zhong, Pinxin Liu, Zhiyao Duan

    Abstract: Expressive speech synthesis aims to generate speech that captures a wide range of para-linguistic features, including emotion and articulation, though current research primarily emphasizes emotional aspects over the nuanced articulatory features mastered by professional voice actors. Inspired by this, we explore expressive speech synthesis through the lens of articulatory phonetics. Specifically,… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  13. arXiv:2406.10160  [pdf, other

    cs.SD cs.AI eess.AS

    One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

    Authors: Zhaoqing Li, Haoning Xu, Tianzi Wang, Shoukang Hu, Zengrui **, Shujie Hu, Jiajun Deng, Mingyu Cui, Mengzhe Geng, Xunying Liu

    Abstract: We propose a novel one-pass multiple ASR systems joint compression and quantization approach using an all-in-one neural model. A single compression cycle allows multiple nested systems with varying Encoder depths, widths, and quantization precision settings to be simultaneously constructed without the need to train and store individual target systems separately. Experiments consistently demonstrat… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  14. arXiv:2406.10152  [pdf, other

    cs.SD eess.AS

    Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

    Authors: Guinan Li, Jiajun Deng, Youjun Chen, Mengzhe Geng, Shujie Hu, Zhe Li, Zengrui **, Tianzi Wang, Xurong Xie, Helen Meng, Xunying Liu

    Abstract: This paper proposes joint speaker feature learning methods for zero-shot adaptation of audio-visual multichannel speech separation and recognition systems. xVector and ECAPA-TDNN speaker encoders are connected using purpose-built fusion blocks and tightly integrated with the complete system training. Experiments conducted on LRS3-TED data simulated multichannel overlapped speech suggest that joint… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  15. arXiv:2406.10034  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

    Authors: Tianzi Wang, Xurong Xie, Zhaoqing Li, Shoukang Hu, Zengrui **g, Jiajun Deng, Mingyu Cui, Shujie Hu, Mengzhe Geng, Guinan Li, Helen Meng, Xunying Liu

    Abstract: This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output labels that are concealed using attention masks, while conducting left-to-right AR prediction and history context amalgamation between blocks. A beam s… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, 2 tables, Interspeech24 conference

  16. arXiv:2406.09846  [pdf, ps, other

    cs.IT eess.SP

    Multiple Intelligent Reflecting Surfaces Collaborative Wireless Localization System

    Authors: Ziheng Zhang, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, **gfeng Chen, Nan Cheng

    Abstract: This paper studies a multiple intelligent reflecting surfaces (IRSs) collaborative localization system where multiple semi-passive IRSs are deployed in the network to locate one or more targets based on time-of-arrival. It is assumed that each semi-passive IRS is equipped with reflective elements and sensors, which are used to establish the line-of-sight links from the base station (BS) to multipl… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 13 pages, 8 figures

  17. arXiv:2406.08835  [pdf, other

    cs.SD eess.AS

    A Single-Step Non-Autoregressive Automatic Speech Recognition Architecture with High Accuracy and Inference Speed

    Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Shuai Gong, Ming Fang, Tao Wei, Zijian Li, Wei Hu, Shaojun Wang, **g Xiao

    Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, ca… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  18. arXiv:2406.08300  [pdf, other

    eess.IV cs.CV

    From Chaos to Clarity: 3DGS in the Dark

    Authors: Zhihao Li, Yufei Wang, Alex Kot, Bihan Wen

    Abstract: Novel view synthesis from raw images provides superior high dynamic range (HDR) information compared to reconstructions from low dynamic range RGB images. However, the inherent noise in unprocessed raw images compromises the accuracy of 3D scene representation. Our study reveals that 3D Gaussian Splatting (3DGS) is particularly susceptible to this noise, leading to numerous elongated Gaussian shap… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  19. arXiv:2406.07857  [pdf, other

    eess.SY cs.LG cs.NI

    Toward Enhanced Reinforcement Learning-Based Resource Management via Digital Twin: Opportunities, Applications, and Challenges

    Authors: Nan Cheng, Xiucheng Wang, Zan Li, Zhisheng Yin, Tom Luan, Xuemin Shen

    Abstract: This article presents a digital twin (DT)-enhanced reinforcement learning (RL) framework aimed at optimizing performance and reliability in network resource management, since the traditional RL methods face several unified challenges when applied to physical networks, including limited exploration efficiency, slow convergence, poor long-term performance, and safety concerns during the exploration… ▽ More

    Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 7pages, 6figures

  20. arXiv:2406.07422  [pdf, other

    eess.AS

    Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation

    Authors: Hanzhao Li, Liumeng Xue, Haohan Guo, Xinfa Zhu, Yuanjun Lv, Lei Xie, Yunlin Chen, Hao Yin, Zhifei Li

    Abstract: The multi-codebook speech codec enables the application of large language models (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction. To avoid this obstacle, we propose Single-Codec, a single-codebook single-sequence codec, which employs a disentangled VQ-VAE to decouple speech into a time-invariant embedding and a phonetically-rich discrete sequence. Furthermor… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  21. arXiv:2406.05982  [pdf

    eess.IV cs.LG physics.med-ph

    Artificial Intelligence for Neuro MRI Acquisition: A Review

    Authors: Hongjia Yang, Guanhua Wang, Ziyu Li, Haoxiang Li, Jialan Zheng, Yuxin Hu, Xiaozhi Cao, Congyu Liao, Huihui Ye, Qiyuan Tian

    Abstract: Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potenti… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Submitted to MAGMA for review

  22. arXiv:2406.05576  [pdf, other

    cs.IT cs.NI eess.SP eess.SY math.OC

    Uplink resource allocation optimization for user-centric cell-free MIMO networks

    Authors: Zehua Li, Raviraj Adve

    Abstract: We examine the problem of optimizing resource allocation in the uplink for a user-centric, cell-free, multi-input multi-output network. We start by modeling and develo** resource allocation algorithms for two standard network operation modes. The centralized mode provides high data rates but suffers multiple issues, including scalability. On the other hand, the distributed mode has the opposite… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: To appear in IEEE Transactions on Wireless Communications, 16 pages, 9 figures

  23. arXiv:2406.05499  [pdf, other

    eess.SP

    A Pixel-based Reconfigurable Antenna Design for Fluid Antenna Systems

    Authors: Jichen Zhang, Junhui Rao, Zhaoyang Ming, Zan Li, Chi-Yuk Chiu, Kai-Kit Wong, Kin-Fai Tong, Ross Murch

    Abstract: Fluid Antenna Systems (FASs) have recently been proposed for enhancing the performance of wireless communication. Previous antenna designs to meet the requirements of FAS have been based on mechanically movable or liquid antennas and therefore have limited reconfiguration speeds. In this paper, we propose a design for a pixel-based reconfigurable antenna (PRA) that meets the requirements of FAS an… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: 13 pages, 16 figures, Submitted to IEEE Transations on Antennas and Propagation

  24. arXiv:2406.04951  [pdf, other

    eess.AS

    The Database and Benchmark for Source Speaker Verification Against Voice Conversion

    Authors: Ze Li, Yuke Lin, Tian Yao, Hongbin Suo, Ming Li

    Abstract: Voice conversion systems can transform audio to mimic another speaker's voice, thereby attacking speaker verification systems. However, ongoing studies on source speaker verification are hindered by limited data availability and methodological constraints. In this paper, we generate a large-scale converted speech database and train a batch of baseline systems based on the MFA-Conformer architectur… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  25. arXiv:2406.04791  [pdf, other

    cs.SD eess.AS

    Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

    Authors: Shaojun Li, Daimeng Wei, Jiaxin Guo, ZongYao Li, Zhanglin Wu, Zhiqiang Rao, Yuanchang Luo, Xianghui He, Hao Yang

    Abstract: Despite recent improvements in End-to-End Automatic Speech Recognition (E2E ASR) systems, the performance can degrade due to vocal characteristic mismatches between training and testing data, particularly with limited target speaker adaptation data. We propose a novel speaker adaptation approach Speaker-Smoothed kNN that leverages k-Nearest Neighbors (kNN) retrieval techniques to improve model out… ▽ More

    Submitted 11 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  26. arXiv:2406.03098  [pdf, ps, other

    cs.IT eess.SP

    A Data and Model-Driven Deep Learning Approach to Robust Downlink Beamforming Optimization

    Authors: Kai Liang, Gan Zheng, Zan Li, Kai-Kit Wong, Chan-Byoung Chae

    Abstract: This paper investigates the optimization of the long-standing probabilistically robust transmit beamforming problem with channel uncertainties in the multiuser multiple-input single-output (MISO) downlink transmission. This problem poses significant analytical and computational challenges. Currently, the state-of-the-art optimization method relies on convex restrictions as tractable approximations… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted for publication in the IEEE Journal on Selected Areas in Communications, Special Issue on Advanced Optimization Theory and Algorithms for Next Generation Wireless Communication Networks

  27. arXiv:2406.02975  [pdf, other

    eess.SP

    A Shared-Aperture Dual-Band sub-6 GHz and mmWave Reconfigurable Intelligent Surface With Independent Operation

    Authors: Junhui Rao, Yujie Zhang, Shiwen Tang, Zan Li, Zhaoyang Ming, Jichen Zhang, Chi Yuk Chiu, Ross Murch

    Abstract: A novel dual-band reconfigurable intelligent surface (DBI-RIS) design that combines the functionalities of millimeter-wave (mmWave) and sub-6 GHz bands within a single aperture is proposed. This design aims to bridge the gap between current single-band reconfigurable intelligent surfaces (RISs) and wireless systems utilizing sub-6 GHz and mmWave bands that require RIS with independently reconfigur… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  28. arXiv:2406.02247  [pdf, other

    physics.ins-det eess.SY

    A Study of the Latest Updates of the Readout System for the Hybird-Pixel Detector at HEPS

    Authors: Hangxu Li, Jie Zhang, Wei Wei, Zhenjie Li, Xiaolu Ji, Yan Zhang, Xuanzheng Yang, Shuihan Zhang, Xueke Ma, Peng Liu, Zheng Wang, Yuanbai Chen

    Abstract: The High Energy Photon Source (HEPS) represents a fourth-generation light source. This facility has made unprecedented advancements in accelerator technology, necessitating the development of new detectors to satisfy physical requirements such as single-photon resolution, large dynamic range, and high frame rates. Since 2016, the Institute of High Energy Physics has introduced the first user-exper… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  29. arXiv:2405.20260  [pdf, other

    eess.SY

    Ancillary Services Provision by Cross-Voltage-Level Power Flow Control using Flexibility Regions

    Authors: Christian Holger Nerowski, Zongjun Li, Christian Rehtanz

    Abstract: The large-scale integration of distributed renewable energy sources into the electricity grid requires the investigation of new methods to ensure stability. For example, Active Distribution Networks (ADNs) can be used at (sub-) transmission levels for emergency operation, provided robust and efficient control is available. This paper investigates the use of Feasible Operating Regions (FORs) and Fl… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: preprint

  30. arXiv:2405.19653  [pdf, other

    cs.LG cs.CL eess.SY

    SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems

    Authors: Patrick Emami, Zhaonan Li, Saumya Sinha, Truc Nguyen

    Abstract: Data-driven simulation surrogates help computational scientists study complex systems. They can also help inform impactful policy decisions. We introduce a learning framework for surrogate modeling where language is used to interface with the underlying system being simulated. We call a language description of a system a "system caption", or SysCap. To address the lack of datasets of paired natura… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 17 pages. Under review

  31. arXiv:2405.18690  [pdf, other

    eess.SY

    Differentially-Private Distributed Model Predictive Control of Linear Discrete-Time Systems with Global Constraints

    Authors: Kaixiang Zhang, Yongqiang Wang, Ziyou Song, Zhaojian Li

    Abstract: Distributed model predictive control (DMPC) has attracted extensive attention as it can explicitly handle system constraints and achieve optimal control in a decentralized manner. However, the deployment of DMPC strategies generally requires the sharing of sensitive data among subsystems, which may violate the privacy of participating systems. In this paper, we propose a differentially-private DMP… ▽ More

    Submitted 4 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 11 pages, 3 figures

  32. arXiv:2405.16850  [pdf, other

    eess.IV cs.CV cs.LG

    UniCompress: Enhancing Multi-Data Medical Image Compression with Knowledge Distillation

    Authors: Runzhao Yang, Yinda Chen, Zhihong Zhang, Xiaoyu Liu, Zongren Li, Kunlun He, Zhiwei Xiong, **li Suo, Qionghai Dai

    Abstract: In the field of medical image compression, Implicit Neural Representation (INR) networks have shown remarkable versatility due to their flexible compression ratios, yet they are constrained by a one-to-one fitting approach that results in lengthy encoding times. Our novel method, ``\textbf{UniCompress}'', innovatively extends the compression capabilities of INR by being the first to compress multi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  33. arXiv:2405.16090  [pdf, other

    cs.HC eess.SP

    EEG-DBNet: A Dual-Branch Network for Temporal-Spectral Decoding in Motor-Imagery Brain-Computer Interfaces

    Authors: Xicheng Lou, Xinwei Li, Hongying Meng, Jun Hu, Meili Xu, Yue Zhao, Jiazhang Yang, Zhangyong Li

    Abstract: Motor imagery electroencephalogram (EEG)-based brain-computer interfaces (BCIs) offer significant advantages for individuals with restricted limb mobility. However, challenges such as low signal-to-noise ratio and limited spatial resolution impede accurate feature extraction from EEG signals, thereby affecting the classification accuracy of different actions. To address these challenges, this stud… ▽ More

    Submitted 19 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  34. arXiv:2405.12584  [pdf, other

    eess.IV cs.CV cs.LG

    Is Dataset Quality Still a Concern in Diagnosis Using Large Foundation Model?

    Authors: Ziqin Lin, Heng Li, Zinan Li, Huazhu Fu, Jiang Liu

    Abstract: Recent advancements in pre-trained large foundation models (LFM) have yielded significant breakthroughs across various domains, including natural language processing and computer vision. These models have been particularly impactful in the domain of medical diagnostic tasks. With abundant unlabeled data, an LFM has been developed for fundus images using the Vision Transformer (VIT) and a self-supe… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures

  35. arXiv:2405.09582  [pdf

    cs.CV eess.IV

    AD-Aligning: Emulating Human-like Generalization for Cognitive Domain Adaptation in Deep Learning

    Authors: Zhuoying Li, Bohua Wan, Cong Mu, Ruzhang Zhao, Shushan Qiu, Chao Yan

    Abstract: Domain adaptation is pivotal for enabling deep learning models to generalize across diverse domains, a task complicated by variations in presentation and cognitive nuances. In this paper, we introduce AD-Aligning, a novel approach that combines adversarial training with source-target domain alignment to enhance generalization capabilities. By pretraining with Coral loss and standard loss, AD-Align… ▽ More

    Submitted 21 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted by 2024 5th International Conference on Electronic Communication and Artificial Intelligence

  36. arXiv:2405.09053  [pdf, ps, other

    eess.SP

    Deep Learning-Based CSI Feedback for XL-MIMO Systems in the Near-Field Domain

    Authors: Zhangjie Peng, Rui**g Liu, Zhaotian Li, Cunhua Pan, Jiangzhou Wang

    Abstract: In this paper, we consider an extremely large-scale massive multiple-input-multiple-output (XL-MIMO) system. As the scale of antenna arrays increases, the range of near-field communications also expands. In this case, the signals no longer exhibit planar wave characteristics but spherical wave characteristics in the near-field channel, which makes the channel state information (CSI) highly complex… ▽ More

    Submitted 22 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  37. arXiv:2405.08838  [pdf, other

    cs.SD cs.AI eess.AS

    PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset

    Authors: Yang Hou, Haitao Fu, Chuankai Chen, Zida Li, Haoyu Zhang, Jianjun Zhao

    Abstract: With the rapid advancement of generative AI, multimodal deepfakes, which manipulate both audio and visual modalities, have drawn increasing public concern. Currently, deepfake detection has emerged as a crucial strategy in countering these growing threats. However, as a key factor in training and validating deepfake detectors, most existing deepfake datasets primarily focus on the visual modal, an… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 13 page, 4 figures

    MSC Class: 68T45 ACM Class: I.4.9

  38. arXiv:2405.07218  [pdf, other

    physics.med-ph eess.SY

    Chained Flexible Capsule Endoscope: Unraveling the Conundrum of Size Limitations and Functional Integration for Gastrointestinal Transitivity

    Authors: Sishen Yuan, Guang Li, Baijia Liang, Lailu Li, Qingzhuo Zheng, Shuang Song, Zhen Li, Hongliang Ren

    Abstract: Capsule endoscopes, predominantly serving diagnostic functions, provide lucid internal imagery but are devoid of surgical or therapeutic capabilities. Consequently, despite lesion detection, physicians frequently resort to traditional endoscopic or open surgical procedures for treatment, resulting in more complex, potentially risky interventions. To surmount these limitations, this study introduce… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  39. arXiv:2405.07216  [pdf, other

    eess.SY

    Magnetic-Guided Flexible Origami Robot toward Long-Term Phototherapy of H. pylori in the Stomach

    Authors: Sishen Yuan, Baijia Liang, Po Wa Wong, Ming**g Xu, Chi Hsuan Li, Zhen Li, Hongliang Ren

    Abstract: Helicobacter pylori, a pervasive bacterial infection associated with gastrointestinal disorders such as gastritis, peptic ulcer disease, and gastric cancer, impacts approximately 50% of the global population. The efficacy of standard clinical eradication therapies is diminishing due to the rise of antibiotic-resistant strains, necessitating alternative treatment strategies. Photodynamic therapy (P… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: IEEE ICRA 2024

  40. arXiv:2405.05518  [pdf, other

    cs.CV cs.RO eess.IV

    DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction

    Authors: Siyu Li, Jiacheng Lin, Hao Shi, Jiaming Zhang, Song Wang, You Yao, Zhiyong Li, Kailun Yang

    Abstract: Temporal information plays a pivotal role in Bird's-Eye-View (BEV) driving scene understanding, which can alleviate the visual information sparsity. However, the indiscriminate temporal fusion method will cause the barrier of feature redundancy when constructing vectorized High-Definition (HD) maps. In this paper, we revisit the temporal fusion of vectorized HD maps, focusing on temporal instance… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: The source code will be made publicly available at https://github.com/lynn-yu/DTCLMapper

  41. arXiv:2405.05133  [pdf, other

    cs.CV eess.IV

    Identifying every building's function in large-scale urban areas with multi-modality remote-sensing data

    Authors: Zhuohong Li, Wei He, Jiepan Li, Hongyan Zhang

    Abstract: Buildings, as fundamental man-made structures in urban environments, serve as crucial indicators for understanding various city function zones. Rapid urbanization has raised an urgent need for efficiently surveying building footprints and functions. In this study, we proposed a semi-supervised framework to identify every building's function in large-scale urban areas with multi-modality remote-sen… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 5 pages, 7 figures, accepted by IGARSS 2024

  42. arXiv:2405.04821  [pdf, other

    cs.RO eess.SY

    ATDM:An Anthropomorphic Aerial Tendon-driven Manipulator with Low-Inertia and High-Stiffness

    Authors: Quman Xu, Zhan Li, Hai Li, Xinghu Yu, Yipeng Yang

    Abstract: Aerial Manipulator Systems (AMS) have garnered significant interest for their utility in aerial operations. Nonetheless, challenges related to the manipulator's limited stiffness and the coupling disturbance with manipulator movement persist. This paper introduces the Aerial Tendon-Driven Manipulator (ATDM), an innovative AMS that integrates a hexrotor Unmanned Aerial Vehicle (UAV) with a 4-degree… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  43. arXiv:2405.03935  [pdf, other

    eess.SY

    Roadside Units Assisted Localized Automated Vehicle Maneuvering: An Offline Reinforcement Learning Approach

    Authors: Kui Wang, Changyang She, Zongdian Li, Tao Yu, Yonghui Li, Kei Sakaguchi

    Abstract: Traffic intersections present significant challenges for the safe and efficient maneuvering of connected and automated vehicles (CAVs). This research proposes an innovative roadside unit (RSU)-assisted cooperative maneuvering system aimed at enhancing road safety and traveling efficiency at intersections for CAVs. We utilize RSUs for real-time traffic data acquisition and train an offline reinforc… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 6 pages, 6 figures

  44. arXiv:2404.19265  [pdf, other

    cs.CV eess.IV

    Map** New Realities: Ground Truth Image Creation with Pix2Pix Image-to-Image Translation

    Authors: Zhenglin Li, Bo Guan, Yuanzhou Wei, Yiming Zhou, **gyu Zhang, **xin Xu

    Abstract: Generative Adversarial Networks (GANs) have significantly advanced image processing, with Pix2Pix being a notable framework for image-to-image translation. This paper explores a novel application of Pix2Pix to transform abstract map images into realistic ground truth images, addressing the scarcity of such images crucial for domains like urban planning and autonomous vehicle training. We detail th… ▽ More

    Submitted 30 April, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  45. arXiv:2404.18820  [pdf, other

    eess.IV cs.CV

    Towards Extreme Image Compression with Latent Feature Guidance and Diffusion Prior

    Authors: Zhiyuan Li, Yanhui Zhou, Hao Wei, Chenyang Ge, **gwen Jiang

    Abstract: Image compression at extremely low bitrates (below 0.1 bits per pixel (bpp)) is a significant challenge due to substantial information loss. In this work, we propose a novel two-stage extreme image compression framework that exploits the powerful generative capability of pre-trained diffusion models to achieve realistic image reconstruction at extremely low bitrates. In the first stage, we treat t… ▽ More

    Submitted 13 June, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE TCSVT

  46. arXiv:2404.17484  [pdf, other

    cs.CV eess.IV

    Sparse Reconstruction of Optical Doppler Tomography Based on State Space Model

    Authors: Zhenghong Li, Jiaxiang Ren, Wensheng Cheng, Congwu Du, Yingtian Pan, Haibin Ling

    Abstract: Optical Doppler Tomography (ODT) is a blood flow imaging technique popularly used in bioengineering applications. The fundamental unit of ODT is the 1D frequency response along the A-line (depth), named raw A-scan. A 2D ODT image (B-scan) is obtained by first sensing raw A-scans along the B-line (width), and then constructing the B-scan from these raw A-scans via magnitude-phase analysis and post-… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 19 pages, 5 figures

  47. arXiv:2404.16727  [pdf, other

    eess.SY

    Learning-Based Efficient Approximation of Data-enabled Predictive Control

    Authors: Yihan Zhou, Yiwen Lu, Zishuo Li, Jiaqi Yan, Yilin Mo

    Abstract: Data-Enabled Predictive Control (DeePC) bypasses the need for system identification by directly leveraging raw data to formulate optimal control policies. However, the size of the optimization problem in DeePC grows linearly with respect to the data size, which prohibits its application due to high computational costs. In this paper, we propose an efficient approximation of DeePC, whose size is in… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  48. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  49. arXiv:2404.15348  [pdf

    eess.SP physics.optics

    High-Linearity PAM-4 Silicon Micro-ring Transmitter Architecture with Electronic-Photonic Hybrid DAC

    Authors: Zheng Li, Chengyang Lv, Min Tan

    Abstract: This paper presents a high linearity PAM-4 transmitter (TX) architecture, consisting of a three-segment micro-ring modulator (MRM) and a matched CMOS driver. This architecture can drive a high-linearity 4-level pulse amplitude (PAM-4) modulation signal, thereby extending the tunable operating wavelength range for achieving linear PAM-4 output. We use the three-segment MRM to increase design flexib… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 14 pages, 11 figures

  50. arXiv:2404.13929  [pdf, other

    eess.IV cs.CV

    Exploring Kinetic Curves Features for the Classification of Benign and Malignant Breast Lesions in DCE-MRI

    Authors: Zixian Li, Yuming Zhong, Yi Wang

    Abstract: Breast cancer is the most common malignant tumor among women and the second cause of cancer-related death. Early diagnosis in clinical practice is crucial for timely treatment and prognosis. Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) has revealed great usability in the preoperative diagnosis and assessing therapy effects thanks to its capability to reflect the morphology and dy… ▽ More

    Submitted 10 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 6 pages, 8 figures, conference