Skip to main content

Showing 1–50 of 349 results for author: Li, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.14118  [pdf, other

    eess.IV cs.CV

    Prediction and Reference Quality Adaptation for Learned Video Compression

    Authors: Xihua Sheng, Li Li, Dong Liu, Houqiang Li

    Abstract: Temporal prediction is one of the most important technologies for video compression. Various prediction coding modes are designed in traditional video codecs. Traditional video codecs will adaptively to decide the optimal coding mode according to the prediction quality and reference quality. Recently, learned video codecs have made great progress. However, they ignore the prediction and reference… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.14067  [pdf

    physics.optics eess.SP

    A microwave photonic prototype for concurrent radar detection and spectrum sensing over an 8 to 40 GHz bandwidth

    Authors: Taixia Shi, Dingding Liang, Lu Wang, Lin Li, Shaogang Guo, Jiawei Gao, Xiaowei Li, Chulun Lin, Lei Shi, Baogang Ding, Shiyang Liu, Fangyi Yang, Chi Jiang, Yang Chen

    Abstract: In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz.… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages, 12 figures, 1 table

  3. arXiv:2406.13982  [pdf, other

    cs.SD eess.AS

    Improved Remixing Process for Domain Adaptation-Based Speech Enhancement by Mitigating Data Imbalance in Signal-to-Noise Ratio

    Authors: Li Li, Shogo Seki

    Abstract: RemixIT and Remixed2Remixed are domain adaptation-based speech enhancement (DASE) methods that use a teacher model trained in full supervision to generate pseudo-paired data by remixing the outputs of the teacher model. The student model for enhancing real-world recorded signals is trained using the pseudo-paired data without ground truth. Since the noisy signals are recorded in natural environmen… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech2024

  4. arXiv:2406.12650  [pdf, other

    eess.IV

    Weakly Supervised Learning of Cortical Surface Reconstruction from Segmentations

    Authors: Qiang Ma, Liu Li, Emma C. Robinson, Bernhard Kainz, Daniel Rueckert

    Abstract: Existing learning-based cortical surface reconstruction approaches heavily rely on the supervision of pseudo ground truth (pGT) cortical surfaces for training. Such pGT surfaces are generated by traditional neuroimage processing pipelines, which are time consuming and difficult to generalize well to low-resolution brain MRI, e.g., from fetuses and neonates. In this work, we present CoSeg, a learni… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024)

  5. arXiv:2406.10956  [pdf, other

    cs.SD cs.LG eess.AS

    Robust Channel Learning for Large-Scale Radio Speaker Verification

    Authors: Wenhao Yang, Jianguo Wei, Wenhuan Lu, Lei Li, Xugang Lu

    Abstract: Recent research in speaker verification has increasingly focused on achieving robust and reliable recognition under challenging channel conditions and noisy environments. Identifying speakers in radio communications is particularly difficult due to inherent limitations such as constrained bandwidth and pervasive noise interference. To address this issue, we present a Channel Robust Speaker Learnin… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 12 pages, 11 figures

  6. arXiv:2406.08203  [pdf, other

    eess.AS cs.SD

    LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation

    Authors: Wenhao Guan, Kaidi Wang, Wang** Zhou, Yang Wang, Feng Deng, Hui Wang, Lin Li, Qingyang Hong, Yong Qin

    Abstract: Recently, the application of diffusion models has facilitated the significant development of speech and audio generation. Nevertheless, the quality of samples generated by diffusion models still needs improvement. And the effectiveness of the method is accompanied by the extensive number of sampling steps, leading to an extended synthesis time necessary for generating high-quality audio. Previous… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech2024

  7. arXiv:2406.07854  [pdf, other

    cs.SD cs.MM eess.AS

    Zero-Shot Fake Video Detection by Audio-Visual Consistency

    Authors: Xiaolou Li, Zehua Liu, Chen Chen, Lantian Li, Li Guo, Dong Wang

    Abstract: Recent studies have advocated the detection of fake videos as a one-class detection task, predicated on the hypothesis that the consistency between audio and visual modalities of genuine data is more significant than that of fake data. This methodology, which solely relies on genuine audio-visual data while negating the need for forged counterparts, is thus delineated as a `zero-shot' detection pa… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: to be published in INTERSPEECH 2024

  8. arXiv:2406.07832  [pdf, other

    cs.SD eess.AS

    SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition

    Authors: Tianhao Wang, Lantian Li, Dong Wang

    Abstract: Deploying a well-optimized pre-trained speaker recognition model in a new domain often leads to a significant decline in performance. While fine-tuning is a commonly employed solution, it demands ample adaptation data and suffers from parameter inefficiency, rendering it impractical for real-world applications with limited data available for model adaptation. Drawing inspiration from the success o… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: to be published in INTERSPEECH 2024

  9. arXiv:2406.07421  [pdf, other

    cs.SD eess.AS

    A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition

    Authors: Zhenyu Zhou, Shibiao Xu, Shi Yin, Lantian Li, Dong Wang

    Abstract: Data augmentation (DA) has played a pivotal role in the success of deep speaker recognition. Current DA techniques primarily focus on speaker-preserving augmentation, which does not change the speaker trait of the speech and does not create new speakers. Recent research has shed light on the potential of speaker augmentation, which generates new speakers to enrich the training dataset. In this stu… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: to be published in INTERSPEECH 2024

  10. arXiv:2405.16765  [pdf, ps, other

    cs.LG eess.SP

    Study of Robust Direction Finding Based on Joint Sparse Representation

    Authors: Y. Li, W. Xiao, L. Zhao, Z. Huang, Q. Li, L. Li, R. C. de Lamare

    Abstract: Standard Direction of Arrival (DOA) estimation methods are typically derived based on the Gaussian noise assumption, making them highly sensitive to outliers. Therefore, in the presence of impulsive noise, the performance of these methods may significantly deteriorate. In this paper, we model impulsive noise as Gaussian noise mixed with sparse outliers. By exploiting their statistical differences,… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 6 pages, 4 figures

  11. arXiv:2405.16047  [pdf, ps, other

    cs.IT eess.SP

    Unified Timing Analysis for Closed-Loop Goal-Oriented Wireless Communication

    Authors: Lintao Li, Anders E. Kalør, Petar Popovski, Wei Chen

    Abstract: Goal-oriented communication has become one of the focal concepts in sixth-generation communication systems owing to its potential to provide intelligent, immersive, and real-time mobile services. The emerging paradigms of goal-oriented communication constitute closed loops integrating communication, computation, and sensing. However, challenges arise for closed-loop timing analysis due to multiple… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: submitted to IEEE Trans. Wireless Commun

  12. arXiv:2405.08783  [pdf, other

    eess.IV

    The Develo** Human Connectome Project: A Fast Deep Learning-based Pipeline for Neonatal Cortical Surface Reconstruction

    Authors: Qiang Ma, Kaili Liang, Liu Li, Saga Masui, Yourong Guo, Chiara Nosarti, Emma C. Robinson, Bernhard Kainz, Daniel Rueckert

    Abstract: The Develo** Human Connectome Project (dHCP) aims to explore developmental patterns of the human brain during the perinatal period. An automated processing pipeline has been developed to extract high-quality cortical surfaces from structural brain magnetic resonance (MR) images for the dHCP neonatal dataset. However, the current implementation of the pipeline requires more than 6.5 hours to proc… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  13. arXiv:2405.07218  [pdf, other

    physics.med-ph eess.SY

    Chained Flexible Capsule Endoscope: Unraveling the Conundrum of Size Limitations and Functional Integration for Gastrointestinal Transitivity

    Authors: Sishen Yuan, Guang Li, Baijia Liang, Lailu Li, Qingzhuo Zheng, Shuang Song, Zhen Li, Hongliang Ren

    Abstract: Capsule endoscopes, predominantly serving diagnostic functions, provide lucid internal imagery but are devoid of surgical or therapeutic capabilities. Consequently, despite lesion detection, physicians frequently resort to traditional endoscopic or open surgical procedures for treatment, resulting in more complex, potentially risky interventions. To surmount these limitations, this study introduce… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  14. arXiv:2405.04167  [pdf, other

    cs.CV eess.IV

    Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment

    Authors: Aobo Li, **jian Wu, Yongxu Liu, Leida Li

    Abstract: The annotation of blind image quality assessment (BIQA) is labor-intensive and time-consuming, especially for authentic images. Training on synthetic data is expected to be beneficial, but synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that introducing more distortion types in the synthetic dataset may… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR2024

  15. arXiv:2405.02132  [pdf, other

    cs.SD cs.CL eess.AS

    Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

    Authors: Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie

    Abstract: Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition (ASR) is becoming a mainstream paradigm. Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset. Specifically, our research aims to evaluate the impact of various configu… ▽ More

    Submitted 6 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  16. arXiv:2405.00885  [pdf, other

    cs.LG cs.NI eess.IV

    WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork Scheduling

    Authors: Huai-an Su, Jiaxiang Geng, Liang Li, Xiaoqi Qin, Yanzhao Hou, Xin Fu, Miao Pan

    Abstract: As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training b… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  17. arXiv:2404.16289  [pdf, other

    cs.IT eess.SP

    Deep Joint CSI Feedback and Multiuser Precoding for MIMO OFDM Systems

    Authors: Yiran Guo, Wei Chen, Jialong Xu, Lun Li, Bo Ai

    Abstract: The design of precoding plays a crucial role in achieving a high downlink sum-rate in multiuser multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) systems. In this correspondence, we propose a deep learning based joint CSI feedback and multiuser precoding method in frequency division duplex systems, aiming at maximizing the downlink sum-rate performance in an e… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  18. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhi**g Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  19. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  20. arXiv:2404.08010  [pdf, other

    cs.LG eess.IV

    Differentiable Search for Finding Optimal Quantization Strategy

    Authors: Lianqiang Li, Chenqian Yan, Yefei Chen

    Abstract: To accelerate and compress deep neural networks (DNNs), many network quantization algorithms have been proposed. Although the quantization strategy of any algorithm from the state-of-the-arts may outperform others in some network architectures, it is hard to prove the strategy is always better than others, and even cannot judge that the strategy is always the best choice for all layers in a networ… ▽ More

    Submitted 15 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  21. arXiv:2404.06682  [pdf, other

    cs.SD eess.AS

    Learning Multidimensional Disentangled Representations of Instrumental Sounds for Musical Similarity Assessment

    Authors: Yuka Hashizume, Li Li, Atsushi Miyashita, Tomoki Toda

    Abstract: To achieve a flexible recommendation and retrieval system, it is desirable to calculate music similarity by focusing on multiple partial elements of musical pieces and allowing the users to select the element they want to focus on. A previous study proposed using multiple individual networks for calculating music similarity based on each instrumental sound, but it is impractical to use each signal… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  22. DiffDet4SAR: Diffusion-based Aircraft Target Detection Network for SAR Images

    Authors: Zhou Jie, Xiao Chao, Peng Bo, Liu Zhen, Liu Li, Liu Yongxiang, Li Xiang

    Abstract: Aircraft target detection in SAR images is a challenging task due to the discrete scattering points and severe background clutter interference. Currently, methods with convolution-based or transformer-based paradigms cannot adequately address these issues. In this letter, we explore diffusion models for SAR image aircraft target detection for the first time and propose a novel \underline{Diff}usio… ▽ More

    Submitted 17 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: accepted by IEEE GRSL

    Journal ref: IEEE Geoscience and Remote Sensing Letters, vol. 21, pp. 1-5, 2024, Art no. 4007905

  23. arXiv:2403.11694  [pdf, other

    eess.IV cs.CV

    Object Segmentation-Assisted Inter Prediction for Versatile Video Coding

    Authors: Zhuoyuan Li, Zikun Yuan, Li Li, Dong Liu, Xiaohu Tang, Feng Wu

    Abstract: In modern video coding standards, block-based inter prediction is widely adopted, which brings high compression efficiency. However, in natural videos, there are usually multiple moving objects of arbitrary shapes, resulting in complex motion fields that are difficult to compactly represent. This problem has been tackled by more flexible block partitioning methods in the Versatile Video Coding (VV… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 22 pages, 15 figures

  24. arXiv:2403.10786  [pdf, other

    eess.IV cs.CV cs.LG

    ContourDiff: Unpaired Image Translation with Contour-Guided Diffusion Models

    Authors: Yuwen Chen, Nicholas Konz, Hanxue Gu, Haoyu Dong, Yaqian Chen, Lin Li, Jisoo Lee, Maciej A. Mazurowski

    Abstract: Accurately translating medical images across different modalities (e.g., CT to MRI) has numerous downstream clinical and machine learning applications. While several methods have been proposed to achieve this, they often prioritize perceptual quality with respect to output domain features over preserving anatomical fidelity. However, maintaining anatomy during translation is essential for many tas… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Code will be released on GitHub

  25. arXiv:2403.10581  [pdf, other

    q-bio.QM cs.AI cs.CL cs.LG eess.SP

    Large Language Model-informed ECG Dual Attention Network for Heart Failure Risk Prediction

    Authors: Chen Chen, Lei Li, Marcel Beetz, Abhirup Banerjee, Ramneek Gupta, Vicente Grau

    Abstract: Heart failure (HF) poses a significant public health challenge, with a rising global mortality rate. Early detection and prevention of HF could significantly reduce its impact. We introduce a novel methodology for predicting HF risk using 12-lead electrocardiograms (ECGs). We present a novel, lightweight dual-attention ECG network designed to capture complex ECG features essential for early HF ris… ▽ More

    Submitted 22 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Under journal revision

  26. arXiv:2403.05937  [pdf, other

    cs.CV eess.IV

    Wavelet-Like Transform-Based Technology in Response to the Call for Proposals on Neural Network-Based Image Coding

    Authors: Cunhui Dong, Haichuan Ma, Haotian Zhang, Changsheng Gao, Li Li, Dong Liu

    Abstract: Neural network-based image coding has been develo** rapidly since its birth. Until 2022, its performance has surpassed that of the best-performing traditional image coding framework -- H.266/VVC. Witnessing such success, the IEEE 1857.11 working subgroup initializes a neural network-based image coding standard project and issues a corresponding call for proposals (CfP). In response to the CfP, t… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  27. arXiv:2403.04626  [pdf, other

    eess.IV cs.CL cs.CV cs.LG

    MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder

    Authors: Lei Li, Tianfang Zhang, Xinglin Zhang, Jiaqi Liu, Bingqi Ma, Yan Luo, Tao Chen

    Abstract: Within the domain of medical analysis, extensive research has explored the potential of mutual learning between Masked Autoencoders(MAEs) and multimodal data. However, the impact of MAEs on intermodality remains a key challenge. We introduce MedFLIP, a Fast Language-Image Pre-training method for Medical analysis. We explore MAEs for zero-shot learning with crossed domains, which enhances the model… ▽ More

    Submitted 30 May, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  28. arXiv:2402.17550  [pdf, other

    cs.NI cs.AI eess.SP

    Emergency Caching: Coded Caching-based Reliable Map Transmission in Emergency Networks

    Authors: Zeyu Tian, Lianming Xu, Liang Li, Li Wang, Aiguo Fei

    Abstract: Many rescue missions demand effective perception and real-time decision making, which highly rely on effective data collection and processing. In this study, we propose a three-layer architecture of emergency caching networks focusing on data collection and reliable transmission, by leveraging efficient perception and edge caching technologies. Based on this architecture, we propose a disaster map… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  29. arXiv:2402.14401  [pdf, other

    cs.CV cs.LG eess.IV

    Diffusion Model Based Visual Compensation Guidance and Visual Difference Analysis for No-Reference Image Quality Assessment

    Authors: Zhaoyang Wang, Bo Hu, Mingyang Zhang, Jie Li, Leida Li, Maoguo Gong, Xinbo Gao

    Abstract: Existing free-energy guided No-Reference Image Quality Assessment (NR-IQA) methods still suffer from finding a balance between learning feature information at the pixel level of the image and capturing high-level feature information and the efficient utilization of the obtained high-level feature information remains a challenge. As a novel class of state-of-the-art (SOTA) generative model, the dif… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  30. arXiv:2402.09463  [pdf

    eess.IV

    Multi-Center Fetal Brain Tissue Annotation (FeTA) Challenge 2022 Results

    Authors: Kelly Payette, Céline Steger, Roxane Licandro, Priscille de Dumast, Hongwei Bran Li, Matthew Barkovich, Liu Li, Maik Dannecker, Chen Chen, Cheng Ouyang, Niccolò McConnell, Alina Miron, Yongmin Li, Alena Uus, Irina Grigorescu, Paula Ramirez Gilliland, Md Mahfuzur Rahman Siddiquee, Daguang Xu, Andriy Myronenko, Haoyu Wang, Ziyan Huang, ** Ye, Mireia Alenyà, Valentin Comte, Oscar Camara , et al. (42 additional authors not shown)

    Abstract: Segmentation is a critical step in analyzing the develo** human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across dif… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Results from FeTA Challenge 2022, held at MICCAI; Manuscript submitted. Supplementary Info (including submission methods descriptions) available here: https://zenodo.org/records/10628648

  31. arXiv:2402.05079  [pdf, other

    eess.IV cs.CV

    Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation

    Authors: Ziyang Wang, Jian-Qing Zheng, Yichi Zhang, Ge Cui, Lei Li

    Abstract: In recent advancements in medical image analysis, Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have set significant benchmarks. While the former excels in capturing local features through its convolution operations, the latter achieves remarkable global context understanding by leveraging self-attention mechanisms. However, both architectures exhibit limitations in efficiently… ▽ More

    Submitted 30 March, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  32. arXiv:2402.04157  [pdf, other

    eess.SY math.DS math.OC

    Controller synthesis for input-state data with measurement errors

    Authors: Andrea Bisoffi, Lidong Li, Claudio De Persis, Nima Monshizadeh

    Abstract: We consider the problem of designing a state-feedback controller for a linear system, based only on noisy input-state data. We focus on input-state data corrupted by measurement errors, which, albeit less investigated, are as relevant as process disturbances in applications. For energy and instantaneous bounds on these measurement errors, we derive linear matrix inequalities for controller design… ▽ More

    Submitted 14 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  33. arXiv:2402.02730  [pdf, ps, other

    cs.SD eess.AS

    How phonemes contribute to deep speaker models?

    Authors: Pengqi Li, Tianhao Wang, Lantian Li, Askar Hamdulla, Dong Wang

    Abstract: Which phonemes convey more speaker traits is a long-standing question, and various perception experiments were conducted with human subjects. For speaker recognition, studies were conducted with the conventional statistical models and the drawn conclusions are more or less consistent with the perception results. However, which phonemes are more important with modern deep neural models is still une… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  34. arXiv:2402.02699  [pdf, other

    cs.SD cs.LG eess.AS

    Adversarial Data Augmentation for Robust Speaker Verification

    Authors: Zhenyu Zhou, Junhui Chen, Namin Wang, Lantian Li, Dong Wang

    Abstract: Data augmentation (DA) has gained widespread popularity in deep speaker models due to its ease of implementation and significant effectiveness. It enriches training data by simulating real-life acoustic variations, enabling deep neural networks to learn speaker-related representations while disregarding irrelevant acoustic variations, thereby improving robustness and generalization. However, a pot… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  35. arXiv:2402.02588  [pdf, other

    eess.SY cs.LG

    Controller Synthesis from Noisy-Input Noisy-Output Data

    Authors: Lidong Li, Andrea Bisoffi, Claudio De Persis, Nima Monshizadeh

    Abstract: We consider the problem of synthesizing a dynamic output-feedback controller for a linear system, using solely input-output data corrupted by measurement noise. To handle input-output data, an auxiliary representation of the original system is introduced. By exploiting the structure of the auxiliary system, we design a controller that robustly stabilizes all possible systems consistent with data.… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  36. arXiv:2402.02349  [pdf

    eess.IV cs.CV

    Vision Transformer-based Multimodal Feature Fusion Network for Lymphoma Segmentation on PET/CT Images

    Authors: Huan Huang, Liheng Qiu, Shenmiao Yang, Longxi Li, Jiaofen Nan, Yanting Li, Chuang Han, Fubao Zhu, Chen Zhao, Weihua Zhou

    Abstract: Background: Diffuse large B-cell lymphoma (DLBCL) segmentation is a challenge in medical image analysis. Traditional segmentation methods for lymphoma struggle with the complex patterns and the presence of DLBCL lesions. Objective: We aim to develop an accurate method for lymphoma segmentation with 18F-Fluorodeoxyglucose positron emission tomography (PET) and computed tomography (CT) images. Metho… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 14 pages, 6 figures; reference added

  37. arXiv:2402.01104  [pdf, other

    eess.SY

    Simulation Framework for Vehicle and Electric Scooter Interaction

    Authors: Zhitong He, Lingxi Li

    Abstract: The number of shared micro-mobility services such as electric scooters (e-scooters) has an increasing trend due to the advantages of high efficiency and low cost in short-range travel in urban areas. However, due to the unique characteristics of moving behavior, it is commonly seen that e-scooters may share the road with other motor vehicles. The lack of protection may lead to severe injury for e-… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: The paper has been accepted by 26th IEEE International Conference on Intelligent Transportation Systems ITSC 2023

  38. arXiv:2401.15864  [pdf, other

    cs.CV eess.IV

    Spatial Decomposition and Temporal Fusion based Inter Prediction for Learned Video Compression

    Authors: Xihua Sheng, Li Li, Dong Liu, Houqiang Li

    Abstract: Video compression performance is closely related to the accuracy of inter prediction. It tends to be difficult to obtain accurate inter prediction for the local video regions with inconsistent motion and occlusion. Traditional video coding standards propose various technologies to handle motion inconsistency and occlusion, such as recursive partitions, geometric partitions, and long-term reference… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

  39. arXiv:2401.12974  [pdf, other

    eess.IV cs.CV q-bio.QM

    SegmentAnyBone: A Universal Model that Segments Any Bone at Any Location on MRI

    Authors: Hanxue Gu, Roy Colglazier, Haoyu Dong, Jikai Zhang, Yaqian Chen, Zafer Yildiz, Yuwen Chen, Lin Li, Jichen Yang, Jay Willhite, Alex M. Meyer, Brian Guo, Yashvi Atul Shah, Emily Luo, Shipra Rajput, Sally Kuehn, Clark Bulleit, Kevin A. Wu, Jisoo Lee, Brandon Ramirez, Darui Lu, Jay M. Levin, Maciej A. Mazurowski

    Abstract: Magnetic Resonance Imaging (MRI) is pivotal in radiology, offering non-invasive and high-quality insights into the human body. Precise segmentation of MRIs into different organs and tissues would be highly beneficial since it would allow for a higher level of understanding of the image content and enable important measurements, which are essential for accurate diagnosis and effective treatment pla… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 15 pages, 15 figures

  40. arXiv:2401.06780  [pdf, other

    eess.IV cs.AI cs.CV

    HA-HI: Synergising fMRI and DTI through Hierarchical Alignments and Hierarchical Interactions for Mild Cognitive Impairment Diagnosis

    Authors: Xiongri Shen, Zhenxi Song, Linling Li, Min Zhang, Lingyan Liang Honghai Liu, Demao Deng, Zhiguo Zhang

    Abstract: Early diagnosis of mild cognitive impairment (MCI) and subjective cognitive decline (SCD) utilizing multi-modal magnetic resonance imaging (MRI) is a pivotal area of research. While various regional and connectivity features from functional MRI (fMRI) and diffusion tensor imaging (DTI) have been employed to develop diagnosis models, most studies integrate these features without adequately addressi… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  41. arXiv:2401.06435  [pdf, other

    cs.IT eess.SP

    Swin Transformer-Based CSI Feedback for Massive MIMO

    Authors: Jiaming Cheng, Wei Chen, Jialong Xu, Yiran Guo, Lun Li, Bo Ai

    Abstract: For massive multiple-input multiple-output systems in the frequency division duplex (FDD) mode, accurate downlink channel state information (CSI) is required at the base station (BS). However, the increasing number of transmit antennas aggravates the feedback overhead of CSI. Recently, deep learning (DL) has shown considerable potential to reduce CSI feedback overhead. In this paper, we propose a… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  42. arXiv:2401.03829  [pdf, ps, other

    physics.optics eess.IV

    Sub-Rayleigh ghost imaging via structured illumination

    Authors: Liming Li

    Abstract: The structured illumination adopted widely in super-resolution microscopy imaging works for the ghost imaging scheme also. Here, we studied the ghost imaging scheme with sinusoidal-structured speckle illumination, whose spatial resolution can surpass the Rayleigh resolution limit by a factor of 2. In addition, even if the bucket intensity signal originated from the diffraction of the object plane… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  43. arXiv:2312.17446  [pdf, other

    cs.LG cs.AI eess.SP

    ClST: A Convolutional Transformer Framework for Automatic Modulation Recognition by Knowledge Distillation

    Authors: Dongbin Hou, Lixin Li, Wensheng Lin, Junli Liang, Zhu Han

    Abstract: With the rapid development of deep learning (DL) in recent years, automatic modulation recognition (AMR) with DL has achieved high accuracy. However, insufficient training signal data in complicated channel environments and large-scale DL models are critical factors that make DL methods difficult to deploy in practice. Aiming to these problems, we propose a novel neural network named convolution-l… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  44. arXiv:2312.16836  [pdf, other

    cs.SD eess.AS

    Remixed2Remixed: Domain adaptation for speech enhancement by Noise2Noise learning with Remixing

    Authors: Li Li, Shogo Seki

    Abstract: This paper proposes Remixed2Remixed, a domain adaptation method for speech enhancement, which adopts Noise2Noise (N2N) learning to adapt models trained on artificially generated (out-of-domain: OOD) noisy-clean pair data to better separate real-world recorded (in-domain) noisy data. The proposed method uses a teacher model trained on OOD data to acquire pseudo-in-domain speech and noise signals, w… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP2024

  45. arXiv:2312.15197  [pdf, other

    cs.SD cs.CL cs.CV eess.AS

    TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation

    Authors: Xize Cheng, Rongjie Huang, Linjun Li, Tao **, Zehan Wang, Aoxiong Yin, Minglei Li, Xinyu Duan, changpeng yang, Zhou Zhao

    Abstract: Direct speech-to-speech translation achieves high-quality results through the introduction of discrete units obtained from self-supervised learning. This approach circumvents delays and cascading errors associated with model cascading. However, talking head translation, converting audio-visual speech (i.e., talking head video) from one language into another, still confronts several challenges comp… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

  46. arXiv:2312.13310  [pdf, other

    eess.IV cs.CV

    Computational Spectral Imaging with Unified Encoding Model: A Comparative Study and Beyond

    Authors: Xinyuan Liu, Lizhi Wang, Lingen Li, Chang Chen, Xue Hu, Fenglong Song, Youliang Yan

    Abstract: Computational spectral imaging is drawing increasing attention owing to the snapshot advantage, and amplitude, phase, and wavelength encoding systems are three types of representative implementations. Fairly comparing and understanding the performance of these systems is essential, but challenging due to the heterogeneity in encoding design. To overcome this limitation, we propose the unified enco… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  47. arXiv:2312.13127  [pdf, other

    eess.IV cs.CV

    Pixel-to-Abundance Translation: Conditional Generative Adversarial Networks Based on Patch Transformer for Hyperspectral Unmixing

    Authors: Li Wang, Xiaohua Zhang, Longfei Li, Hongyun Meng, Xianghai Cao

    Abstract: Spectral unmixing is a significant challenge in hyperspectral image processing. Existing unmixing methods utilize prior knowledge about the abundance distribution to solve the regularization optimization problem, where the difficulty lies in choosing appropriate prior knowledge and solving the complex regularization optimization problem. To solve these problems, we propose a hyperspectral conditio… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  48. arXiv:2312.10687  [pdf, other

    eess.AS cs.SD

    MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis

    Authors: Wenhao Guan, Yishuang Li, Tao Li, Hukai Huang, Feng Wang, Jiayan Lin, Lingyan Huang, Lin Li, Qingyang Hong

    Abstract: The style transfer task in Text-to-Speech refers to the process of transferring style information into text content to generate corresponding speech with a specific style. However, most existing style transfer approaches are either based on fixed emotional labels or reference speech clips, which cannot achieve flexible style transfer. Recently, some methods have adopted text descriptions to guide… ▽ More

    Submitted 31 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted at AAAI2024

  49. arXiv:2312.07226  [pdf, other

    eess.IV cs.CV

    Super-Resolution on Rotationally Scanned Photoacoustic Microscopy Images Incorporating Scanning Prior

    Authors: Kai Pan, Linyang Li, Li Lin, Pu** Cheng, Junyan Lyu, Lei Xi, Xiaoyin Tang

    Abstract: Photoacoustic Microscopy (PAM) images integrating the advantages of optical contrast and acoustic resolution have been widely used in brain studies. However, there exists a trade-off between scanning speed and image resolution. Compared with traditional raster scanning, rotational scanning provides good opportunities for fast PAM imaging by optimizing the scanning mechanism. Recently, there is a t… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  50. arXiv:2312.06101  [pdf, other

    eess.IV cs.CV

    Hundred-Kilobyte Lookup Tables for Efficient Single-Image Super-Resolution

    Authors: Binxiao Huang, Jason Chun Lok Li, Jie Ran, Boyu Li, Jiajun Zhou, Dahai Yu, Ngai Wong

    Abstract: Conventional super-resolution (SR) schemes make heavy use of convolutional neural networks (CNNs), which involve intensive multiply-accumulate (MAC) operations, and require specialized hardware such as graphics processing units. This contradicts the regime of edge AI that often runs on devices strained by power, computing, and storage resources. Such a challenge has motivated a series of lookup ta… ▽ More

    Submitted 8 May, 2024; v1 submitted 10 December, 2023; originally announced December 2023.