Skip to main content

Showing 1–50 of 550 results for author: Li, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.01336  [pdf, other

    cs.IT eess.SP

    Compressed Sensing Inspired User Acquisition for Downlink Integrated Sensing and Communication Transmissions

    Authors: Yi Song, Fernando Pedraza, Shuangyang Li, Siyao Li, Han Yu, Giuseppe Caire

    Abstract: This paper investigates radar-assisted user acquisition for downlink multi-user multiple-input multiple-output (MIMO) transmission using Orthogonal Frequency Division Multiplexing (OFDM) signals. Specifically, we formulate a concise mathematical model for the user acquisition problem, where each user is characterized by its delay and beamspace response. Therefore, we propose a two-stage method for… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.18301  [pdf, other

    eess.AS cs.CL cs.SD

    MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research

    Authors: Song Li, Yongbin You, Xuezhi Wang, Zhengkun Tian, Ke Ding, Guanglu Wan

    Abstract: Recently, multilingual artificial intelligence assistants, exemplified by ChatGPT, have gained immense popularity. As a crucial gateway to human-computer interaction, multilingual automatic speech recognition (ASR) has also garnered significant attention, as evidenced by systems like Whisper. However, the proprietary nature of the training data has impeded researchers' efforts to study multilingua… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024

  3. arXiv:2406.14816  [pdf

    physics.med-ph eess.SP physics.bio-ph

    A novel optical assay system for bilirubin concentration measurement in whole blood

    Authors: Jean Pierre Ndabakuranye, Anushi E. Rajapaksa, Genia Burchall, Shiqiang Li, Steven Prawer, Arman Ahnood

    Abstract: As a biomarker for liver disease, bilirubin has been utilized in prognostic scoring systems for cirrhosis. While laboratory-based methods are used to determine bilirubin levels in clinical settings, they do not readily lend themselves to applications outside of hospitals. Consequently, bilirubin monitoring for cirrhotic patients is often performed only intermittently; thus, episodes requiring clin… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Journal ref: IEEE Transactions on Biomedical Engineering 69.2 (2021): 983-990

  4. arXiv:2406.14696  [pdf, other

    eess.SY cs.AI

    Physically Analyzable AI-Based Nonlinear Platoon Dynamics Modeling During Traffic Oscillation: A Koopman Approach

    Authors: Kexin Tian, Haotian Shi, Yang Zhou, Sixu Li

    Abstract: Given the complexity and nonlinearity inherent in traffic dynamics within vehicular platoons, there exists a critical need for a modeling methodology with high accuracy while concurrently achieving physical analyzability. Currently, there are two predominant approaches: the physics model-based approach and the Artificial Intelligence (AI)--based approach. Knowing the facts that the physical-based… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. Blind Super-Resolution via Meta-learning and Markov Chain Monte Carlo Simulation

    Authors: **gyuan Xia, Zhixiong Yang, Shengxi Li, Shuanghui Zhang, Yaowen Fu, Deniz Gündüz, Xiang Li

    Abstract: Learning-based approaches have witnessed great successes in blind single image super-resolution (SISR) tasks, however, handcrafted kernel priors and learning based kernel priors are typically required. In this paper, we propose a Meta-learning and Markov Chain Monte Carlo (MCMC) based SISR approach to learn kernel priors from organized randomness. In concrete, a lightweight network is adopted as k… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

  6. arXiv:2406.04791  [pdf, other

    cs.SD eess.AS

    Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

    Authors: Shaojun Li, Daimeng Wei, Jiaxin Guo, ZongYao Li, Zhanglin Wu, Zhiqiang Rao, Yuanchang Luo, Xianghui He, Hao Yang

    Abstract: Despite recent improvements in End-to-End Automatic Speech Recognition (E2E ASR) systems, the performance can degrade due to vocal characteristic mismatches between training and testing data, particularly with limited target speaker adaptation data. We propose a novel speaker adaptation approach Speaker-Smoothed kNN that leverages k-Nearest Neighbors (kNN) retrieval techniques to improve model out… ▽ More

    Submitted 11 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  7. arXiv:2406.04776  [pdf, ps, other

    eess.SP cs.AI

    OFDM-Standard Compatible SC-NOFS Waveforms for Low-Latency and Jitter-Tolerance Industrial IoT Communications

    Authors: Tongyang Xu, Shuangyang Li, **hong Yuan

    Abstract: Traditional communications focus on regular and orthogonal signal waveforms for simplified signal processing and improved spectral efficiency. In contrast, the next-generation communications would aim for irregular and non-orthogonal signal waveforms to introduce new capabilities. This work proposes a spectrally efficient irregular Sinc (irSinc) sha** technique, revisiting the traditional Sinc b… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  8. arXiv:2405.18558  [pdf, other

    cs.RO eess.SY

    "Golden Ratio Yoshimura" for Meta-Stable and Massively Reconfigurable Deployment

    Authors: Vishrut Deshpande, Yogesh Phalak, Ziyang Zhou, Ian Walker, Suyi Li

    Abstract: Yoshimura origami is a classical folding pattern that has inspired many deployable structure designs. Its applications span from space exploration, kinetic architectures, and soft robots to even everyday household items. However, despite its wide usage, Yoshimura has been fixated on a set of design constraints to ensure its flat-foldability. Through extensive kinematic analysis and prototype tests… ▽ More

    Submitted 30 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  9. arXiv:2405.12408  [pdf, other

    cs.RO eess.SY

    Flexible Active Safety Motion Control for Robotic Obstacle Avoidance: A CBF-Guided MPC Approach

    Authors: **hao Liu, Jun Yang, Jianliang Mao, Tianqi Zhu, Qihang Xie, Yimeng Li, Xiangyu Wang, Shihua Li

    Abstract: A flexible active safety motion (FASM) control approach is proposed for the avoidance of dynamic obstacles and the reference tracking in robot manipulators. The distinctive feature of the proposed method lies in its utilization of control barrier functions (CBF) to design flexible CBF-guided safety criteria (CBFSC) with dynamically optimized decay rates, thereby offering flexibility and active saf… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 11 pages, 11 figures

  10. arXiv:2405.11006  [pdf, other

    eess.SY nlin.AO

    Self-Triggered Distributed Model Predictive Control with Synchronization Parameters Interaction

    Authors: Qianqian Chen, Shaoyuan Li

    Abstract: This paper investigates an aperiodic distributed model predictive control approach for multi-agent systems (MASs) in which parameterized synchronization constraints is considered and an innovative self-triggered criterion is constructed. Different from existing coordination methodology, the proposed strategy achieves the cooperation of agents through the synchronization of one-dimensional paramete… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  11. arXiv:2405.11005  [pdf, other

    eess.SY nlin.AO

    Distributed Model Predictive Control for Asynchronous Multi-agent Systems with Self-Triggered Coordinator

    Authors: Qianqian Chen, Shaoyuan Li

    Abstract: This paper investigates the distributed model predictive control for an asynchronous nonlinear multi-agent system with external interference via a self-triggered generator and a prediction horizon regulator. First, a shrinking constraint related to the error between the actual state and the predicted state is introduced into the optimal control problem to enable the robustness of the system. Then,… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  12. arXiv:2405.09557  [pdf, other

    eess.SP cs.LG

    Machine Learning in Short-Reach Optical Systems: A Comprehensive Survey

    Authors: Chen Shao, Elias Giacoumidis, Syed Moktacim Billah, Shi Li, Jialei Li, Prashasti Sahu, Andre Richter, Tobias Kaefer, Michael Faerber

    Abstract: In recent years, extensive research has been conducted to explore the utilization of machine learning algorithms in various direct-detected and self-coherent short-reach communication applications. These applications encompass a wide range of tasks, including bandwidth request prediction, signal quality monitoring, fault detection, traffic prediction, and digital signal processing (DSP)-based equa… ▽ More

    Submitted 29 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 23 pages, 2 figure, 3 tables, Accepted as MDPI Photonics Journal Speical Issue Machine Learning Applied to Optical Communication Systems

  13. arXiv:2405.09317  [pdf, other

    eess.SY

    Controllability Test for Nonlinear Datatic Systems

    Authors: Yujie Yang, Letian Tao, Likun Wang, Shengbo Eben Li

    Abstract: Controllability is a fundamental property of control systems, serving as the prerequisite for controller design. While controllability test is well established in modelic (i.e., model-driven) control systems, extending it to datatic (i.e., data-driven) control systems is still a challenging task due to the absence of system models. In this study, we propose a general controllability test method fo… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  14. arXiv:2405.07685  [pdf, other

    eess.SY

    Comprehensive Analysis of Access Control Models in Edge Computing: Challenges, Solutions, and Future Directions

    Authors: Tao Xue, Ying Zhang, Yanbin Wang, Wenbo Wang, Shuailou Li, Haibin Zhang

    Abstract: Many contemporary applications, including smart homes and autonomous vehicles, rely on the Internet of Things technology. While cloud computing provides a multitude of valuable services for these applications, it generally imposes constraints on latency-sensitive applications due to the significant propagation delays. As a complementary technique to cloud computing, edge computing situates computi… ▽ More

    Submitted 22 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  15. arXiv:2405.05518  [pdf, other

    cs.CV cs.RO eess.IV

    DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction

    Authors: Siyu Li, Jiacheng Lin, Hao Shi, Jiaming Zhang, Song Wang, You Yao, Zhiyong Li, Kailun Yang

    Abstract: Temporal information plays a pivotal role in Bird's-Eye-View (BEV) driving scene understanding, which can alleviate the visual information sparsity. However, the indiscriminate temporal fusion method will cause the barrier of feature redundancy when constructing vectorized High-Definition (HD) maps. In this paper, we revisit the temporal fusion of vectorized HD maps, focusing on temporal instance… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: The source code will be made publicly available at https://github.com/lynn-yu/DTCLMapper

  16. arXiv:2405.03597  [pdf, other

    eess.SP

    Improving the Ranging Performance of Random ISAC Signals Through Pulse Sha** Design

    Authors: Zihan Liao, Fan Liu, Shuangyang Li, Yifeng Xiong, Weijie Yuan, Marco Lops

    Abstract: In this paper, we propose a novel pulse sha** design for single-carrier integrated sensing and communication (ISAC) transmission. Due to the communication information embedded in the ISAC signal, the resulting auto-correlation function (ACF) is determined by both the information-conveying random symbol sequence and the signaling pulse, where the former leads to random fluctuations in the sidelob… ▽ More

    Submitted 6 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  17. arXiv:2405.00720  [pdf, other

    eess.SP cs.LG

    A Novel Machine Learning-based Equalizer for a Downstream 100G PAM-4 PON

    Authors: Chen Shao, Elias Giacoumidis, Shi Li, Jialei Li, Michael Faerber, Tobias Kaefer, Andre Richter

    Abstract: A frequency-calibrated SCINet (FC-SCINet) equalizer is proposed for down-stream 100G PON with 28.7 dB path loss. At 5 km, FC-SCINet improves the BER by 88.87% compared to FFE and a 3-layer DNN with 10.57% lower complexity.

    Submitted 25 April, 2024; originally announced May 2024.

    Comments: 3 pages, 6 figures, accepted by Optical Fiber Communications Conference and Exhibition 2024

  18. arXiv:2404.18096  [pdf, other

    eess.IV cs.CV

    Snake with Shifted Window: Learning to Adapt Vessel Pattern for OCTA Segmentation

    Authors: Xinrun Chen, Mei Shen, Haojian Ning, Mengzhan Zhang, Chengliang Wang, Shiying Li

    Abstract: Segmenting specific targets or structures in optical coherence tomography angiography (OCTA) images is fundamental for conducting further pathological studies. The retinal vascular layers are rich and intricate, and such vascular with complex shapes can be captured by the widely-studied OCTA images. In this paper, we thus study how to use OCTA images with projection vascular layers to segment reti… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  19. arXiv:2404.15620  [pdf, other

    eess.IV

    A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution

    Authors: Zhixiong Yang, **gyuan Xia, Shengxi Li, Xinghua Huang, Shuanghui Zhang, Zhen Liu, Yaowen Fu, Yongxiang Liu

    Abstract: Deep learning-based methods have achieved significant successes on solving the blind super-resolution (BSR) problem. However, most of them request supervised pre-training on labelled datasets. This paper proposes an unsupervised kernel estimation model, named dynamic kernel prior (DKP), to realize an unsupervised and pre-training-free learning-based algorithm for solving the BSR problem. DKP can a… ▽ More

    Submitted 25 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted for publication in CVPR 2024

  20. arXiv:2404.13388  [pdf

    eess.IV cs.CV cs.LG

    Diagnosis of Multiple Fundus Disorders Amidst a Scarcity of Medical Experts Via Self-supervised Machine Learning

    Authors: Yong Liu, Mengtian Kang, Shuo Gao, Chi Zhang, Ying Liu, Shiming Li, Yue Qi, Arokia Nathan, Wenjun Xu, Chenyu Tang, Edoardo Occhipinti, Mayinuer Yusufu, Ningli Wang, Weiling Bai, Luigi Occhipinti

    Abstract: Fundus diseases are major causes of visual impairment and blindness worldwide, especially in underdeveloped regions, where the shortage of ophthalmologists hinders timely diagnosis. AI-assisted fundus image analysis has several advantages, such as high accuracy, reduced workload, and improved accessibility, but it requires a large amount of expert-annotated data to build reliable models. To addres… ▽ More

    Submitted 23 April, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

  21. arXiv:2404.13386  [pdf

    eess.IV cs.CV cs.LG

    SSVT: Self-Supervised Vision Transformer For Eye Disease Diagnosis Based On Fundus Images

    Authors: Jiaqi Wang, Mengtian Kang, Yong Liu, Chi Zhang, Ying Liu, Shiming Li, Yue Qi, Wenjun Xu, Chenyu Tang, Edoardo Occhipinti, Mayinuer Yusufu, Ningli Wang, Weiling Bai, Shuo Gao, Luigi G. Occhipinti

    Abstract: Machine learning-based fundus image diagnosis technologies trigger worldwide interest owing to their benefits such as reducing medical resource power and providing objective evaluation results. However, current methods are commonly based on supervised methods, bringing in a heavy workload to biomedical staff and hence suffering in expanding effective databases. To address this issue, in this artic… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: ISBI 2024

  22. arXiv:2404.13159  [pdf, other

    cs.CV cs.LG eess.IV

    Equivariant Imaging for Self-supervised Hyperspectral Image Inpainting

    Authors: Shuo Li, Mike Davies, Mehrdad Yaghoobi

    Abstract: Hyperspectral imaging (HSI) is a key technology for earth observation, surveillance, medical imaging and diagnostics, astronomy and space exploration. The conventional technology for HSI in remote sensing applications is based on the push-broom scanning approach in which the camera records the spectral image of a stripe of the scene at a time, while the image is generated by the aggregation of mea… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 5 Pages, 4 Figures, 2 Tables

  23. arXiv:2404.13153  [pdf, other

    eess.IV cs.CV

    Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring

    Authors: Chengxu Liu, Xuan Wang, Xiangyu Xu, Ruhao Tian, Shuai Li, Xueming Qian, Ming-Hsuan Yang

    Abstract: Eliminating image blur produced by various kinds of motion has been a challenging problem. Dominant approaches rely heavily on model capacity to remove blurring by reconstructing residual from blurry observation in feature space. These practices not only prevent the capture of spatially variable motion in the real world but also ignore the tailored handling of various motions in image space. In th… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  24. arXiv:2404.12794  [pdf, other

    cs.CV cs.MM cs.RO eess.IV

    MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

    Authors: Kang Zeng, Hao Shi, Jiacheng Lin, Siyu Li, **tao Cheng, Kaiwei Wang, Zhiyong Li, Kailun Yang

    Abstract: LiDAR-based Moving Object Segmentation (MOS) aims to locate and segment moving objects in point clouds of the current scan using motion information from previous scans. Despite the promising results achieved by previous MOS methods, several key issues, such as the weak coupling of temporal and spatial information, still need further study. In this paper, we propose a novel LiDAR-based 3D Moving Ob… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: The source code will be made publicly available at https://github.com/Terminal-K/MambaMOS

  25. arXiv:2404.10714  [pdf, other

    eess.IV cs.CV

    AV-GAN: Attention-Based Varifocal Generative Adversarial Network for Uneven Medical Image Translation

    Authors: Zexin Li, Yiyang Lin, Zijie Fang, Shuyan Li, Xiu Li

    Abstract: Different types of staining highlight different structures in organs, thereby assisting in diagnosis. However, due to the impossibility of repeated staining, we cannot obtain different types of stained slides of the same tissue area. Translating the slide that is easy to obtain (e.g., H&E) to slides of staining types difficult to obtain (e.g., MT, PAS) is a promising way to solve this problem. How… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  26. arXiv:2404.10064  [pdf, other

    eess.SY

    The Feasibility of Constrained Reinforcement Learning Algorithms: A Tutorial Study

    Authors: Yujie Yang, Zhilong Zheng, Shengbo Eben Li, Masayoshi Tomizuka, Changliu Liu

    Abstract: Satisfying safety constraints is a priority concern when solving optimal control problems (OCPs). Due to the existence of infeasibility phenomenon, where a constraint-satisfying solution cannot be found, it is necessary to identify a feasible region before implementing a policy. Existing feasibility theories built for model predictive control (MPC) only consider the feasibility of optimal policy.… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  27. arXiv:2404.09385  [pdf, other

    eess.AS cs.CL eess.SP

    A Large-Scale Evaluation of Speech Foundation Models

    Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

    Abstract: The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferred

  28. arXiv:2404.05600  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechAlign: Aligning Speech Generation to Human Preferences

    Authors: Dong Zhang, Zhaowei Li, Shimin Li, Xin Zhang, Pengyu Wang, Yaqian Zhou, Xipeng Qiu

    Abstract: Speech language models have significantly advanced in generating realistic speech, with neural codec language models standing out. However, the integration of human feedback to align speech outputs to human preferences is often neglected. This paper addresses this gap by first analyzing the distribution gap in codec language models, highlighting how it leads to discrepancies between the training a… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Work in progress

  29. arXiv:2404.02920  [pdf, other

    eess.SY

    Advanced Algorithms for Autonomous Guidance of Solar-powered UAVs

    Authors: Siyuan Li

    Abstract: Unmanned aerial vehicle (UAV) techniques have developed rapidly within the past few decades. Using UAVs provides benefits in numerous applications such as site surveying, communication systems, parcel delivery, target tracking, etc. The high manoeuvrability of the drone and its ability to replace a certain amount of labour cost are the reasons why it can be widely chosen. There will be more applic… ▽ More

    Submitted 28 March, 2024; originally announced April 2024.

    Comments: 31 Pages, master degree thesis

  30. arXiv:2404.02185  [pdf, other

    cs.CV cs.GR eess.IV

    NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation

    Authors: Sicheng Li, Hao Li, Yiyi Liao, Lu Yu

    Abstract: The emergence of Neural Radiance Fields (NeRF) has greatly impacted 3D scene modeling and novel-view synthesis. As a kind of visual media for 3D scene representation, compression with high rate-distortion performance is an eternal target. Motivated by advances in neural compression and neural field representation, we propose NeRFCodec, an end-to-end NeRF compression framework that integrates non-l… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR2024. The source code will be released

  31. arXiv:2404.00481  [pdf, other

    stat.ML cs.LG eess.SY

    Convolutional Bayesian Filtering

    Authors: Wenhan Cao, Shiqi Liu, Chang Liu, Zeyu He, Stephen S. -T. Yau, Shengbo Eben Li

    Abstract: Bayesian filtering serves as the mainstream framework of state estimation in dynamic systems. Its standard version utilizes total probability rule and Bayes' law alternatively, where how to define and compute conditional probability is critical to state distribution inference. Previously, the conditional probability is assumed to be exactly known, which represents a measure of the occurrence proba… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  32. arXiv:2403.18632  [pdf, other

    eess.SY

    Optimal Control Synthesis of Markov Decision Processes for Efficiency with Surveillance Tasks

    Authors: Yu Chen, Xuanyuan Yin, Shaoyuan Li, Xiang Yin

    Abstract: We investigate the problem of optimal control synthesis for Markov Decision Processes (MDPs), addressing both qualitative and quantitative objectives. Specifically, we require the system to fulfill a qualitative surveillance task in the sense that a specific region of interest can be visited infinitely often with probability one. Furthermore, to quantify the performance of the system, we consider… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  33. arXiv:2403.17704   

    eess.SY cs.MA

    Prioritize Team Actions: Multi-Agent Temporal Logic Task Planning with Ordering Constraints

    Authors: Bowen Ye, Jianing Zhao, Shaoyuan Li, Xiang Yin

    Abstract: In this paper, we investigate the problem of linear temporal logic (LTL) path planning for multi-agent systems, introducing the new concept of \emph{ordering constraints}. Specifically, we consider a generic objective function that is defined for the path of each individual agent. The primary objective is to find a global plan for the team of agents, ensuring they collectively meet the specified L… ▽ More

    Submitted 8 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: This article is withdrawn due to errors in the methodology section, specifically concerning the insufficient explanation of the data collection process. Upon review, it's clear that the data sampling methods were not adequately described, potentially leading to misinterpretations of the results

  34. arXiv:2403.16973  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

    Authors: Puyuan Peng, Po-Yao Huang, Shang-Wen Li, Abdelrahman Mohamed, David Harwath

    Abstract: We introduce VoiceCraft, a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on audiobooks, internet videos, and podcasts. VoiceCraft employs a Transformer decoder architecture and introduces a token rearrangement procedure that combines causal masking and delayed stacking to enable generation within an… ▽ More

    Submitted 13 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: ACL 2024. Data, code, and model weights are available at https://github.com/jasonppy/VoiceCraft

  35. arXiv:2403.16212  [pdf, other

    eess.IV cs.CV cs.LG

    Leveraging Deep Learning and Xception Architecture for High-Accuracy MRI Classification in Alzheimer Diagnosis

    Authors: Shaojie Li, Haichen Qu, Xinqi Dong, Bo Dang, Hengyi Zang, Yulu Gong

    Abstract: Exploring the application of deep learning technologies in the field of medical diagnostics, Magnetic Resonance Imaging (MRI) provides a unique perspective for observing and diagnosing complex neurodegenerative diseases such as Alzheimer Disease (AD). With advancements in deep learning, particularly in Convolutional Neural Networks (CNNs) and the Xception network architecture, we are now able to a… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  36. arXiv:2403.14192  [pdf, ps, other

    cs.IT eess.SP

    Fundamentals of Delay-Doppler Communications: Practical Implementation and Extensions to OTFS

    Authors: Shuangyang Li, Peter Jung, Weijie Yuan, Zhiqiang Wei, **hong Yuan, Baoming Bai, Giuseppe Caire

    Abstract: The recently proposed orthogonal time frequency space (OTFS) modulation, which is a typical Delay-Doppler (DD) communication scheme, has attracted significant attention thanks to its appealing performance over doubly-selective channels. In this paper, we present the fundamentals of general DD communications from the viewpoint of the Zak transform. We start our study by constructing DD domain basis… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  37. arXiv:2403.13225  [pdf, other

    eess.IV

    Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation

    Authors: Linshan Wu, Zhun Zhong, Jiayi Ma, Yunchao Wei, Hao Chen, Leyuan Fang, Shutao Li

    Abstract: Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models by weak labels, which is receiving significant attention due to its low annotation cost. Existing approaches focus on generating pseudo labels for supervision while largely ignoring to leverage the inherent semantic correlation among different pseudo labels. We observe that pseudo-labeled pixels that are close to each… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  38. arXiv:2403.10805  [pdf, other

    cs.SD cs.AI cs.CV cs.GR cs.HC eess.AS

    Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference

    Authors: Fan Zhang, Zhaohan Wang, Xin Lyu, Siyuan Zhao, Mengjian Li, Weidong Geng, Naye Ji, Hui Du, Fuxing Gao, Hao Wu, Shunman Li

    Abstract: Speech-driven gesture generation is an emerging field within virtual human creation. However, a significant challenge lies in accurately determining and processing the multitude of input features (such as acoustic, semantic, emotional, personality, and even subtle unknown features). Traditional approaches, reliant on various explicit feature inputs and complex multimodal processing, constrain the… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: 12 pages,

  39. arXiv:2403.07721  [pdf, other

    cs.HC eess.SP q-bio.NC

    Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion

    Authors: Dongyang Li, Chen Wei, Shiying Li, Jiachen Zou, Quanying Liu

    Abstract: How to decode human vision through neural signals has attracted a long-standing interest in neuroscience and machine learning. Modern contrastive learning and generative models improved the performance of fMRI-based visual decoding and reconstruction. However, the high cost and low temporal resolution of fMRI limit their applications in brain-computer interfaces (BCIs), prompting a high need for E… ▽ More

    Submitted 4 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  40. arXiv:2403.06798  [pdf, other

    eess.IV cs.CV cs.LG

    Dynamic Perturbation-Adaptive Adversarial Training on Medical Image Classification

    Authors: Shuai Li, Xiaoguang Ma, Shancheng Jiang, Lu Meng

    Abstract: Remarkable successes were made in Medical Image Classification (MIC) recently, mainly due to wide applications of convolutional neural networks (CNNs). However, adversarial examples (AEs) exhibited imperceptible similarity with raw data, raising serious concerns on network robustness. Although adversarial training (AT), in responding to malevolent AEs, was recognized as an effective approach to im… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 9 pages, 4 figures, 2 tables

  41. arXiv:2403.06788  [pdf, other

    eess.SY

    Efficient dual-scale generalized Radon-Fourier transform detector family for long time coherent integration

    Authors: Suqi Li, Yihan Wang, Bailu Wang, Giorgio Battistelli, Luigi Chisci, Guolong Cui

    Abstract: Long Time Coherent Integration (LTCI) aims to accumulate target energy through long time integration, which is an effective method for the detection of a weak target. However, for a moving target, defocusing can occur due to range migration (RM) and Doppler frequency migration (DFM). To address this issue, RM and DFM corrections are required in order to achieve a well-focused image for the subsequ… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  42. arXiv:2403.04232  [pdf, other

    cs.RO cs.AI cs.LG cs.MA eess.SY

    Generalizing Cooperative Eco-driving via Multi-residual Task Learning

    Authors: Vindula Jayawardana, Sirui Li, Cathy Wu, Yashar Farid, Kentaro Oguchi

    Abstract: Conventional control, such as model-based control, is commonly utilized in autonomous driving due to its efficiency and reliability. However, real-world autonomous driving contends with a multitude of diverse traffic scenarios that are challenging for these planning algorithms. Model-free Deep Reinforcement Learning (DRL) presents a promising avenue in this direction, but learning DRL control poli… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted for publication at ICRA 2024

  43. arXiv:2403.01768  [pdf, other

    eess.SY cs.AI

    Canonical Form of Datatic Description in Control Systems

    Authors: Guojian Zhan, Ziang Zheng, Shengbo Eben Li

    Abstract: The design of feedback controllers is undergoing a paradigm shift from modelic (i.e., model-driven) control to datatic (i.e., data-driven) control. Canonical form of state space model is an important concept in modelic control systems, exemplified by Jordan form, controllable form and observable form, whose purpose is to facilitate system analysis and controller synthesis. In the realm of datatic… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  44. arXiv:2402.17200  [pdf, other

    cs.CV eess.IV

    Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain

    Authors: Qunliang Xing, Mai Xu, Shengxi Li, Xin Deng, Meisong Zheng, Huaida Liu, Ying Chen

    Abstract: Existing quality enhancement methods for compressed images focus on aligning the enhancement domain with the raw domain to yield realistic images. However, these methods exhibit a pervasive enhancement bias towards the compression domain, inadvertently regarding it as more realistic than the raw domain. This bias makes enhanced images closely resemble their compressed counterparts, thus degrading… ▽ More

    Submitted 19 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024

  45. arXiv:2402.15944  [pdf, other

    cs.IT eess.SP

    On A Class of Greedy Sparse Recovery Algorithms -- A High Dimensional Approach

    Authors: Gang Li, Qiuwei Li, Shuang Li, Wu Angela Li

    Abstract: Sparse signal recovery deals with finding the sparest solution of an under-determined linear system $x = Qs$. In this paper, we propose a novel greedy approach to addressing the challenges from such a problem. Such an approach is based on a characterization of solutions to the system, which allows us to work on the sparse recovery in the $s$-space directly with a given measure. With $l_2$-based me… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  46. arXiv:2402.15744  [pdf, other

    eess.IV cs.CV

    Traditional Transformation Theory Guided Model for Learned Image Compression

    Authors: Zhiyuan Li, Chenyang Ge, Shun Li

    Abstract: Recently, many deep image compression methods have been proposed and achieved remarkable performance. However, these methods are dedicated to optimizing the compression performance and speed at medium and high bitrates, while research on ultra low bitrates is limited. In this work, we propose a ultra low bitrates enhanced invertible encoding network guided by traditional transformation theory, exp… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 6 pages, 8 figures, accepted by ICCE 2024

  47. arXiv:2402.15047  [pdf

    cs.IT eess.SP

    Networked Collaborative Sensing using Multi-domain Measurements: Architectures, Performance Limits and Algorithms

    Authors: Yihua Ma, Shuqiang Xia, Chen bai, Yuxin Wang, Zhongbin Wang, Songqian Li

    Abstract: As a promising 6G technology, integrated sensing and communication (ISAC) gains growing interest. ISAC provides integration gain via sharing spectrum, hardware, and software. However, concerns exist regarding its sensing performance when compared to dedicated radar systems. To address this issue, the advantages of widely deployed networks should be utilized, and this paper proposes networked colla… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  48. arXiv:2402.13763  [pdf, other

    cs.SD eess.AS

    Music Style Transfer with Time-Varying Inversion of Diffusion Models

    Authors: Sifei Li, Yuxin Zhang, Fan Tang, Chongyang Ma, Weiming dong, Changsheng Xu

    Abstract: With the development of diffusion models, text-guided image style transfer has demonstrated high-quality controllable synthesis results. However, the utilization of text for diverse music style transfer poses significant challenges, primarily due to the limited availability of matched audio-text datasets. Music, being an abstract and complex art form, exhibits variations and intricacies even withi… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 7 pages, 4 figures, AAAI 2024

  49. arXiv:2402.13548  [pdf, other

    cs.LG eess.SP

    DiffPLF: A Conditional Diffusion Model for Probabilistic Forecasting of EV Charging Load

    Authors: Siyang Li, Hui Xiong, Yize Chen

    Abstract: Due to the vast electric vehicle (EV) penetration to distribution grid, charging load forecasting is essential to promote charging station operation and demand-side management.However, the stochastic charging behaviors and associated exogenous factors render future charging load patterns quite volatile and hard to predict. Accordingly, we devise a novel Diffusion model termed DiffPLF for Probabili… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted to the 23rd Power Systems Computation Conference (PSCC). Code is released at https://github.com/LSY-Cython/DiffPLF

  50. arXiv:2402.11419  [pdf, other

    eess.SP

    A Self-Healing Magnetic-Array-Type Current Sensor with Data-Driven Identification of Abnormal Magnetic Measurement Units

    Authors: Xiaohu Liu, Wei Zhao, Kang Ma, Jian Liu, Lisha Peng, Songling Huang, Shisong Li

    Abstract: Magnetic-array-type current sensors have garnered increasing popularity owing to their notable advantages, including broadband functionality, a large dynamic range, cost-effectiveness, and compact dimensions. However, the susceptibility of the measurement error of one or more magnetic measurement units (MMUs) within the current sensor to drift significantly from the nominal value due to environmen… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Comments: 11 pages, 10 figures