Skip to main content

Showing 1–50 of 186 results for author: Li, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.14474  [pdf, ps, other

    eess.SY

    Spatio-temporal Patterns between ENSO and Weather-related Power Outages in the Continental United States

    Authors: Long Huo, Xin Chen, Kaiwen Li, Fengying Cai, Jürgen Kurths

    Abstract: El Niño-Southern Oscillation (ENSO) exhibits significant impacts on the frequency of extreme weather events and its socio-economic implications prevail on a global scale. However, a fundamental gap still exists in understanding the relationship between the ENSO and weather-related power outages in the continental United States. Through 24-year (2000-2023) composite and statistical analysis, our st… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.11546  [pdf, other

    eess.AS cs.CL cs.SD

    GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

    Authors: Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, **peng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen

    Abstract: The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired spee… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Under review

  3. arXiv:2406.10514  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

    Authors: Zehua Kcriss Li, Meiying Melissa Chen, Yi Zhong, Pinxin Liu, Zhiyao Duan

    Abstract: Expressive speech synthesis aims to generate speech that captures a wide range of para-linguistic features, including emotion and articulation, though current research primarily emphasizes emotional aspects over the nuanced articulatory features mastered by professional voice actors. Inspired by this, we explore expressive speech synthesis through the lens of articulatory phonetics. Specifically,… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  4. arXiv:2406.08782  [pdf, other

    eess.IV cs.CV

    Hybrid Spatial-spectral Neural Network for Hyperspectral Image Denoising

    Authors: Hao Liang, Chengjie, Kun Li, Xin Tian

    Abstract: Hyperspectral image (HSI) denoising is an essential procedure for HSI applications. Unfortunately, the existing Transformer-based methods mainly focus on non-local modeling, neglecting the importance of locality in image denoising. Moreover, deep learning methods employ complex spectral learning mechanisms, thus introducing large computation costs. To address these problems, we propose a hybrid… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  5. arXiv:2405.04476  [pdf, other

    eess.AS cs.SD

    BERP: A Blind Estimator of Room Acoustic and Physical Parameters for Single-Channel Noisy Speech Signals

    Authors: Lijun Wang, Yixian Lu, Ziyan Gao, Kai Li, Jianqiang Huang, Yuntao Kong, Shogo Okada

    Abstract: Room acoustic parameters (RAPs) and room physical parameters ( RPPs) are essential metrics for parameterizing the room acoustical characteristics (RAC) of a sound field around a listener's local environment, offering comprehensive indications for various applications. The current RAPs and RPPs estimation methods either fall short of covering broad real-world acoustic environments in the context of… ▽ More

    Submitted 16 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 13-page, Submitted to IEEE/ACM Transaction on Audio Speech and Language Processing (TASLP)

  6. arXiv:2405.00056  [pdf, other

    eess.SY cs.GT

    Age of Information Minimization using Multi-agent UAVs based on AI-Enhanced Mean Field Resource Allocation

    Authors: Yousef Emami, Hao Gao, Kai Li, Luis Almeida, Eduardo Tovar, Zhu Han

    Abstract: Unmanned Aerial Vehicle (UAV) swarms play an effective role in timely data collection from ground sensors in remote and hostile areas. Optimizing the collective behavior of swarms can improve data collection performance. This paper puts forth a new mean field flight resource allocation optimization to minimize age of information (AoI) of sensory data, where balancing the trade-off between the UAVs… ▽ More

    Submitted 2 May, 2024; v1 submitted 24 April, 2024; originally announced May 2024.

    Comments: 13 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:2312.09953

    MSC Class: 00 ACM Class: C.2

  7. arXiv:2404.02063  [pdf, other

    cs.SD cs.AI eess.AS

    SPMamba: State-space model is all you need in speech separation

    Authors: Kai Li, Guo Chen

    Abstract: In speech separation, both CNN- and Transformer-based models have demonstrated robust separation capabilities, garnering significant attention within the research community. However, CNN-based methods have limited modelling capability for long-sequence audio, leading to suboptimal separation performance. Conversely, Transformer-based methods are limited in practical applications due to their high… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Technical Report. Work in progress. Code is available at https://github.com/JusperLee/SPMamba

  8. arXiv:2403.12115  [pdf, other

    eess.IV cs.CV cs.LG

    Deep learning automates Cobb angle measurement compared with multi-expert observers

    Authors: Keyu Li, Hanxue Gu, Roy Colglazier, Robert Lark, Elizabeth Hubbard, Robert French, Denise Smith, Jikai Zhang, Erin McCrum, Anthony Catanzano, Joseph Cao, Leah Waldman, Maciej A. Mazurowski, Benjamin Alman

    Abstract: Scoliosis, a prevalent condition characterized by abnormal spinal curvature leading to deformity, requires precise assessment methods for effective diagnosis and management. The Cobb angle is a widely used scoliosis quantification method that measures the degree of curvature between the tilted vertebrae. Yet, manual measuring of Cobb angles is time-consuming and labor-intensive, fraught with signi… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 17 pages, 5 figures

  9. arXiv:2403.08200  [pdf, ps, other

    eess.SY eess.SP

    Prototy** and Experimental Results for Environment-Aware Millimeter Wave Beam Alignment via Channel Knowledge Map

    Authors: Zhuoyin Dai, Di Wu, Zhenjun Dong, Kun Li, Dingyang Ding, Sihan Wang, Yong Zeng

    Abstract: Channel knowledge map (CKM), which aims to directly reflect the intrinsic channel properties of the local wireless environment, is a novel technique for achieving environmentaware communication. In this paper, to alleviate the large training overhead in millimeter wave (mmWave) beam alignment, an environment-aware and training-free beam alignment prototype is established based on a typical CKM, te… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  10. arXiv:2403.07271  [pdf, other

    math.OC cs.AI cs.LG eess.SP

    Anderson acceleration for iteratively reweighted $\ell_1$ algorithm

    Authors: Kexin Li

    Abstract: Iteratively reweighted L1 (IRL1) algorithm is a common algorithm for solving sparse optimization problems with nonconvex and nonsmooth regularization. The development of its acceleration algorithm, often employing Nesterov acceleration, has sparked significant interest. Nevertheless, the convergence and complexity analysis of these acceleration algorithms consistently poses substantial challenges.… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  11. arXiv:2403.06066  [pdf

    eess.IV cs.CV cs.LG

    CausalCellSegmenter: Causal Inference inspired Diversified Aggregation Convolution for Pathology Image Segmentation

    Authors: Dawei Fan, Yifan Gao, Jiaming Yu, Yan** Chen, Wencheng Li, Chuancong Lin, Kaibin Li, Changcai Yang, Riqing Chen, Lifang Wei

    Abstract: Deep learning models have shown promising performance for cell nucleus segmentation in the field of pathology image analysis. However, training a robust model from multiple domains remains a great challenge for cell nucleus segmentation. Additionally, the shortcomings of background noise, highly overlap** between cell nucleus, and blurred edges often lead to poor performance. To address these ch… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: 10 pages, 5 figures, 2 tables, MICCAI

  12. arXiv:2403.05256  [pdf, other

    eess.IV cs.CV cs.LG

    DuDoUniNeXt: Dual-domain unified hybrid model for single and multi-contrast undersampled MRI reconstruction

    Authors: Ziqi Gao, Yue Zhang, Xinwen Liu, Kaiyan Li, S. Kevin Zhou

    Abstract: Multi-contrast (MC) Magnetic Resonance Imaging (MRI) reconstruction aims to incorporate a reference image of auxiliary modality to guide the reconstruction process of the target modality. Known MC reconstruction methods perform well with a fully sampled reference image, but usually exhibit inferior performance, compared to single-contrast (SC) methods, when the reference image is missing or of low… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 11 pages, 4 figures, 2 tables

  13. arXiv:2402.16129  [pdf, other

    eess.SP

    Localization in Reconfigurable Intelligent Surface Aided mmWave Systems: A Multiple Measurement Vector Based Channel Estimation Method

    Authors: Kunlun Li, Jiguang He, Mohammed El-Hajjar, Lie-Liang Yang

    Abstract: The sparsity of millimeter wave (mmWave) channels in the angular and temporal domains is beneficial to channel estimation, while the associated channel parameters can be utilized for localization. However, line-of-sight (LoS) blockage poses a significant challenge on the localization in mmWave systems, potentially leading to substantial positioning errors. A promising solution is to employ reconfi… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  14. arXiv:2402.09430  [pdf, other

    eess.SP cs.AI cs.CV cs.MM

    WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing

    Authors: Shuokang Huang, Kaihan Li, Di You, Yichong Chen, Arvin Lin, Siying Liu, Xiaohui Li, Julie A. McCann

    Abstract: WiFi-based human sensing has exhibited remarkable potential to analyze user behaviors in a non-intrusive and device-free manner, benefiting applications as diverse as smart homes and healthcare. However, most previous works focus on single-user sensing, which has limited practicability in scenarios involving multiple users. Although recent studies have begun to investigate WiFi-based multi-user se… ▽ More

    Submitted 12 March, 2024; v1 submitted 24 January, 2024; originally announced February 2024.

    Comments: We present WiMANS, to our knowledge, the first dataset for multi-user activity sensing based on WiFi

  15. arXiv:2402.04448  [pdf, other

    eess.SY

    Failure Analysis in Next-Generation Critical Cellular Communication Infrastructures

    Authors: Siguo Bi, Xin Yuan, Shuyan Hu, Kai Li, Wei Ni, Ekram Hossain, Xin Wang

    Abstract: The advent of communication technologies marks a transformative phase in critical infrastructure construction, where the meticulous analysis of failures becomes paramount in achieving the fundamental objectives of continuity, security, and availability. This survey enriches the discourse on failures, failure analysis, and countermeasures in the context of the next-generation critical communication… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  16. arXiv:2402.03897  [pdf, other

    eess.SY

    Robust Data-EnablEd Predictive Leading Cruise Control via Reachability Analysis

    Authors: Shuai Li, Chaoyi Chen, Haotian Zheng, Jiawei Wang, Qing Xu, Keqiang Li

    Abstract: Data-driven predictive control promises model-free wave-dampening strategies for Connected and Autonomous Vehicles (CAVs) in mixed traffic flow. However, its performance relies on data quality, which suffers from unknown noise and disturbances.This paper introduces a Robust Data-EnablEd Predictive Leading Cruise Control (RDeeP-LCC) method based on reachability analysis, aiming to achieve safe and… ▽ More

    Submitted 14 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: 8 pages, 4 figures

  17. arXiv:2402.03497  [pdf, other

    eess.SP

    An Analytic Solution for Kernel Adaptive Filtering

    Authors: Benjamin Colburn, Luis G. Sanchez Giraldo, Kan Li, Jose C. Principe

    Abstract: Conventional kernel adaptive filtering (KAF) uses a prescribed, positive definite, nonlinear function to define the Reproducing Kernel Hilbert Space (RKHS), where the optimal solution for mean square error estimation is approximated using search techniques. Instead, this paper proposes to embed the full statistics of the input data in the kernel definition, obtaining the first analytical solution… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  18. arXiv:2402.03390  [pdf, other

    eess.IV cs.AI cs.CV cs.NI

    PixelGen: Rethinking Embedded Camera Systems

    Authors: Kunjun Li, Manoj Gulati, Steven Waskito, Dhairya Shah, Shantanu Chakrabarty, Ambuj Varshney

    Abstract: Embedded camera systems are ubiquitous, representing the most widely deployed example of a wireless embedded system. They capture a representation of the world - the surroundings illuminated by visible or infrared light. Despite their widespread usage, the architecture of embedded camera systems has remained unchanged, which leads to limitations. They visualize only a tiny portion of the world. Ad… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  19. TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion

    Authors: Samuel Pegg, Kai Li, Xiaolin Hu

    Abstract: Audio-visual speech separation has gained significant traction in recent years due to its potential applications in various fields such as speech recognition, diarization, scene analysis and assistive technologies. Designing a lightweight audio-visual speech separation network is important for low-latency applications, but existing methods often require higher computational costs and more paramete… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Journal ref: 2023 13th International Conference on Information Science and Technology (ICIST), Cairo, Egypt, 2023, pp. 243-252

  20. arXiv:2401.03150  [pdf, other

    eess.IV

    O-PRESS: Boosting OCT axial resolution with Prior guidance, Recurrence, and Equivariant Self-Supervision

    Authors: Kaiyan Li, **gyuan Yang, Wenxuan Liang, Xingde Li, Chenxi Zhang, Lulu Chen, Chan Wu, Xiao Zhang, Zhiyan Xu, Yuelin Wang, Lihui Meng, Yue Zhang, Youxin Chen, S. Kevin Zhou

    Abstract: Optical coherence tomography (OCT) is a noninvasive technology that enables real-time imaging of tissue microanatomies. The axial resolution of OCT is intrinsically constrained by the spectral bandwidth of the employed light source while maintaining a fixed center wavelength for a specific application. Physically extending this bandwidth faces strong limitations and requires a substantial cost. We… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  21. arXiv:2312.06337  [pdf, other

    cs.SD cs.CL eess.AS

    Deep Imbalanced Learning for Multimodal Emotion Recognition in Conversations

    Authors: Tao Meng, Yuntao Shou, Wei Ai, Nan Yin, Keqin Li

    Abstract: The main task of Multimodal Emotion Recognition in Conversations (MERC) is to identify the emotions in modalities, e.g., text, audio, image and video, which is a significant development direction for realizing machine intelligence. However, many data in MERC naturally exhibit an imbalanced distribution of emotion categories, and researchers ignore the negative impact of imbalanced data on emotion… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 16 pages, 9 figures

  22. arXiv:2312.03787  [pdf, other

    eess.SY

    Detection and Mitigation of Position Spoofing Attacks on Cooperative UAV Swarm Formations

    Authors: Siguo Bi, Kai Li, Shuyan Hu, Wei Ni, Cong Wang, Xin Wang

    Abstract: Detecting spoofing attacks on the positions of unmanned aerial vehicles (UAVs) within a swarm is challenging. Traditional methods relying solely on individually reported positions and pairwise distance measurements are ineffective in identifying the misbehavior of malicious UAVs. This paper presents a novel systematic structure designed to detect and mitigate spoofing attacks in UAV swarms. We for… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: accepted by IEEE TIFS in Dec. 2023

  23. arXiv:2312.03464  [pdf, other

    cs.LG cs.SD eess.AS

    Subnetwork-to-go: Elastic Neural Network with Dynamic Training and Customizable Inference

    Authors: Kai Li, Yi Luo

    Abstract: Deploying neural networks to different devices or platforms is in general challenging, especially when the model size is large or model complexity is high. Although there exist ways for model pruning or distillation, it is typically required to perform a full round of model training or finetuning procedure in order to obtain a smaller model that satisfies the model size or complexity constraints.… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 5 pages, 3 figures

  24. arXiv:2311.12083  [pdf, other

    cs.CV eess.IV

    PanBench: Towards High-Resolution and High-Performance Pansharpening

    Authors: Shiying Wang, Xuechao Zou, Kai Li, Junliang Xing, Pin Tao

    Abstract: Pansharpening, a pivotal task in remote sensing, involves integrating low-resolution multispectral images with high-resolution panchromatic images to synthesize an image that is both high-resolution and retains multispectral information. These pansharpened images enhance precision in land cover classification, change detection, and environmental monitoring within remote sensing data analysis. Whil… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 10 pages, 5 figures

  25. arXiv:2309.17189  [pdf, other

    cs.SD cs.CV eess.AS

    RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech Separation

    Authors: Samuel Pegg, Kai Li, Xiaolin Hu

    Abstract: Audio-visual speech separation methods aim to integrate different modalities to generate high-quality separated speech, thereby enhancing the performance of downstream tasks such as speech recognition. Most existing state-of-the-art (SOTA) models operate in the time domain. However, their overly simplistic approach to modeling acoustic features often necessitates larger and more computationally in… ▽ More

    Submitted 21 March, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Accepted by The Twelfth International Conference on Learning Representations (ICLR) 2024, see https://openreview.net/forum?id=PEuDO2EiDr

  26. arXiv:2309.14474  [pdf

    eess.IV cs.CV

    Gastro-Intestinal Tract Segmentation Using an Explainable 3D Unet

    Authors: Kai Li, Jonathan Chan

    Abstract: In treating gastrointestinal cancer using radiotherapy, the role of the radiation oncologist is to administer high doses of radiation, through x-ray beams, toward the tumor while avoiding the stomach and intestines. With the advent of precise radiation treatment technology such as the MR-Linac, oncologists can visualize the daily positions of the tumors and intestines, which may vary day to day. B… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: 5 pages, 8 figures, 13th Joint Symposium on Computational Intelligence (JSCI13)

  27. arXiv:2309.13018  [pdf, other

    eess.AS cs.CL cs.SD

    Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

    Authors: Jiamin Xie, Ke Li, **xi Guo, Andros Tjandra, Yuan Shangguan, Leda Sari, Chunyang Wu, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli

    Abstract: Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each language. In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in… ▽ More

    Submitted 11 January, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

  28. arXiv:2309.09734  [pdf, other

    eess.SY

    Learning Optimal Robust Control of Connected Vehicles in Mixed Traffic Flow

    Authors: Jie Li, Jiawei Wang, Shengbo Eben Li, Keqiang Li

    Abstract: Connected and automated vehicles (CAVs) technologies promise to attenuate undesired traffic disturbances. However, in mixed traffic where human-driven vehicles (HDVs) also exist, the nonlinear human-driving behavior has brought critical challenges for effective CAV control. This paper employs the policy iteration method to learn the optimal robust controller for nonlinear mixed traffic systems. Pr… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  29. arXiv:2309.03686  [pdf, other

    eess.IV cs.CV

    MS-UNet-v2: Adaptive Denoising Method and Training Strategy for Medical Image Segmentation with Small Training Data

    Authors: Haoyuan Chen, Yufei Han, Pin Xu, Yanyi Li, Kuan Li, Jian** Yin

    Abstract: Models based on U-like structures have improved the performance of medical image segmentation. However, the single-layer decoder structure of U-Net is too "thin" to exploit enough information, resulting in large semantic differences between the encoder and decoder parts. Things get worse if the number of training sets of data is not sufficiently large, which is common in medical image processing t… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  30. arXiv:2309.01994  [pdf, other

    eess.SY

    Cloud Control of Connected Vehicle under Bi-directional Time-varying delay: An Application of Predictor-observer Structured Controller

    Authors: Ji-An Pan, Qing Xu, Keqiang Li, Chunying Yang, Jianqiang Wang

    Abstract: This article is devoted to addressing the cloud control of connected vehicles, specifically focusing on analyzing the effect of bi-directional communication-induced delays. To mitigate the adverse effects of such delays, a novel predictor-observer structured controller is proposed which compensate for both measurable output delays and unmeasurable, yet bounded, input delays simultaneously. The stu… ▽ More

    Submitted 9 December, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

  31. arXiv:2309.01625  [pdf, other

    eess.SY

    Information Flow Topology in Mixed Traffic: A Comparative Study between "Looking Ahead" and "Looking Behind"

    Authors: Shuai Li, Haotian Zheng, Jiawei Wang, Chaoyi Chen, Qing Xu, Jianqiang Wang, Keqiang Li

    Abstract: The emergence of connected and automated vehicles (CAVs) promises smoother traffic flow. In mixed traffic where human-driven vehicles (HDVs) also exist, existing research mostly focuses on "looking ahead" (i.e., the CAVs receive information from preceding vehicles) strategies for CAVs, while recent work reveals that "looking behind" (i.e., the CAVs receive information from their rear vehicles) str… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: This paper has been accepted by 26th IEEE International Conference on Intelligent Transportation Systems ITSC 2023

  32. arXiv:2308.13421  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Exploiting Diverse Feature for Multimodal Sentiment Analysis

    Authors: Jia Li, Wei Qian, Kun Li, Qi Li, Dan Guo, Meng Wang

    Abstract: In this paper, we present our solution to the MuSe-Personalisation sub-challenge in the MuSe 2023 Multimodal Sentiment Analysis Challenge. The task of MuSe-Personalisation aims to predict the continuous arousal and valence values of a participant based on their audio-visual, language, and physiological signal modalities data. Considering different people have personal characteristics, the main cha… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  33. arXiv:2308.09223  [pdf, other

    eess.IV cs.CV cs.LG

    DMCVR: Morphology-Guided Diffusion Model for 3D Cardiac Volume Reconstruction

    Authors: Xiaoxiao He, Chaowei Tan, Ligong Han, Bo Liu, Leon Axel, Kang Li, Dimitris N. Metaxas

    Abstract: Accurate 3D cardiac reconstruction from cine magnetic resonance imaging (cMRI) is crucial for improved cardiovascular disease diagnosis and understanding of the heart's motion. However, current cardiac MRI-based reconstruction technology used in clinical settings is 2D with limited through-plane resolution, resulting in low-quality reconstructed cardiac volumes. To better reconstruct 3D cardiac vo… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted in MICCAI 2023

  34. arXiv:2308.08143  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    IIANet: An Intra- and Inter-Modality Attention Network for Audio-Visual Speech Separation

    Authors: Kai Li, Runxuan Yang, Fuchun Sun, Xiaolin Hu

    Abstract: Recent research has made significant progress in designing fusion modules for audio-visual speech separation. However, they predominantly focus on multi-modal fusion at a single temporal scale of auditory and visual features without employing selective attention mechanisms, which is in sharp contrast with the brain. To address this issue, We propose a novel model called Intra- and Inter-Attention… ▽ More

    Submitted 2 February, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: 18 pages, 6 figures

  35. arXiv:2308.06981  [pdf, other

    eess.AS cs.SD

    The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track

    Authors: Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji

    Abstract: This paper summarizes the cinematic demixing (CDX) track of the Sound Demixing Challenge 2023 (SDX'23). We provide a comprehensive summary of the challenge setup, detailing the structure of the competition and the datasets used. Especially, we detail CDXDB23, a new hidden dataset constructed from real movies that was used to rank the submissions. The paper also offers insights into the most succes… ▽ More

    Submitted 18 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted for Transactions of the International Society for Music Information Retrieval

  36. arXiv:2308.04743  [pdf

    eess.SY cs.RO math.DS

    Missile guidance law design based on free-time convergent error dynamics

    Authors: Yuanhe Liu, Nianhao Xie, Kebo Li, Yangang Liang

    Abstract: The design of guidance law can be considered a kind of finite-time error-tracking problem. A unified free-time convergent guidance law design approach based on the error dynamics and the free-time convergence method is proposed in this paper. Firstly, the desired free-time convergent error dynamics approach is proposed, and its convergent time can be set freely, which is independent of the initial… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: 13 pages, 6 figures, accepted by Journal of Systems Engineering and Electronics

  37. arXiv:2308.04417  [pdf, other

    cs.CV cs.LG eess.IV

    DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images

    Authors: Xuechao Zou, Kai Li, Junliang Xing, Yu Zhang, Shiying Wang, Lei **, Pin Tao

    Abstract: Optical satellite images are a critical data source; however, cloud cover often compromises their quality, hindering image applications and analysis. Consequently, effectively removing clouds from optical satellite images has emerged as a prominent research direction. While recent advancements in cloud removal primarily rely on generative adversarial networks, which may yield suboptimal image qual… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: 13 pages, 7 figures

  38. arXiv:2307.11795  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Prompting Large Language Models with Speech Recognition Abilities

    Authors: Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, **xi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

    Abstract: Large language models have proven themselves highly flexible, able to solve a wide range of generative tasks, such as abstractive summarization and open-ended question answering. In this paper we extend the capabilities of LLMs by directly attaching a small audio encoder allowing it to perform speech recognition. By directly prepending a sequence of audial embeddings to the text token embeddings,… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

  39. arXiv:2307.09729  [pdf, other

    cs.CV cs.MM eess.IV

    NTIRE 2023 Quality Assessment of Video Enhancement Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Wei Sun, Yulun Zhang, Kai Zhang, Radu Timofte, Guangtao Zhai, Yixuan Gao, Yuqin Cao, Tengchuan Kou, Yunlong Dong, Ziheng Jia, Yilin Li, Wei Wu, Shuming Hu, Sibin Deng, Pengxiang Xiao, Ying Chen, Kai Li, Kai Zhao, Kun Yuan, Ming Sun, Heng Cong, Hao Wang, Lingzhi Fu , et al. (47 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  40. arXiv:2307.07829  [pdf, other

    eess.IV cs.CV

    HQG-Net: Unpaired Medical Image Enhancement with High-Quality Guidance

    Authors: Chunming He, Kai Li, Guoxia Xu, Jiangpeng Yan, Longxiang Tang, Yulun Zhang, Xiu Li, Yaowei Wang

    Abstract: Unpaired Medical Image Enhancement (UMIE) aims to transform a low-quality (LQ) medical image into a high-quality (HQ) one without relying on paired images for training. While most existing approaches are based on Pix2Pix/CycleGAN and are effective to some extent, they fail to explicitly use HQ information to guide the enhancement process, which can lead to undesired artifacts and structural distor… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

    Comments: 14 pages, 10 figures

  41. arXiv:2307.00828  [pdf, other

    eess.SY cs.LG math.OC

    Model-Assisted Probabilistic Safe Adaptive Control With Meta-Bayesian Learning

    Authors: Shengbo Wang, Ke Li, Yin Yang, Yuting Cao, Tingwen Huang, Shi** Wen

    Abstract: Breaking safety constraints in control systems can lead to potential risks, resulting in unexpected costs or catastrophic damage. Nevertheless, uncertainty is ubiquitous, even among similar tasks. In this paper, we develop a novel adaptive safe control framework that integrates meta learning, Bayesian models, and control barrier function (CBF) method. Specifically, with the help of CBF method, we… ▽ More

    Submitted 13 July, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

  42. arXiv:2307.00637  [pdf, other

    eess.SY

    On Embedding B-Splines in Recursive State Estimation

    Authors: Kailai Li

    Abstract: We present a principled study on establishing a novel probabilistic framework for state estimation. B-splines are embedded in the state-space modeling as a continuous-time intermediate between the states of recurrent control points and asynchronous sensor measurements. Based thereon, the spline-embedded recursive estimation scheme is established w.r.t. common sensor fusion tasks, and the correspon… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

  43. arXiv:2306.10311  [pdf, other

    eess.IV cs.CV

    Efficient HDR Reconstruction from Real-World Raw Images

    Authors: Qirui Yang, Yihao Liu, Qihua Chen, Huan**g Yue, Kun Li, **gyu Yang

    Abstract: The widespread usage of high-definition screens on edge devices stimulates a strong demand for efficient high dynamic range (HDR) algorithms. However, many existing HDR methods either deliver unsatisfactory results or consume too much computational and memory resources, hindering their application to high-resolution images (usually with more than 12 megapixels) in practice. In addition, existing H… ▽ More

    Submitted 5 June, 2024; v1 submitted 17 June, 2023; originally announced June 2023.

  44. arXiv:2306.04242  [pdf, other

    eess.SP cs.RO

    4D Millimeter-Wave Radar in Autonomous Driving: A Survey

    Authors: Zeyu Han, Jiahao Wang, Zikun Xu, Shuocheng Yang, Lei He, Shaobing Xu, Jianqiang Wang, Keqiang Li

    Abstract: The 4D millimeter-wave (mmWave) radar, proficient in measuring the range, azimuth, elevation, and velocity of targets, has attracted considerable interest within the autonomous driving community. This is attributed to its robustness in extreme environments and the velocity and elevation measurement capabilities. However, despite the rapid advancement in research related to its sensing theory and a… ▽ More

    Submitted 26 April, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

  45. arXiv:2306.02017  [pdf, other

    eess.SY

    Resilient Distributed Parameter Estimation in Sensor Networks

    Authors: Jiaqi Yan, Kuo Li, Hideaki Ishii

    Abstract: In this paper, we study the problem of parameter estimation in a sensor network, where the measurements and updates of some sensors might be arbitrarily manipulated by adversaries. Despite the presence of such misbehaviors, normally behaving sensors make successive observations of an unknown $d$-dimensional vector parameter and aim to infer its true value by cooperating with their neighbors over a… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

  46. arXiv:2306.00160  [pdf, other

    eess.AS cs.LG cs.SD

    Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model

    Authors: Héctor Martel, Julius Richter, Kai Li, Xiaolin Hu, Timo Gerkmann

    Abstract: We propose Audio-Visual Lightweight ITerative model (AVLIT), an effective and lightweight neural network that uses Progressive Learning (PL) to perform audio-visual speech separation in noisy environments. To this end, we adopt the Asynchronous Fully Recurrent Convolutional Neural Network (A-FRCNN), which has shown successful results in audio-only speech separation. Our architecture consists of an… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: Accepted by Interspeech 2023

  47. arXiv:2305.20006  [pdf, other

    eess.IV cs.CV

    Physics-Informed Ensemble Representation for Light-Field Image Super-Resolution

    Authors: Manchang **, Gaosheng Liu, Kunshu Hu, Xin Luo, Kun Li, **gyu Yang

    Abstract: Recent learning-based approaches have achieved significant progress in light field (LF) image super-resolution (SR) by exploring convolution-based or transformer-based network structures. However, LF imaging has many intrinsic physical priors that have not been fully exploited. In this paper, we analyze the coordinate transformation of the LF imaging process to reveal the geometric relationship in… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

  48. arXiv:2305.16932  [pdf, other

    cs.SD cs.CL eess.AS

    A Neural State-Space Model Approach to Efficient Speech Separation

    Authors: Chen Chen, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku, Eng Siong Chng

    Abstract: In this work, we introduce S4M, a new efficient speech separation framework based on neural state-space models (SSM). Motivated by linear time-invariant systems for sequence modeling, our SSM-based approach can efficiently model input signals into a format of linear ordinary differential equations (ODEs) for representation learning. To extend the SSM technique into speech separation tasks, we firs… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted by InterSpeech 2023

  49. arXiv:2305.16105  [pdf, ps, other

    eess.SP

    Joint Uplink and Downlink Resource Allocation Towards Energy-efficient Transmission for URLLC

    Authors: Kang Li, Pengcheng Zhu, Yan Wang, Fu-Chun Zheng, Xiaohu You

    Abstract: Ultra-reliable and low-latency communications (URLLC) is firstly proposed in 5G networks, and expected to support applications with the most stringent quality-of-service (QoS). However, since the wireless channels vary dynamically, the transmit power for ensuring the QoS requirements of URLLC may be very high, which conflicts with the power limitation of a real system. To fulfill the successful UR… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: 16 pages, 11 figures

  50. arXiv:2305.02260  [pdf, other

    physics.med-ph cs.LG eess.IV

    Standardized Benchmark Dataset for Localized Exposure to a Realistic Source at 10$-$90 GHz

    Authors: Ante Kapetanovic, Dragan Poljak, Kun Li

    Abstract: The lack of freely available standardized datasets represents an aggravating factor during the development and testing the performance of novel computational techniques in exposure assessment and dosimetry research. This hinders progress as researchers are required to generate numerical data (field, power and temperature distribution) anew using simulation software for each exposure scenario. Othe… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: 6 pages, 3 figures, in proceedings of BioEM2023