Skip to main content

Showing 1–50 of 1,330 results for author: Li, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Ya**g Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, **g Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  2. arXiv:2406.18345  [pdf, other

    cs.LG eess.SP

    EmT: A Novel Transformer for Generalized Cross-subject EEG Emotion Recognition

    Authors: Yi Ding, Chengxuan Tong, Shuailei Zhang, Muyun Jiang, Yong Li, Kevin Lim Jun Liang, Cuntai Guan

    Abstract: Integrating prior knowledge of neurophysiology into neural network architecture enhances the performance of emotion decoding. While numerous techniques emphasize learning spatial and short-term temporal patterns, there has been limited emphasis on capturing the vital long-term contextual information associated with emotional cognitive processes. In order to address this discrepancy, we introduce a… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 11 pages, 5 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  3. arXiv:2406.18069  [pdf, other

    eess.SP cs.AI cs.CL

    Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals

    Authors: Zengding Liu, Chen Chen, Jiannong Cao, Minglei Pan, Jikui Liu, Nan Li, Fen Miao, Ye Li

    Abstract: Large language models (LLMs) have captured significant interest from both academia and industry due to their impressive performance across various textual tasks. However, the potential of LLMs to analyze physiological time-series data remains an emerging research field. Particularly, there is a notable gap in the utilization of LLMs for analyzing wearable biosignals to achieve cuffless blood press… ▽ More

    Submitted 26 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.17257  [pdf, other

    cs.CL cs.SD eess.AS

    Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation

    Authors: Yingting Li, Ambuj Mehrish, Bryan Chew, Bo Cheng, Soujanya Poria

    Abstract: Different languages have distinct phonetic systems and vary in their prosodic features making it challenging to develop a Text-to-Speech (TTS) model that can effectively synthesise speech in multilingual settings. Furthermore, TTS architecture needs to be both efficient enough to capture nuances in multiple languages and efficient enough to be practical for deployment. The standard approach is to… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  5. arXiv:2406.14664  [pdf, ps, other

    eess.SP

    Experimental Validation of Cooperative RSS-based Localization with Unknown Transmit Power, Path Loss Exponent, and Precise Anchor Location

    Authors: Yingquan Li, Bodhibrata Mukhopadhyay, Jiajie Xu, Mohamed-Slim Alouini

    Abstract: Received signal strength (RSS)--based cooperative localization has gained significant attention due to its straightforward system architectures and cost-effectiveness. In this paper, we propose Cooperative Localization Techniques (with Unknown Parameters), referred to as CTUP(s), which consider uncertainty in anchor nodes' locations and assume the transmit power and \textcolor{blue}{path loss expo… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.13705  [pdf, other

    eess.IV cs.AI cs.CV

    EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy

    Authors: Long Bai, Qiaozhi Tan, Tong Chen, Wan Jun Nah, Yanheng Li, Zhicheng He, Sishen Yuan, Zhen Chen, **lin Wu, Mobarakol Islam, Zhen Li, Hongbin Liu, Hongliang Ren

    Abstract: Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: To appear in MICCAI 2024. Code and dataset availability: https://github.com/longbai1006/EndoUIC

  7. arXiv:2406.12699  [pdf, other

    cs.SD eess.AS eess.SP

    Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition

    Authors: Kuan-Chen Wang, You-** Li, Wei-Lun Chen, Yu-Wen Chen, Yi-Ching Wang, **-Cheng Yeh, Chao Zhang, Yu Tsao

    Abstract: Noise robustness is critical when applying automatic speech recognition (ASR) in real-world scenarios. One solution involves the used of speech enhancement (SE) models as the front end of ASR. However, neural network-based (NN-based) SE often introduces artifacts into the enhanced signals and harms ASR performance, particularly when SE and ASR are independently trained. Therefore, this study intro… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.12646  [pdf, other

    eess.IV cs.AI cs.CV

    An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation

    Authors: Qin Li, Yizhe Zhang, Yan Li, Jun Lyu, Meng Liu, Longyu Sun, Mengting Sun, Qirong Li, Wenyue Mao, Xinran Wu, Ya**g Zhang, Yinghua Chu, Shuo Wang, Chengyan Wang

    Abstract: The segmentation foundation model, e.g., Segment Anything Model (SAM), has attracted increasing interest in the medical image community. Early pioneering studies primarily concentrated on assessing and improving SAM's performance from the perspectives of overall accuracy and efficiency, yet little attention was given to the fairness considerations. This oversight raises questions about the potenti… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to MICCAI-2024

  9. arXiv:2406.11219  [pdf, other

    cs.RO eess.SY

    A Swift and Omnidirectional Formation Approach based on Hierarchical Reorganization

    Authors: Yuzhu Li, Wei Dong

    Abstract: Current formations commonly rely on invariant hierarchical structures, such as predetermined leaders or enumerated formation shapes. These structures could be unidirectional and sluggish, constraining their adaptability and agility when encountering cluttered environments. To surmount these constraints, this work proposes an omnidirectional affine formation approach with hierarchical reorganizatio… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  10. arXiv:2406.10298  [pdf

    eess.SY

    Enhancing Resilience of Power Systems against Typhoon Threats: A Hybrid Data-Model Driven Approach

    Authors: Yang Li

    Abstract: This chapter addresses the increasing vulnerability of coastal regions to typhoons and the consequent power outages, emphasizing the critical role of power transmission systems in disaster resilience. It introduces a framework for assessing and enhancing the resilience of these systems against typhoon impacts. The approach integrates a hybrid-driven model for system failure analysis and resilience… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted in 'Design for Reliability and Resilience of Power Systems,' Elsevier

  11. arXiv:2406.10236  [pdf, other

    eess.IV cs.AI

    Lightening Anything in Medical Images

    Authors: Ben Fei, Yixuan Li, Weidong Yang, Hengjun Gao, **gyi Xu, Lipeng Ma, Yatian Yang, **hong Zhou

    Abstract: The development of medical imaging techniques has made a significant contribution to clinical decision-making. However, the existence of suboptimal imaging quality, as indicated by irregular illumination or imbalanced intensity, presents significant obstacles in automating disease screening, analysis, and diagnosis. Existing approaches for natural image enhancement are mostly trained with numerous… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 23 pages, 6 figures

  12. arXiv:2406.09695  [pdf, other

    eess.SP

    Machine learning-based Near-field Emitter Localization via Grouped Hybrid Analog and Digital Massive MIMO Receive Array

    Authors: Yifan Li, Feng Shu, Jiatong Bai, Cunhua Pan, Yongpeng Wu, Yaoliang Song, Jiangzhou Wang

    Abstract: A fully-digital massive MIMO receive array is promising to meet the high-resolution requirement of near-field (NF) emitter localization, but it also results in the significantly increasing of hardware costs and algorithm complexity. In order to meet the future demand for green communication while maintaining high performance, the grouped hybrid analog and digital (HAD) structure is proposed for NF… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  13. arXiv:2406.09238  [pdf, other

    cs.IT eess.SP

    Near-Field Multiuser Communications based on Sparse Arrays

    Authors: Kangjian Chen, Chenhao Qi, Geoffrey Ye Li, Octavia A. Dobre

    Abstract: This paper considers near-field multiuser communications based on sparse arrays (SAs). First, for the uniform SAs (USAs), we analyze the beam gains of channel steering vectors, which shows that increasing the antenna spacings can effectively improve the spatial resolution of the antenna arrays to enhance the sum rate of multiuser communications. Then, we investigate nonuniform SAs (NSAs) to mitiga… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  14. arXiv:2406.08393  [pdf, other

    eess.AS cs.SD

    SCDNet: Self-supervised Learning Feature-based Speaker Change Detection

    Authors: Yue Li, Xinsheng Wang, Li Zhang, Lei Xie

    Abstract: Speaker Change Detection (SCD) is to identify boundaries among speakers in a conversation. Motivated by the success of fine-tuning wav2vec 2.0 models for the SCD task, a further investigation of self-supervised learning (SSL) features for SCD is conducted in this work. Specifically, an SCD model, named SCDNet, is proposed. With this model, various state-of-the-art SSL models, including Hubert, wav… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  15. arXiv:2406.08353  [pdf, other

    eess.AS cs.CL cs.MM cs.SD

    Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques

    Authors: Yuanchao Li, Peter Bell, Catherine Lai

    Abstract: Text data is commonly utilized as a primary input to enhance Speech Emotion Recognition (SER) performance and reliability. However, the reliance on human-transcribed text in most studies impedes the development of practical SER systems, creating a gap between in-lab research and real-world scenarios where Automatic Speech Recognition (ASR) serves as the text source. Hence, this study benchmarks SE… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  16. arXiv:2406.08122  [pdf

    eess.AS cs.SD

    Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor

    Authors: Yongjie Si, Yanxiong Li, Jialong Li, Jiaxin Tan, Qianhua He

    Abstract: It's assumed that training data is sufficient in base session of few-shot class-incremental audio classification. However, it's difficult to collect abundant samples for model training in base session in some practical scenarios due to the data scarcity of some classes. This paper explores a new problem of fully few-shot class-incremental audio classification with few training samples in all sessi… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted for publication on Interspeech 2024. 5 pages, 3 figures, 5 tables

  17. arXiv:2406.08119  [pdf

    eess.AS cs.SD

    Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network

    Authors: Yanxiong Li, Jiaxin Tan, Guoqing Chen, Jialong Li, Yongjie Si, Qianhua He

    Abstract: This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method of low-complexity acoustic scene classification by a parallel attention-convolution network which consists of four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed network is computationally efficient to capture global and local contextu… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted for publication on Interspeech 2024. 5 pages, 4 figures, 3 tables

  18. arXiv:2406.08112  [pdf, other

    cs.SD cs.AI eess.AS

    Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio

    Authors: Yi Lu, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Zhiyong Wang, Xin Qi, Xuefei Liu, Yongwei Li, Yukun Liu, Xiaopeng Wang, Shuchen Shi

    Abstract: With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024. arXiv admin note: substantial text overlap with arXiv:2405.04880

  19. arXiv:2406.05700  [pdf, other

    cs.CV eess.IV

    HDMba: Hyperspectral Remote Sensing Imagery Dehazing with State Space Model

    Authors: Hang Fu, Genyun Sun, Yinhe Li, **chang Ren, Aizhu Zhang, Cheng **g, Pedram Ghamisi

    Abstract: Haze contamination in hyperspectral remote sensing images (HSI) can lead to spatial visibility degradation and spectral distortion. Haze in HSI exhibits spatial irregularity and inhomogeneous spectral distribution, with few dehazing networks available. Current CNN and Transformer-based dehazing methods fail to balance global scene recovery, local detail retention, and computational efficiency. Ins… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  20. arXiv:2406.05692  [pdf, other

    cs.SD cs.AI eess.AS

    SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion

    Authors: Bingsong Bai, Feng** Wang, Yingming Gao, Ya Li

    Abstract: Diffusion-based singing voice conversion (SVC) models have shown better synthesis quality compared to traditional methods. However, in cross-domain SVC scenarios, where there is a significant disparity in pitch between the source and target voice domains, the models tend to generate audios with hoarseness, posing challenges in achieving high-quality vocal outputs. Therefore, in this paper, we prop… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  21. arXiv:2406.05452  [pdf, other

    eess.SP cs.IT

    Near-Field Channel Estimation for Extremely Large-Scale Terahertz Communications

    Authors: Songjie Yang, Yizhou Peng, Wanting Lyu, Ya Li, Hongjun He, Zhongpei Zhang, Chau Yuen

    Abstract: Future Terahertz communications exhibit significant potential in accommodating ultra-high-rate services. Employing extremely large-scale array antennas is a key approach to realize this potential, as they can harness substantial beamforming gains to overcome the severe path loss and leverage the electromagnetic advantages in the near field. This paper proposes novel estimation methods designed to… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  22. arXiv:2406.05170  [pdf

    q-bio.OT cs.CV eess.IV

    Research on Tumors Segmentation based on Image Enhancement Method

    Authors: Danyi Huang, Ziang Liu, Yizhou Li

    Abstract: One of the most effective ways to treat liver cancer is to perform precise liver resection surgery, the key step of which includes precise digital image segmentation of the liver and its tumor. However, traditional liver parenchymal segmentation techniques often face several challenges in performing liver segmentation: lack of precision, slow processing speed, and computational burden. These short… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  23. arXiv:2406.04683  [pdf, other

    cs.SD eess.AS

    PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation

    Authors: Shuchen Shi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Tao Wang, Chunyu Qiang, Yi Lu, Xin Qi, Xuefei Liu, Yukun Liu, Yongwei Li, Zhiyong Wang, Xiaopeng Wang

    Abstract: Text-to-Audio (TTA) aims to generate audio that corresponds to the given text description, playing a crucial role in media production. The text descriptions in TTA datasets lack rich variations and diversity, resulting in a drop in TTA model performance when faced with complex text. To address this issue, we propose a method called Portable Plug-in Prompt Refiner, which utilizes rich knowledge abo… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: accepted by INTERSPEECH2024

  24. arXiv:2406.04324  [pdf, other

    cs.CV eess.IV

    SF-V: Single Forward Video Generation Model

    Authors: Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian Ren

    Abstract: Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune p… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://snap-research.github.io/SF-V

  25. arXiv:2406.03714  [pdf, other

    cs.SD eess.AS

    Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining

    Authors: **long Xue, Yayue Deng, Yingming Gao, Ya Li

    Abstract: Recent prompt-based text-to-speech (TTS) models can clone an unseen speaker using only a short speech prompt. They leverage a strong in-context ability to mimic the speech prompts, including speaker style, prosody, and emotion. Therefore, the selection of a speech prompt greatly influences the generated speech, akin to the importance of a prompt in large language models (LLMs). However, current pr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  26. arXiv:2406.03706  [pdf, other

    cs.SD cs.CL eess.AS

    Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model

    Authors: **long Xue, Yayue Deng, Yicheng Han, Yingming Gao, Ya Li

    Abstract: Recent advances in large language models (LLMs) and development of audio codecs greatly propel the zero-shot TTS. They can synthesize personalized speech with only a 3-second speech of an unseen speaker as acoustic prompt. However, they only support short speech prompts and cannot leverage longer context information, as required in audiobook and conversational TTS scenarios. In this paper, we intr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  27. arXiv:2406.03247  [pdf, other

    cs.SD eess.AS

    Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection

    Authors: Xiaopeng Wang, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Yuankun Xie, Yukun Liu, Jianhua Tao, Xuefei Liu, Yongwei Li, Xin Qi, Yi Lu, Shuchen Shi

    Abstract: The generalization of Fake Audio Detection (FAD) is critical due to the emergence of new spoofing techniques. Traditional FAD methods often focus solely on distinguishing between genuine and known spoofed audio. We propose a Genuine-Focused Learning (GFL) framework guided, aiming for highly generalized FAD, called GFL-FAD. This method incorporates a Counterfactual Reasoning Enhanced Representation… ▽ More

    Submitted 9 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  28. arXiv:2406.03237  [pdf, other

    cs.SD eess.AS

    Generalized Fake Audio Detection via Deep Stable Learning

    Authors: Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Yuankun Xie, Yukun Liu, Xiaopeng Wang, Xuefei Liu, Yongwei Li, Jianhua Tao, Yi Lu, Xin Qi, Shuchen Shi

    Abstract: Although current fake audio detection approaches have achieved remarkable success on specific datasets, they often fail when evaluated with datasets from different distributions. Previous studies typically address distribution shift by focusing on using extra data or applying extra loss restrictions during training. However, these methods either require a substantial amount of data or complicate t… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: accepted by INTERSPEECH2024

  29. arXiv:2406.02410  [pdf, ps, other

    eess.SP

    Optimization of Rate-Splitting Multiple Access with Integrated Sensing and Backscatter Communication

    Authors: Diluka Galappaththige, Shayan Zargari, Chintha Tellambura, Geoffrey Ye Li

    Abstract: An integrated sensing and backscatter communication (ISABC) system is introduced herein. This system features a full-duplex (FD) base station (BS) that seamlessly merges sensing with backscatter communication and supports multiple users. Multiple access (MA) for the user is provided by employing rate-splitting multiple access (RSMA). RSMA, unlike other classical orthogonal and non-orthogonal MA sc… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 13 pages, 8 figures, Journal paper

  30. arXiv:2406.02126  [pdf, other

    eess.SY cs.AI cs.LG cs.MA

    CityLight: A Universal Model Towards Real-world City-scale Traffic Signal Control Coordination

    Authors: **wei Zeng, Chao Yu, Xinyi Yang, Wenxuan Ao, Jian Yuan, Yong Li, Yu Wang, Huazhong Yang

    Abstract: Traffic signal control (TSC) is a promising low-cost measure to enhance transportation efficiency without affecting existing road infrastructure. While various reinforcement learning-based TSC methods have been proposed and experimentally outperform conventional rule-based methods, none of them has been deployed in the real world. An essential gap lies in the oversimplification of the scenarios in… ▽ More

    Submitted 6 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  31. arXiv:2406.00516  [pdf, other

    eess.SY

    Deep Learning based Performance Testing for Analog Integrated Circuits

    Authors: Jiawei Cao, Chongtao Guo, Hao Li, Zhigang Wang, Houjun Wang, Geoffrey Ye Li

    Abstract: In this paper, we propose a deep learning based performance testing framework to minimize the number of required test modules while guaranteeing the accuracy requirement, where a test module corresponds to a combination of one circuit and one stimulus. First, we apply a deep neural network (DNN) to establish the map** from the response of the circuit under test (CUT) in each module to all specif… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  32. arXiv:2406.00485  [pdf

    eess.IV cs.RO

    TacShade A New 3D-printed Soft Optical Tactile Sensor Based on Light, Shadow and Greyscale for Shape Reconstruction

    Authors: Zhenyu Lu, Jialong Yang, Haoran Li, Yifan Li, Weiyong Si, Nathan Lepora, Chenguang Yang

    Abstract: In this paper, we present the TacShade a newly designed 3D-printed soft optical tactile sensor. The sensor is developed for shape reconstruction under the inspiration of sketch drawing that uses the density of sketch lines to draw light and shadow, resulting in the creation of a 3D-view effect. TacShade, building upon the strengths of the TacTip, a single-camera tactile sensor of large in-depth de… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by ICRA 2024

  33. arXiv:2406.00416  [pdf, other

    stat.ML cs.LG eess.SP

    Representation and De-interleaving of Mixtures of Hidden Markov Processes

    Authors: Jiadi Bao, Mengtao Zhu, Yunjie Li, Shafei Wang

    Abstract: De-interleaving of the mixtures of Hidden Markov Processes (HMPs) generally depends on its representation model. Existing representation models consider Markov chain mixtures rather than hidden Markov, resulting in the lack of robustness to non-ideal situations such as observation noise or missing observations. Besides, de-interleaving methods utilize a search-based strategy, which is time-consumi… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 13 pages, 9 figures, submitted to IEEE transactions on Signal Processing

  34. arXiv:2406.00211  [pdf

    cs.RO eess.SY

    Navigating Autonomous Vehicle on Unmarked Roads with Diffusion-Based Motion Prediction and Active Inference

    Authors: Yufei Huang, Yulin Li, Andrea Matta, Mohsen Jafari

    Abstract: This paper presents a novel approach to improving autonomous vehicle control in environments lacking clear road markings by integrating a diffusion-based motion predictor within an Active Inference Framework (AIF). Using a simulated parking lot environment as a parallel to unmarked roads, we develop and test our model to predict and guide vehicle movements effectively. The diffusion-based motion p… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  35. arXiv:2405.20723  [pdf, other

    physics.optics eess.SP

    Beaconless Auto-Alignment for Single-Wavelength 5 Tbit/s Mode-Division Multiplexing Free-Space Optical Communications

    Authors: Yiming Li, Gil Fernandes, David Benton, Antonin Billaud, Mohammed Patel, Andrew Ellis

    Abstract: Mode-division multiplexing has shown its ability to significantly increase the capacity of free-space optical communications. An accurate alignment is crucial to enable such links due to possible performance degradation induced by mode crosstalk and narrow beam divergence. Conventionally, a beacon beam is necessary for system alignment due to multiple local maximums in the mode-division multiplexe… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  36. arXiv:2405.20064  [pdf, other

    eess.AS cs.SD

    1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem

    Authors: Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu, Ziyang Ma, Peter Bell, Catherine Lai, Joshua Reiss, Lin Wang, Philip C. Woodland, Xie Chen, Huy Phan, Thomas Hain

    Abstract: Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Previous work has utilised class weighted loss for t… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  37. arXiv:2405.19298  [pdf, other

    cs.CV eess.IV

    Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

    Authors: Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, Shiqi Wang

    Abstract: While recent advancements in large multimodal models (LMMs) have significantly improved their abilities in image quality assessment (IQA) relying on absolute quality rating, how to transfer reliable relative quality comparison outputs to continuous perceptual quality scores remains largely unexplored. To address this gap, we introduce Compare2Score-an all-around LMM-based no-reference IQA (NR-IQA)… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  38. arXiv:2405.16942  [pdf, other

    eess.IV cs.CV

    PASTA: Pathology-Aware MRI to PET Cross-Modal Translation with Diffusion Models

    Authors: Yitong Li, Igor Yakushev, Dennis M. Hedderich, Christian Wachinger

    Abstract: Positron emission tomography (PET) is a well-established functional imaging technique for diagnosing brain disorders. However, PET's high costs and radiation exposure limit its widespread use. In contrast, magnetic resonance imaging (MRI) does not have these limitations. Although it also captures neurodegenerative changes, MRI is a less sensitive diagnostic tool than PET. To close this gap, we aim… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  39. arXiv:2405.16765  [pdf, ps, other

    cs.LG eess.SP

    Study of Robust Direction Finding Based on Joint Sparse Representation

    Authors: Y. Li, W. Xiao, L. Zhao, Z. Huang, Q. Li, L. Li, R. C. de Lamare

    Abstract: Standard Direction of Arrival (DOA) estimation methods are typically derived based on the Gaussian noise assumption, making them highly sensitive to outliers. Therefore, in the presence of impulsive noise, the performance of these methods may significantly deteriorate. In this paper, we model impulsive noise as Gaussian noise mixed with sparse outliers. By exploiting their statistical differences,… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 6 pages, 4 figures

  40. arXiv:2405.16677  [pdf, other

    eess.AS cs.CL cs.SD

    Crossmodal ASR Error Correction with Discrete Speech Units

    Authors: Yuanchao Li, Pinzhen Chen, Peter Bell, Catherine Lai

    Abstract: ASR remains unsatisfactory in scenarios where the speaking style diverges from that used to train ASR systems, resulting in erroneous transcripts. To address this, ASR Error Correction (AEC), a post-ASR processing approach, is required. In this work, we tackle an understudied issue: the Low-Resource Out-of-Domain (LROOD) problem, by investigating crossmodal AEC on very limited downstream data with… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  41. arXiv:2405.15705  [pdf, other

    cs.AR eess.SY

    Sums: Sniffing Unknown Multiband Signals under Low Sampling Rates

    Authors: **bo Peng, Zhe Chen, Zheng Lin, Haoxuan Yuan, Zihan Fang, Lingzhong Bao, Zihang Song, Ying Li, **g Ren, Yue Gao

    Abstract: Due to sophisticated deployments of all kinds of wireless networks (e.g., 5G, Wi-Fi, Bluetooth, LEO satellite, etc.), multiband signals distribute in a large bandwidth (e.g., from 70 MHz to 8 GHz). Consequently, for network monitoring and spectrum sharing applications, a sniffer for extracting physical layer information, such as structure of packet, with low sampling rate (especially, sub-Nyquist… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 12 pages, 9 figures

  42. arXiv:2405.13477  [pdf, other

    cs.HC cs.SD eess.AS

    A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction

    Authors: Yue Li, Florian A. Kunneman, Koen V. Hindriks

    Abstract: With current state-of-the-art automatic speech recognition (ASR) systems, it is not possible to transcribe overlap** speech audio streams separately. Consequently, when these ASR systems are used as part of a social robot like Pepper for interaction with a human, it is common practice to close the robot's microphone while it is talking itself. This prevents the human users to interrupt the robot… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 8 pages,16 figures, Under review by RoMan 2024 conference

  43. arXiv:2405.13339  [pdf, other

    eess.SP

    Floor-Plan-aided Indoor Localization: Zero-Shot Learning Framework, Data Sets, and Prototype

    Authors: Haiyao Yu, Changyang She, Yunkai Hu, Geng Wang, Rui Wang, Branka Vucetic, Yonghui Li

    Abstract: Machine learning has been considered a promising approach for indoor localization. Nevertheless, the sample efficiency, scalability, and generalization ability remain open issues of implementing learning-based algorithms in practical systems. In this paper, we establish a zero-shot learning framework that does not need real-world measurements in a new communication environment. Specifically, a gra… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  44. arXiv:2405.12408  [pdf, other

    cs.RO eess.SY

    Flexible Active Safety Motion Control for Robotic Obstacle Avoidance: A CBF-Guided MPC Approach

    Authors: **hao Liu, Jun Yang, Jianliang Mao, Tianqi Zhu, Qihang Xie, Yimeng Li, Xiangyu Wang, Shihua Li

    Abstract: A flexible active safety motion (FASM) control approach is proposed for the avoidance of dynamic obstacles and the reference tracking in robot manipulators. The distinctive feature of the proposed method lies in its utilization of control barrier functions (CBF) to design flexible CBF-guided safety criteria (CBFSC) with dynamically optimized decay rates, thereby offering flexibility and active saf… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 11 pages, 11 figures

  45. arXiv:2405.12064  [pdf, ps, other

    eess.SP

    Approximating Multi-Dimensional and Multiband Signals

    Authors: Yuhan Li, Tianyao Huang, Yimin Liu, Xiqin Wang

    Abstract: We study the problem of representing a discrete tensor that comes from finite uniform samplings of a multi-dimensional and multiband analog signal. Particularly, we consider two typical cases in which the shape of the subbands is cubic or parallelepipedic. For the cubic case, by examining the spectrum of its corresponding time- and band-limited operators, we obtain a low-dimensional optimal dictio… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  46. arXiv:2405.11306  [pdf, other

    eess.SP

    Meta Reinforcement Learning for Resource Allocation in Multi-Antenna UAV Network with Rate Splitting Multiple Access

    Authors: Hosein Zarini, Maryam Farajzadeh Dehkordi, Armin Farhadi, Mohammad Robat Mili, Ali Movaghar, Mehdi Rasti, Yonghui Li, Kai-Kit Wong

    Abstract: Unmanned aerial vehicles (UAVs) with multiple antennas have recently been explored to improve capacity in wireless networks. However, the strict energy constraint of UAVs, given their simultaneous flying and communication tasks, renders the exploration of energy-efficient multi-antenna techniques indispensable for UAVs. Meanwhile, lens antenna subarray (LAS) emerges as a promising energy-efficient… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  47. arXiv:2405.09851  [pdf, other

    eess.IV cs.CV q-bio.QM

    Region of Interest Detection in Melanocytic Skin Tumor Whole Slide Images -- Nevus & Melanoma

    Authors: Yi Cui, Yao Li, Jayson R. Miedema, Sharon N. Edmiston, Sherif Farag, J. S. Marron, Nancy E. Thomas

    Abstract: Automated region of interest detection in histopathological image analysis is a challenging and important topic with tremendous potential impact on clinical practice. The deep-learning methods used in computational pathology may help us to reduce costs and increase the speed and accuracy of cancer diagnosis. We started with the UNC Melanocytic Tumor Dataset cohort that contains 160 hematoxylin and… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 5 figures, NeurIPS 2022 Workshop

  48. arXiv:2405.05336  [pdf, other

    eess.IV cs.AI cs.CV

    Joint semi-supervised and contrastive learning enables zero-shot domain-adaptation and multi-domain segmentation

    Authors: Alvaro Gomariz, Yusuke Kikuchi, Yun Yvonna Li, Thomas Albrecht, Andreas Maunz, Daniela Ferrara, Huanxiang Lu, Orcun Goksel

    Abstract: Despite their effectiveness, current deep learning models face challenges with images coming from different domains with varying appearance and content. We introduce SegCLR, a versatile framework designed to segment volumetric images across different domains, employing supervised and contrastive learning simultaneously to effectively learn from both labeled and unlabeled data. We demonstrate the s… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  49. arXiv:2405.05252  [pdf, other

    cs.CV cs.AI cs.LG eess.IV eess.SP

    Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

    Authors: Hongjie Wang, Difan Liu, Yan Kang, Yijun Li, Zhe Lin, Niraj K. Jha, Yuchen Liu

    Abstract: Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  50. arXiv:2405.04962  [pdf, other

    eess.SP

    Bistatic OFDM-based ISAC with Over-the-Air Synchronization: System Concept and Performance Analysis

    Authors: David Brunner, Lucas Giroto de Oliveira, Charlotte Muth, Silvio Mandelli, Marcus Henninger, Axel Diewald, Yueheng Li, Mohamad Basim Alabd, Laurent Schmalen, Thomas Zwick, Benjamin Nuss

    Abstract: Integrated sensing and communication (ISAC) has been defined as one goal for 6G mobile communication systems. In this context, this article introduces a bistatic ISAC system based on orthogonal frequency-division multiplexing (OFDM). While the bistatic architecture brings advantages such as not demanding full duplex operation with respect to the monostatic one, the need for synchronizing transmitt… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.