Skip to main content

Showing 1–50 of 248 results for author: Yu, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00995  [pdf, other

    cs.CY eess.SY physics.app-ph

    Data on the Move: Traffic-Oriented Data Trading Platform Powered by AI Agent with Common Sense

    Authors: Yi Yu, Shengyue Yao, Tianchen Zhou, Yexuan Fu, **gru Yu, Ding Wang, Xuhong Wang, Cen Chen, Yilun Lin

    Abstract: In the digital era, data has become a pivotal asset, advancing technologies such as autonomous driving. Despite this, data trading faces challenges like the absence of robust pricing methods and the lack of trustworthy trading mechanisms. To address these challenges, we introduce a traffic-oriented data trading platform named Data on The Move (DTM), integrating traffic simulation, data trading, an… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.17578  [pdf, other

    eess.IV

    Sparse-view Signal-domain Photoacoustic Tomography Reconstruction Method Based on Neural Representation

    Authors: Bowei Yao, Yi Zeng, Haizhao Dai, Qing Wu, Youshen Xiao, Fei Gao, Yuyao Zhang, **gyi Yu, Xiran Cai

    Abstract: Photoacoustic tomography is a hybrid biomedical technology, which combines the advantages of acoustic and optical imaging. However, for the conventional image reconstruction method, the image quality is affected obviously by artifacts under the condition of sparse sampling. in this paper, a novel model-based sparse reconstruction method via implicit neural representation was proposed for improving… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2406.14264  [pdf, other

    eess.IV cs.CV

    Zero-Shot Image Denoising for High-Resolution Electron Microscopy

    Authors: Xuanyu Tian, Zhuoya Dong, Xiyue Lin, Yue Gao, Hongjiang Wei, Yanhang Ma, **gyi Yu, Yuyao Zhang

    Abstract: High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 12 figures

  4. arXiv:2406.08268  [pdf, other

    eess.SY

    Multi-Static ISAC based on Network-Assisted Full-Duplex Cell-Free Networks: Performance Analysis and Duplex Mode Optimization

    Authors: Fan Zeng, Ruoyun Liu, Xiaoyu Sun, **gxuan Yu, Jiamin Li, Pengchen Zhu, Dongming Wang, Xiaohu You

    Abstract: Multi-static integrated sensing and communication (ISAC) technology, which can achieve a wider coverage range and avoid self-interference, is an important trend for the future development of ISAC. Existing multi-static ISAC designs are unable to support the asymmetric uplink (UL)/downlink (DL) communication requirements in the scenario while simultaneously achieving optimal sensing performance. Th… ▽ More

    Submitted 12 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  5. arXiv:2406.02640  [pdf, other

    eess.IV physics.med-ph physics.optics

    Ghost imaging-based Non-contact Heart Rate Detection

    Authors: Jianming Yu, Yuchen He, Bin Li, Hui Chen, Huaibin Zheng, Jianbin Liu, Zhuo Xu

    Abstract: Remote heart rate measurement is an increasingly concerned research field, usually using remote photoplethysmography (rPPG) to collect heart rate information through video data collection. However, in certain specific scenarios (such as low light conditions, intense lighting, and non-line-of-sight situations), traditional imaging methods fail to capture image information effectively, that may lead… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 4 pages, 6 figures

  6. arXiv:2405.17028  [pdf, other

    cs.SD eess.AS

    RSET: Remap**-based Sorting Method for Emotion Transfer Speech Synthesis

    Authors: Haoxiang Shi, Jianzong Wang, Xulong Zhang, Ning Cheng, Jun Yu, **g Xiao

    Abstract: Although current Text-To-Speech (TTS) models are able to generate high-quality speech samples, there are still challenges in develo** emotion intensity controllable TTS. Most existing TTS models achieve emotion intensity control by extracting intensity information from reference speeches. Unfortunately, limited by the lack of modeling for intra-class emotion intensity and the model's information… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by the 8th APWeb-WAIM International Joint Conference on Web and Big Data

  7. I$^3$Net: Inter-Intra-slice Interpolation Network for Medical Slice Synthesis

    Authors: Haofei Song, Xintian Mao, **g Yu, Qingli Li, Yan Wang

    Abstract: Medical imaging is limited by acquisition time and scanning equipment. CT and MR volumes, reconstructed with thicker slices, are anisotropic with high in-plane resolution and low through-plane resolution. We reveal an intriguing phenomenon that due to the mentioned nature of data, performing slice-wise interpolation from the axial view can yield greater benefits than performing super-resolution fr… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  8. arXiv:2404.17890  [pdf, other

    eess.IV cs.AI cs.CV

    DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction

    Authors: Chenhe Du, Xiyue Lin, Qing Wu, Xuanyu Tian, Ying Su, Zhe Luo, Hongjiang Wei, S. Kevin Zhou, **gyi Yu, Yuyao Zhang

    Abstract: Limited-angle and sparse-view computed tomography (LACT and SVCT) are crucial for expanding the scope of X-ray CT applications. However, they face challenges due to incomplete data acquisition, resulting in diverse artifacts in the reconstructed CT images. Emerging implicit neural representation (INR) techniques, such as NeRF, NeAT, and NeRP, have shown promise in under-determined CT imaging recon… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 15 pages, 10 figures

    ACM Class: I.2.10; I.4.5

  9. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhi**g Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  10. arXiv:2404.11278  [pdf, other

    physics.ins-det eess.IV

    Study on the static detection of ICF target based on muonic X-ray sphere encoded imaging

    Authors: Dikai Li, Jian Yu, Qian Chen, Chunhui Zhang, Xiangyu Wan, Leifeng Cao

    Abstract: Muon Induced X-ray Emission (MIXE) was discovered by Chinese physicist Zhang Wenyu as early as 1947, and it can conduct non-destructive elemental analysis inside samples. Research has shown that MIXE can retain the high efficiency of direct imaging while benefiting from the low noise of pinhole imaging through encoding holes. The related technology significantly improves the counting rate while ma… ▽ More

    Submitted 17 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  11. arXiv:2404.10640  [pdf, other

    eess.IV

    Adapting SAM for Surgical Instrument Tracking and Segmentation in Endoscopic Submucosal Dissection Videos

    Authors: Jieming Yu, Long Bai, Guankun Wang, An Wang, Xiaoxiao Yang, Huxin Gao, Hongliang Ren

    Abstract: The precise tracking and segmentation of surgical instruments have led to a remarkable enhancement in the efficiency of surgical procedures. However, the challenge lies in achieving accurate segmentation of surgical instruments while minimizing the need for manual annotation and reducing the time required for the segmentation process. To tackle this, we propose a novel framework for surgical instr… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: To appear in IEEE ICRA 2024 C4SR+ Workshop

  12. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  13. arXiv:2404.04947  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    Gull: A Generative Multifunctional Audio Codec

    Authors: Yi Luo, Jianwei Yu, Hangting Chen, Rongzhi Gu, Chao Weng

    Abstract: We introduce Gull, a generative multifunctional audio codec. Gull is a general purpose neural audio compression and decompression model which can be applied to a wide range of tasks and applications such as real-time communication, audio super-resolution, and codec language models. The key components of Gull include (1) universal-sample-rate modeling via subband modeling schemes motivated by recen… ▽ More

    Submitted 7 June, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Demo page: https://yluo42.github.io/Gull/

  14. arXiv:2404.03869  [pdf, other

    cs.LG cs.AI cs.MA cs.RO eess.SY

    Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration

    Authors: Xudong Guo, Daming Shi, Junjie Yu, Wenhui Fan

    Abstract: The rise of multi-agent systems, especially the success of multi-agent reinforcement learning (MARL), is resha** our future across diverse domains like autonomous vehicle networks. However, MARL still faces significant challenges, particularly in achieving zero-shot scalability, which allows trained MARL models to be directly applied to unseen tasks with varying numbers of agents. In addition, r… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  15. arXiv:2403.12425  [pdf, other

    cs.CV cs.SD eess.AS

    Multimodal Fusion Method with Spatiotemporal Sequences and Relationship Learning for Valence-Arousal Estimation

    Authors: Jun Yu, Gongpeng Zhao, Yongqi Wang, Zhihong Wei, Yang Zheng, Zerui Zhang, Zhongpeng Cai, Guochen Xie, Jichao Zhu, Wangyuan Zhu

    Abstract: This paper presents our approach for the VA (Valence-Arousal) estimation task in the ABAW6 competition. We devised a comprehensive model by preprocessing video frames and audio segments to extract visual and audio features. Through the utilization of Temporal Convolutional Network (TCN) modules, we effectively captured the temporal and spatial correlations between these features. Subsequently, we… ▽ More

    Submitted 20 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 8 pages,3 figures

  16. arXiv:2403.11757  [pdf, other

    cs.MM cs.LG cs.SD eess.AS

    Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation

    Authors: Jun Yu, Wangyuan Zhu, Jichao Zhu

    Abstract: In this paper, we present the solution to the Emotional Mimicry Intensity (EMI) Estimation challenge, which is part of 6th Affective Behavior Analysis in-the-wild (ABAW) Competition.The EMI Estimation challenge task aims to evaluate the emotional intensity of seed videos by assessing them from a set of predefined emotion categories (i.e., "Admiration", "Amusement", "Determination", "Empathic Pain"… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  17. arXiv:2403.06066  [pdf

    eess.IV cs.CV cs.LG

    CausalCellSegmenter: Causal Inference inspired Diversified Aggregation Convolution for Pathology Image Segmentation

    Authors: Dawei Fan, Yifan Gao, Jiaming Yu, Yan** Chen, Wencheng Li, Chuancong Lin, Kaibin Li, Changcai Yang, Riqing Chen, Lifang Wei

    Abstract: Deep learning models have shown promising performance for cell nucleus segmentation in the field of pathology image analysis. However, training a robust model from multiple domains remains a great challenge for cell nucleus segmentation. Additionally, the shortcomings of background noise, highly overlap** between cell nucleus, and blurred edges often lead to poor performance. To address these ch… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: 10 pages, 5 figures, 2 tables, MICCAI

  18. arXiv:2403.05808  [pdf, other

    cs.CV eess.IV

    Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution

    Authors: Junxiong Lin, Yan Wang, Zeng Tao, Boyang Wang, Qing Zhao, Haorang Wang, Xuan Tong, Xinji Mai, Yuxuan Lin, Wei Song, Jiawen Yu, Shaoqi Yan, Wenqiang Zhang

    Abstract: Pre-trained diffusion models utilized for image generation encapsulate a substantial reservoir of a priori knowledge pertaining to intricate textures. Harnessing the potential of leveraging this a priori knowledge in the context of image super-resolution presents a compelling avenue. Nonetheless, prevailing diffusion-based methodologies presently overlook the constraints imposed by degradation inf… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  19. arXiv:2403.01428  [pdf, other

    cs.RO eess.SP

    Localization matters too: How localization error affects UAV flight

    Authors: Suquan Zhang, Yuanfan Xu, Shu'ang Yu, Qingmin Liao, **cheng Yu, Yu Wang

    Abstract: The maximum safe flight speed of a Unmanned Aerial Vehicle (UAV) is an important indicator for measuring its efficiency in completing various tasks. This indicator is influenced by numerous parameters such as UAV localization error, perception range, and system latency. However, in terms of localization errors, although there have been many studies dedicated to improving the localization capabilit… ▽ More

    Submitted 7 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: 8 pages,8 figures

  20. arXiv:2402.17187  [pdf, other

    eess.IV cs.CV

    PE-MVCNet: Multi-view and Cross-modal Fusion Network for Pulmonary Embolism Prediction

    Authors: Zhaoxin Guo, Zhipeng Wang, Ruiquan Ge, Jianxun Yu, Feiwei Qin, Yuan Tian, Yuqing Peng, Yonghong Li, Changmiao Wang

    Abstract: The early detection of a pulmonary embolism (PE) is critical for enhancing patient survival rates. Both image-based and non-image-based features are of utmost importance in medical classification tasks. In a clinical setting, physicians tend to rely on the contextual information provided by Electronic Medical Records (EMR) to interpret medical imaging. However, very few models effectively integrat… ▽ More

    Submitted 17 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  21. arXiv:2402.09729  [pdf, other

    cs.AI eess.SY

    Federated Prompt-based Decision Transformer for Customized VR Services in Mobile Edge Computing System

    Authors: Tailin Zhou, Jiadong Yu, Jun Zhang, Danny H. K. Tsang

    Abstract: This paper investigates resource allocation to provide heterogeneous users with customized virtual reality (VR) services in a mobile edge computing (MEC) system. We first introduce a quality of experience (QoE) metric to measure user experience, which considers the MEC system's latency, user attention levels, and preferred resolutions. Then, a QoE maximization problem is formulated for resource al… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  22. arXiv:2402.05254  [pdf, other

    cs.RO eess.SY

    Online and Certifiably Correct Visual Odometry and Map**

    Authors: Devansh R Agrawal, Rajiv Govindjee, Jiangbo Yu, Anurekha Ravikumar, Dimitra Panagou

    Abstract: This paper proposes two new algorithms for certified perception in safety-critical robotic applications. The first is a Certified Visual Odometry algorithm, which uses a RGBD camera with bounded sensor noise to construct a visual odometry estimate with provable error bounds. The second is a Certified Map** algorithm which, using the same RGBD images, constructs a Signed Distance Field of the obs… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 10 pages, 6 figures

  23. arXiv:2401.16099  [pdf, other

    stat.ME eess.IV

    A Ridgelet Approach to Poisson Denoising

    Authors: Ali Dadras, Klara Leffler, Jun Yu

    Abstract: This paper introduces a novel ridgelet transform-based method for Poisson image denoising. Our work focuses on harnessing the Poisson noise's unique non-additive and signal-dependent properties, distinguishing it from Gaussian noise. The core of our approach is a new thresholding scheme informed by theoretical insights into the ridgelet coefficients of Poisson-distributed images and adaptive thres… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 11 pages, 8 figures

  24. arXiv:2401.15993  [pdf, other

    cs.SD eess.AS

    Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings

    Authors: He Zhao, Hangting Chen, Jianwei Yu, Yuehai Wang

    Abstract: Target speaker extraction (TSE) aims to extract the target speaker's voice from the input mixture. Previous studies have concentrated on high-overlap** scenarios. However, real-world applications usually meet more complex scenarios like variable speaker overlap** and target speaker absence. In this paper, we introduces a framework to perform continuous TSE (C-TSE), comprising a target speaker… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 8 pages, 6 figures

  25. Gland Segmentation Via Dual Encoders and Boundary-Enhanced Attention

    Authors: Huadeng Wang, Jiejiang Yu, Bingbing Li, Xipeng Pan, Zhenbing Liu, Rushi Lan, Xiaonan Luo

    Abstract: Accurate and automated gland segmentation on pathological images can assist pathologists in diagnosing the malignancy of colorectal adenocarcinoma. However, due to various gland shapes, severe deformation of malignant glands, and overlap** adhesions between glands. Gland segmentation has always been very challenging. To address these problems, we propose a DEA model. This model consists of two b… ▽ More

    Submitted 9 May, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Published in: ICASSP 2024

    Journal ref: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, pp. 2345-2349,

  26. arXiv:2401.08049  [pdf, other

    cs.CV cs.SD eess.AS

    EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model

    Authors: Bingyuan Zhang, Xulong Zhang, Ning Cheng, Jun Yu, **g Xiao, Jianzong Wang

    Abstract: In recent years, the field of talking faces generation has attracted considerable attention, with certain methods adept at generating virtual faces that convincingly imitate human expressions. However, existing methods face challenges related to limited generalization, particularly when dealing with challenging identities. Furthermore, methods for editing expressions are often confined to a singul… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted by 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2024)

  27. arXiv:2312.15463  [pdf, other

    eess.AS cs.SD

    Consistent and Relevant: Rethink the Query Embedding in General Sound Separation

    Authors: Yuanyuan Wang, Hangting Chen, Dongchao Yang, Jianwei Yu, Chao Weng, Zhiyong Wu, Helen Meng

    Abstract: The query-based audio separation usually employs specific queries to extract target sources from a mixture of audio signals. Currently, most query-based separation models need additional networks to obtain query embedding. In this way, separation model is optimized to be adapted to the distribution of query embedding. However, query embedding may exhibit mismatches with separation models due to in… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  28. arXiv:2312.10381  [pdf, other

    cs.SD eess.AS

    SECap: Speech Emotion Captioning with Large Language Model

    Authors: Yaoxun Xu, Hangting Chen, Jianwei Yu, Qiaochu Huang, Zhiyong Wu, Shixiong Zhang, Guangzhi Li, Yi Luo, Rongzhi Gu

    Abstract: Speech emotions are crucial in human communication and are extensively used in fields like speech synthesis and natural language understanding. Most prior studies, such as speech emotion recognition, have categorized speech emotions into a fixed set of classes. Yet, emotions expressed in human speech are often complex, and categorizing them into predefined groups can be insufficient to adequately… ▽ More

    Submitted 23 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  29. arXiv:2312.04382  [pdf, other

    eess.IV cs.AI

    Adversarial Denoising Diffusion Model for Unsupervised Anomaly Detection

    Authors: Jongmin Yu, Hyeontaek Oh, **hong Yang

    Abstract: In this paper, we propose the Adversarial Denoising Diffusion Model (ADDM). The ADDM is based on the Denoising Diffusion Probabilistic Model (DDPM) but complementarily trained by adversarial learning. The proposed adversarial learning is achieved by classifying model-based denoised samples and samples to which random Gaussian noise is added to a specific sampling step. With the addition of explici… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Accepted for the poster session of DGM4H worshop on NeuralPS 2023

  30. arXiv:2311.14316  [pdf, other

    eess.SP cs.AI

    Windformer:Bi-Directional Long-Distance Spatio-Temporal Network For Wind Speed Prediction

    Authors: Xuewei Li, Zewen Shang, Zhiqiang Liu, Jian Yu, Wei Xiong, Mei Yu

    Abstract: Wind speed prediction is critical to the management of wind power generation. Due to the large range of wind speed fluctuations and wake effect, there may also be strong correlations between long-distance wind turbines. This difficult-to-extract feature has become a bottleneck for improving accuracy. History and future time information includes the trend of airflow changes, whether this dynamic in… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  31. arXiv:2311.06003  [pdf, ps, other

    eess.SP

    Passive Integrated Sensing and Communication Scheme based on RF Fingerprint Information Extraction for Cell-Free RAN

    Authors: **gxuan Yu, Fan Zeng, Jiamin Li, Feiyang Liu, Pengcheng Zhu, Dongming Wang, Xiaohu You

    Abstract: This paper investigates how to achieve integrated sensing and communication (ISAC) based on a cell-free radio access network (CF-RAN) architecture with a minimum footprint of communication resources. We propose a new passive sensing scheme. The scheme is based on the radio frequency (RF) fingerprint learning of the RF radio unit (RRU) to build an RF fingerprint library of RRUs. The source RRU is i… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: 11 pages, 6 figures, submitted on 28-Feb-2023, China Communication, Accepted on 14-Sep-2023

  32. arXiv:2311.05101  [pdf, other

    cs.IT eess.SP

    Integrated Sensing and Communication for Network-Assisted Full-Duplex Cell-Free Distributed Massive MIMO Systems

    Authors: Fan Zeng, **gxuan Yu, Jiamin Li, Feiyang Liu, Dongming Wang, Xiaohu You

    Abstract: In this paper, we combine the network-assisted full-duplex (NAFD) technology and distributed radar sensing to implement integrated sensing and communication (ISAC). The ISAC system features both uplink and downlink remote radio units (RRUs) equipped with communication and sensing capabilities. We evaluate the communication and sensing performance of the system using the sum communication rates and… ▽ More

    Submitted 13 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: 14 pages, 7 figures,submit to China Communication February 28, 2023, date of major revision July 09, 2023

  33. arXiv:2310.07284  [pdf, other

    eess.AS cs.CL

    Ty** to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction

    Authors: Xiang Hao, Jibin Wu, Jianwei Yu, Chenglin Xu, Kay Chen Tan

    Abstract: Humans possess an extraordinary ability to selectively focus on the sound source of interest amidst complex acoustic environments, commonly referred to as cocktail party scenarios. In an attempt to replicate this remarkable auditory attention capability in machines, target speaker extraction (TSE) models have been developed. These models leverage the pre-registered cues of the target speaker to ex… ▽ More

    Submitted 14 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Under review, https://github.com/haoxiangsnr/llm-tse

  34. arXiv:2310.06339  [pdf, other

    eess.IV cs.LG

    Automatic nodule identification and differentiation in ultrasound videos to facilitate per-nodule examination

    Authors: Siyuan Jiang, Yan Ding, Yuling Wang, Lei Xu, Wenli Dai, Wanru Chang, Jianfeng Zhang, Jie Yu, Jianqiao Zhou, Chunquan Zhang, ** Liang, Dexing Kong

    Abstract: Ultrasound is a vital diagnostic technique in health screening, with the advantages of non-invasive, cost-effective, and radiation free, and therefore is widely applied in the diagnosis of nodules. However, it relies heavily on the expertise and clinical experience of the sonographer. In ultrasound images, a single nodule might present heterogeneous appearances in different cross-sectional views w… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  35. arXiv:2309.17136  [pdf, other

    eess.SY

    Latent Dynamic Networked System Identification with High-Dimensional Networked Data

    Authors: Jiaxin Yu, Yanfang Mo, S. Joe Qin

    Abstract: Networked dynamic systems are ubiquitous in various domains, such as industrial processes, social networks, and biological systems. These systems produce high-dimensional data that reflect the complex interactions among the network nodes with rich sensor measurements. In this paper, we propose a novel algorithm for latent dynamic networked system identification that leverages the network structure… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  36. arXiv:2309.13905  [pdf, other

    eess.AS cs.SD

    AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data

    Authors: Jianwei Yu, Hangting Chen, Yanyao Bian, Xiang Li, Yi Luo, **chuan Tian, Mengyang Liu, Jiayi Jiang, Shuai Wang

    Abstract: Recently, the utilization of extensive open-sourced text data has significantly advanced the performance of text-based large language models (LLMs). However, the use of in-the-wild large-scale speech data in the speech technology community remains constrained. One reason for this limitation is that a considerable amount of the publicly available speech data is compromised by background noise, spee… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  37. arXiv:2309.11730  [pdf, other

    eess.AS cs.SD

    Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition

    Authors: Shuai Wang, Qibing Bai, Qi Liu, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li

    Abstract: Current speaker recognition systems primarily rely on supervised approaches, constrained by the scale of labeled datasets. To boost the system performance, researchers leverage large pretrained models such as WavLM to transfer learned high-level features to the downstream speaker recognition task. However, this approach introduces extra parameters as the pretrained model remains in the inference s… ▽ More

    Submitted 26 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: submitted to ICASSP 2024

  38. arXiv:2309.10738  [pdf, other

    cs.SD cs.AI cs.CL cs.IR cs.MM eess.AS

    MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation

    Authors: Xinda Wu, Zhijie Huang, Kejun Zhang, Jiaxing Yu, Xu Tan, Tieyao Zhang, Zihao Wang, Lingyun Sun

    Abstract: Pre-trained language models have achieved impressive results in various music understanding and generation tasks. However, existing pre-training methods for symbolic melody generation struggle to capture multi-scale, multi-dimensional structural information in note sequences, due to the domain knowledge discrepancy between text and music. Moreover, the lack of available large-scale symbolic melody… ▽ More

    Submitted 20 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

  39. arXiv:2309.07757  [pdf, other

    eess.AS cs.SD

    Complexity Scaling for Speech Denoising

    Authors: Hangting Chen, Jianwei Yu, Chao Weng

    Abstract: Computational complexity is critical when deploying deep learning-based speech denoising models for on-device applications. Most prior research focused on optimizing model architectures to meet specific computational cost constraints, often creating distinct neural network architectures for different complexity limitations. This study conducts complexity scaling for speech denoising tasks, aiming… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP2024

  40. arXiv:2309.04655  [pdf

    cs.RO cs.LG eess.SP eess.SY

    Intelligent upper-limb exoskeleton integrated with soft wearable bioelectronics and deep-learning for human intention-driven strength augmentation based on sensory feedback

    Authors: **woo Lee, Kangkyu Kwon, Ira Soltis, Jared Matthews, Yoonjae Lee, Hojoong Kim, Lissette Romero, Nathan Zavanelli, Young** Kwon, Shinjae Kwon, Jimin Lee, Yewon Na, Sung Hoon Lee, Ki Jun Yu, Minoru Shinohara, Frank L. Hammond, Woon-Hong Yeo

    Abstract: The age and stroke-associated decline in musculoskeletal strength degrades the ability to perform daily human tasks using the upper extremities. Although there are a few examples of exoskeletons, they need manual operations due to the absence of sensor feedback and no intention prediction of movements. Here, we introduce an intelligent upper-limb exoskeleton system that uses cloud-based deep learn… ▽ More

    Submitted 26 January, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

    Comments: 15 pages, 6 figures, 1 table, published in npj flexible electronics journals

    MSC Class: 68T40 (Primary) 92C55; 68T99 (Secondary)

  41. arXiv:2309.01161  [pdf, other

    math.OC eess.SY stat.ME

    Probabilistic Reduced-Dimensional Vector Autoregressive Modeling for Dynamics Prediction and Reconstruction with Oblique Projections

    Authors: Yanfang Mo, Jiaxin Yu, S. Joe Qin

    Abstract: In this paper, we propose a probabilistic reduced-dimensional vector autoregressive (PredVAR) model with oblique projections. This model partitions the measurement space into a dynamic subspace and a static subspace that do not need to be orthogonal. The partition allows us to apply an oblique projection to extract dynamic latent variables (DLVs) from high-dimensional data with maximized predictab… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

  42. Empirical Modeling of Variance in Medium Frequency R-Mode Time-of-Arrival Measurements

    Authors: Jaewon Yu, Pyo-Woong Son

    Abstract: The R-Mode system, an advanced terrestrial integrated navigation system, is designed to address the vulnerabilities of global navigation satellite systems (GNSS) and explore the potential of a complementary navigation system. This study aims to enhance the accuracy of performance simulation for the medium frequency (MF) R-Mode system by modeling the variance of time-of-arrival (TOA) measurements b… ▽ More

    Submitted 31 August, 2023; originally announced September 2023.

    Comments: 4 pages, 2 figures

  43. arXiv:2308.13790  [pdf, other

    eess.IV cs.CV

    FFPN: Fourier Feature Pyramid Network for Ultrasound Image Segmentation

    Authors: Chaoyu Chen, Xin Yang, Rusi Chen, Junxuan Yu, Liwei Du, Jian Wang, Xindi Hu, Yan Cao, Yingying Liu, Dong Ni

    Abstract: Ultrasound (US) image segmentation is an active research area that requires real-time and highly accurate analysis in many scenarios. The detect-to-segment (DTS) frameworks have been recently proposed to balance accuracy and efficiency. However, existing approaches may suffer from inadequate contour encoding or fail to effectively leverage the encoded results. In this paper, we introduce a novel F… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

    Comments: 10 pages, 5 figures, Accepted by MLMI 2023

  44. arXiv:2308.13736  [pdf, other

    cs.SD cs.AI cs.HC eess.AS

    A Comprehensive Survey for Evaluation Methodologies of AI-Generated Music

    Authors: Zeyu Xiong, Weitao Wang, **g Yu, Yue Lin, Ziyan Wang

    Abstract: In recent years, AI-generated music has made significant progress, with several models performing well in multimodal and complex musical genres and scenes. While objective metrics can be used to evaluate generative music, they often lack interpretability for musical evaluation. Therefore, researchers often resort to subjective user studies to assess the quality of the generated works, which can be… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  45. arXiv:2308.12985  [pdf

    cs.AI eess.SY

    Perimeter Control with Heterogeneous Metering Rates for Cordon Signals: A Physics-Regularized Multi-Agent Reinforcement Learning Approach

    Authors: Jiajie Yu, Pierre-Antoine Laharotte, Yu Han, Wei Ma, Ludovic Leclercq

    Abstract: Perimeter Control (PC) strategies have been proposed to address urban road network control in oversaturated situations by regulating the transfer flow of the Protected Network (PN) based on the Macroscopic Fundamental Diagram (MFD). The uniform metering rate for cordon signals in most existing studies overlooks the variance of local traffic states at the intersection level, which may cause severe… ▽ More

    Submitted 31 May, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: 21 pages, 24 figures

  46. Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression

    Authors: Hangting Chen, Jianwei Yu, Yi Luo, Rongzhi Gu, Weihua Li, Zhuocheng Lu, Chao Weng

    Abstract: Echo cancellation and noise reduction are essential for full-duplex communication, yet most existing neural networks have high computational costs and are inflexible in tuning model complexity. In this paper, we introduce time-frequency dual-path compression to achieve a wide range of compression ratios on computational cost. Specifically, for frequency compression, trainable filters are used to r… ▽ More

    Submitted 10 October, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: Proceedings of INTERSPEECH

  47. arXiv:2308.08125  [pdf, other

    cs.SD cs.CL cs.HC eess.AS

    Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals

    Authors: Running Zhao, Jiangtao Yu, Hang Zhao, Edith C. H. Ngai

    Abstract: Millimeter wave (mmWave) based speech recognition provides more possibility for audio-related applications, such as conference speech transcription and eavesdrop**. However, considering the practicality in real scenarios, latency and recognizable vocabulary size are two critical factors that cannot be overlooked. In this paper, we propose Radio2Text, the first mmWave-based system for streaming a… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: Accepted by Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (ACM IMWUT/UbiComp 2023)

  48. arXiv:2308.06981  [pdf, other

    eess.AS cs.SD

    The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track

    Authors: Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji

    Abstract: This paper summarizes the cinematic demixing (CDX) track of the Sound Demixing Challenge 2023 (SDX'23). We provide a comprehensive summary of the challenge setup, detailing the structure of the competition and the datasets used. Especially, we detail CDXDB23, a new hidden dataset constructed from real movies that was used to rank the submissions. The paper also offers insights into the most succes… ▽ More

    Submitted 18 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted for Transactions of the International Society for Music Information Retrieval

  49. arXiv:2308.06979  [pdf, other

    eess.AS cs.SD

    The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track

    Authors: Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang , et al. (2 additional authors not shown)

    Abstract: This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce t… ▽ More

    Submitted 19 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: Published in Transactions of the International Society for Music Information Retrieval (https://transactions.ismir.net/articles/10.5334/tismir.171)

    Journal ref: Transactions of the International Society for Music Information Retrieval, 7(1), pp.63-84, 2024

  50. arXiv:2308.03769  [pdf, other

    eess.SY cs.AI math.OC

    Towards Integrated Traffic Control with Operating Decentralized Autonomous Organization

    Authors: Shengyue Yao, **gru Yu, Yi Yu, Jia Xu, Xingyuan Dai, Honghai Li, Fei-Yue Wang, Yilun Lin

    Abstract: With a growing complexity of the intelligent traffic system (ITS), an integrated control of ITS that is capable of considering plentiful heterogeneous intelligent agents is desired. However, existing control methods based on the centralized or the decentralized scheme have not presented their competencies in considering the optimality and the scalability simultaneously. To address this issue, we p… ▽ More

    Submitted 25 July, 2023; originally announced August 2023.

    Comments: 6 pages, 6 figures. To be published in 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC)