Skip to main content

Showing 1–50 of 154 results for author: Luo, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.14878  [pdf, other

    cs.CV cs.LG eess.IV

    MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection

    Authors: Zhuoxiao Chen, Junjie Meng, Mahsa Baktashmotlagh, Zi Huang, Yadan Luo

    Abstract: LiDAR-based 3D object detection is pivotal across many applications, yet the performance of such detection systems often degrades after deployment, especially when faced with unseen test point clouds originating from diverse locations or subjected to corruption. In this work, we introduce a new online adaptation framework for detectors named Model Synergy (MOS). Specifically, MOS dynamically assem… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2406.13788  [pdf, other

    eess.SP

    Groupwise Deformable Registration of Diffusion Tensor Cardiovascular Magnetic Resonance: Disentangling Diffusion Contrast, Respiratory and Cardiac Motions

    Authors: Fanwen Wang, Yihao Luo, Ke Wen, Jiahao Huang, Pedro F. Ferreira, Yaqing Luo, Yinzhe Wu, Camila Munoz, Dudley J. Pennell, Andrew D. Scott, Sonia Nielles-Vallespin, Guang Yang

    Abstract: Diffusion tensor based cardiovascular magnetic resonance (DT-CMR) offers a non-invasive method to visualize the myocardial microstructure. With the assumption that the heart is stationary, frames are acquired with multiple repetitions for different diffusion encoding directions. However, motion from poor breath-holding and imprecise cardiac triggering complicates DT-CMR analysis, further challenge… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI 2024

  3. arXiv:2406.13708  [pdf

    eess.IV physics.med-ph

    Low-rank based motion correction followed by automatic frame selection in DT-CMR

    Authors: Fanwen Wang, Pedro F. Ferreira, Camila Munoz, Ke Wen, Yaqing Luo, Jiahao Huang, Yinzhe Wu, Dudley J. Pennell, Andrew D. Scott, Sonia Nielles-Vallespin, Guang Yang

    Abstract: Motivation: Post-processing of in-vivo diffusion tensor CMR (DT-CMR) is challenging due to the low SNR and variation in contrast between frames which makes image registration difficult, and the need to manually reject frames corrupted by motion. Goals: To develop a semi-automatic post-processing pipeline for robust DT-CMR registration and automatic frame selection. Approach: We used low intrinsic… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted as ISMRM 2024 Digital poster 2141

    Journal ref: ISMRM 2024 Digital poster 2141

  4. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  5. arXiv:2406.04791  [pdf, other

    cs.SD eess.AS

    Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

    Authors: Shaojun Li, Daimeng Wei, Jiaxin Guo, ZongYao Li, Zhanglin Wu, Zhiqiang Rao, Yuanchang Luo, Xianghui He, Hao Yang

    Abstract: Despite recent improvements in End-to-End Automatic Speech Recognition (E2E ASR) systems, the performance can degrade due to vocal characteristic mismatches between training and testing data, particularly with limited target speaker adaptation data. We propose a novel speaker adaptation approach Speaker-Smoothed kNN that leverages k-Nearest Neighbors (kNN) retrieval techniques to improve model out… ▽ More

    Submitted 11 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  6. arXiv:2406.00497  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    Recent Advances in End-to-End Simultaneous Speech Translation

    Authors: Xiaoqian Liu, Guoqiang Hu, Yangfan Du, Erfeng He, YingFeng Luo, Chen Xu, Tong Xiao, **gbo Zhu

    Abstract: Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles.… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  7. arXiv:2406.00492  [pdf, other

    eess.IV cs.CV cs.LG

    SAM-VMNet: Deep Neural Networks For Coronary Angiography Vessel Segmentation

    Authors: Xueying Zeng, Baixiang Huang, Yu Luo, Guangyu Wei, Songyan He, Yushuang Shao

    Abstract: Coronary artery disease (CAD) is one of the most prevalent diseases in the cardiovascular field and one of the major contributors to death worldwide. Computed Tomography Angiography (CTA) images are regarded as the authoritative standard for the diagnosis of coronary artery disease, and by performing vessel segmentation and stenosis detection on CTA images, physicians are able to diagnose coronary… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  8. arXiv:2405.17241  [pdf, other

    cs.CV eess.IV

    NeurTV: Total Variation on the Neural Domain

    Authors: Yisi Luo, Xile Zhao, Kai Ye, Deyu Meng

    Abstract: Recently, we have witnessed the success of total variation (TV) for many imaging applications. However, traditional TV is defined on the original pixel domain, which limits its potential. In this work, we suggest a new TV regularization defined on the neural domain. Concretely, the discrete data is continuously and implicitly represented by a deep neural network (DNN), and we use the derivatives o… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    MSC Class: 94A08; 68U10; 68T45

  9. arXiv:2405.05564  [pdf, other

    eess.IV cs.CV cs.LG

    Joint Edge Optimization Deep Unfolding Network for Accelerated MRI Reconstruction

    Authors: Yue Cai, Yu Luo, Jie Ling, Shun Yao

    Abstract: Magnetic Resonance Imaging (MRI) is a widely used imaging technique, however it has the limitation of long scanning time. Though previous model-based and learning-based MRI reconstruction methods have shown promising performance, most of them have not fully utilized the edge prior of MR images, and there is still much room for improvement. In this paper, we build a joint edge optimization model th… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  10. arXiv:2405.04865  [pdf, ps, other

    cs.LG eess.SP

    Regime Learning for Differentiable Particle Filters

    Authors: John-Joseph Brady, Yuhui Luo, Wenwu Wang, Victor Elvira, Yunpeng Li

    Abstract: Differentiable particle filters are an emerging class of models that combine sequential Monte Carlo techniques with the flexibility of neural networks to perform state space inference. This paper concerns the case where the system may switch between a finite set of state-space models, i.e. regimes. No prior approaches effectively learn both the individual regimes and the switching process simultan… ▽ More

    Submitted 12 June, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    MSC Class: 68T37 ACM Class: I.2.6

  11. arXiv:2404.15163  [pdf, other

    cs.CV eess.IV

    Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment

    Authors: Tianwei Zhou, Songbai Tan, Wei Zhou, Yu Luo, Yuan-Gen Wang, Guanghui Yue

    Abstract: With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a nov… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: IEEE Transactions on Broadcasting (TBC)

  12. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  13. arXiv:2404.04947  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    Gull: A Generative Multifunctional Audio Codec

    Authors: Yi Luo, Jianwei Yu, Hangting Chen, Rongzhi Gu, Chao Weng

    Abstract: We introduce Gull, a generative multifunctional audio codec. Gull is a general purpose neural audio compression and decompression model which can be applied to a wide range of tasks and applications such as real-time communication, audio super-resolution, and codec language models. The key components of Gull include (1) universal-sample-rate modeling via subband modeling schemes motivated by recen… ▽ More

    Submitted 7 June, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Demo page: https://yluo42.github.io/Gull/

  14. arXiv:2404.01563  [pdf

    eess.IV cs.CV

    Two-Phase Multi-Dose-Level PET Image Reconstruction with Dose Level Awareness

    Authors: Yuchen Fei, Yanmei Luo, Yan Wang, Jiaqi Cui, Yuanyuan Xu, Jiliu Zhou, Dinggang Shen

    Abstract: To obtain high-quality positron emission tomography (PET) while minimizing radiation exposure, a range of methods have been designed to reconstruct standard-dose PET (SPET) from corresponding low-dose PET (LPET) images. However, most current methods merely learn the map** between single-dose-level LPET and SPET images, but omit the dose disparity of LPET images in clinical scenarios. In this pap… ▽ More

    Submitted 10 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted by ISBI2024

  15. arXiv:2403.11092  [pdf, other

    cs.CL cs.AI cs.CV cs.CY eess.IV

    Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

    Authors: Michael Saxon, Yiran Luo, Sharon Levy, Chitta Baral, Yezhou Yang, William Yang Wang

    Abstract: Benchmarks of the multilingual capabilities of text-to-image (T2I) models compare generated images prompted in a test language to an expected image distribution over a concept set. One such benchmark, "Conceptual Coverage Across Languages" (CoCo-CroLa), assesses the tangible noun inventory of T2I models by prompting them to generate pictures from a concept list translated to seven languages and co… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: NAACL 2024 Main Conference

  16. Intelligent Reflecting Surfaces vs. Full-Duplex Relays: A Comparison in the Air

    Authors: Qian Ding, Jie Yang, Yang Luo, Chunbo Luo

    Abstract: This letter aims to provide a fundamental analytical comparison for the two major types of relaying methods: intelligent reflecting surfaces and full-duplex relays, particularly focusing on unmanned aerial vehicle communication scenarios. Both amplify-and-forward and decode-and-forward relaying schemes are included in the comparison. In addition, optimal 3D UAV deployment and minimum transmit powe… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Journal ref: IEEE Communications Letters, vol. 28, no. 2, pp. 397-401, Feb. 2024

  17. arXiv:2403.09223  [pdf, other

    cs.LG eess.SP

    MCformer: Multivariate Time Series Forecasting with Mixed-Channels Transformer

    Authors: Wenyong Han, Tao Zhu Member, Liming Chen, Huansheng Ning, Yang Luo, Ya** Wan

    Abstract: The massive generation of time-series data by largescale Internet of Things (IoT) devices necessitates the exploration of more effective models for multivariate time-series forecasting. In previous models, there was a predominant use of the Channel Dependence (CD) strategy (where each channel represents a univariate sequence). Current state-of-the-art (SOTA) models primarily rely on the Channel In… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  18. arXiv:2403.08680  [pdf, other

    eess.SY

    Towards the THz Networks in the 6G Era

    Authors: Qian Ding, Jie Yang, Yang Luo, Chunbo Luo

    Abstract: This commentary dedicates to envision what role THz is going to play in the coming human-centric 6G era. Three distinct THz network types including outdoor, indoor, and body area networks are discussed, with an emphasis on their capabilities in human body detection. Synthesizing these networks will unlock a bunch of fascinating applications across industrial, biomedical and entertainment fields, s… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  19. arXiv:2403.04626  [pdf, other

    eess.IV cs.CL cs.CV cs.LG

    MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder

    Authors: Lei Li, Tianfang Zhang, Xinglin Zhang, Jiaqi Liu, Bingqi Ma, Yan Luo, Tao Chen

    Abstract: Within the domain of medical analysis, extensive research has explored the potential of mutual learning between Masked Autoencoders(MAEs) and multimodal data. However, the impact of MAEs on intermodality remains a key challenge. We introduce MedFLIP, a Fast Language-Image Pre-training method for Medical analysis. We explore MAEs for zero-shot learning with crossed domains, which enhances the model… ▽ More

    Submitted 30 May, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  20. arXiv:2403.03809  [pdf, other

    eess.SP

    Variational Bayesian Learning based Joint Localization and Channel Estimation with Distance-dependent Noise

    Authors: Yunfei Li, Yiting Luo, Weiqiang Tan, Chunguo Li, Shaodan Ma, Guanghua Yang

    Abstract: In the Industrial Internet of Things (IIoTs) and Ocean of Things (OoTs), the advent of massive intelligent services has imposed stringent requirements on both communication and localization, particularly emphasizing precise localization and channel information. This paper focuses on the challenge of jointly optimizing localization and communication in IoT networks. Departing from the conventional… ▽ More

    Submitted 6 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  21. arXiv:2403.01265  [pdf, other

    cs.RO eess.SY

    Smooth Computation without Input Delay: Robust Tube-Based Model Predictive Control for Robot Manipulator Planning

    Authors: Yu Luo, Qie Sima, Tianying Ji, Fuchun Sun, Hua** Liu, Jianwei Zhang

    Abstract: Model Predictive Control (MPC) has exhibited remarkable capabilities in optimizing objectives and meeting constraints. However, the substantial computational burden associated with solving the Optimal Control Problem (OCP) at each triggering instant introduces significant delays between state sampling and control application. These delays limit the practicality of MPC in resource-constrained syste… ▽ More

    Submitted 7 May, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2103.09693

  22. arXiv:2403.01093  [pdf, other

    eess.SP

    Variational Bayesian Learning Based Localization and Channel Reconstruction in RIS-aided Systems

    Authors: Yunfei Li, Yiting Luo, Xianda Wu, Zheng Shi, Shaodan Ma, Guanghua Yang

    Abstract: The emerging immersive and autonomous services have posed stringent requirements on both communications and localization. By considering the great potential of reconfigurable intelligent surface (RIS), this paper focuses on the joint channel estimation and localization for RIS-aided wireless systems. As opposed to existing works that treat channel estimation and localization independently, this pa… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  23. arXiv:2402.15897  [pdf, other

    eess.SP

    MMW-Carry: Enhancing Carry Object Detection through Millimeter-Wave Radar-Camera Fusion

    Authors: Xiangyu Gao, Youchen Luo, Ali Alansari, Ya** Sun

    Abstract: This paper introduces MMW-Carry, a system designed to predict the probability of individuals carrying various objects using millimeter-wave radar signals, complemented by camera input. The primary goal of MMW-Carry is to provide a rapid and cost-effective preliminary screening solution, specifically tailored for non-super-sensitive scenarios. Overall, MMW-Carry achieves significant advancements in… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 10 pages

  24. arXiv:2401.00708  [pdf, other

    cs.CV eess.IV

    Revisiting Nonlocal Self-Similarity from Continuous Representation

    Authors: Yisi Luo, Xile Zhao, Deyu Meng

    Abstract: Nonlocal self-similarity (NSS) is an important prior that has been successfully applied in multi-dimensional data processing tasks, e.g., image and video recovery. However, existing NSS-based methods are solely suitable for meshgrid data such as images and videos, but are not suitable for emerging off-meshgrid data, e.g., point cloud and climate data. In this work, we revisit the NSS from the cont… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  25. arXiv:2312.15659  [pdf, other

    eess.IV

    Perceptual Quality Assessment for Video Frame Interpolation

    Authors: **liang Han, Xiongkuo Min, Yixuan Gao, Jun Jia, Lei Sun, Zuowei Cao, Yonglin Luo, Guangtao Zhai

    Abstract: The quality of frames is significant for both research and application of video frame interpolation (VFI). In recent VFI studies, the methods of full-reference image quality assessment have generally been used to evaluate the quality of VFI frames. However, high frame rate reference videos, necessities for the full-reference methods, are difficult to obtain in most applications of VFI. To evaluate… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 5 pages, 4 figures

    ACM Class: I.4.0

  26. arXiv:2312.13744  [pdf

    eess.SY

    Modelling of Networked Measuring Systems -- From White-Box Models to Data Based Approaches

    Authors: Klaus-Dieter Sommer, Peter Harris, Sascha Eichstädt, Roland Füssl, Tanja Dorst, Andreas Schütze, Michael Heizmann, Nadine Schiering, Andreas Maier, Yuhui Luo, Christos Tachtatzis, Ivan Andonovic, Gordon Gourlay

    Abstract: Mathematical modelling is at the core of metrology as it transforms raw measured data into useful measurement results. A model captures the relationship between the measurand and all relevant quantities on which the measurand depends, and is used to design measuring systems, analyse measured data, make inferences and predictions, and is the basis for evaluating measurement uncertainties. Tradition… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  27. arXiv:2312.10381  [pdf, other

    cs.SD eess.AS

    SECap: Speech Emotion Captioning with Large Language Model

    Authors: Yaoxun Xu, Hangting Chen, Jianwei Yu, Qiaochu Huang, Zhiyong Wu, Shixiong Zhang, Guangzhi Li, Yi Luo, Rongzhi Gu

    Abstract: Speech emotions are crucial in human communication and are extensively used in fields like speech synthesis and natural language understanding. Most prior studies, such as speech emotion recognition, have categorized speech emotions into a fixed set of classes. Yet, emotions expressed in human speech are often complex, and categorizing them into predefined groups can be insufficient to adequately… ▽ More

    Submitted 23 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  28. arXiv:2312.05279  [pdf

    eess.IV cs.CV

    Quantitative perfusion maps using a novelty spatiotemporal convolutional neural network

    Authors: Anbo Cao, Pin-Yu Le, Zhonghui Qie, Haseeb Hassan, Yingwei Guo, Asim Zaman, Jiaxi Lu, Xueqiang Zeng, Huihui Yang, Xiaoqiang Miao, Taiyu Han, Guangtao Huang, Yan Kang, Yu Luo, Jia Guo

    Abstract: Dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) is widely used to evaluate acute ischemic stroke to distinguish salvageable tissue and infarct core. For this purpose, traditional methods employ deconvolution techniques, like singular value decomposition, which are known to be vulnerable to noise, potentially distorting the derived perfusion parameters. However, deep learning t… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  29. arXiv:2312.03464  [pdf, other

    cs.LG cs.SD eess.AS

    Subnetwork-to-go: Elastic Neural Network with Dynamic Training and Customizable Inference

    Authors: Kai Li, Yi Luo

    Abstract: Deploying neural networks to different devices or platforms is in general challenging, especially when the model size is large or model complexity is high. Although there exist ways for model pruning or distillation, it is typically required to perform a full round of model training or finetuning procedure in order to obtain a smaller model that satisfies the model size or complexity constraints.… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 5 pages, 3 figures

  30. arXiv:2311.01781  [pdf, other

    eess.SY

    Passive Handwriting Tracking via Weak mmWave Communication Signals

    Authors: Chao Yu, Yan Luo, Renqi Chen, Rui Wang

    Abstract: In this letter, a cooperative sensing framework based on millimeter wave (mmWave) communication systems is proposed to detect tiny motions with a millimeter-level resolution. Particularly, the cooperative sensing framework is facilitated with one transmitter and two receivers. There are two radio frequency (RF) chains at each receiver. Hence, the Doppler effect due to the tiny motions can be detec… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  31. arXiv:2310.05813  [pdf, other

    cs.SD eess.AS

    Audio compression-assisted feature extraction for voice replay attack detection

    Authors: Xiangyu Shi, Yuhao Luo, Li Wang, Haorui He, Hao Li, Lei Wang, Zhizheng Wu

    Abstract: Replay attack is one of the most effective and simplest voice spoofing attacks. Detecting replay attacks is challenging, according to the Automatic Speaker Verification Spoofing and Countermeasures Challenge 2021 (ASVspoof 2021), because they involve a loudspeaker, a microphone, and acoustic conditions (e.g., background noise). One obstacle to detecting replay attacks is finding robust feature rep… ▽ More

    Submitted 10 October, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

  32. arXiv:2310.05369  [pdf, other

    cs.SD eess.AS

    AdvSV: An Over-the-Air Adversarial Attack Dataset for Speaker Verification

    Authors: Li Wang, Jiaqi Li, Yuhao Luo, Jiahao Zheng, Lei Wang, Hao Li, Ke Xu, Chengfang Fang, Jie Shi, Zhizheng Wu

    Abstract: It is known that deep neural networks are vulnerable to adversarial attacks. Although Automatic Speaker Verification (ASV) built on top of deep neural networks exhibits robust performance in controlled scenarios, many studies confirm that ASV is vulnerable to adversarial attacks. The lack of a standard dataset is a bottleneck for further research, especially reproducible research. In this study, w… ▽ More

    Submitted 16 January, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted by ICASSP2024

  33. arXiv:2309.13905  [pdf, other

    eess.AS cs.SD

    AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data

    Authors: Jianwei Yu, Hangting Chen, Yanyao Bian, Xiang Li, Yi Luo, **chuan Tian, Mengyang Liu, Jiayi Jiang, Shuai Wang

    Abstract: Recently, the utilization of extensive open-sourced text data has significantly advanced the performance of text-based large language models (LLMs). However, the use of in-the-wild large-scale speech data in the speech technology community remains constrained. One reason for this limitation is that a considerable amount of the publicly available speech data is compromised by background noise, spee… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  34. arXiv:2309.06598  [pdf

    eess.IV

    Efficient Post-processing of Diffusion Tensor Cardiac Magnetic Imaging Using Texture-conserving Deformable Registration

    Authors: Fanwen Wang, Pedro F. Ferreira, Yinzhe Wu, Camila Munoz, Ke Wen, Yaqing Luo, Jiahao Huang, Dudley J. Pennell, Andrew D. Scott, Sonia Nielles-Vallespin, Guang Yang

    Abstract: Diffusion tensor cardiac magnetic resonance (DT-CMR) is a method capable of providing non-invasive measurements of myocardial microstructure. Image registration is essential to correct image shifts due to intra and inter breath-hold motion and imperfect cardiac triggering. Registration is challenging in DT-CMR due to the low signal-to-noise and various contrasts induced by the diffusion encoding i… ▽ More

    Submitted 16 May, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: 7 pages, 4 figures, conference

  35. arXiv:2308.16892  [pdf, other

    eess.AS cs.AI cs.SD

    ReZero: Region-customizable Sound Extraction

    Authors: Rongzhi Gu, Yi Luo

    Abstract: We introduce region-customizable sound extraction (ReZero), a general and flexible framework for the multi-channel region-wise sound extraction (R-SE) task. R-SE task aims at extracting all active target sounds (e.g., human speech) within a specific, user-defined spatial region, which is different from conventional and existing tasks where a blind separation or a fixed, predefined spatial region a… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: 13 pages, 11 figures

  36. Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression

    Authors: Hangting Chen, Jianwei Yu, Yi Luo, Rongzhi Gu, Weihua Li, Zhuocheng Lu, Chao Weng

    Abstract: Echo cancellation and noise reduction are essential for full-duplex communication, yet most existing neural networks have high computational costs and are inflexible in tuning model complexity. In this paper, we introduce time-frequency dual-path compression to achieve a wide range of compression ratios on computational cost. Specifically, for frequency compression, trainable filters are used to r… ▽ More

    Submitted 10 October, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: Proceedings of INTERSPEECH

  37. arXiv:2308.10343  [pdf, other

    eess.SY cs.NI

    Enhancing In-Situ Structural Health Monitoring through RF Energy-Powered Sensor Nodes and Mobile Platform

    Authors: Yu Luo, Lina Pu, Jun Wang, Isaac Howard

    Abstract: This research contributes to long-term structural health monitoring (SHM) by exploring radio frequency energy-powered sensor nodes (RF-SNs) embedded in concrete. Unlike traditional in-situ monitoring systems relying on batteries or wire-connected power sources, the RF-SN captures radio energy from a mobile radio transmitter for sensing and communication. This offers a cost-effective solution for c… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

  38. arXiv:2308.09360  [pdf, other

    cs.LG eess.SP

    Multi-feature concatenation and multi-classifier stacking: an interpretable and generalizable machine learning method for MDD discrimination with rsfMRI

    Authors: Yunsong Luo, Wenyu Chen, Ling Zhan, Jiang Qiu, Tao Jia

    Abstract: Major depressive disorder is a serious and heterogeneous psychiatric disorder that needs accurate diagnosis. Resting-state functional MRI (rsfMRI), which captures multiple perspectives on brain structure, function, and connectivity, is increasingly applied in the diagnosis and pathological research of mental diseases. Different machine learning algorithms are then developed to exploit the rich inf… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  39. arXiv:2308.06981  [pdf, other

    eess.AS cs.SD

    The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track

    Authors: Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji

    Abstract: This paper summarizes the cinematic demixing (CDX) track of the Sound Demixing Challenge 2023 (SDX'23). We provide a comprehensive summary of the challenge setup, detailing the structure of the competition and the datasets used. Especially, we detail CDXDB23, a new hidden dataset constructed from real movies that was used to rank the submissions. The paper also offers insights into the most succes… ▽ More

    Submitted 18 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted for Transactions of the International Society for Music Information Retrieval

  40. arXiv:2308.06979  [pdf, other

    eess.AS cs.SD

    The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track

    Authors: Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang , et al. (2 additional authors not shown)

    Abstract: This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce t… ▽ More

    Submitted 19 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: Published in Transactions of the International Society for Music Information Retrieval (https://transactions.ismir.net/articles/10.5334/tismir.171)

    Journal ref: Transactions of the International Society for Music Information Retrieval, 7(1), pp.63-84, 2024

  41. arXiv:2308.02765  [pdf

    eess.SY cs.AI

    Surrogate Empowered Sim2Real Transfer of Deep Reinforcement Learning for ORC Superheat Control

    Authors: Runze Lin, Yangyang Luo, Xialai Wu, Junghui Chen, Biao Huang, Lei Xie, Hongye Su

    Abstract: The Organic Rankine Cycle (ORC) is widely used in industrial waste heat recovery due to its simple structure and easy maintenance. However, in the context of smart manufacturing in the process industry, traditional model-based optimization control methods are unable to adapt to the varying operating conditions of the ORC system or sudden changes in operating modes. Deep reinforcement learning (DRL… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

  42. arXiv:2307.16440  [pdf, other

    cs.CV cs.LG eess.IV

    Towards Head Computed Tomography Image Reconstruction Standardization with Deep Learning Assisted Automatic Detection

    Authors: Bowen Zheng, Chenxi Huang, Yuemei Luo

    Abstract: Three-dimensional (3D) reconstruction of head Computed Tomography (CT) images elucidates the intricate spatial relationships of tissue structures, thereby assisting in accurate diagnosis. Nonetheless, securing an optimal head CT scan without deviation is challenging in clinical settings, owing to poor positioning by technicians, patient's physical constraints, or CT scanner tilt angle restrictions… ▽ More

    Submitted 15 September, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

  43. arXiv:2306.16556  [pdf, other

    eess.IV cs.CV

    Inter-Rater Uncertainty Quantification in Medical Image Segmentation via Rater-Specific Bayesian Neural Networks

    Authors: Qingqiao Hu, Hao Wang, **g Luo, Yunhao Luo, Zhiheng Zhangg, Jan S. Kirschke, Benedikt Wiestler, Bjoern Menze, Jianguo Zhang, Hongwei Bran Li

    Abstract: Automated medical image segmentation inherently involves a certain degree of uncertainty. One key factor contributing to this uncertainty is the ambiguity that can arise in determining the boundaries of a target region of interest, primarily due to variations in image appearance. On top of this, even among experts in the field, different opinions can emerge regarding the precise definition of spec… ▽ More

    Submitted 25 August, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: submitted to a journal for review

  44. arXiv:2306.14471  [pdf

    physics.med-ph eess.IV physics.ins-det physics.optics

    Single-shot 3D photoacoustic computed tomography with a densely packed array for transcranial functional imaging

    Authors: Rui Cao, Yilin Luo, **hua Xu, Xiaofei Luo, Ku Geng, Yousuf Aborahama, Manxiu Cui, Samuel Davis, Shuai Na, Xin Tong, Cindy Liu, Karteek Sastry, Konstantin Maslov, Peng Hu, Yide Zhang, Li Lin, Yang Zhang, Lihong V. Wang

    Abstract: Photoacoustic computed tomography (PACT) is emerging as a new technique for functional brain imaging, primarily due to its capabilities in label-free hemodynamic imaging. Despite its potential, the transcranial application of PACT has encountered hurdles, such as acoustic attenuations and distortions by the skull and limited light penetration through the skull. To overcome these challenges, we hav… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  45. arXiv:2306.08998  [pdf, other

    cs.SD cs.CV eess.AS

    Team AcieLee: Technical Report for EPIC-SOUNDS Audio-Based Interaction Recognition Challenge 2023

    Authors: Yuqi Li, Yizhi Luo, Xiaoshuai Hao, Chuanguang Yang, Zhulin An, Dantong Song, Wei Yi

    Abstract: In this report, we describe the technical details of our submission to the EPIC-SOUNDS Audio-Based Interaction Recognition Challenge 2023, by Team "AcieLee" (username: Yuqi\_Li). The task is to classify the audio caused by interactions between objects, or from events of the camera wearer. We conducted exhaustive experiments and found learning rate step decay, backbone frozen, label smoothing and f… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  46. arXiv:2306.04236  [pdf, other

    cs.CV eess.IV

    Flare7K++: Mixing Synthetic and Real Datasets for Nighttime Flare Removal and Beyond

    Authors: Yuekun Dai, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yihang Luo, Chen Change Loy

    Abstract: Artificial lights commonly leave strong lens flare artifacts on the images captured at night, degrading both the visual quality and performance of vision algorithms. Existing flare removal approaches mainly focus on removing daytime flares and fail in nighttime cases. Nighttime flare removal is challenging due to the unique luminance and spectrum of artificial lights, as well as the diverse patter… ▽ More

    Submitted 7 June, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: Extension of arXiv:2210.06570; Project page at https://ykdai.github.io/projects/Flare7K

  47. arXiv:2305.16445  [pdf, other

    cs.SD cs.DC eess.AS

    SoundSieve: Seconds-Long Audio Event Recognition on Intermittently-Powered Systems

    Authors: Mahathir Monjur, Yubo Luo, Zhenyu Wang, Shahriar Nirjon

    Abstract: A fundamental problem of every intermittently-powered sensing system is that signals acquired by these systems over a longer period in time are also intermittent. As a consequence, these systems fail to capture parts of a longer-duration event that spans over multiple charge-discharge cycles of the capacitor that stores the harvested energy. From an application's perspective, this is viewed as spo… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: The 21st ACM International Conference on Mobile Systems, Applications, and Services (Mobisys 2023)

  48. arXiv:2304.12322  [pdf, other

    eess.IV cs.CV cs.LG

    Medical Image Deidentification, Cleaning and Compression Using Pylogik

    Authors: Adrienne Kline, Vinesh Appadurai, Yuan Luo, Sanjiv Shah

    Abstract: Leveraging medical record information in the era of big data and machine learning comes with the caveat that data must be cleaned and de-identified. Facilitating data sharing and harmonization for multi-center collaborations are particularly difficult when protected health information (PHI) is contained or embedded in image meta-data. We propose a novel library in the Python framework, called PyLo… ▽ More

    Submitted 10 May, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: updates needed to manuscript

  49. arXiv:2304.08052  [pdf, other

    cs.SD eess.AS

    Fast Random Approximation of Multi-channel Room Impulse Response

    Authors: Yi Luo, Rongzhi Gu

    Abstract: Modern neural-network-based speech processing systems are typically required to be robust against reverberation, and the training of such systems thus needs a large amount of reverberant data. During the training of the systems, on-the-fly simulation pipeline is nowadays preferred as it allows the model to train on infinite number of data samples without pre-generating and saving them on harddisk.… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  50. arXiv:2304.06496  [pdf, other

    eess.SP cs.HC cs.LG

    EEGMatch: Learning with Incomplete Labels for Semi-Supervised EEG-based Cross-Subject Emotion Recognition

    Authors: Rushuang Zhou, Weishan Ye, Zhiguo Zhang, Yanyang Luo, Li Zhang, Linling Li, Gan Huang, Yining Dong, Yuan-Ting Zhang, Zhen Liang

    Abstract: Electroencephalography (EEG) is an objective tool for emotion recognition and shows promising performance. However, the label scarcity problem is a main challenge in this field, which limits the wide application of EEG-based emotion recognition. In this paper, we propose a novel semi-supervised learning framework (EEGMatch) to leverage both labeled and unlabeled EEG data. First, an EEG-Mixup based… ▽ More

    Submitted 27 March, 2023; originally announced April 2023.