Skip to main content

Showing 1–30 of 30 results for author: Xia, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  2. arXiv:2404.15353  [pdf, other

    eess.SP cs.AI cs.LG

    SQUWA: Signal Quality Aware DNN Architecture for Enhanced Accuracy in Atrial Fibrillation Detection from Noisy PPG Signals

    Authors: Runze Yan, Cheng Ding, Ran Xiao, Aleksandr Fedorov, Randall J Lee, Fadi Nahab, Xiao Hu

    Abstract: Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambu… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 15 pages; 9 figures; 2024 Conference on Health, Inference, and Learning (CHIL)

  3. arXiv:2404.11889  [pdf, other

    eess.IV cs.CV

    Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans

    Authors: Lixing Tan, Shuang Song, Kangneng Zhou, Chengbo Duan, Lanying Wang, Huayang Ren, Linlin Liu, Wei Zhang, Ruoxiu Xiao

    Abstract: X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume d… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 13 pages, 10 figures

  4. arXiv:2312.02300  [pdf

    cs.LG eess.SP

    Reconsideration on evaluation of machine learning models in continuous monitoring using wearables

    Authors: Cheng Ding, Zhicheng Guo, Cynthia Rudin, Ran Xiao, Fadi B Nahab, Xiao Hu

    Abstract: This paper explores the challenges in evaluating machine learning (ML) models for continuous health monitoring using wearable devices beyond conventional metrics. We state the complexities posed by real-world variability, disease dynamics, user-specific characteristics, and the prevalence of false notifications, necessitating novel evaluation strategies. Drawing insights from large-scale heart stu… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  5. arXiv:2311.18399  [pdf, other

    eess.AS cs.SD

    Audio Prompt Tuning for Universal Sound Separation

    Authors: Yuzhuo Liu, Xubo Liu, Yan Zhao, Yuanyuan Wang, Rui Xia, **chuan Tain, Yuxuan Wang

    Abstract: Universal sound separation (USS) is a task to separate arbitrary sounds from an audio mixture. Existing USS systems are capable of separating arbitrary sources, given a few examples of the target sources as queries. However, separating arbitrary sounds with a single system is challenging, and the robustness is not always guaranteed. In this work, we propose audio prompt tuning (APT), a simple yet… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  6. arXiv:2310.14155  [pdf

    eess.SP

    Photoplethysmography based atrial fibrillation detection: an updated review from July 2019

    Authors: Cheng Ding, Ran Xiao, Weijia Wang, Elizabeth Holdsworth, Xiao Hu

    Abstract: Atrial fibrillation (AF) is a prevalent cardiac arrhythmia associated with significant health ramifications, including an elevated susceptibility to ischemic stroke, heart disease, and heightened mortality. Photoplethysmography (PPG) has emerged as a promising technology for continuous AF monitoring for its cost-effectiveness and widespread integration into wearable devices. Our team previously co… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

  7. arXiv:2308.08345  [pdf, other

    eess.IV cs.CV

    GAEI-UNet: Global Attention and Elastic Interaction U-Net for Vessel Image Segmentation

    Authors: Ruiqiang Xiao, Zhuoyue Wan

    Abstract: Vessel image segmentation plays a pivotal role in medical diagnostics, aiding in the early detection and treatment of vascular diseases. While segmentation based on deep learning has shown promising results, effectively segmenting small structures and maintaining connectivity between them remains challenging. To address these limitations, we propose GAEI-UNet, a novel model that combines global at… ▽ More

    Submitted 22 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: arXiv admin note: text overlap with arXiv:2004.03696 by other authors

  8. arXiv:2308.05037  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    Separate Anything You Describe

    Authors: Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang

    Abstract: Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA). LASS aims to separate a target sound from an audio mixture given a natural language query, which provides a natural and scalable interface for digital audio applications. Recent works on LASS, despite attaining promising separation performance on specific sources (e.g., musical instr… ▽ More

    Submitted 27 October, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: Code, benchmark and pre-trained models: https://github.com/Audio-AGI/AudioSep

  9. arXiv:2305.15719  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Efficient Neural Music Generation

    Authors: Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yu** Wang, Yuxuan Wang

    Abstract: Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet, sampling with the MusicLM requires processing through these LMs one by one to obtain the fine-grained acoustic tokens, making it computationally expensive and prohibitive for a real… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  10. arXiv:2305.11576  [pdf, other

    eess.AS cs.CL cs.SD

    Language-universal phonetic encoder for low-resource speech recognition

    Authors: Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

    Abstract: Multilingual training is effective in improving low-resource ASR, which may partially be explained by phonetic representation sharing between languages. In end-to-end (E2E) ASR systems, graphemes are often used as basic modeling units, however graphemes may not be ideal for multilingual phonetic sharing. In this paper, we leverage International Phonetic Alphabet (IPA) based language-universal phon… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted for publication in INTERSPEECH 2023

  11. arXiv:2305.11569  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition

    Authors: Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

    Abstract: We improve low-resource ASR by integrating the ideas of multilingual training and self-supervised learning. Concretely, we leverage an International Phonetic Alphabet (IPA) multilingual model to create frame-level pseudo labels for unlabeled speech, and use these pseudo labels to guide hidden-unit BERT (HuBERT) based speech pretraining in a phonetically-informed manner. The experiments on the Mult… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted for publication in INTERSPEECH 2023

  12. arXiv:2301.00066  [pdf, other

    cs.CL eess.AS

    Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition

    Authors: Yukun Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

    Abstract: Recent studies have shown that using an external Language Model (LM) benefits the end-to-end Automatic Speech Recognition (ASR). However, predicting tokens that appear less frequently in the training set is still quite challenging. The long-tail prediction problems have been widely studied in many applications, but only been addressed by a few studies for ASR and LMs. In this paper, we propose a n… ▽ More

    Submitted 30 December, 2022; originally announced January 2023.

    Comments: Submitted to ICASSP 2023

  13. arXiv:2211.08146  [pdf, other

    eess.IV cs.CV cs.LG

    Encoding feature supervised UNet++: Redesigning Supervision for liver and tumor segmentation

    Authors: Jiahao Cui, Ruoxin Xiao, Shiyuan Fang, Minnan Pei, Yixuan Yu

    Abstract: Liver tumor segmentation in CT images is a critical step in the diagnosis, surgical planning and postoperative evaluation of liver disease. An automatic liver and tumor segmentation method can greatly relieve physicians of the heavy workload of examining CT images and better improve the accuracy of diagnosis. In the last few decades, many modifications based on U-Net model have been proposed in th… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  14. arXiv:2211.03333  [pdf

    eess.SP

    Learning From Alarms: A Robust Learning Approach for Accurate Photoplethysmography-Based Atrial Fibrillation Detection using Eight Million Samples Labeled with Imprecise Arrhythmia Alarms

    Authors: Cheng Ding, Zhicheng Guo, Cynthia Rudin, Ran Xiao, Amit Shah, Duc H. Do, Randall J Lee, Gari Clifford, Fadi B Nahab, Xiao Hu

    Abstract: Atrial fibrillation (AF) is a common cardiac arrhythmia with serious health consequences if not detected and treated early. Detecting AF using wearable devices with photoplethysmography (PPG) sensors and deep neural networks has demonstrated some success using proprietary algorithms in commercial solutions. However, further advancement of this paradigm of continuous AF detection in ambulatory sett… ▽ More

    Submitted 12 November, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

  15. Degradation-invariant Enhancement of Fundus Images via Pyramid Constraint Network

    Authors: Haofeng Liu, Heng Li, Huazhu Fu, Ruoxiu Xiao, Yunshu Gao, Yan Hu, Jiang Liu

    Abstract: As an economical and efficient fundus imaging modality, retinal fundus images have been widely adopted in clinical fundus examination. Unfortunately, fundus images often suffer from quality degradation caused by imaging interferences, leading to misdiagnosis. Despite impressive enhancement performances that state-of-the-art methods have achieved, challenges remain in clinical scenarios. For boosti… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Journal ref: International Conference on Medical Image Computing and Computer-Assisted Intervention. (2022) 507-516

  16. Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion

    Authors: Yuxiang Zhang, **gze Lu, Xingming Wang, Zhuo Li, Runqiu Xiao, Wenchao Wang, Ming Li, Pengyuan Zhang

    Abstract: This paper describes the deepfake audio detection system submitted to the Audio Deep Synthesis Detection (ADD) Challenge Track 3.2 and gives an analysis of score fusion. The proposed system is a score-level fusion of several light convolutional neural network (LCNN) based models. Various front-ends are used as input features, including low-frequency short-time Fourier transform and Constant Q tran… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted by ACM Multimedia 2022 Workshop: First International Workshop on Deepfake Detection for Audio Multimedia

  17. arXiv:2207.04676  [pdf, other

    cs.SD eess.AS

    The HCCL System for the NIST SRE21

    Authors: Zhuo Li, Runqiu Xiao, Hangting Chen, Zhenduo Zhao, Zihan Zhang, Wenchao Wang

    Abstract: This paper describes the systems developed by the HCCL team for the NIST 2021 speaker recognition evaluation (NIST SRE21).We first explore various state-of-the-art speaker embedding extractors combined with a novel circle loss to obtain discriminative deep speaker embeddings. Considering that cross-channel and cross-linguistic speaker recognition are the key challenges of SRE21, we introduce sever… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: accepted by interspeech 2022

  18. arXiv:2204.11403  [pdf, other

    cs.SD eess.AS

    Back-ends Selection for Deep Speaker Embeddings

    Authors: Zhuo Li, Runqiu Xiao, Zihan Zhang, Zhenduo Zhao, Wenchao Wang, Pengyuan Zhang

    Abstract: Probabilistic Linear Discriminant Analysis (PLDA) was the dominant and necessary back-end for early speaker recognition approaches, like i-vector and x-vector. However, with the development of neural networks and margin-based loss functions, we can obtain deep speaker embeddings (DSEs), which have advantages of increased inter-class separation and smaller intra-class distances. In this case, PLDA… ▽ More

    Submitted 24 April, 2022; originally announced April 2022.

    Comments: submitted to interspeech2022

  19. arXiv:2203.09722  [pdf, other

    cs.SD eess.AS

    DGC-vector: A new speaker embedding for zero-shot voice conversion

    Authors: Ruitong Xiao, Haitong Zhang, Yue Lin

    Abstract: Recently, more and more zero-shot voice conversion algorithms have been proposed. As a fundamental part of zero-shot voice conversion, speaker embeddings are the key to improving the converted speech's speaker similarity. In this paper, we study the impact of speaker embeddings on zero-shot voice conversion performance. To better represent the characteristics of the target speaker and improve the… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing

  20. arXiv:2110.03347  [pdf, ps, other

    eess.AS cs.HC cs.SD

    Cloning one's voice using very limited data in the wild

    Authors: Dongyang Dai, Yuanzhe Chen, Li Chen, Ming Tu, Lu Liu, Rui Xia, Qiao Tian, Yu** Wang, Yuxuan Wang

    Abstract: With the increasing popularity of speech synthesis products, the industry has put forward more requirements for personalized speech synthesis: (1) How to use low-resource, easily accessible data to clone a person's voice. (2) How to clone a person's voice while controlling the style and prosody. To solve the above two problems, we proposed the Hieratron model framework in which the prosody and tim… ▽ More

    Submitted 8 October, 2021; v1 submitted 7 October, 2021; originally announced October 2021.

  21. arXiv:2109.02047  [pdf, other

    cs.SD eess.AS

    The ByteDance Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2021

    Authors: Keke Wang, Xudong Mao, Hao Wu, Chen Ding, Chuxiang Shang, Rui Xia, Yuxuan Wang

    Abstract: This paper describes the ByteDance speaker diarization system for the fourth track of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). The VoxSRC-21 provides both the dev set and test set of VoxConverse for use in validation and a standalone test set for evaluation. We first collect the duration and signal-to-noise ratio (SNR) of all audio and find that the distribution of the VoxConve… ▽ More

    Submitted 5 September, 2021; originally announced September 2021.

  22. arXiv:2108.05272  [pdf, other

    eess.SP

    Log-Spectral Matching GAN: PPG-based Atrial Fibrillation Detection can be Enhanced by GAN-based Data Augmentation with Integration of Spectral Loss

    Authors: Cheng Ding, Ran Xiao, Duc Do, David Scott Lee, Shadi Kalantarian, Randall J Lee, Xiao Hu

    Abstract: Photoplethysmography (PPG) is a ubiquitous physiological measurement that detects beat-to-beat pulsatile blood volume changes and hence has a potential for monitoring cardiovascular conditions, particularly in ambulatory settings. A PPG dataset that is created for a particular use case is often imbalanced, due to a low prevalence of the pathological condition it targets to predict and the paroxysm… ▽ More

    Submitted 31 January, 2022; v1 submitted 11 August, 2021; originally announced August 2021.

  23. arXiv:2107.01329  [pdf, other

    cs.SD eess.AS

    The HCCL Speaker Verification System for Far-Field Speaker Verification Challenge

    Authors: Zhuo Li, Ce Fang, Runqiu Xiao, Zhigao Chen, Wenchao Wang, Yonghong Yan

    Abstract: This paper describes the systems submitted by team HCCL to the Far-Field Speaker Verification Challenge. Our previous work in the AIshell Speaker Verification Challenge 2019 shows that the powerful modeling abilities of Neural Network architectures can provide exceptional performance for this kind of task. Therefore, in this challenge, we focus on constructing deep Neural Network architectures bas… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  24. arXiv:2106.08004  [pdf, other

    cs.SD cs.CL eess.AS

    Adaptive Margin Circle Loss for Speaker Verification

    Authors: Runqiu Xiao

    Abstract: Deep-Neural-Network (DNN) based speaker verification sys-tems use the angular softmax loss with margin penalties toenhance the intra-class compactness of speaker embeddings,which achieved remarkable performance. In this paper, we pro-pose a novel angular loss function called adaptive margin cir-cle loss for speaker verification. The stage-based margin andchunk-based margin are applied to improve t… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: Accepted by Interspeech 2021

  25. arXiv:2102.09971  [pdf, other

    cs.SD eess.AS

    Speech enhancement with weakly labelled data from AudioSet

    Authors: Qiuqiang Kong, Haohe Liu, Xingjian Du, Li Chen, Rui Xia, Yuxuan Wang

    Abstract: Speech enhancement is a task to improve the intelligibility and perceptual quality of degraded speech signal. Recently, neural networks based methods have been applied to speech enhancement. However, many neural network based methods require noisy and clean speech pairs for training. We propose a speech enhancement framework that can be trained with large-scale weakly labelled AudioSet dataset. We… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

    Comments: 5 pages

  26. arXiv:2005.12531  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement

    Authors: Dongyang Dai, Li Chen, Yu** Wang, Mu Wang, Rui Xia, Xuchen Song, Zhiyong Wu, Yuxuan Wang

    Abstract: With the popularity of deep neural network, speech synthesis task has achieved significant improvements based on the end-to-end encoder-decoder framework in the recent days. More and more applications relying on speech synthesis technology have been widely used in our daily life. Robust speech synthesis model depends on high quality and customized data which needs lots of collecting efforts. It is… ▽ More

    Submitted 22 October, 2020; v1 submitted 26 May, 2020; originally announced May 2020.

  27. arXiv:2003.07004  [pdf, ps, other

    eess.SP cs.LG

    A Generative Learning Approach for Spatio-temporal Modeling in Connected Vehicular Network

    Authors: Rong Xia, Yong Xiao, Yingyu Li, Marwan Krunz, Dusit Niyato

    Abstract: Spatio-temporal modeling of wireless access latency is of great importance for connected-vehicular systems. The quality of the molded results rely heavily on the number and quality of samples which can vary significantly due to the sensor deployment density as well as traffic volume and density. This paper proposes LaMI (Latency Model Inpainting), a novel framework to generate a comprehensive spat… ▽ More

    Submitted 15 March, 2020; originally announced March 2020.

    Comments: 6 pages, 8 figures. Accepted at IEEE International Conference on Communications (ICC), Dublin, Ireland, June 2020

  28. arXiv:1911.02521  [pdf

    eess.IV cs.CV cs.LG

    Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications

    Authors: Hyunseok Seo, Masoud Badiei Khuzani, Varun Vasudevan, Charles Huang, Hongyi Ren, Ruoxiu Xiao, Xiao Jia, Lei Xing

    Abstract: In recent years, significant progress has been made in develo** more accurate and efficient machine learning algorithms for segmentation of medical and natural images. In this review article, we highlight the imperative role of machine learning algorithms in enabling efficient and accurate segmentation in the field of medical imaging. We specifically focus on several key studies pertaining to th… ▽ More

    Submitted 6 November, 2019; originally announced November 2019.

    Comments: Accept for publication at Medical Physics

  29. arXiv:1911.00140  [pdf

    eess.IV cs.CV cs.LG

    Modified U-Net (mU-Net) with Incorporation of Object-Dependent High Level Features for Improved Liver and Liver-Tumor Segmentation in CT Images

    Authors: Hyunseok Seo, Charles Huang, Maxime Bassenne, Ruoxiu Xiao, Lei Xing

    Abstract: Segmentation of livers and liver tumors is one of the most important steps in radiation therapy of hepatocellular carcinoma. The segmentation task is often done manually, making it tedious, labor intensive, and subject to intra-/inter- operator variations. While various algorithms for delineating organ-at-risks (OARs) and tumor targets have been proposed, automatic segmentation of livers and liver… ▽ More

    Submitted 31 October, 2019; originally announced November 2019.

    Comments: Accept for publication at IEEE Transactions on Medical Imaging

  30. arXiv:1903.06500  [pdf, other

    cs.LG eess.SP stat.ML

    A Ranking Model Motivated by Nonnegative Matrix Factorization with Applications to Tennis Tournaments

    Authors: Rui Xia, Vincent Y. F. Tan, Louis Filstroff, Cédric Févotte

    Abstract: We propose a novel ranking model that combines the Bradley-Terry-Luce probability model with a nonnegative matrix factorization framework to model and uncover the presence of latent variables that influence the performance of top tennis players. We derive an efficient, provably convergent, and numerically stable majorization-minimization-based algorithm to maximize the likelihood of datasets under… ▽ More

    Submitted 12 June, 2019; v1 submitted 15 March, 2019; originally announced March 2019.

    Comments: 16 pages, 2 figures, 9 tables. Accepted and to be presented at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD) 2019. Supplementary material, code and datasets can be found in this URL https://github.com/XiaRui1996/btl-nmf