Skip to main content

Showing 1–19 of 19 results for author: Qu, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18156  [pdf, other

    cs.LG cs.DC cs.NI eess.SP

    FedAQ: Communication-Efficient Federated Edge Learning via Joint Uplink and Downlink Adaptive Quantization

    Authors: Lin** Qu, Shenghui Song, Chi-Ying Tsui

    Abstract: Federated learning (FL) is a powerful machine learning paradigm which leverages the data as well as the computational resources of clients, while protecting clients' data privacy. However, the substantial model size and frequent aggregation between the server and clients result in significant communication overhead, making it challenging to deploy FL in resource-limited wireless networks. In this… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  2. arXiv:2406.17430  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights

    Authors: Hao Yang, Lizhen Qu, Ehsan Shareghi, Gholamreza Haffari

    Abstract: Large Multimodal Models (LMMs) have achieved great success recently, demonstrating a strong capability to understand multimodal information and to interact with human users. Despite the progress made, the challenge of detecting high-risk interactions in multimodal settings, and in particular in speech modality, remains largely unexplored. Conventional research on risk for speech modality primarily… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2405.10084  [pdf, other

    eess.AS cs.AI cs.SD

    Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation

    Authors: Manh Luong, Khai Nguyen, Nhat Ho, Reza Haf, Dinh Phung, Lizhen Qu

    Abstract: The Learning-to-match (LTM) framework proves to be an effective inverse optimal transport approach for learning the underlying ground metric between two sources of data, facilitating subsequent matching. However, the conventional LTM framework faces scalability challenges, necessitating the use of the entire dataset each time the parameters of the ground metric are updated. In adapting LTM to the… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  4. arXiv:2404.15585  [pdf, other

    cs.LG eess.IV

    Brain Storm Optimization Based Swarm Learning for Diabetic Retinopathy Image Classification

    Authors: Liang Qu, Cunze Wang, Yuhui Shi

    Abstract: The application of deep learning techniques to medical problems has garnered widespread research interest in recent years, such as applying convolutional neural networks to medical image classification tasks. However, data in the medical field is often highly private, preventing different hospitals from sharing data to train an accurate model. Federated learning, as a privacy-preserving machine le… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  5. arXiv:2401.15613  [pdf, other

    eess.IV cs.CV

    Towards Arbitrary-Scale Histopathology Image Super-resolution: An Efficient Dual-branch Framework via Implicit Self-texture Enhancement

    Authors: Minghong Duan, Linhao Qu, Zhiwei Yang, Manning Wang, Chenxi Zhang, Zhijian Song

    Abstract: High-quality whole-slide scanners are expensive, complex, and time-consuming, thus limiting the acquisition and utilization of high-resolution pathology whole-slide images in daily clinical work. Deep learning-based single-image super-resolution techniques are an effective way to solve this problem by synthesizing high-resolution images from low-resolution ones. However, the existing super-resolut… ▽ More

    Submitted 26 June, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  6. arXiv:2401.05153  [pdf, other

    cs.CV eess.IV

    CrossDiff: Exploring Self-Supervised Representation of Pansharpening via Cross-Predictive Diffusion Model

    Authors: Yinghui Xing, Litao Qu, Shizhou Zhang, Kai Zhang, Yanning Zhang

    Abstract: Fusion of a panchromatic (PAN) image and corresponding multispectral (MS) image is also known as pansharpening, which aims to combine abundant spatial details of PAN and spectral information of MS. Due to the absence of high-resolution MS images, available deep-learning-based methods usually follow the paradigm of training at reduced resolution and testing at both reduced and full resolution. When… ▽ More

    Submitted 13 January, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

  7. arXiv:2307.13953  [pdf, other

    cs.CV cs.SD eess.AS

    The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features

    Authors: Liao Qu, Xianwei Zou, Xiang Li, Yandong Wen, Rita Singh, Bhiksha Raj

    Abstract: This work unveils the enigmatic link between phonemes and facial features. Traditional studies on voice-face correlations typically involve using a long period of voice input, including generating face images from voices and reconstructing 3D face meshes from voices. However, in situations like voice-based crimes, the available voice evidence may be short and limited. Additionally, from a physiolo… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: Interspeech 2023

  8. arXiv:2304.04238  [pdf, other

    eess.IV cs.CV

    Towards Arbitrary-scale Histopathology Image Super-resolution: An Efficient Dual-branch Framework based on Implicit Self-texture Enhancement

    Authors: Linhao Qu, Minghong Duan, Zhiwei Yang, Manning Wang, Zhijian Song

    Abstract: Existing super-resolution models for pathology images can only work in fixed integer magnifications and have limited performance. Though implicit neural network-based methods have shown promising results in arbitrary-scale super-resolution of natural images, it is not effective to directly apply them in pathology images, because pathology images have special fine-grained image textures different f… ▽ More

    Submitted 9 April, 2023; originally announced April 2023.

  9. arXiv:2303.00232  [pdf, other

    eess.IV cs.CV

    Towards more precise automatic analysis: a comprehensive survey of deep learning-based multi-organ segmentation

    Authors: Xiaoyu Liu, Linhao Qu, Ziyue Xie, Jiayue Zhao, Yonghong Shi, Zhijian Song

    Abstract: Accurate segmentation of multiple organs of the head, neck, chest, and abdomen from medical images is an essential step in computer-aided diagnosis, surgical navigation, and radiation therapy. In the past few years, with a data-driven feature extraction approach and end-to-end training, automatic deep learning-based multi-organ segmentation method has far outperformed traditional methods and becom… ▽ More

    Submitted 2 March, 2023; v1 submitted 28 February, 2023; originally announced March 2023.

    Comments: 25 pages, 9 figures, 16 tabels

  10. Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition

    Authors: Leyuan Qu, Cornelius Weber, Stefan Wermter

    Abstract: Due to the dynamic nature of human language, automatic speech recognition (ASR) systems need to continuously acquire new vocabulary. Out-Of-Vocabulary (OOV) words, such as trending words and new named entities, pose problems to modern ASR systems that require long training times to adapt their large numbers of parameters. Different from most previous research focusing on language model post-proces… ▽ More

    Submitted 21 February, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: Neural Networks, Volume 161, April 2023, Pages 494-504

  11. arXiv:2212.06972  [pdf, other

    cs.SD cs.CL eess.AS

    Disentangling Prosody Representations with Unsupervised Speech Reconstruction

    Authors: Leyuan Qu, Taihao Li, Cornelius Weber, Theresa Pekarek-Rosin, Fuji Ren, Stefan Wermter

    Abstract: Human speech can be characterized by different components, including semantic content, speaker identity and prosodic information. Significant progress has been made in disentangling representations for semantic content and speaker identity in Automatic Speech Recognition (ASR) and speaker verification tasks respectively. However, it is still an open challenging research question to extract prosodi… ▽ More

    Submitted 25 September, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

  12. arXiv:2211.11176  [pdf, other

    cs.LG cs.AI eess.SP

    Modeling Multivariate Biosignals With Graph Neural Networks and Structured State Space Models

    Authors: Siyi Tang, Jared A. Dunnmon, Liangqiong Qu, Khaled K. Saab, Tina Baykaner, Christopher Lee-Messer, Daniel L. Rubin

    Abstract: Multivariate biosignals are prevalent in many medical domains, such as electroencephalography, polysomnography, and electrocardiography. Modeling spatiotemporal dependencies in multivariate biosignals is challenging due to (1) long-range temporal dependencies and (2) complex spatial correlations between the electrodes. To address these challenges, we propose representing multivariate biosignals as… ▽ More

    Submitted 29 April, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: Published as a conference paper at CHIL 2023

  13. arXiv:2211.08843  [pdf, other

    cs.SD cs.AI eess.AS

    Improving Speech Emotion Recognition with Unsupervised Speaking Style Transfer

    Authors: Leyuan Qu, Wei Wang, Cornelius Weber, Pengcheng Yue, Taihao Li, Stefan Wermter

    Abstract: Humans can effortlessly modify various prosodic attributes, such as the placement of stress and the intensity of sentiment, to convey a specific emotion while maintaining consistent linguistic content. Motivated by this capability, we propose EmoAug, a novel style transfer model designed to enhance emotional expression and tackle the data scarcity issue in speech emotion recognition tasks. EmoAug… ▽ More

    Submitted 28 December, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Accepted by ICASSP2024

  14. arXiv:2205.04044   

    eess.IV cs.CV cs.LG

    Masked Co-attentional Transformer reconstructs 100x ultra-fast/low-dose whole-body PET from longitudinal images and anatomically guided MRI

    Authors: Yan-Ran, Wang, Liangqiong Qu, Natasha Diba Sheybani, Xiaolong Luo, Jiangshan Wang, Kristina Elizabeth Hawk, Ashok Joseph Theruvath, Sergios Gatidis, Xuerong Xiao, Allison Pribnow, Daniel Rubin, Heike E. Daldrup-Link

    Abstract: Despite its tremendous value for the diagnosis, treatment monitoring and surveillance of children with cancer, whole body staging with positron emission tomography (PET) is time consuming and associated with considerable radiation exposure. 100x (1% of the standard clinical dosage) ultra-low-dose/ultra-fast whole-body PET reconstruction has the potential for cancer imaging with unprecedented speed… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: This submission has been removed by arXiv administrators because the submitter did not have the right to assign the license at the time of submission

  15. arXiv:2112.04748  [pdf, other

    cs.SD cs.AI eess.AS

    LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading

    Authors: Leyuan Qu, Cornelius Weber, Stefan Wermter

    Abstract: The aim of this work is to investigate the impact of crossmodal self-supervised pre-training for speech reconstruction (video-to-audio) by leveraging the natural co-occurrence of audio and visual streams in videos. We propose LipSound2 which consists of an encoder-decoder architecture and location-aware attention mechanism to map face image sequences to mel-scale spectrograms directly without requ… ▽ More

    Submitted 12 September, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

    Comments: ACCEPTED IN IEEE Transactions on Neural Networks and Learning Systems

  16. arXiv:2110.04604  [pdf, other

    eess.IV cs.CV cs.LG

    Learning MRI Artifact Removal With Unpaired Data

    Authors: Siyuan Liu, Kim-Han Thung, Liangqiong Qu, Weili Lin, Dinggang Shen, Pew-Thian Yap

    Abstract: Retrospective artifact correction (RAC) improves image quality post acquisition and enhances image usability. Recent machine learning driven techniques for RAC are predominantly based on supervised learning and therefore practical utility can be limited as data with paired artifact-free and artifact-corrupted images are typically insufficient or even non-existent. Here we show that unwanted image… ▽ More

    Submitted 9 October, 2021; originally announced October 2021.

  17. arXiv:2107.02375  [pdf, other

    cs.LG eess.IV

    SplitAVG: A heterogeneity-aware federated deep learning method for medical imaging

    Authors: Miao Zhang, Liangqiong Qu, Praveer Singh, Jayashree Kalpathy-Cramer, Daniel L. Rubin

    Abstract: Federated learning is an emerging research paradigm for enabling collaboratively training deep learning models without sharing patient data. However, the data from different institutions are usually heterogeneous across institutions, which may reduce the performance of models trained using federated learning. In this study, we propose a novel heterogeneity-aware federated learning method, SplitAVG… ▽ More

    Submitted 10 April, 2022; v1 submitted 5 July, 2021; originally announced July 2021.

  18. Federated Learning for Breast Density Classification: A Real-World Implementation

    Authors: Holger R. Roth, Ken Chang, Praveer Singh, Nir Neumark, Wenqi Li, Vikash Gupta, Sharut Gupta, Liangqiong Qu, Alvin Ihsani, Bernardo C. Bizzo, Yuhong Wen, Varun Buch, Meesam Shah, Felipe Kitamura, Matheus Mendonça, Vitor Lavor, Ahmed Harouni, Colin Compas, Jesse Tetreault, Prerna Dogra, Yan Cheng, Selnur Erdal, Richard White, Behrooz Hashemian, Thomas Schultz , et al. (18 additional authors not shown)

    Abstract: Building robust deep learning-based models requires large quantities of diverse training data. In this study, we investigate the use of federated learning (FL) to build medical imaging classification models in a real-world collaborative setting. Seven clinical institutions from across the world joined this FL effort to train a model for breast density classification based on Breast Imaging, Report… ▽ More

    Submitted 20 October, 2020; v1 submitted 3 September, 2020; originally announced September 2020.

    Comments: Accepted at the 1st MICCAI Workshop on "Distributed And Collaborative Learning"; add citation to Fig. 1 & 2 and update Fig. 5; fix typo in affiliations

    Journal ref: In: Albarqouni S. et al. (eds) Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning. DART 2020, DCL 2020. Lecture Notes in Computer Science, vol 12444. Springer, Cham

  19. arXiv:2005.08335  [pdf, other

    eess.AS cs.SD

    Multimodal Target Speech Separation with Voice and Face References

    Authors: Leyuan Qu, Cornelius Weber, Stefan Wermter

    Abstract: Target speech separation refers to isolating target speech from a multi-speaker mixture signal by conditioning on auxiliary information about the target speaker. Different from the mainstream audio-visual approaches which usually require simultaneous visual streams as additional input, e.g. the corresponding lip movement sequences, in our approach we propose the novel use of a single face profile… ▽ More

    Submitted 17 May, 2020; originally announced May 2020.