Search | arXiv e-print repository

Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

Authors: Yuwei Zhang, Tong Xia, **g Han, Yu Wu, Georgios Rizos, Yang Liu, Mohammed Mosuily, Jagmohan Chauhan, Cecilia Mascolo

Abstract: Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide range of healthcare applications, yet is currently under-explored. The main problem for those applications arises from the difficulty in collecting large labeled task-specific data for model development. Generalizable respiratory acoustic foundation models pretrained with unlabeled data would offer appealing… ▽ More Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide range of healthcare applications, yet is currently under-explored. The main problem for those applications arises from the difficulty in collecting large labeled task-specific data for model development. Generalizable respiratory acoustic foundation models pretrained with unlabeled data would offer appealing advantages and possibly unlock this impasse. However, given the safety-critical nature of healthcare applications, it is pivotal to also ensure openness and replicability for any proposed foundation model solution. To this end, we introduce OPERA, an OPEn Respiratory Acoustic foundation model pretraining and benchmarking system, as the first approach answering this need. We curate large-scale respiratory audio datasets (~136K samples, 440 hours), pretrain three pioneering foundation models, and build a benchmark consisting of 19 downstream respiratory health tasks for evaluation. Our pretrained models demonstrate superior performance (against existing acoustic models pretrained with general audio on 16 out of 19 tasks) and generalizability (to unseen datasets and new respiratory audio modalities). This highlights the great promise of respiratory acoustic foundation models and encourages more studies using OPERA as an open resource to accelerate research on respiratory audio for health. The system is accessible from https://github.com/evelyn0414/OPERA. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.15846 [pdf, other]

Revisiting Interpolation Augmentation for Speech-to-Text Generation

Authors: Chen Xu, Jie Wang, Xiaoqian Liu, Qianqian Dong, Chunliang Zhang, Tong Xiao, **gbo Zhu, Dapeng Man, Wu Yang

Abstract: Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under… ▽ More Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under-explored. In this paper, we delve into the utility of interpolation augmentation, guided by several pivotal questions. Our findings reveal that employing an appropriate strategy in interpolation augmentation significantly enhances performance across diverse tasks, architectures, and data scales, offering a promising avenue for more robust S2T systems in resource-constrained settings. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: ACL 2024 Findings

arXiv:2406.00497 [pdf, ps, other]

Recent Advances in End-to-End Simultaneous Speech Translation

Authors: Xiaoqian Liu, Guoqiang Hu, Yangfan Du, Erfeng He, YingFeng Luo, Chen Xu, Tong Xiao, **gbo Zhu

Abstract: Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles.… ▽ More Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles. Secondly, satisfying real-time requirements presents inherent difficulties due to the need for immediate translation output. Thirdly, striking a balance between translation quality and latency constraints remains a critical challenge. Finally, the scarcity of annotated data adds another layer of complexity to the task. Through our exploration of these challenges and the proposed solutions, we aim to provide valuable insights into the current landscape of SimulST research and suggest promising directions for future exploration. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.12609 [pdf, other]

Mamba in Speech: Towards an Alternative to Self-Attention

Authors: Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps

Abstract: Transformer and its derivatives have achieved success in diverse tasks across computer vision, natural language processing, and speech processing. To reduce the complexity of computations within the multi-head self-attention mechanism in Transformer, Selective State Space Models (i.e., Mamba) were proposed as an alternative. Mamba exhibited its effectiveness in natural language processing and comp… ▽ More Transformer and its derivatives have achieved success in diverse tasks across computer vision, natural language processing, and speech processing. To reduce the complexity of computations within the multi-head self-attention mechanism in Transformer, Selective State Space Models (i.e., Mamba) were proposed as an alternative. Mamba exhibited its effectiveness in natural language processing and computer vision tasks, but its superiority has rarely been investigated in speech signal processing. This paper explores solutions for applying Mamba to speech processing using two typical speech processing tasks: speech recognition, which requires semantic and sequential information, and speech enhancement, which focuses primarily on sequential patterns. The experimental results exhibit the superiority of bidirectional Mamba (BiMamba) for speech processing to vanilla Mamba. Moreover, experiments demonstrate the effectiveness of BiMamba as an alternative to the self-attention module in Transformer and its derivates, particularly for the semantic-aware task. The crucial technologies for transferring Mamba to speech are then summarized in ablation studies and the discussion section to offer insights for future research. △ Less

Submitted 30 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.08800 [pdf]

Estimation of Participation Factors for Power System Oscillation from Measurements

Authors: Tianwei Xia, Zhe Yu, Kai Sun, Di Shi, Kaiyang Huang

Abstract: In a power system, when the participation factors of generators are computed to rank their participations into an oscillatory mode, a model-based approach is conventionally used on the linearized system model by means of the corresponding right and left eigenvectors. This paper proposes a new approach for estimating participation factors directly from measurement data on generator responses under… ▽ More In a power system, when the participation factors of generators are computed to rank their participations into an oscillatory mode, a model-based approach is conventionally used on the linearized system model by means of the corresponding right and left eigenvectors. This paper proposes a new approach for estimating participation factors directly from measurement data on generator responses under selected disturbances. The approach computes extended participation factors that coincide with accurate model-based participation factors when the measured responses satisfy an ideally symmetric condition. This paper relaxes this symmetric condition with the original measurement space by identifying and utilizing a coordinate transformation to a new space optimally recovering the symmetry. Thus, the optimal estimates of participation factors solely from measurements are achieved, and the accuracy and influencing factors are discussed. The proposed approach is first demonstrated in detail on a two-area system and then tested on an NPCC 48-machine power system. The penetration of inverter-based resources is also considered. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2403.07156 [pdf]

On Uniqueness of Participation Factors

Authors: Tianwei Xia, Kai Sun

Abstract: In modal analysis and control of a nonlinear dynamical system, participation factors of state variables with respect to a mode of interest serve as pivotal tools for stability studies. Linear participation factors are uniquely determined by the mode's shape and composition, which are defined by the right and left eigenvectors of the linearized model. For nonlinear participation factors as well as… ▽ More In modal analysis and control of a nonlinear dynamical system, participation factors of state variables with respect to a mode of interest serve as pivotal tools for stability studies. Linear participation factors are uniquely determined by the mode's shape and composition, which are defined by the right and left eigenvectors of the linearized model. For nonlinear participation factors as well as five other variants of participation factors, this paper finds the sufficient conditions for them to be unique against scaling factors on the shape and composition of a mode. Besides, the similarity between the scaling factor and perturbation amplitude is also discussed. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2401.09674 [pdf, ps, other]

QoS-Aware 3D Coverage Deployment of UAVs for Internet of Vehicles in Intelligent Transportation

Authors: engfei Du, Tingyue Xiao, Haotong Cao, Daosen Zhai

Abstract: It is a challenging problem to characterize the air-to-ground (A2G) channel and identify the best deployment location for 3D UAVs with the QoS awareness. To address this problem, we propose a QoS-aware UAV 3D coverage deployment algorithm, which simulates the three-dimensional urban road scenario, considers the UAV communication resource capacity and vehicle communication QoS requirements comprehe… ▽ More It is a challenging problem to characterize the air-to-ground (A2G) channel and identify the best deployment location for 3D UAVs with the QoS awareness. To address this problem, we propose a QoS-aware UAV 3D coverage deployment algorithm, which simulates the three-dimensional urban road scenario, considers the UAV communication resource capacity and vehicle communication QoS requirements comprehensively, and then obtains the optimal UAV deployment position by improving the genetic algorithm. Specifically, the K-means clustering algorithm is used to cluster the vehicles, and the center locations of these clusters serve as the initial UAV positions to generate the initial population. Subsequently, we employ the K-means initialized grey wolf optimization (KIGWO) algorithm to achieve the UAV location with an optimal fitness value by performing an optimal search within the grey wolf population. To enhance the algorithm's diversity and global search capability, we randomly substitute this optimal location with one of the individual locations from the initial population. The fitness value is determined by the total number of vehicles covered by UAVs in the system, while the allocation scheme's feasibility is evaluated based on the corresponding QoS requirements. Competitive selection operations are conducted to retain individuals with higher fitness values, while crossover and mutation operations are employed to maintain the diversity of solutions. Finally, the individual with the highest fitness, which represents the UAV deployment position that covers the maximum number of vehicles in the entire system, is selected as the optimal solution. Extensive experimental results demonstrate that the proposed algorithm can effectively enhance the reliability and vehicle communication QoS. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.07681 [pdf, other]

Effect of target signals and delays on spatially selective active noise control for open-fitting hearables

Authors: Tong Xiao, Simon Doclo

Abstract: Spatially selective active noise control (ANC) hearables are designed to reduce unwanted noise from certain directions while preserving desired sounds from other directions. In previous studies, the target signal has been defined either as the delayed desired component in one of the reference microphone signals or as the desired component in the error microphone signal without any delay. In this p… ▽ More Spatially selective active noise control (ANC) hearables are designed to reduce unwanted noise from certain directions while preserving desired sounds from other directions. In previous studies, the target signal has been defined either as the delayed desired component in one of the reference microphone signals or as the desired component in the error microphone signal without any delay. In this paper, we systematically investigate the influence of delays in different target signals on the ANC performance and provide an intuitive explanation for how the system obtains the desired signal. Simulations were conducted on a pair of open-fitting hearables for localized speech and noise sources in an anechoic environment. The performance was assessed in terms of noise reduction, signal quality and control effort. Results indicate that optimal performance is achieved without delays when the target signal is defined at the error microphone, whereas causality necessitates delays when the target signal is defined at the reference microphone. The optimal delay is found to be the acoustic delay between this reference microphone and the error microphone from the desired source. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: ICASSP 2024 (c) 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2312.10952 [pdf, other]

Soft Alignment of Modality Space for End-to-end Speech Translation

Authors: Yuhao Zhang, Kaiqi Kou, Bei Li, Chen Xu, Chunliang Zhang, Tong Xiao, **gbo Zhu

Abstract: End-to-end Speech Translation (ST) aims to convert speech into target text within a unified model. The inherent differences between speech and text modalities often impede effective cross-modal and cross-lingual transfer. Existing methods typically employ hard alignment (H-Align) of individual speech and text segments, which can degrade textual representations. To address this, we introduce Soft A… ▽ More End-to-end Speech Translation (ST) aims to convert speech into target text within a unified model. The inherent differences between speech and text modalities often impede effective cross-modal and cross-lingual transfer. Existing methods typically employ hard alignment (H-Align) of individual speech and text segments, which can degrade textual representations. To address this, we introduce Soft Alignment (S-Align), using adversarial training to align the representation spaces of both modalities. S-Align creates a modality-invariant space while preserving individual modality quality. Experiments on three languages from the MuST-C dataset show S-Align outperforms H-Align across multiple tasks and offers translation capabilities on par with specialized translation models. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: Accepted to ICASSP2024

arXiv:2312.06050 [pdf, other]

Federated Multilinear Principal Component Analysis with Applications in Prognostics

Authors: Chengyu Zhou, Yuqi Su, Tangbin Xia, Xiaolei Fang

Abstract: Multilinear Principal Component Analysis (MPCA) is a widely utilized method for the dimension reduction of tensor data. However, the integration of MPCA into federated learning remains unexplored in existing research. To tackle this gap, this article proposes a Federated Multilinear Principal Component Analysis (FMPCA) method, which enables multiple users to collaboratively reduce the dimension of… ▽ More Multilinear Principal Component Analysis (MPCA) is a widely utilized method for the dimension reduction of tensor data. However, the integration of MPCA into federated learning remains unexplored in existing research. To tackle this gap, this article proposes a Federated Multilinear Principal Component Analysis (FMPCA) method, which enables multiple users to collaboratively reduce the dimension of their tensor data while kee** each user's data local and confidential. The proposed FMPCA method is guaranteed to have the same performance as traditional MPCA. An application of the proposed FMPCA in industrial prognostics is also demonstrated. Simulated data and a real-world data set are used to validate the performance of the proposed method. △ Less

Submitted 28 April, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

arXiv:2312.00082 [pdf, other]

A Compact Implicit Neural Representation for Efficient Storage of Massive 4D Functional Magnetic Resonance Imaging

Authors: Ruoran Li, Runzhao Yang, Wenxin Xiang, Yuxiao Cheng, Tingxiong Xiao, **li Suo

Abstract: Functional Magnetic Resonance Imaging (fMRI) data is a widely used kind of four-dimensional biomedical data, which requires effective compression. However, fMRI compressing poses unique challenges due to its intricate temporal dynamics, low signal-to-noise ratio, and complicated underlying redundancies. This paper reports a novel compression paradigm specifically tailored for fMRI data based on Im… ▽ More Functional Magnetic Resonance Imaging (fMRI) data is a widely used kind of four-dimensional biomedical data, which requires effective compression. However, fMRI compressing poses unique challenges due to its intricate temporal dynamics, low signal-to-noise ratio, and complicated underlying redundancies. This paper reports a novel compression paradigm specifically tailored for fMRI data based on Implicit Neural Representation (INR). The proposed approach focuses on removing the various redundancies among the time series by employing several methods, including (i) conducting spatial correlation modeling for intra-region dynamics, (ii) decomposing reusable neuronal activation patterns, and (iii) using proper initialization together with nonlinear fusion to describe the inter-region similarity. This scheme appropriately incorporates the unique features of fMRI data, and experimental results on publicly available datasets demonstrate the effectiveness of the proposed method, surpassing state-of-the-art algorithms in both conventional image quality evaluation metrics and fMRI downstream tasks. This work in this paper paves the way for sharing massive fMRI data at low bandwidth and high fidelity. △ Less

Submitted 29 February, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

arXiv:2311.03810 [pdf, other]

Rethinking and Improving Multi-task Learning for End-to-end Speech Translation

Authors: Yuhao Zhang, Chen Xu, Bei Li, Hao Chen, Tong Xiao, Chunliang Zhang, **gbo Zhu

Abstract: Significant improvements in end-to-end speech translation (ST) have been achieved through the application of multi-task learning. However, the extent to which auxiliary tasks are highly consistent with the ST task, and how much this approach truly helps, have not been thoroughly studied. In this paper, we investigate the consistency between different tasks, considering different times and modules.… ▽ More Significant improvements in end-to-end speech translation (ST) have been achieved through the application of multi-task learning. However, the extent to which auxiliary tasks are highly consistent with the ST task, and how much this approach truly helps, have not been thoroughly studied. In this paper, we investigate the consistency between different tasks, considering different times and modules. We find that the textual encoder primarily facilitates cross-modal conversion, but the presence of noise in speech impedes the consistency between text and speech representations. Furthermore, we propose an improved multi-task learning (IMTL) approach for the ST task, which bridges the modal gap by mitigating the difference in length and representation. We conduct experiments on the MuST-C dataset. The results demonstrate that our method attains state-of-the-art results. Moreover, when additional data is used, we achieve the new SOTA result on MuST-C English to Spanish task with 20.8% of the training time required by the current SOTA method. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: Accepted to EMNLP2023 main conference

arXiv:2309.12234 [pdf, ps, other]

Bridging the Gaps of Both Modality and Language: Synchronous Bilingual CTC for Speech Translation and Speech Recognition

Authors: Chen Xu, Xiaoqian Liu, Erfeng He, Yuhao Zhang, Qianqian Dong, Tong Xiao, **gbo Zhu, Dapeng Man, Wu Yang

Abstract: In this study, we present synchronous bilingual Connectionist Temporal Classification (CTC), an innovative framework that leverages dual CTC to bridge the gaps of both modality and language in the speech translation (ST) task. Utilizing transcript and translation as concurrent objectives for CTC, our model bridges the gap between audio and text as well as between source and target languages. Build… ▽ More In this study, we present synchronous bilingual Connectionist Temporal Classification (CTC), an innovative framework that leverages dual CTC to bridge the gaps of both modality and language in the speech translation (ST) task. Utilizing transcript and translation as concurrent objectives for CTC, our model bridges the gap between audio and text as well as between source and target languages. Building upon the recent advances in CTC application, we develop an enhanced variant, BiL-CTC+, that establishes new state-of-the-art performances on the MuST-C ST benchmarks under resource-constrained scenarios. Intriguingly, our method also yields significant improvements in speech recognition performance, revealing the effect of cross-lingual learning on transcription and demonstrating its broad applicability. The source code is available at https://github.com/xuchennlp/S2T. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: Submitted to ICASSP 2024

arXiv:2306.11646 [pdf, other]

Recent Advances in Direct Speech-to-text Translation

Authors: Chen Xu, Rong Ye, Qianqian Dong, Chengqi Zhao, Tom Ko, Mingxuan Wang, Tong Xiao, **gbo Zhu

Abstract: Recently, speech-to-text translation has attracted more and more attention and many studies have emerged rapidly. In this paper, we present a comprehensive survey on direct speech translation aiming to summarize the current state-of-the-art techniques. First, we categorize the existing research work into three directions based on the main challenges -- modeling burden, data scarcity, and applicati… ▽ More Recently, speech-to-text translation has attracted more and more attention and many studies have emerged rapidly. In this paper, we present a comprehensive survey on direct speech translation aiming to summarize the current state-of-the-art techniques. First, we categorize the existing research work into three directions based on the main challenges -- modeling burden, data scarcity, and application issues. To tackle the problem of modeling burden, two main structures have been proposed, encoder-decoder framework (Transformer and the variants) and multitask frameworks. For the challenge of data scarcity, recent work resorts to many sophisticated techniques, such as data augmentation, pre-training, knowledge distillation, and multilingual modeling. We analyze and summarize the application issues, which include real-time, segmentation, named entity, gender bias, and code-switching. Finally, we discuss some promising directions for future work. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: An expanded version of the paper accepted by IJCAI2023 survey track

arXiv:2306.07650 [pdf, other]

Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation

Authors: Yuchen Han, Chen Xu, Tong Xiao, **gbo Zhu

Abstract: Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST). The commonplace "modality gap" between speech and text data often leads to inconsistent inputs between pre-training and fine-tuning. However, we observe that this gap occurs in the early stages of fine-tuning, but does not have a major impact on the final performance. On… ▽ More Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST). The commonplace "modality gap" between speech and text data often leads to inconsistent inputs between pre-training and fine-tuning. However, we observe that this gap occurs in the early stages of fine-tuning, but does not have a major impact on the final performance. On the other hand, we find that there has another gap, which we call the "capacity gap": high resource tasks (such as ASR and MT) always require a large model to fit, when the model is reused for a low resource task (E2E ST), it will get a sub-optimal performance due to the over-fitting. In a case study, we find that the regularization plays a more important role than the well-designed modality adaption method, which achieves 29.0 for en-de and 40.3 for en-fr on the MuST-C dataset. Code and models are available at https://github.com/hannlp/TAB. △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: ACL 2023 Main Conference

arXiv:2305.20024 [pdf]

Cooperative IoT Data Sharing with Heterogeneity of Participants Based on Electricity Retail

Authors: Bohong Wang, Qinglai Guo, Tian Xia, Qiang Li, Di Liu, Feng Zhao

Abstract: With the development of Internet of Things (IoT) and big data technology, the data value is increasingly explored in multiple practical scenarios, including electricity transactions. However, the isolation of IoT data among several entities makes it difficult to achieve optimal allocation of data resources and convert data resources into real economic value, thus it is necessary to introduce the I… ▽ More With the development of Internet of Things (IoT) and big data technology, the data value is increasingly explored in multiple practical scenarios, including electricity transactions. However, the isolation of IoT data among several entities makes it difficult to achieve optimal allocation of data resources and convert data resources into real economic value, thus it is necessary to introduce the IoT data sharing mode to drive data circulation. To enhance the accuracy and fairness of IoT data sharing, the heterogeneity of participants is sufficiently considered, and data valuation and profit allocation in IoT data sharing are improved based on the background of electricity retail. Data valuation is supposed to be relevant to attributes of IoT data buyers, thus risk preferences of electricity retailers are applied as characteristic attributes and data premium rates are proposed to modify data value rates. Profit allocation should measure the marginal contribution shares of electricity retailers and data brokers fairly, thus asymmetric Nash bargaining model is used to guarantee that they could receive reasonable profits based on their specific contribution to the coalition of IoT data sharing. Considering the heterogeneity of participants comprehensively, the proposed IoT data sharing fits for a large coalition of IoT data sharing with multiple electricity retailers and data brokers. Finally, to demonstrate the applications of IoT data sharing in smart grids, case studies are utilized to validate the results of data value for electricity retailers with different risk preferences and the efficiency of profit allocation using asymmetric Nash bargaining model. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: 18 pages, 14 figures

arXiv:2305.05344 [pdf, other]

Trustworthy Multi-phase Liver Tumor Segmentation via Evidence-based Uncertainty

Authors: Chuanfei Hu, Tianyi Xia, Ying Cui, Quchen Zou, Yuancheng Wang, Wenbo Xiao, Shenghong Ju, Xinde Li

Abstract: Multi-phase liver contrast-enhanced computed tomography (CECT) images convey the complementary multi-phase information for liver tumor segmentation (LiTS), which are crucial to assist the diagnosis of liver cancer clinically. However, the performances of existing multi-phase liver tumor segmentation (MPLiTS)-based methods suffer from redundancy and weak interpretability, % of the fused result, res… ▽ More Multi-phase liver contrast-enhanced computed tomography (CECT) images convey the complementary multi-phase information for liver tumor segmentation (LiTS), which are crucial to assist the diagnosis of liver cancer clinically. However, the performances of existing multi-phase liver tumor segmentation (MPLiTS)-based methods suffer from redundancy and weak interpretability, % of the fused result, resulting in the implicit unreliability of clinical applications. In this paper, we propose a novel trustworthy multi-phase liver tumor segmentation (TMPLiTS), which is a unified framework jointly conducting segmentation and uncertainty estimation. The trustworthy results could assist the clinicians to make a reliable diagnosis. Specifically, Dempster-Shafer Evidence Theory (DST) is introduced to parameterize the segmentation and uncertainty as evidence following Dirichlet distribution. The reliability of segmentation results among multi-phase CECT images is quantified explicitly. Meanwhile, a multi-expert mixture scheme (MEMS) is proposed to fuse the multi-phase evidences, which can guarantee the effect of fusion procedure based on theoretical analysis. Experimental results demonstrate the superiority of TMPLiTS compared with the state-of-the-art methods. Meanwhile, the robustness of TMPLiTS is verified, where the reliable performance can be guaranteed against the perturbations. △ Less

Submitted 20 June, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

arXiv:2304.14612 [pdf, other]

Local-Global Transformer Enhanced Unfolding Network for Pan-sharpening

Authors: Mingsong Li, Yikun Liu, Tao Xiao, Yuwen Huang, Gong** Yang

Abstract: Pan-sharpening aims to increase the spatial resolution of the low-resolution multispectral (LrMS) image with the guidance of the corresponding panchromatic (PAN) image. Although deep learning (DL)-based pan-sharpening methods have achieved promising performance, most of them have a two-fold deficiency. For one thing, the universally adopted black box principle limits the model interpretability. Fo… ▽ More Pan-sharpening aims to increase the spatial resolution of the low-resolution multispectral (LrMS) image with the guidance of the corresponding panchromatic (PAN) image. Although deep learning (DL)-based pan-sharpening methods have achieved promising performance, most of them have a two-fold deficiency. For one thing, the universally adopted black box principle limits the model interpretability. For another thing, existing DL-based methods fail to efficiently capture local and global dependencies at the same time, inevitably limiting the overall performance. To address these mentioned issues, we first formulate the degradation process of the high-resolution multispectral (HrMS) image as a unified variational optimization problem, and alternately solve its data and prior subproblems by the designed iterative proximal gradient descent (PGD) algorithm. Moreover, we customize a Local-Global Transformer (LGT) to simultaneously model local and global dependencies, and further formulate an LGT-based prior module for image denoising. Besides the prior module, we also design a lightweight data module. Finally, by serially integrating the data and prior modules in each iterative stage, we unfold the iterative algorithm into a stage-wise unfolding network, Local-Global Transformer Enhanced Unfolding Network (LGTEUN), for the interpretable MS pan-sharpening. Comprehensive experimental results on three satellite data sets demonstrate the effectiveness and efficiency of LGTEUN compared with state-of-the-art (SOTA) methods. The source code is available at https://github.com/lms-07/LGTEUN. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Comments: Accepted by IJCAI2023

arXiv:2304.08506 [pdf, other]

When SAM Meets Medical Images: An Investigation of Segment Anything Model (SAM) on Multi-phase Liver Tumor Segmentation

Authors: Chuanfei Hu, Tianyi Xia, Shenghong Ju, Xinde Li

Abstract: Learning to segmentation without large-scale samples is an inherent capability of human. Recently, Segment Anything Model (SAM) performs the significant zero-shot image segmentation, attracting considerable attention from the computer vision community. Here, we investigate the capability of SAM for medical image analysis, especially for multi-phase liver tumor segmentation (MPLiTS), in terms of pr… ▽ More Learning to segmentation without large-scale samples is an inherent capability of human. Recently, Segment Anything Model (SAM) performs the significant zero-shot image segmentation, attracting considerable attention from the computer vision community. Here, we investigate the capability of SAM for medical image analysis, especially for multi-phase liver tumor segmentation (MPLiTS), in terms of prompts, data resolution, phases. Experimental results demonstrate that there might be a large gap between SAM and expected performance. Fortunately, the qualitative results show that SAM is a powerful annotation tool for the community of interactive medical image segmentation. △ Less

Submitted 21 December, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

Comments: Preliminary investigation

arXiv:2304.08320 [pdf]

On Fast-Converged Deep Reinforcement Learning for Optimal Dispatch of Large-Scale Power Systems under Transient Security Constraints

Authors: Tannan Xiao, Ying Chen, Han Diao, Shaowei Huang, Chen Shen

Abstract: Power system optimal dispatch with transient security constraints is commonly represented as Transient Security-Constrained Optimal Power Flow (TSC-OPF). Deep Reinforcement Learning (DRL)-based TSC-OPF trains efficient decision-making agents that are adaptable to various scenarios and provide solution results quickly. However, due to the high dimensionality of the state space and action spaces, as… ▽ More Power system optimal dispatch with transient security constraints is commonly represented as Transient Security-Constrained Optimal Power Flow (TSC-OPF). Deep Reinforcement Learning (DRL)-based TSC-OPF trains efficient decision-making agents that are adaptable to various scenarios and provide solution results quickly. However, due to the high dimensionality of the state space and action spaces, as well as the non-smoothness of dynamic constraints, existing DRL-based TSC-OPF solution methods face a significant challenge of the sparse reward problem. To address this issue, a fast-converged DRL method for TSC-OPF is proposed in this paper. The Markov Decision Process (MDP) modeling of TSC-OPF is improved by reducing the observation space and smoothing the reward design, thus facilitating agent training. An improved Deep Deterministic Policy Gradient algorithm with Curriculum learning, Parallel exploration, and Ensemble decision-making (DDPG-CPEn) is introduced to drastically enhance the efficiency of agent training and the accuracy of decision-making. The effectiveness, efficiency, and accuracy of the proposed method are demonstrated through experiments in the IEEE 39-bus system and a practical 710-bus regional power grid. The source code of the proposed method is made public on GitHub. △ Less

Submitted 29 January, 2024; v1 submitted 17 April, 2023; originally announced April 2023.

Comments: 10 pages, 11 figures

arXiv:2303.07067 [pdf, other]

Cross-device Federated Learning for Mobile Health Diagnostics: A First Study on COVID-19 Detection

Authors: Tong Xia, **g Han, Abhirup Ghosh, Cecilia Mascolo

Abstract: Federated learning (FL) aided health diagnostic models can incorporate data from a large number of personal edge devices (e.g., mobile phones) while kee** the data local to the originating devices, largely ensuring privacy. However, such a cross-device FL approach for health diagnostics still imposes many challenges due to both local data imbalance (as extreme as local data consists of a single… ▽ More Federated learning (FL) aided health diagnostic models can incorporate data from a large number of personal edge devices (e.g., mobile phones) while kee** the data local to the originating devices, largely ensuring privacy. However, such a cross-device FL approach for health diagnostics still imposes many challenges due to both local data imbalance (as extreme as local data consists of a single disease class) and global data imbalance (the disease prevalence is generally low in a population). Since the federated server has no access to data distribution information, it is not trivial to solve the imbalance issue towards an unbiased model. In this paper, we propose FedLoss, a novel cross-device FL framework for health diagnostics. Here the federated server averages the models trained on edge devices according to the predictive loss on the local data, rather than using only the number of samples as weights. As the predictive loss better quantifies the data distribution at a device, FedLoss alleviates the impact of data imbalance. Through a real-world dataset on respiratory sound and symptom-based COVID-$19$ detection task, we validate the superiority of FedLoss. It achieves competitive COVID-$19$ detection performance compared to a centralised model with an AUC-ROC of $79\%$. It also outperforms the state-of-the-art FL baselines in sensitivity and convergence speed. Our work not only demonstrates the promise of federated COVID-$19$ detection but also paves the way to a plethora of mobile health model development in a privacy-preserving fashion. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: This paper has been accepted by IEEE ICASSP 2023

arXiv:2212.01778 [pdf, ps, other]

Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data

Authors: Yuhao Zhang, Chen Xu, Bojie Hu, Chunliang Zhang, Tong Xiao, **gbo Zhu

Abstract: We present a method for introducing a text encoder into pre-trained end-to-end speech translation systems. It enhances the ability of adapting one modality (i.e., source-language speech) to another (i.e., source-language text). Thus, the speech translation model can learn from both unlabeled and labeled data, especially when the source-language text data is abundant. Beyond this, we present a deno… ▽ More We present a method for introducing a text encoder into pre-trained end-to-end speech translation systems. It enhances the ability of adapting one modality (i.e., source-language speech) to another (i.e., source-language text). Thus, the speech translation model can learn from both unlabeled and labeled data, especially when the source-language text data is abundant. Beyond this, we present a denoising method to build a robust text encoder that can deal with both normal and noisy text data. Our system sets new state-of-the-arts on the MuST-C En-De, En-Fr, and LibriSpeech En-Fr tasks. △ Less

Submitted 4 December, 2022; originally announced December 2022.

Comments: Accepted to AAAI 2023

arXiv:2209.15180 [pdf, other]

SCI: A Spectrum Concentrated Implicit Neural Compression for Biomedical Data

Authors: Runzhao Yang, Tingxiong Xiao, Yuxiao Cheng, Qianni Cao, **yuan Qu, **li Suo, Qionghai Dai

Abstract: Massive collection and explosive growth of biomedical data, demands effective compression for efficient storage, transmission and sharing. Readily available visual data compression techniques have been studied extensively but tailored for natural images/videos, and thus show limited performance on biomedical data which are of different features and larger diversity. Emerging implicit neural repres… ▽ More Massive collection and explosive growth of biomedical data, demands effective compression for efficient storage, transmission and sharing. Readily available visual data compression techniques have been studied extensively but tailored for natural images/videos, and thus show limited performance on biomedical data which are of different features and larger diversity. Emerging implicit neural representation (INR) is gaining momentum and demonstrates high promise for fitting diverse visual data in target-data-specific manner, but a general compression scheme covering diverse biomedical data is so far absent. To address this issue, we firstly derive a mathematical explanation for INR's spectrum concentration property and an analytical insight on the design of INR based compressor. Further, we propose a Spectrum Concentrated Implicit neural compression (SCI) which adaptively partitions the complex biomedical data into blocks matching INR's concentrated spectrum envelop, and design a funnel shaped neural network capable of representing each block with a small number of parameters. Based on this design, we conduct compression via optimization under given budget and allocate the available parameters with high representation accuracy. The experiments show SCI's superior performance to state-of-the-art methods including commercial compressors, data-driven ones, and INR based counterparts on diverse biomedical data. The source code can be found at https://github.com/RichealYoung/ImplicitNeuralCompression.git. △ Less

Submitted 23 November, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: accepted to AAAI2023

ACM Class: I.4.2; I.2.10

arXiv:2208.09997 [pdf, other]

doi 10.1121/10.0019336

Spatially Selective Active Noise Control Systems

Authors: Tong Xiao, Buye Xu, Chuming Zhao

Abstract: Active noise control (ANC) systems are commonly designed to achieve maximal sound reduction regardless of the incident direction of the sound. When desired sound is present, the state-of-the-art methods add a separate system to reconstruct it. This can result in distortion and latency. In this work, we propose a multi-channel ANC system that only reduces sound from undesired directions, and the sy… ▽ More Active noise control (ANC) systems are commonly designed to achieve maximal sound reduction regardless of the incident direction of the sound. When desired sound is present, the state-of-the-art methods add a separate system to reconstruct it. This can result in distortion and latency. In this work, we propose a multi-channel ANC system that only reduces sound from undesired directions, and the system truly preserves the desired sound instead of reproducing it. The proposed algorithm imposes a spatial constraint on the hybrid ANC cost function to achieve spatial selectivity. Based on a six-channel microphone array on a pair of augmented eyeglasses, results show that the system minimized only noise coming from undesired directions. The control performance could be maintained even when the array was heavily perturbed. The proposed algorithm was also compared with the existing methods in the literature. Not only did the proposed system provide better noise reduction, but it also required much less effort. The binaural localization cues did not need to be reconstructed since the system preserved the physical sound wave from the desired source. △ Less

Submitted 12 May, 2023; v1 submitted 21 August, 2022; originally announced August 2022.

Comments: The following article has been submitted to the Journal of the Acoustical Society of America (JASA). It has been accepted and published in https://doi.org/10.1121/10.0019336

Journal ref: J. Acoust. Soc. Am., Vol. 153, No. 5, pp. 2733-2744, 2023

arXiv:2203.07815 [pdf, other]

Adversarial Counterfactual Augmentation: Application in Alzheimer's Disease Classification

Authors: Tian Xia, Pedro Sanchez, Chen Qin, Sotirios A. Tsaftaris

Abstract: Due to the limited availability of medical data, deep learning approaches for medical image analysis tend to generalise poorly to unseen data. Augmenting data during training with random transformations has been shown to help and became a ubiquitous technique for training neural networks. Here, we propose a novel adversarial counterfactual augmentation scheme that aims at finding the most \textit{… ▽ More Due to the limited availability of medical data, deep learning approaches for medical image analysis tend to generalise poorly to unseen data. Augmenting data during training with random transformations has been shown to help and became a ubiquitous technique for training neural networks. Here, we propose a novel adversarial counterfactual augmentation scheme that aims at finding the most \textit{effective} synthesised images to improve downstream tasks, given a pre-trained generative model. Specifically, we construct an adversarial game where we update the input \textit{conditional factor} of the generator and the downstream \textit{classifier} with gradient backpropagation alternatively and iteratively. This can be viewed as finding the `\textit{weakness}' of the classifier and purposely forcing it to \textit{overcome} its weakness via the generative model. To demonstrate the effectiveness of the proposed approach, we validate the method with the classification of Alzheimer's Disease (AD) as a downstream task. The pre-trained generative model synthesises brain images using age as conditional factor. Extensive experiments and ablation studies have been performed to show that the proposed approach improves classification performance and has potential to alleviate spurious correlations and catastrophic forgetting. Code will be released upon acceptance. △ Less

Submitted 1 October, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

arXiv:2203.04815 [pdf]

Machine Learning based Optimal Feedback Control for Microgrid Stabilization

Authors: Tianwei Xia, Kai Sun, Wei Kang

Abstract: Microgrids have more operational flexibilities as well as uncertainties than conventional power grids, especially when renewable energy resources are utilized. An energy storage based feedback controller can compensate undesired dynamics of a microgrid to improve its stability. However, the optimal feedback control of a microgrid subject to a large disturbance needs to solve a Hamilton-Jacobi-Bell… ▽ More Microgrids have more operational flexibilities as well as uncertainties than conventional power grids, especially when renewable energy resources are utilized. An energy storage based feedback controller can compensate undesired dynamics of a microgrid to improve its stability. However, the optimal feedback control of a microgrid subject to a large disturbance needs to solve a Hamilton-Jacobi-Bellman problem. This paper proposes a machine learning-based optimal feedback control scheme. Its training dataset is generated from a linear-quadratic regulator and a brute-force method respectively addressing small and large disturbances. Then, a three-layer neural network is constructed from the data for the purpose of optimal feedback control. A case study is carried out for a microgrid model based on a modified Kundur two-area system to test the real-time performance of the proposed control scheme. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: Accepted by 2022 IEEE PES General Meeting in Denver, CO

arXiv:2203.04808 [pdf]

Time-variant Nonlinear Participation Factors Considering Resonances in Power Systems

Authors: Tianwei Xia, Kai Sun

Abstract: The participation factor (PF), as an important modal property for small-signal stability, evaluates the linkage between a state variable and a mode. Applying the normal form theory, a nonlinear PF can be defined to evaluate the participation of a state variable into modal dynamics following a large disturbance, that gives considerations to resonances and nonlinearities up to a desired order. Howev… ▽ More The participation factor (PF), as an important modal property for small-signal stability, evaluates the linkage between a state variable and a mode. Applying the normal form theory, a nonlinear PF can be defined to evaluate the participation of a state variable into modal dynamics following a large disturbance, that gives considerations to resonances and nonlinearities up to a desired order. However, existing nonlinear PFs are inconsistent with the conventional linear PF when nonlinear dynamics following a large disturbance attenuate and linear modal dynamics become dominating. This paper proposes a time-variant nonlinear PF by introducing a time decaying factor and the definition of a nonlinear mode. The new PFs consider modes of resonances and their values naturally transition to a linear PF when the system state becomes close to its equilibrium. The case study on a two-area four-generator system shows that the new PF can correctly rank generators by their participations in natural and resonance modes of nonlinear oscillation subject to a large disturbance. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: Accepted by 2022 IEEE PES General Meeting in Denver CO

arXiv:2202.08981 [pdf, other]

A Summary of the ComParE COVID-19 Challenges

Authors: Harry Coppock, Alican Akman, Christian Bergler, Maurice Gerczuk, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, **g Han, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Panagiotis Tzirakis, Anton Batliner, Cecilia Mascolo, Björn W. Schuller

Abstract: The COVID-19 pandemic has caused massive humanitarian and economic damage. Teams of scientists from a broad range of disciplines have searched for methods to help governments and communities combat the disease. One avenue from the machine learning field which has been explored is the prospect of a digital mass test which can detect COVID-19 from infected individuals' respiratory sounds. We present… ▽ More The COVID-19 pandemic has caused massive humanitarian and economic damage. Teams of scientists from a broad range of disciplines have searched for methods to help governments and communities combat the disease. One avenue from the machine learning field which has been explored is the prospect of a digital mass test which can detect COVID-19 from infected individuals' respiratory sounds. We present a summary of the results from the INTERSPEECH 2021 Computational Paralinguistics Challenges: COVID-19 Cough, (CCS) and COVID-19 Speech, (CSS). △ Less

Submitted 17 February, 2022; originally announced February 2022.

Comments: 18 pages, 13 figures

arXiv:2201.01232 [pdf]

doi 10.2196/37004

Exploring Longitudinal Cough, Breath, and Voice Data for COVID-19 Progression Prediction via Sequential Deep Learning: Model Development and Validation

Authors: Ting Dang, **g Han, Tong Xia, Dimitris Spathis, Erika Bondareva, Chloë Siegele-Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Andres Floto, Pietro Cicuta, Cecilia Mascolo

Abstract: Recent work has shown the potential of using audio data (eg, cough, breathing, and voice) in the screening for COVID-19. However, these approaches only focus on one-off detection and detect the infection given the current audio sample, but do not monitor disease progression in COVID-19. Limited exploration has been put forward to continuously monitor COVID-19 progression, especially recovery, thro… ▽ More Recent work has shown the potential of using audio data (eg, cough, breathing, and voice) in the screening for COVID-19. However, these approaches only focus on one-off detection and detect the infection given the current audio sample, but do not monitor disease progression in COVID-19. Limited exploration has been put forward to continuously monitor COVID-19 progression, especially recovery, through longitudinal audio data. Tracking disease progression characteristics could lead to more timely treatment. The primary objective of this study is to explore the potential of longitudinal audio samples over time for COVID-19 progression prediction and, especially, recovery trend prediction using sequential deep learning techniques. Crowdsourced respiratory audio data, including breathing, cough, and voice samples, from 212 individuals over 5-385 days were analyzed. We developed a deep learning-enabled tracking tool using gated recurrent units (GRUs) to detect COVID-19 progression by exploring the audio dynamics of the individuals' historical audio biomarkers. The investigation comprised 2 parts: (1) COVID-19 detection in terms of positive and negative (healthy) tests, and (2) longitudinal disease progression prediction over time in terms of probability of positive tests. The strong performance for COVID-19 detection, yielding an AUROC of 0.79, a sensitivity of 0.75, and a specificity of 0.71 supported the effectiveness of the approach compared to methods that do not leverage longitudinal dynamics. We further examined the predicted disease progression trajectory, displaying high consistency with test results with a correlation of 0.75 in the test cohort and 0.86 in a subset of the test cohort who reported recovery. Our findings suggest that monitoring COVID-19 evolution via longitudinal audio data has potential in the tracking of individuals' disease progression and recovery. △ Less

Submitted 22 June, 2022; v1 submitted 4 January, 2022; originally announced January 2022.

Comments: Updated title. Revised format according to journal requirements

arXiv:2110.12981 [pdf]

doi 10.1109/TPWRS.2022.3194570

Feasibility Study of Neural ODE and DAE Modules for Power System Dynamic Component Modeling

Authors: Tannan Xiao, Ying Chen, Shaowei Huang, Tirui He, Huizhe Guan

Abstract: In the context of high penetration of renewables, the need to build dynamic models of power system components based on accessible measurement data has become urgent. To address this challenge, firstly, a neural ordinary differential equations (ODE) module and a neural differential-algebraic equations (DAE) module are proposed to form a data-driven modeling framework that accurately captures compon… ▽ More In the context of high penetration of renewables, the need to build dynamic models of power system components based on accessible measurement data has become urgent. To address this challenge, firstly, a neural ordinary differential equations (ODE) module and a neural differential-algebraic equations (DAE) module are proposed to form a data-driven modeling framework that accurately captures components' dynamic characteristics and flexibly adapts to various interface settings. Secondly, analytical models and data-driven models learned by the neural ODE and DAE modules are integrated together and simulated simultaneously using unified transient stability simulation methods. Finally, the neural ODE and DAE modules are implemented with Python and made public on GitHub. Using the portal measurements, three simple but representative cases of excitation controller modeling, photovoltaic power plant modeling, and equivalent load modeling of a regional power network are carried out in the IEEE-39 system and 2383wp system. Neural dynamic model-integrated simulations are compared with the original model-based ones to verify the feasibility and potentiality of the proposed neural ODE and DAE modules. △ Less

Submitted 4 July, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

Comments: 14 pages, 8 figures, 3 table. Under review by IEEE Transactions on Power Systems

arXiv:2110.00931 [pdf]

doi 10.35833/MPCE.2022.000099

Exploration of Artificial Intelligence-oriented Power System Dynamic Simulators

Authors: Tannan Xiao, Ying Chen, Jianquan Wang, Shaowei Huang, Weilin Tong, Tirui He

Abstract: With the rapid development of artificial intelligence (AI), it is foreseeable that the accuracy and efficiency of dynamic analysis for future power system will be greatly improved by the integration of dynamic simulators and AI. To explore the interaction mechanism of power system dynamic simulations and AI, a general design of an AI-oriented power system dynamic simulator is proposed, which consi… ▽ More With the rapid development of artificial intelligence (AI), it is foreseeable that the accuracy and efficiency of dynamic analysis for future power system will be greatly improved by the integration of dynamic simulators and AI. To explore the interaction mechanism of power system dynamic simulations and AI, a general design of an AI-oriented power system dynamic simulator is proposed, which consists of a high-performance simulator with neural network supportability and flexible external and internal application programming interfaces (APIs). With the support of APIs, simulation-assisted AI and AI-assisted simulation form a comprehensive interaction mechanism between power system dynamic simulations and AI. A prototype of this design is implemented and made public based on a highly efficient electromechanical simulator. Tests of this prototype are carried out under four scenarios including sample generation, AI-based stability prediction, data-driven dynamic component modeling, and AI-aided stability control, which prove the validity, flexibility, and efficiency of the design and implementation of the AI-oriented power system dynamic simulator. △ Less

Submitted 6 July, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

Comments: 10 pages, 8 figures, 1 table. Accepted by Journal of Modern Power System and Clean Energy

arXiv:2109.14956 [pdf]

Comparative Validation of Machine Learning Algorithms for Surgical Workflow and Skill Analysis with the HeiChole Benchmark

Authors: Martin Wagner, Beat-Peter Müller-Stich, Anna Kisilenko, Duc Tran, Patrick Heger, Lars Mündermann, David M Lubotsky, Benjamin Müller, Tornike Davitashvili, Manuela Capek, Annika Reinke, Tong Yu, Armine Vardazaryan, Chinedu Innocent Nwoye, Nicolas Padoy, Xinyang Liu, Eung-Joo Lee, Constantin Disch, Hans Meine, Tong Xia, Fucang Jia, Satoshi Kondo, Wolfgang Reiter, Yueming **, Yonghao Long , et al. (16 additional authors not shown)

Abstract: PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported fo… ▽ More PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center dataset. In this work we investigated the generalizability of phase recognition algorithms in a multi-center setting including more difficult recognition tasks such as surgical action and surgical skill. METHODS: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 hours was created. Labels included annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 teams submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. RESULTS: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n=9 teams), for instrument presence detection between 38.5% and 63.8% (n=8 teams), but for action recognition only between 21.8% and 23.3% (n=5 teams). The average absolute error for skill assessment was 0.78 (n=1 team). CONCLUSION: Surgical workflow and skill analysis are promising technologies to support the surgical team, but are not solved yet, as shown by our comparison of algorithms. This novel benchmark can be used for comparable evaluation and validation of future work. △ Less

Submitted 30 September, 2021; originally announced September 2021.

arXiv:2106.15523 [pdf, other]

Sounds of COVID-19: exploring realistic performance of audio-based digital testing

Authors: **g Han, Tong Xia, Dimitris Spathis, Erika Bondareva, Chloë Brown, Jagmohan Chauhan, Ting Dang, Andreas Grammenos, Apinan Hasthanasombat, Andres Floto, Pietro Cicuta, Cecilia Mascolo

Abstract: Researchers have been battling with the question of how we can identify Coronavirus disease (COVID-19) cases efficiently, affordably and at scale. Recent work has shown how audio based approaches, which collect respiratory audio data (cough, breathing and voice) can be used for testing, however there is a lack of exploration of how biases and methodological decisions impact these tools' performanc… ▽ More Researchers have been battling with the question of how we can identify Coronavirus disease (COVID-19) cases efficiently, affordably and at scale. Recent work has shown how audio based approaches, which collect respiratory audio data (cough, breathing and voice) can be used for testing, however there is a lack of exploration of how biases and methodological decisions impact these tools' performance in practice. In this paper, we explore the realistic performance of audio-based digital testing of COVID-19. To investigate this, we collected a large crowdsourced respiratory audio dataset through a mobile app, alongside recent COVID-19 test result and symptoms intended as a ground truth. Within the collected dataset, we selected 5,240 samples from 2,478 participants and split them into different participant-independent sets for model development and validation. Among these, we controlled for potential confounding factors (such as demographics and language). The unbiased model takes features extracted from breathing, coughs, and voice signals as predictors and yields an AUC-ROC of 0.71 (95\% CI: 0.65$-$0.77). We further explore different unbalanced distributions to show how biases and participant splits affect performance. Finally, we discuss how the realistic model presented could be integrated in clinical practice to realize continuous, ubiquitous, sustainable and affordable testing at population scale. △ Less

Submitted 29 June, 2021; originally announced June 2021.

arXiv:2105.05752 [pdf, other]

Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders

Authors: Chen Xu, Bojie Hu, Yanyang Li, Yuhao Zhang, shen huang, Qi Ju, Tong Xiao, **gbo Zhu

Abstract: Encoder pre-training is promising in end-to-end Speech Translation (ST), given the fact that speech-to-translation data is scarce. But ST encoders are not simple instances of Automatic Speech Recognition (ASR) or Machine Translation (MT) encoders. For example, we find that ASR encoders lack the global context representation, which is necessary for translation, whereas MT encoders are not designed… ▽ More Encoder pre-training is promising in end-to-end Speech Translation (ST), given the fact that speech-to-translation data is scarce. But ST encoders are not simple instances of Automatic Speech Recognition (ASR) or Machine Translation (MT) encoders. For example, we find that ASR encoders lack the global context representation, which is necessary for translation, whereas MT encoders are not designed to deal with long but locally attentive acoustic sequences. In this work, we propose a Stacked Acoustic-and-Textual Encoding (SATE) method for speech translation. Our encoder begins with processing the acoustic sequence as usual, but later behaves more like an MT encoder for a global representation of the input sequence. In this way, it is straightforward to incorporate the pre-trained models into the system. Also, we develop an adaptor module to alleviate the representation inconsistency between the pre-trained ASR encoder and MT encoder, and develop a multi-teacher knowledge distillation method to preserve the pre-training knowledge. Experimental results on the LibriSpeech En-Fr and MuST-C En-De ST tasks show that our method achieves state-of-the-art BLEU scores of 18.3 and 25.2. To our knowledge, we are the first to develop an end-to-end ST system that achieves comparable or even better BLEU performance than the cascaded ST counterpart when large-scale ASR and MT data is available. △ Less

Submitted 15 June, 2021; v1 submitted 12 May, 2021; originally announced May 2021.

Comments: ACL 2021

arXiv:2104.02005 [pdf, other]

Uncertainty-Aware COVID-19 Detection from Imbalanced Sound Data

Authors: Tong Xia, **g Han, Lorena Qendro, Ting Dang, Cecilia Mascolo

Abstract: Recently, sound-based COVID-19 detection studies have shown great promise to achieve scalable and prompt digital pre-screening. However, there are still two unsolved issues hindering the practice. First, collected datasets for model training are often imbalanced, with a considerably smaller proportion of users tested positive, making it harder to learn representative and robust features. Second, d… ▽ More Recently, sound-based COVID-19 detection studies have shown great promise to achieve scalable and prompt digital pre-screening. However, there are still two unsolved issues hindering the practice. First, collected datasets for model training are often imbalanced, with a considerably smaller proportion of users tested positive, making it harder to learn representative and robust features. Second, deep learning models are generally overconfident in their predictions. Clinically, false predictions aggravate healthcare costs. Estimation of the uncertainty of screening would aid this. To handle these issues, we propose an ensemble framework where multiple deep learning models for sound-based COVID-19 detection are developed from different but balanced subsets from original data. As such, data are utilized more effectively compared to traditional up-sampling and down-sampling approaches: an AUC of 0.74 with a sensitivity of 0.68 and a specificity of 0.69 is achieved. Simultaneously, we estimate uncertainty from the disagreement across multiple models. It is shown that false predictions often yield higher uncertainty, enabling us to suggest the users with certainty higher than a threshold to repeat the audio test on their phones or to take clinical tests if digital diagnosis still fails. This study paves the way for a more robust sound-based COVID-19 automated screening system. △ Less

Submitted 18 June, 2021; v1 submitted 5 April, 2021; originally announced April 2021.

Comments: Accepted by INTERSPEECH 2021

arXiv:2102.13468 [pdf, other]

The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates

Authors: Björn W. Schuller, Anton Batliner, Christian Bergler, Cecilia Mascolo, **g Han, Iulia Lefter, Heysem Kaya, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Maurice Gerczuk, Panagiotis Tzirakis, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Leon J. M. Rothkrantz, Joeri Zwerts, Jelle Treep, Casper Kaandorp

Abstract: The INTERSPEECH 2021 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the COVID-19 Cough and COVID-19 Speech Sub-Challenges, a binary classification on COVID-19 infection has to be made based on coughing sounds and speech; in the Escalation SubChallenge, a three-way assessment of the level of es… ▽ More The INTERSPEECH 2021 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the COVID-19 Cough and COVID-19 Speech Sub-Challenges, a binary classification on COVID-19 infection has to be made based on coughing sounds and speech; in the Escalation SubChallenge, a three-way assessment of the level of escalation in a dialogue is featured; and in the Primates Sub-Challenge, four species vs background need to be classified. We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the 'usual' COMPARE and BoAW features as well as deep unsupervised representation learning using the AuDeep toolkit, and deep feature extraction from pre-trained CNNs using the Deep Spectrum toolkit; in addition, we add deep end-to-end sequential modelling, and partially linguistic analysis. △ Less

Submitted 24 February, 2021; originally announced February 2021.

Comments: 5 pages

MSC Class: 68 ACM Class: I.2.7; I.5.0; J.3

arXiv:2102.05225 [pdf, other]

doi 10.1109/ICASSP39728.2021.9414576

Exploring Automatic COVID-19 Diagnosis via voice and symptoms from Crowdsourced Data

Authors: **g Han, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Cecilia Mascolo

Abstract: The development of fast and accurate screening tools, which could facilitate testing and prevent more costly clinical tests, is key to the current pandemic of COVID-19. In this context, some initial work shows promise in detecting diagnostic signals of COVID-19 from audio sounds. In this paper, we propose a voice-based framework to automatically detect individuals who have tested positive for COVI… ▽ More The development of fast and accurate screening tools, which could facilitate testing and prevent more costly clinical tests, is key to the current pandemic of COVID-19. In this context, some initial work shows promise in detecting diagnostic signals of COVID-19 from audio sounds. In this paper, we propose a voice-based framework to automatically detect individuals who have tested positive for COVID-19. We evaluate the performance of the proposed framework on a subset of data crowdsourced from our app, containing 828 samples from 343 participants. By combining voice signals and reported symptoms, an AUC of $0.79$ has been attained, with a sensitivity of $0.68$ and a specificity of $0.82$. We hope that this study opens the door to rapid, low-cost, and convenient pre-screening tools to automatically detect the disease. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: 5 pages, 3 figures, 2 tables, Accepted for publication at ICASSP 2021

arXiv:2012.08931 [pdf]

Deep learning for fast MR imaging: a review for learning reconstruction from incomplete k-space data

Authors: Shanshan Wang, Taohui Xiao, Qiegen Liu, Hairong Zheng

Abstract: Magnetic resonance imaging is a powerful imaging modality that can provide versatile information but it has a bottleneck problem "slow imaging speed". Reducing the scanned measurements can accelerate MR imaging with the aid of powerful reconstruction methods, which have evolved from linear analytic models to nonlinear iterative ones. The emerging trend in this area is replacing human-defined signa… ▽ More Magnetic resonance imaging is a powerful imaging modality that can provide versatile information but it has a bottleneck problem "slow imaging speed". Reducing the scanned measurements can accelerate MR imaging with the aid of powerful reconstruction methods, which have evolved from linear analytic models to nonlinear iterative ones. The emerging trend in this area is replacing human-defined signal models with that learned from data. Specifically, from 2016, deep learning has been incorporated into the fast MR imaging task, which draws valuable prior knowledge from big datasets to facilitate accurate MR image reconstruction from limited measurements. This survey aims to review deep learning based MR image reconstruction works from 2016- June 2020 and will discuss merits, limitations and challenges associated with such methods. Last but not least, this paper will provide a starting point for researchers interested in contributing to this field by pointing out good tutorial resources, state-of-the-art open-source codes and meaningful data sources. △ Less

Submitted 15 December, 2020; originally announced December 2020.

Comments: Invited review submitted to Biomedical signal processing and control in Jan 2020

arXiv:2006.05919 [pdf, other]

doi 10.1145/3394486.3412865

Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory Sound Data

Authors: Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, **g Han, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Cecilia Mascolo

Abstract: Audio signals generated by the human body (e.g., sighs, breathing, heart, digestion, vibration sounds) have routinely been used by clinicians as indicators to diagnose disease or assess disease progression. Until recently, such signals were usually collected through manual auscultation at scheduled visits. Research has now started to use digital technology to gather bodily sounds (e.g., from digit… ▽ More Audio signals generated by the human body (e.g., sighs, breathing, heart, digestion, vibration sounds) have routinely been used by clinicians as indicators to diagnose disease or assess disease progression. Until recently, such signals were usually collected through manual auscultation at scheduled visits. Research has now started to use digital technology to gather bodily sounds (e.g., from digital stethoscopes) for cardiovascular or respiratory examination, which could then be used for automatic analysis. Some initial work shows promise in detecting diagnostic signals of COVID-19 from voice and coughs. In this paper we describe our data analysis over a large-scale crowdsourced dataset of respiratory sounds collected to aid diagnosis of COVID-19. We use coughs and breathing to understand how discernible COVID-19 sounds are from those in asthma or healthy controls. Our results show that even a simple binary machine learning classifier is able to classify correctly healthy and COVID-19 sounds. We also show how we distinguish a user who tested positive for COVID-19 and has a cough from a healthy user with a cough, and users who tested positive for COVID-19 and have a cough from users with asthma and a cough. Our models achieve an AUC of above 80% across all tasks. These results are preliminary and only scratch the surface of the potential of this type of data and audio-based machine learning. This work opens the door to further investigation of how automatically analysed respiratory patterns could be used as pre-screening signals to aid COVID-19 diagnosis. △ Less

Submitted 18 January, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: 9 pages, 6 figures, 2 tables, Accepted for publication at KDD'20 (Health Day)

arXiv:2005.01607 [pdf, other]

doi 10.1016/j.media.2020.101719

Pseudo-healthy synthesis with pathology disentanglement and adversarial learning

Authors: Tian Xia, Agisilaos Chartsias, Sotirios A. Tsaftaris

Abstract: Pseudo-healthy synthesis is the task of creating a subject-specific `healthy' image from a pathological one. Such images can be helpful in tasks such as anomaly detection and understanding changes induced by pathology and disease. In this paper, we present a model that is encouraged to disentangle the information of pathology from what seems to be healthy. We disentangle what appears to be healthy… ▽ More Pseudo-healthy synthesis is the task of creating a subject-specific `healthy' image from a pathological one. Such images can be helpful in tasks such as anomaly detection and understanding changes induced by pathology and disease. In this paper, we present a model that is encouraged to disentangle the information of pathology from what seems to be healthy. We disentangle what appears to be healthy and where disease is as a segmentation map, which are then recombined by a network to reconstruct the input disease image. We train our models adversarially using either paired or unpaired settings, where we pair disease images and maps when available. We quantitatively and subjectively, with a human study, evaluate the quality of pseudo-healthy images using several criteria. We show in a series of experiments, performed on ISLES, BraTS and Cam-CAN datasets, that our method is better than several baselines and methods from the literature. We also show that due to better training processes we could recover deformations, on surrounding tissue, caused by disease. Our implementation is publicly available at https://github.com/xiat0616/pseudo-healthy-synthesis. This paper has been accepted by Medical Image Analysis: https://doi.org/10.1016/j.media.2020.101719. △ Less

Submitted 18 June, 2021; v1 submitted 20 April, 2020; originally announced May 2020.

Comments: This paper has been accepted by Medical Image Analysis

arXiv:2003.09651 [pdf]

Extended Prony Analysis on Power System Oscillation Under a Near-Resonance Condition

Authors: Tianwei Xia, Zhe Yu, Kai Sun, Di Shi, Zhiwei Wang

Abstract: Power system oscillations under a large disturbance often exhibit distorted waveforms as captured by increasingly deployed phasor measurement units. One cause is the occurrence of a near-resonance condition among several dominant modes that are influenced by nonlinear transient dynamics of generators. This paper proposes an Extended Prony Analysis method for measurement-based modal analysis. Based… ▽ More Power system oscillations under a large disturbance often exhibit distorted waveforms as captured by increasingly deployed phasor measurement units. One cause is the occurrence of a near-resonance condition among several dominant modes that are influenced by nonlinear transient dynamics of generators. This paper proposes an Extended Prony Analysis method for measurement-based modal analysis. Based on the normal form theory, it compares analyses on transient and post-transient waveforms to distinguish a resonance mode caused by a near-resonance condition from natural modes so that the method can give more accurate modal properties than a traditional Prony Analysis method, especially for large disturbances. The new method is first demonstrated in detail on Kundur's two-area system and then tested on the IEEE 39-bus system to show its performance under a near-resonance condition. △ Less

Submitted 21 March, 2020; originally announced March 2020.

Comments: To be presented at the IEEE PES General Meeting in 2020

arXiv:2003.01950 [pdf, other]

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

Authors: Zhen Zeng, Jianzong Wang, Ning Cheng, Tian Xia, **g Xiao

Abstract: Targeting at both high efficiency and performance, we propose AlignTTS to predict the mel-spectrum in parallel. AlignTTS is based on a Feed-Forward Transformer which generates mel-spectrum from a sequence of characters, and the duration of each character is determined by a duration predictor.Instead of adopting the attention mechanism in Transformer TTS to align text to mel-spectrum, the alignment… ▽ More Targeting at both high efficiency and performance, we propose AlignTTS to predict the mel-spectrum in parallel. AlignTTS is based on a Feed-Forward Transformer which generates mel-spectrum from a sequence of characters, and the duration of each character is determined by a duration predictor.Instead of adopting the attention mechanism in Transformer TTS to align text to mel-spectrum, the alignment loss is presented to consider all possible alignments in training by use of dynamic programming. Experiments on the LJSpeech dataset show that our model achieves not only state-of-the-art performance which outperforms Transformer TTS by 0.03 in mean option score (MOS), but also a high efficiency which is more than 50 times faster than real-time. △ Less

Submitted 4 March, 2020; originally announced March 2020.

Comments: will be presented in ICASSP 2020

arXiv:1912.02620 [pdf, other]

doi 10.1016/j.media.2021.102169

Learning to synthesise the ageing brain without longitudinal data

Authors: Tian Xia, Agisilaos Chartsias, Chengjia Wang, Sotirios A. Tsaftaris

Abstract: How will my face look when I get older? Or, for a more challenging question: How will my brain look when I get older? To answer this question one must devise (and learn from data) a multivariate auto-regressive function which given an image and a desired target age generates an output image. While collecting data for faces may be easier, collecting longitudinal brain data is not trivial. We propos… ▽ More How will my face look when I get older? Or, for a more challenging question: How will my brain look when I get older? To answer this question one must devise (and learn from data) a multivariate auto-regressive function which given an image and a desired target age generates an output image. While collecting data for faces may be easier, collecting longitudinal brain data is not trivial. We propose a deep learning-based method that learns to simulate subject-specific brain ageing trajectories without relying on longitudinal data. Our method synthesises images conditioned on two factors: age (a continuous variable), and status of Alzheimer's Disease (AD, an ordinal variable). With an adversarial formulation we learn the joint distribution of brain appearance, age and AD status, and define reconstruction losses to address the challenging problem of preserving subject identity. We compare with several benchmarks using two widely used datasets. We evaluate the quality and realism of synthesised images using ground-truth longitudinal data and a pre-trained age predictor. We show that, despite the use of cross-sectional data, our model learns patterns of gray matter atrophy in the middle temporal gyrus in patients with AD. To demonstrate generalisation ability, we train on one dataset and evaluate predictions on the other. In conclusion, our model shows an ability to separate age, disease influence and anatomy using only 2D cross-sectional data that should be useful in large studies into neurodegenerative disease, that aim to combine several data sources. To facilitate such future studies by the community at large our code is made available at https://github.com/xiat0616/BrainAgeing. △ Less

Submitted 30 September, 2021; v1 submitted 4 December, 2019; originally announced December 2019.

Journal ref: Medical Image Analysis, 2021, 73: 102169

arXiv:1909.05393 [pdf]

doi 10.1088/1757-899X/646/1/012048

Automated Blood Cell Detection and Counting via Deep Learning for Microfluidic Point-of-Care Medical Devices

Authors: Tiancheng Xia, Richard Jiang, YongQing Fu, Nanlin **

Abstract: Automated in-vitro cell detection and counting have been a key theme for artificial and intelligent biological analysis such as biopsy, drug analysis and decease diagnosis. Along with the rapid development of microfluidics and lab-on-chip technologies, in-vitro live cell analysis has been one of the critical tasks for both research and industry communities. However, it is a great challenge to obta… ▽ More Automated in-vitro cell detection and counting have been a key theme for artificial and intelligent biological analysis such as biopsy, drug analysis and decease diagnosis. Along with the rapid development of microfluidics and lab-on-chip technologies, in-vitro live cell analysis has been one of the critical tasks for both research and industry communities. However, it is a great challenge to obtain and then predict the precise information of live cells from numerous microscopic videos and images. In this paper, we investigated in-vitro detection of white blood cells using deep neural networks, and discussed how state-of-the-art machine learning techniques could fulfil the needs of medical diagnosis. The approach we used in this study was based on Faster Region-based Convolutional Neural Networks (Faster RCNNs), and a transfer learning process was applied to apply this technique to the microscopic detection of blood cells. Our experimental results demonstrated that fast and efficient analysis of blood cells via automated microscopic imaging can achieve much better accuracy and faster speed than the conventionally applied methods, implying a promising future of this technology to be applied to the microfluidic point-of-care medical devices. △ Less

Submitted 11 September, 2019; originally announced September 2019.

Journal ref: Proceeding of 2019 3rd International Conference on Artificial Intelligence Applications and Technologies (AIAAT 2019)

arXiv:1909.03377 [pdf]

doi 10.1038/s41598-020-77614-w

Ultra-broadband local active noise control with remote acoustic sensing

Authors: Tong Xiao, Xiaojun Qiu, Benjamin Halkon

Abstract: One enduring challenge for controlling high frequency sound in local active noise control (ANC) systems is to obtain the acoustic signal at the specific location to be controlled. In some applications such as in ANC headrest systems, it is not practical to install error microphones in a person's ears to provide the user a quiet or optimally acoustically controlled environment. Many virtual error s… ▽ More One enduring challenge for controlling high frequency sound in local active noise control (ANC) systems is to obtain the acoustic signal at the specific location to be controlled. In some applications such as in ANC headrest systems, it is not practical to install error microphones in a person's ears to provide the user a quiet or optimally acoustically controlled environment. Many virtual error sensing approaches have been proposed to estimate the acoustic signal remotely with the current state-of-the-art method using an array of four microphones and a head tracking system to yield sound reduction up to 1 kHz for a single sound source. In the work reported in this paper, a novel approach of incorporating remote acoustic sensing using a laser Doppler vibrometer into an ANC headrest system is investigated. In this 'virtual ANC headphone' system, a lightweight retro-reflective membrane pick-up is mounted in each synthetic ear of a head and torso simulator to determine the sound in the ear in real-time with minimal invasiveness. The membrane design and the effects of its location on the system performance are explored, the noise spectra in the ears without and with ANC for a variety of relevant primary sound fields are reported, and the performance of the system during head movements is demonstrated. The test results show that at least 10 dB sound attenuation can be realised in the ears over an extended frequency range from (500 Hz to 6 kHz) under a complex sound field and for several common types of synthesised environmental noise, even in the presence of head motion. △ Less

Submitted 27 November, 2020; v1 submitted 7 September, 2019; originally announced September 2019.

Report number: 20784

Journal ref: Sci. Rep. 10 (2020)

arXiv:1908.09140 [pdf]

LANTERN: learn analysis transform network for dynamic magnetic resonance imaging with small dataset

Authors: Shanshan Wang, Yanxia Chen, Taohui Xiao, Ziwen Ke, Qiegen Liu, Hairong Zheng

Abstract: This paper proposes to learn analysis transform network for dynamic magnetic resonance imaging (LANTERN) with small dataset. Integrating the strength of CS-MRI and deep learning, the proposed framework is highlighted in three components: (i) The spatial and temporal domains are sparsely constrained by using adaptively trained CNN. (ii) We introduce an end-to-end framework to learn the parameters i… ▽ More This paper proposes to learn analysis transform network for dynamic magnetic resonance imaging (LANTERN) with small dataset. Integrating the strength of CS-MRI and deep learning, the proposed framework is highlighted in three components: (i) The spatial and temporal domains are sparsely constrained by using adaptively trained CNN. (ii) We introduce an end-to-end framework to learn the parameters in LANTERN to solve the difficulty of parameter selection in traditional methods. (iii) Compared to existing deep learning reconstruction methods, our reconstruction accuracy is better when the amount of data is limited. Our model is able to fully exploit the redundancy in spatial and temporal of dynamic MR images. We performed quantitative and qualitative analysis of cardiac datasets at different acceleration factors (2x-11x) and different undersampling modes. In comparison with state-of-the-art methods, extensive experiments show that our method achieves consistent better reconstruction performance on the MRI reconstruction in terms of three quantitative metrics (PSNR, SSIM and HFEN) under different undersamling patterns and acceleration factors. △ Less

Submitted 24 August, 2019; originally announced August 2019.

arXiv:1908.02054 [pdf, other]

Model-based Convolutional De-Aliasing Network Learning for Parallel MR Imaging

Authors: Yanxia Chen, Taohui Xiao, Cheng Li, Qiegen Liu, Shanshan Wang

Abstract: Parallel imaging has been an essential technique to accelerate MR imaging. Nevertheless, the acceleration rate is still limited due to the ill-condition and challenges associated with the undersampled reconstruction. In this paper, we propose a model-based convolutional de-aliasing network with adaptive parameter learning to achieve accurate reconstruction from multi-coil undersampled k-space data… ▽ More Parallel imaging has been an essential technique to accelerate MR imaging. Nevertheless, the acceleration rate is still limited due to the ill-condition and challenges associated with the undersampled reconstruction. In this paper, we propose a model-based convolutional de-aliasing network with adaptive parameter learning to achieve accurate reconstruction from multi-coil undersampled k-space data. Three main contributions have been made: a de-aliasing reconstruction model was proposed to accelerate parallel MR imaging with deep learning exploring both spatial redundancy and multi-coil correlations; a split Bregman iteration algorithm was developed to solve the model efficiently; and unlike most existing parallel imaging methods which rely on the accuracy of the estimated multi-coil sensitivity, the proposed method can perform parallel reconstruction from undersampled data without explicit sensitivity calculation. Evaluations were conducted on \emph{in vivo} brain dataset with a variety of undersampling patterns and different acceleration factors. Our results demonstrated that this method could achieve superior performance in both quantitative and qualitative analysis, compared to three state-of-the-art methods. △ Less

Submitted 6 August, 2019; originally announced August 2019.

arXiv:1906.04359 [pdf, other]

DeepcomplexMRI: Exploiting deep residual network for fast parallel MR imaging with complex convolution

Authors: Shanshan Wang, Huitao Cheng, Leslie Ying, Taohui Xiao, Ziwen Ke, Xin Liu, Hairong Zheng, Dong Liang

Abstract: This paper proposes a multi-channel image reconstruction method, named DeepcomplexMRI, to accelerate parallel MR imaging with residual complex convolutional neural network. Different from most existing works which rely on the utilization of the coil sensitivities or prior information of predefined transforms, DeepcomplexMRI takes advantage of the availability of a large number of existing multi-ch… ▽ More This paper proposes a multi-channel image reconstruction method, named DeepcomplexMRI, to accelerate parallel MR imaging with residual complex convolutional neural network. Different from most existing works which rely on the utilization of the coil sensitivities or prior information of predefined transforms, DeepcomplexMRI takes advantage of the availability of a large number of existing multi-channel groudtruth images and uses them as labeled data to train the deep residual convolutional neural network offline. In particular, a complex convolutional network is proposed to take into account the correlation between the real and imaginary parts of MR images. In addition, the k space data consistency is further enforced repeatedly in between layers of the network. The evaluations on in vivo datasets show that the proposed method has the capability to recover the desired multi-channel images. Its comparison with state-of-the-art method also demonstrates that the proposed method can reconstruct the desired MR images more accurately. △ Less

Submitted 29 July, 2019; v1 submitted 10 June, 2019; originally announced June 2019.

arXiv:1703.09260 [pdf, other]

Goal-Driven Dynamics Learning via Bayesian Optimization

Authors: Somil Bansal, Roberto Calandra, Ted Xiao, Sergey Levine, Claire J. Tomlin

Abstract: Real-world robots are becoming increasingly complex and commonly act in poorly understood environments where it is extremely challenging to model or learn their true dynamics. Therefore, it might be desirable to take a task-specific approach, wherein the focus is on explicitly learning the dynamics model which achieves the best control performance for the task at hand, rather than learning the tru… ▽ More Real-world robots are becoming increasingly complex and commonly act in poorly understood environments where it is extremely challenging to model or learn their true dynamics. Therefore, it might be desirable to take a task-specific approach, wherein the focus is on explicitly learning the dynamics model which achieves the best control performance for the task at hand, rather than learning the true dynamics. In this work, we use Bayesian optimization in an active learning framework where a locally linear dynamics model is learned with the intent of maximizing the control performance, and used in conjunction with optimal control schemes to efficiently design a controller for a given task. This model is updated directly based on the performance observed in experiments on the physical system in an iterative manner until a desired performance is achieved. We demonstrate the efficacy of the proposed approach through simulations and real experiments on a quadrotor testbed. △ Less

Submitted 21 September, 2017; v1 submitted 27 March, 2017; originally announced March 2017.

Comments: This is the extended version of the CDC'17 paper titled "Goal-Driven Dynamics Learning via Bayesian Optimization."

Showing 1–49 of 49 results for author: Xia, T