-
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge
Authors:
Hongwei Bran Li,
Fernando Navarro,
Ivan Ezhov,
Amirhossein Bayat,
Dhritiman Das,
Florian Kofler,
Suprosanna Shit,
Diana Waldmannstetter,
Johannes C. Paetzold,
Xiaobin Hu,
Benedikt Wiestler,
Lucas Zimmer,
Tamaz Amiranashvili,
Chinmay Prabhakar,
Christoph Berger,
Jonas Weidner,
Michelle Alonso-Basant,
Arif Rashid,
Ujjwal Baid,
Wesam Adel,
Deniz Ali,
Bhakti Baheti,
Yingbin Bai,
Ishaan Bhatt,
Sabri Can Cetindag
, et al. (55 additional authors not shown)
Abstract:
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de…
▽ More
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks.
△ Less
Submitted 24 June, 2024; v1 submitted 19 March, 2024;
originally announced May 2024.
-
Paired Conditional Generative Adversarial Network for Highly Accelerated Liver 4D MRI
Authors:
Di Xu,
Xin Miao,
Hengjie Liu,
Jessica E. Scholey,
Wensha Yang,
Mary Feng,
Michael Ohliger,
Hui Lin,
Yi Lao,
Yang Yang,
Ke Sheng
Abstract:
Purpose: 4D MRI with high spatiotemporal resolution is desired for image-guided liver radiotherapy. Acquiring densely sampling k-space data is time-consuming. Accelerated acquisition with sparse samples is desirable but often causes degraded image quality or long reconstruction time. We propose the Reconstruct Paired Conditional Generative Adversarial Network (Re-Con-GAN) to shorten the 4D MRI rec…
▽ More
Purpose: 4D MRI with high spatiotemporal resolution is desired for image-guided liver radiotherapy. Acquiring densely sampling k-space data is time-consuming. Accelerated acquisition with sparse samples is desirable but often causes degraded image quality or long reconstruction time. We propose the Reconstruct Paired Conditional Generative Adversarial Network (Re-Con-GAN) to shorten the 4D MRI reconstruction time while maintaining the reconstruction quality.
Methods: Patients who underwent free-breathing liver 4D MRI were included in the study. Fully- and retrospectively under-sampled data at 3, 6 and 10 times (3x, 6x and 10x) were first reconstructed using the nuFFT algorithm. Re-Con-GAN then trained input and output in pairs. Three types of networks, ResNet9, UNet and reconstruction swin transformer, were explored as generators. PatchGAN was selected as the discriminator. Re-Con-GAN processed the data (3D+t) as temporal slices (2D+t). A total of 48 patients with 12332 temporal slices were split into training (37 patients with 10721 slices) and test (11 patients with 1611 slices).
Results: Re-Con-GAN consistently achieved comparable/better PSNR, SSIM, and RMSE scores compared to CS/UNet models. The inference time of Re-Con-GAN, UNet and CS are 0.15s, 0.16s, and 120s. The GTV detection task showed that Re-Con-GAN and CS, compared to UNet, better improved the dice score (3x Re-Con-GAN 80.98%; 3x CS 80.74%; 3x UNet 79.88%) of unprocessed under-sampled images (3x 69.61%).
Conclusion: A generative network with adversarial training is proposed with promising and efficient reconstruction results demonstrated on an in-house dataset. The rapid and qualitative reconstruction of 4D liver MR has the potential to facilitate online adaptive MR-guided radiotherapy for liver cancer.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
AIC-UNet: Anatomy-informed Cascaded UNet for Robust Multi-Organ Segmentation
Authors:
Young Seok Jeon,
Hongfei Yang,
Huazhu Fu,
Mengling Feng
Abstract:
Imposing key anatomical features, such as the number of organs, their shapes, sizes, and relative positions, is crucial for building a robust multi-organ segmentation model. Current attempts to incorporate anatomical features include broadening effective receptive fields (ERF) size with resource- and data-intensive modules such as self-attention or introducing organ-specific topology regularizers,…
▽ More
Imposing key anatomical features, such as the number of organs, their shapes, sizes, and relative positions, is crucial for building a robust multi-organ segmentation model. Current attempts to incorporate anatomical features include broadening effective receptive fields (ERF) size with resource- and data-intensive modules such as self-attention or introducing organ-specific topology regularizers, which may not scale to multi-organ segmentation problems where inter-organ relation also plays a huge role. We introduce a new approach to impose anatomical constraints on any existing encoder-decoder segmentation model by conditioning model prediction with learnable anatomy prior. More specifically, given an abdominal scan, a part of the encoder spatially warps a learnable prior to align with the given input scan using thin plate spline (TPS) grid interpolation. The warped prior is then integrated during the decoding phase to guide the model for more anatomy-informed predictions. Code is available at \hyperlink{https://anonymous.4open.science/r/AIC-UNet-7048}{https://anonymous.4open.science/r/AIC-UNet-7048}.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
CTC Blank Triggered Dynamic Layer-Skip** for Efficient CTC-based Speech Recognition
Authors:
Junfeng Hou,
Peiyao Wang,
**cheng Zhang,
Meng Yang,
Minwei Feng,
**gcheng Yin
Abstract:
Deploying end-to-end speech recognition models with limited computing resources remains challenging, despite their impressive performance. Given the gradual increase in model size and the wide range of model applications, selectively executing model components for different inputs to improve the inference efficiency is of great interest. In this paper, we propose a dynamic layer-skip** method th…
▽ More
Deploying end-to-end speech recognition models with limited computing resources remains challenging, despite their impressive performance. Given the gradual increase in model size and the wide range of model applications, selectively executing model components for different inputs to improve the inference efficiency is of great interest. In this paper, we propose a dynamic layer-skip** method that leverages the CTC blank output from intermediate layers to trigger the skip** of the last few encoder layers for frames with high blank probabilities. Furthermore, we factorize the CTC output distribution and perform knowledge distillation on intermediate layers to reduce computation and improve recognition accuracy. Experimental results show that by utilizing the CTC blank, the encoder layer depth can be adjusted dynamically, resulting in 29% acceleration of the CTC model inference with minor performance degradation.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge
Authors:
Meng Ge,
Yizhou Peng,
Yidi Jiang,
**gru Lin,
Junyi Ao,
Mehmet Sinan Yildirim,
Shuai Wang,
Haizhou Li,
Mengling Feng
Abstract:
This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition. Our submitted systems for ICMC-ASR Challenge include the multi-channel front-end enhancement and diarization, training data augmentation, speech recognition modeling with multi-channel branches. Tested on the offical Eval1 and Eval2 set, our best system achieves…
▽ More
This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition. Our submitted systems for ICMC-ASR Challenge include the multi-channel front-end enhancement and diarization, training data augmentation, speech recognition modeling with multi-channel branches. Tested on the offical Eval1 and Eval2 set, our best system achieves a relative 34.3% improvement in CER and 56.5% improvement in cpCER, compared to the offical baseline system.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture Speech
Authors:
**gru Lin,
Meng Ge,
Wupeng Wang,
Haizhou Li,
Mengling Feng
Abstract:
Self-supervised pre-trained speech models were shown effective for various downstream speech processing tasks. Since they are mainly pre-trained to map input speech to pseudo-labels, the resulting representations are only effective for the type of pre-train data used, either clean or mixture speech. With the idea of selective auditory attention, we propose a novel pre-training solution called Sele…
▽ More
Self-supervised pre-trained speech models were shown effective for various downstream speech processing tasks. Since they are mainly pre-trained to map input speech to pseudo-labels, the resulting representations are only effective for the type of pre-train data used, either clean or mixture speech. With the idea of selective auditory attention, we propose a novel pre-training solution called Selective-HuBERT, or SHuBERT, which learns the selective extraction of target speech representations from either clean or mixture speech. Specifically, SHuBERT is trained to predict pseudo labels of a target speaker, conditioned on an enrolled speech from the target speaker. By doing so, SHuBERT is expected to selectively attend to the target speaker in a complex acoustic environment, thus benefiting various downstream tasks. We further introduce a dual-path training strategy and use the cross-correlation constraint between the two branches to encourage the model to generate noise-invariant representation. Experiments on SUPERB benchmark and LibriMix dataset demonstrate the universality and noise-robustness of SHuBERT. Furthermore, we find that our high-quality representation can be easily integrated with conventional supervised learning methods to achieve significant performance, even under extremely low-resource labeled data.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
A review of uncertainty quantification in medical image analysis: probabilistic and non-probabilistic methods
Authors:
Ling Huang,
Su Ruan,
Yucheng Xing,
Mengling Feng
Abstract:
The comprehensive integration of machine learning healthcare models within clinical practice remains suboptimal, notwithstanding the proliferation of high-performing solutions reported in the literature. A predominant factor hindering widespread adoption pertains to an insufficiency of evidence affirming the reliability of the aforementioned models. Recently, uncertainty quantification methods hav…
▽ More
The comprehensive integration of machine learning healthcare models within clinical practice remains suboptimal, notwithstanding the proliferation of high-performing solutions reported in the literature. A predominant factor hindering widespread adoption pertains to an insufficiency of evidence affirming the reliability of the aforementioned models. Recently, uncertainty quantification methods have been proposed as a potential solution to quantify the reliability of machine learning models and thus increase the interpretability and acceptability of the result. In this review, we offer a comprehensive overview of prevailing methods proposed to quantify uncertainty inherent in machine learning models developed for various medical image tasks. Contrary to earlier reviews that exclusively focused on probabilistic methods, this review also explores non-probabilistic approaches, thereby furnishing a more holistic survey of research pertaining to uncertainty quantification for machine learning models. Analysis of medical images with the summary and discussion on medical applications and the corresponding uncertainty evaluation protocols are presented, which focus on the specific challenges of uncertainty in medical image analysis. We also highlight some potential future research work at the end. Generally, this review aims to allow researchers from both clinical and technical backgrounds to gain a quick and yet in-depth understanding of the research in uncertainty quantification for medical image analysis machine learning models.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Weakly Supervised YOLO Network for Surgical Instrument Localization in Endoscopic Videos
Authors:
Rongfeng Wei,
**lin Wu,
Xuexue Bai,
Ming Feng,
Zhen Lei,
Hongbin Liu,
Zhen Chen
Abstract:
In minimally invasive surgery, surgical instrument localization is a crucial task for endoscopic videos, which enables various applications for improving surgical outcomes. However, annotating the instrument localization in endoscopic videos is tedious and labor-intensive. In contrast, obtaining the category information is easy and efficient in real-world applications. To fully utilize the categor…
▽ More
In minimally invasive surgery, surgical instrument localization is a crucial task for endoscopic videos, which enables various applications for improving surgical outcomes. However, annotating the instrument localization in endoscopic videos is tedious and labor-intensive. In contrast, obtaining the category information is easy and efficient in real-world applications. To fully utilize the category information and address the localization problem, we propose a weakly supervised localization framework named WS-YOLO for surgical instruments. By leveraging the instrument category information as the weak supervision, our WS-YOLO framework adopts an unsupervised multi-round training strategy for the localization capability training. We validate our WS-YOLO framework on the Endoscopic Vision Challenge 2023 dataset, which achieves remarkable performance in the weakly supervised surgical instrument localization. The source code is available at https://github.com/Breezewrf/WS-YOLO.
△ Less
Submitted 20 June, 2024; v1 submitted 23 September, 2023;
originally announced September 2023.
-
Model-Free Large-Scale Cloth Spreading With Mobile Manipulation: Initial Feasibility Study
Authors:
Xiangyu Chu+,
Shengzhi Wang+,
Minjian Feng,
Jiaxi Zheng,
Yuxuan Zhao,
**g Huang,
K. W. Samuel Au
Abstract:
Cloth manipulation is common in domestic and service tasks, and most studies use fixed-base manipulators to manipulate objects whose sizes are relatively small with respect to the manipulators' workspace, such as towels, shirts, and rags. In contrast, manipulation of large-scale cloth, such as bed making and tablecloth spreading, poses additional challenges of reachability and manipulation control…
▽ More
Cloth manipulation is common in domestic and service tasks, and most studies use fixed-base manipulators to manipulate objects whose sizes are relatively small with respect to the manipulators' workspace, such as towels, shirts, and rags. In contrast, manipulation of large-scale cloth, such as bed making and tablecloth spreading, poses additional challenges of reachability and manipulation control. To address them, this paper presents a novel framework to spread large-scale cloth, with a single-arm mobile manipulator that can solve the reachability issue, for an initial feasibility study. On the manipulation control side, without modeling highly deformable cloth, a vision-based manipulation control scheme is applied and based on an online-update Jacobian matrix map** from selected feature points to the end-effector motion. To coordinate the control of the manipulator and mobile platform, Behavior Trees (BTs) are used because of their modularity. Finally, experiments are conducted, including validation of the model-free manipulation control for cloth spreading in different conditions and the large-scale cloth spreading framework. The experimental results demonstrate the large-scale cloth spreading task feasibility with a single-arm mobile manipulator and the model-free deformation controller.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
cs-net: structural approach to time-series forecasting for high-dimensional feature space data with limited observations
Authors:
Weiyu Zong,
Mingqian Feng,
Griffin Heyrich,
Peter Chin
Abstract:
In recent years, deep-learning-based approaches have been introduced to solving time-series forecasting-related problems. These novel methods have demonstrated impressive performance in univariate and low-dimensional multivariate time-series forecasting tasks. However, when these novel methods are used to handle high-dimensional multivariate forecasting problems, their performance is highly restri…
▽ More
In recent years, deep-learning-based approaches have been introduced to solving time-series forecasting-related problems. These novel methods have demonstrated impressive performance in univariate and low-dimensional multivariate time-series forecasting tasks. However, when these novel methods are used to handle high-dimensional multivariate forecasting problems, their performance is highly restricted by a practical training time and a reasonable GPU memory configuration. In this paper, inspired by a change of basis in the Hilbert space, we propose a flexible data feature extraction technique that excels in high-dimensional multivariate forecasting tasks. Our approach was originally developed for the National Science Foundation (NSF) Algorithms for Threat Detection (ATD) 2022 Challenge. Implemented using the attention mechanism and Convolutional Neural Networks (CNN) architecture, our method demonstrates great performance and compatibility. Our models trained on the GDELT Dataset finished 1st and 2nd places in the ATD sprint series and hold promise for other datasets for time series forecasting.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
FCSN: Global Context Aware Segmentation by Learning the Fourier Coefficients of Objects in Medical Images
Authors:
Young Seok Jeon,
Hongfei Yang,
Mengling Feng
Abstract:
The encoder-decoder model is a commonly used Deep Neural Network (DNN) model for medical image segmentation. Conventional encoder-decoder models make pixel-wise predictions focusing heavily on local patterns around the pixel. This makes it challenging to give segmentation that preserves the object's shape and topology, which often requires an understanding of the global context of the object. In t…
▽ More
The encoder-decoder model is a commonly used Deep Neural Network (DNN) model for medical image segmentation. Conventional encoder-decoder models make pixel-wise predictions focusing heavily on local patterns around the pixel. This makes it challenging to give segmentation that preserves the object's shape and topology, which often requires an understanding of the global context of the object. In this work, we propose a Fourier Coefficient Segmentation Network~(FCSN) -- a novel DNN-based model that segments an object by learning the complex Fourier coefficients of the object's masks. The Fourier coefficients are calculated by integrating over the whole contour. Therefore, for our model to make a precise estimation of the coefficients, the model is motivated to incorporate the global context of the object, leading to a more accurate segmentation of the object's shape. This global context awareness also makes our model robust to unseen local perturbations during inference, such as additive noise or motion blur that are prevalent in medical images. When FCSN is compared with other state-of-the-art models (UNet+, DeepLabV3+, UNETR) on 3 medical image segmentation tasks (ISIC\_2018, RIM\_CUP, RIM\_DISC), FCSN attains significantly lower Hausdorff scores of 19.14 (6\%), 17.42 (6\%), and 9.16 (14\%) on the 3 tasks, respectively. Moreover, FCSN is lightweight by discarding the decoder module, which incurs significant computational overhead. FCSN only requires 22.2M parameters, 82M and 10M fewer parameters than UNETR and DeepLabV3+. FCSN attains inference and training speeds of 1.6ms/img and 6.3ms/img, that is 8$\times$ and 3$\times$ faster than UNet and UNETR.
△ Less
Submitted 29 July, 2022;
originally announced July 2022.
-
Wound Segmentation with Dynamic Illumination Correction and Dual-view Semantic Fusion
Authors:
Honghui Liu,
Changjian Wang,
Kele Xu,
Fangzhao Li,
Ming Feng,
Yuxing Peng,
Hongjun He
Abstract:
Wound image segmentation is a critical component for the clinical diagnosis and in-time treatment of wounds. Recently, deep learning has become the mainstream methodology for wound image segmentation. However, the pre-processing of the wound image, such as the illumination correction, is required before the training phase as the performance can be greatly improved. The correction procedure and the…
▽ More
Wound image segmentation is a critical component for the clinical diagnosis and in-time treatment of wounds. Recently, deep learning has become the mainstream methodology for wound image segmentation. However, the pre-processing of the wound image, such as the illumination correction, is required before the training phase as the performance can be greatly improved. The correction procedure and the training of deep models are independent of each other, which leads to sub-optimal segmentation performance as the fixed illumination correction may not be suitable for all images. To address aforementioned issues, an end-to-end dual-view segmentation approach was proposed in this paper, by incorporating a learn-able illumination correction module into the deep segmentation models. The parameters of the module can be learned and updated during the training stage automatically, while the dual-view fusion can fully employ the features from both the raw images and the enhanced ones. To demonstrate the effectiveness and robustness of the proposed framework, the extensive experiments are conducted on the benchmark datasets. The encouraging results suggest that our framework can significantly improve the segmentation performance, compared to the state-of-the-art methods.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Underwater Acoustic Communication Channel Modeling using Reservoir Computing
Authors:
Oluwaseyi Onasami,
Ming Feng,
Hao Xu,
Mulugeta Haile,
Lijun Qian
Abstract:
Underwater acoustic (UWA) communications have been widely used but greatly impaired due to the complicated nature of the underwater environment. In order to improve UWA communications, modeling and understanding the UWA channel is indispensable. However, there exist many challenges due to the high uncertainties of the underwater environment and the lack of real-world measurement data. In this work…
▽ More
Underwater acoustic (UWA) communications have been widely used but greatly impaired due to the complicated nature of the underwater environment. In order to improve UWA communications, modeling and understanding the UWA channel is indispensable. However, there exist many challenges due to the high uncertainties of the underwater environment and the lack of real-world measurement data. In this work, the capability of reservoir computing and deep learning has been explored for modeling the UWA communication channel accurately using real underwater data collected from a water tank with disturbance and from Lake Tahoe. We leverage the capability of reservoir computing for modeling dynamical systems and provided a data-driven approach to modeling the UWA channel using Echo State Network (ESN). In addition, the potential application of transfer learning to reservoir computing has been examined. Experimental results show that ESN is able to model chaotic UWA channels with better performance compared to popular deep learning models in terms of mean absolute percentage error (MAPE), specifically, ESN has outperformed deep neural network by 2% and as much as 40% in benign and chaotic UWA respectively.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
Gated Multimodal Fusion with Contrastive Learning for Turn-taking Prediction in Human-robot Dialogue
Authors:
Jiudong Yang,
Peiying Wang,
Yi Zhu,
Mingchao Feng,
Meng Chen,
Xiaodong He
Abstract:
Turn-taking, aiming to decide when the next speaker can start talking, is an essential component in building human-robot spoken dialogue systems. Previous studies indicate that multimodal cues can facilitate this challenging task. However, due to the paucity of public multimodal datasets, current methods are mostly limited to either utilizing unimodal features or simplistic multimodal ensemble mod…
▽ More
Turn-taking, aiming to decide when the next speaker can start talking, is an essential component in building human-robot spoken dialogue systems. Previous studies indicate that multimodal cues can facilitate this challenging task. However, due to the paucity of public multimodal datasets, current methods are mostly limited to either utilizing unimodal features or simplistic multimodal ensemble models. Besides, the inherent class imbalance in real scenario, e.g. sentence ending with short pause will be mostly regarded as the end of turn, also poses great challenge to the turn-taking decision. In this paper, we first collect a large-scale annotated corpus for turn-taking with over 5,000 real human-robot dialogues in speech and text modalities. Then, a novel gated multimodal fusion mechanism is devised to utilize various information seamlessly for turn-taking prediction. More importantly, to tackle the data imbalance issue, we design a simple yet effective data augmentation method to construct negative instances without supervision and apply contrastive learning to obtain better feature representations. Extensive experiments are conducted and the results demonstrate the superiority and competitiveness of our model over several state-of-the-art baselines.
△ Less
Submitted 18 April, 2022;
originally announced April 2022.
-
UFRC: A Unified Framework for Reliable COVID-19 Detection on Crowdsourced Cough Audio
Authors:
Jiangeng Chang,
Yucheng Ruan,
Cui Shaoze,
John Soong Tshon Yit,
Mengling Feng
Abstract:
We suggested a unified system with core components of data augmentation, ImageNet-pretrained ResNet-50, cost-sensitive loss, deep ensemble learning, and uncertainty estimation to quickly and consistently detect COVID-19 using acoustic evidence. To increase the model's capacity to identify a minority class, data augmentation and cost-sensitive loss are incorporated (infected samples). In the COVID-…
▽ More
We suggested a unified system with core components of data augmentation, ImageNet-pretrained ResNet-50, cost-sensitive loss, deep ensemble learning, and uncertainty estimation to quickly and consistently detect COVID-19 using acoustic evidence. To increase the model's capacity to identify a minority class, data augmentation and cost-sensitive loss are incorporated (infected samples). In the COVID-19 detection challenge, ImageNet-pretrained ResNet-50 has been found to be effective. The unified framework also integrates deep ensemble learning and uncertainty estimation to integrate predictions from various base classifiers for generalisation and reliability. We ran a series of tests using the DiCOVA2021 challenge dataset to assess the efficacy of our proposed method, and the results show that our method has an AUC-ROC of 85.43 percent, making it a promising method for COVID-19 detection. The unified framework also demonstrates that audio may be used to quickly diagnose different respiratory disorders.
△ Less
Submitted 30 June, 2022; v1 submitted 16 April, 2022;
originally announced April 2022.
-
To Explore or Not to Explore: Regret-Based LTL Planning in Partially-Known Environments
Authors:
Jianing Zhao,
Keyi Zhu,
Mingyang Feng,
Xiang Yin
Abstract:
In this paper, we investigate the optimal robot path planning problem for high-level specifications described by co-safe linear temporal logic (LTL) formulae. We consider the scenario where the map geometry of the workspace is partially-known. Specifically, we assume that there are some unknown regions, for which the robot does not know their successor regions a priori unless it reaches these regi…
▽ More
In this paper, we investigate the optimal robot path planning problem for high-level specifications described by co-safe linear temporal logic (LTL) formulae. We consider the scenario where the map geometry of the workspace is partially-known. Specifically, we assume that there are some unknown regions, for which the robot does not know their successor regions a priori unless it reaches these regions physically. In contrast to the standard game-based approach that optimizes the worst-case cost, in the paper, we propose to use regret as a new metric for planning in such a partially-known environment. The regret of a plan under a fixed but unknown environment is the difference between the actual cost incurred and the best-response cost the robot could have achieved if it realizes the actual environment with hindsight. We provide an effective algorithm for finding an optimal plan that satisfies the LTL specification while minimizing its regret. A case study on firefighting robots is provided to illustrate the proposed framework. We argue that the new metric is more suitable for the scenario of partially-known environment since it captures the trade-off between the actual cost spent and the potential benefit one may obtain for exploring an unknown region.
△ Less
Submitted 17 January, 2024; v1 submitted 1 April, 2022;
originally announced April 2022.
-
Building Robust Spoken Language Understanding by Cross Attention between Phoneme Sequence and ASR Hypothesis
Authors:
Zexun Wang,
Yuquan Le,
Yi Zhu,
Yuming Zhao,
Mingchao Feng,
Meng Chen,
Xiaodong He
Abstract:
Building Spoken Language Understanding (SLU) robust to Automatic Speech Recognition (ASR) errors is an essential issue for various voice-enabled virtual assistants. Considering that most ASR errors are caused by phonetic confusion between similar-sounding expressions, intuitively, leveraging the phoneme sequence of speech can complement ASR hypothesis and enhance the robustness of SLU. This paper…
▽ More
Building Spoken Language Understanding (SLU) robust to Automatic Speech Recognition (ASR) errors is an essential issue for various voice-enabled virtual assistants. Considering that most ASR errors are caused by phonetic confusion between similar-sounding expressions, intuitively, leveraging the phoneme sequence of speech can complement ASR hypothesis and enhance the robustness of SLU. This paper proposes a novel model with Cross Attention for SLU (denoted as CASLU). The cross attention block is devised to catch the fine-grained interactions between phoneme and word embeddings in order to make the joint representations catch the phonetic and semantic features of input simultaneously and for overcoming the ASR errors in downstream natural language understanding (NLU) tasks. Extensive experiments are conducted on three datasets, showing the effectiveness and competitiveness of our approach. Additionally, We also validate the universality of CASLU and prove its complementarity when combining with other robust SLU techniques.
△ Less
Submitted 22 March, 2022;
originally announced March 2022.
-
Federated Self-Supervised Learning for Acoustic Event Classification
Authors:
Meng Feng,
Chieh-Chi Kao,
Qingming Tang,
Ming Sun,
Viktor Rozgic,
Spyros Matsoukas,
Chao Wang
Abstract:
Standard acoustic event classification (AEC) solutions require large-scale collection of data from client devices for model optimization. Federated learning (FL) is a compelling framework that decouples data collection and model training to enhance customer privacy. In this work, we investigate the feasibility of applying FL to improve AEC performance while no customer data can be directly uploade…
▽ More
Standard acoustic event classification (AEC) solutions require large-scale collection of data from client devices for model optimization. Federated learning (FL) is a compelling framework that decouples data collection and model training to enhance customer privacy. In this work, we investigate the feasibility of applying FL to improve AEC performance while no customer data can be directly uploaded to the server. We assume no pseudo labels can be inferred from on-device user inputs, aligning with the typical use cases of AEC. We adapt self-supervised learning to the FL framework for on-device continual learning of representations, and it results in improved performance of the downstream AEC classifiers without labeled/pseudo-labeled data available. Compared to the baseline w/o FL, the proposed method improves precision up to 20.3\% relatively while maintaining the recall. Our work differs from prior work in FL in that our approach does not require user-generated learning targets, and the data we use is collected from our Beta program and is de-identified, to maximally simulate the production settings.
△ Less
Submitted 22 March, 2022;
originally announced March 2022.
-
On the Exactness of an Energy-efficient Train Control model based on Convex Optimization
Authors:
Shaofeng Lu,
Minling Feng,
Kunpeng Wu
Abstract:
In this paper, we demonstrate the exactness proof for the energy-efficient train control (EETC) model based on convex optimization. The proof of exactness shows that the convex optimization model will share the same optimization results with the initial model on which the convex relaxations are conducted. We first show how the relaxation on the initial non-convex model is conducted and provide ana…
▽ More
In this paper, we demonstrate the exactness proof for the energy-efficient train control (EETC) model based on convex optimization. The proof of exactness shows that the convex optimization model will share the same optimization results with the initial model on which the convex relaxations are conducted. We first show how the relaxation on the initial non-convex model is conducted and provide analysis to show that the relaxations are convex constraints and the relaxed model is thus a convex model. Subsequently, we prove that the relaxed convex model will always achieve its optimal solution on the initial equality constraints and the optimal solution achieved by convex optimization will be the same as the one obtained by the initial non-convex model and the relaxations applied are exact. A numerical verification has been conducted based on a typical urban rail system with a steep gradient. The results of this paper shed lights on further applications of convex optimization on energy-efficient train control and relevant areas related to operation and control of low-carbon transportation systems.
△ Less
Submitted 13 February, 2022;
originally announced February 2022.
-
A fast-solved model for energy-efficient train control based on convex optimization
Authors:
Minling Feng,
Kunpeng Wu,
Shaofeng Lu
Abstract:
In modern rail transportation, energy-efficient train control (EETC) is concerned with the optimal train speed trajectory or control strategies to achieve the minimum energy cost under various operation and traction constraints. This paper proposes an EETC model based on convex optimization so that the model can be rapidly solved by convex optimization algorithms. The high computational efficiency…
▽ More
In modern rail transportation, energy-efficient train control (EETC) is concerned with the optimal train speed trajectory or control strategies to achieve the minimum energy cost under various operation and traction constraints. This paper proposes an EETC model based on convex optimization so that the model can be rapidly solved by convex optimization algorithms. The high computational efficiency and robustness of the convex model can be verified by comparing the results achieved by the method proposed by this paper and other mainstream mathematical programming methods including mixed-integer linear programming (MILP) and Radau pseudospectral method (RPM). Based on the characteristics of convex optimization, the proposed method boasts more significant advantages over its counterparts in terms of computational efficiency in the promising online applications for automatic train control systems of various types of rail transportation.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
HEROHE Challenge: assessing HER2 status in breast cancer without immunohistochemistry or in situ hybridization
Authors:
Eduardo Conde-Sousa,
João Vale,
Ming Feng,
Kele Xu,
Yin Wang,
Vincenzo Della Mea,
David La Barbera,
Ehsan Montahaei,
Mahdieh Soleymani Baghshah,
Andreas Turzynski,
Jacob Gildenblat,
Eldad Klaiman,
Yiyu Hong,
Guilherme Aresta,
Teresa Araújo,
Paulo Aguiar,
Catarina Eloy,
António Polónia
Abstract:
Breast cancer is the most common malignancy in women, being responsible for more than half a million deaths every year. As such, early and accurate diagnosis is of paramount importance. Human expertise is required to diagnose and correctly classify breast cancer and define appropriate therapy, which depends on the evaluation of the expression of different biomarkers such as the transmembrane prote…
▽ More
Breast cancer is the most common malignancy in women, being responsible for more than half a million deaths every year. As such, early and accurate diagnosis is of paramount importance. Human expertise is required to diagnose and correctly classify breast cancer and define appropriate therapy, which depends on the evaluation of the expression of different biomarkers such as the transmembrane protein receptor HER2. This evaluation requires several steps, including special techniques such as immunohistochemistry or in situ hybridization to assess HER2 status. With the goal of reducing the number of steps and human bias in diagnosis, the HEROHE Challenge was organized, as a parallel event of the 16th European Congress on Digital Pathology, aiming to automate the assessment of the HER2 status based only on hematoxylin and eosin stained tissue sample of invasive breast cancer. Methods to assess HER2 status were presented by 21 teams worldwide and the results achieved by some of the proposed methods open potential perspectives to advance the state-of-the-art.
△ Less
Submitted 8 November, 2021;
originally announced November 2021.
-
Intra-Inter Subject Self-supervised Learning for Multivariate Cardiac Signals
Authors:
Xiang Lan,
Dianwen Ng,
Shenda Hong,
Mengling Feng
Abstract:
Learning information-rich and generalizable representations effectively from unlabeled multivariate cardiac signals to identify abnormal heart rhythms (cardiac arrhythmias) is valuable in real-world clinical settings but often challenging due to its complex temporal dynamics. Cardiac arrhythmias can vary significantly in temporal patterns even for the same patient ($i.e.$, intra subject difference…
▽ More
Learning information-rich and generalizable representations effectively from unlabeled multivariate cardiac signals to identify abnormal heart rhythms (cardiac arrhythmias) is valuable in real-world clinical settings but often challenging due to its complex temporal dynamics. Cardiac arrhythmias can vary significantly in temporal patterns even for the same patient ($i.e.$, intra subject difference). Meanwhile, the same type of cardiac arrhythmia can show different temporal patterns among different patients due to different cardiac structures ($i.e.$, inter subject difference). In this paper, we address the challenges by proposing an Intra-inter Subject self-supervised Learning (ISL) model that is customized for multivariate cardiac signals. Our proposed ISL model integrates medical knowledge into self-supervision to effectively learn from intra-inter subject differences. In intra subject self-supervision, ISL model first extracts heartbeat-level features from each subject using a channel-wise attentional CNN-RNN encoder. Then a stationarity test module is employed to capture the temporal dependencies between heartbeats. In inter subject self-supervision, we design a set of data augmentations according to the clinical characteristics of cardiac signals and perform contrastive learning among subjects to learn distinctive representations for various types of patients. Extensive experiments on three real-world datasets were conducted. In a semi-supervised transfer learning scenario, our pre-trained ISL model leads about 10% improvement over supervised training when only 1% labeled data is available, suggesting strong generalizability and robustness of the model.
△ Less
Submitted 18 September, 2021;
originally announced September 2021.
-
F3S: Free Flow Fever Screening
Authors:
Kunal Rao,
Giuseppe Coviello,
Min Feng,
Biplob Debnath,
Wang-Pin Hsiung,
Murugan Sankaradas,
Yi Yang,
Oliver Po,
Utsav Drolia,
Srimat Chakradhar
Abstract:
Identification of people with elevated body temperature can reduce or dramatically slow down the spread of infectious diseases like COVID-19. We present a novel fever-screening system, F3S, that uses edge machine learning techniques to accurately measure core body temperatures of multiple individuals in a free-flow setting. F3S performs real-time sensor fusion of visual camera with thermal camera…
▽ More
Identification of people with elevated body temperature can reduce or dramatically slow down the spread of infectious diseases like COVID-19. We present a novel fever-screening system, F3S, that uses edge machine learning techniques to accurately measure core body temperatures of multiple individuals in a free-flow setting. F3S performs real-time sensor fusion of visual camera with thermal camera data streams to detect elevated body temperature, and it has several unique features: (a) visual and thermal streams represent very different modalities, and we dynamically associate semantically-equivalent regions across visual and thermal frames by using a new, dynamic alignment technique that analyzes content and context in real-time, (b) we track people through occlusions, identify the eye (inner canthus), forehead, face and head regions where possible, and provide an accurate temperature reading by using a prioritized refinement algorithm, and (c) we robustly detect elevated body temperature even in the presence of personal protective equipment like masks, or sunglasses or hats, all of which can be affected by hot weather and lead to spurious temperature readings. F3S has been deployed at over a dozen large commercial establishments, providing contact-less, free-flow, real-time fever screening for thousands of employees and customers in indoors and outdoor settings.
△ Less
Submitted 3 September, 2021;
originally announced September 2021.
-
DiCOVA-Net: Diagnosing COVID-19 using Acoustics based on Deep Residual Network for the DiCOVA Challenge 2021
Authors:
Jiangeng Chang,
Shaoze Cui,
Mengling Feng
Abstract:
In this paper, we propose a deep residual network-based method, namely the DiCOVA-Net, to identify COVID-19 infected patients based on the acoustic recording of their coughs. Since there are far more healthy people than infected patients, this classification problem faces the challenge of imbalanced data. To improve the model's ability to recognize minority class (the infected patients), we introd…
▽ More
In this paper, we propose a deep residual network-based method, namely the DiCOVA-Net, to identify COVID-19 infected patients based on the acoustic recording of their coughs. Since there are far more healthy people than infected patients, this classification problem faces the challenge of imbalanced data. To improve the model's ability to recognize minority class (the infected patients), we introduce data augmentation and cost-sensitive methods into our model. Besides, considering the particularity of this task, we deploy some fine-tuning techniques to adjust the pre-training ResNet50. Furthermore, to improve the model's generalizability, we use ensemble learning to integrate prediction results from multiple base classifiers generated using different random seeds. To evaluate the proposed DiCOVA-Net's performance, we conducted experiments with the DiCOVA challenge dataset. The results show that our method has achieved 85.43\% in AUC, among the top of all competing teams.
△ Less
Submitted 4 May, 2022; v1 submitted 11 July, 2021;
originally announced July 2021.
-
A Systematic Collection of Medical Image Datasets for Deep Learning
Authors:
Johann Li,
Guangming Zhu,
Cong Hua,
Mingtao Feng,
BasheerBennamoun,
** Li,
Xiaoyuan Lu,
Juan Song,
Peiyi Shen,
Xu Xu,
Lin Mei,
Liang Zhang,
Syed Afaq Ali Shah,
Mohammed Bennamoun
Abstract:
The astounding success made by artificial intelligence (AI) in healthcare and other fields proves that AI can achieve human-like performance. However, success always comes with challenges. Deep learning algorithms are data-dependent and require large datasets for training. The lack of data in the medical imaging field creates a bottleneck for the application of deep learning to medical image analy…
▽ More
The astounding success made by artificial intelligence (AI) in healthcare and other fields proves that AI can achieve human-like performance. However, success always comes with challenges. Deep learning algorithms are data-dependent and require large datasets for training. The lack of data in the medical imaging field creates a bottleneck for the application of deep learning to medical image analysis. Medical image acquisition, annotation, and analysis are costly, and their usage is constrained by ethical restrictions. They also require many resources, such as human expertise and funding. That makes it difficult for non-medical researchers to have access to useful and large medical data. Thus, as comprehensive as possible, this paper provides a collection of medical image datasets with their associated challenges for deep learning research. We have collected information of around three hundred datasets and challenges mainly reported between 2013 and 2020 and categorized them into four categories: head & neck, chest & abdomen, pathology & blood, and ``others''. Our paper has three purposes: 1) to provide a most up to date and complete list that can be used as a universal reference to easily find the datasets for clinical image analysis, 2) to guide researchers on the methodology to test and evaluate their methods' performance and robustness on relevant datasets, 3) to provide a ``route'' to relevant algorithms for the relevant medical topics, and challenge leaderboards.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.
-
Hyperspectral and LiDAR data classification based on linear self-attention
Authors:
Min Feng,
Feng Gao,
Jian Fang,
Junyu Dong
Abstract:
An efficient linear self-attention fusion model is proposed in this paper for the task of hyperspectral image (HSI) and LiDAR data joint classification. The proposed method is comprised of a feature extraction module, an attention module, and a fusion module. The attention module is a plug-and-play linear self-attention module that can be extensively used in any model. The proposed model has achie…
▽ More
An efficient linear self-attention fusion model is proposed in this paper for the task of hyperspectral image (HSI) and LiDAR data joint classification. The proposed method is comprised of a feature extraction module, an attention module, and a fusion module. The attention module is a plug-and-play linear self-attention module that can be extensively used in any model. The proposed model has achieved the overall accuracy of 95.40\% on the Houston dataset. The experimental results demonstrate the superiority of the proposed method over other state-of-the-art models.
△ Less
Submitted 6 April, 2021;
originally announced April 2021.
-
Identification of 27 abnormalities from multi-lead ECG signals: An ensembled Se-ResNet framework with Sign Loss function
Authors:
Zhaowei Zhu,
Xiang Lan,
Tingting Zhao,
Yangming Guo,
Pipin Kojodjojo,
Zhuoyang Xu,
Zhuo Liu,
Siqi Liu,
Han Wang,
Xingzhi Sun,
Mengling Feng
Abstract:
Cardiovascular disease is a major threat to health and one of the primary causes of death globally. The 12-lead ECG is a cheap and commonly accessible tool to identify cardiac abnormalities. Early and accurate diagnosis will allow early treatment and intervention to prevent severe complications of cardiovascular disease. In the PhysioNet/Computing in Cardiology Challenge 2020, our objective is to…
▽ More
Cardiovascular disease is a major threat to health and one of the primary causes of death globally. The 12-lead ECG is a cheap and commonly accessible tool to identify cardiac abnormalities. Early and accurate diagnosis will allow early treatment and intervention to prevent severe complications of cardiovascular disease. In the PhysioNet/Computing in Cardiology Challenge 2020, our objective is to develop an algorithm that automatically identifies 27 ECG abnormalities from 12-lead ECG recordings.
△ Less
Submitted 11 January, 2021; v1 submitted 12 December, 2020;
originally announced January 2021.
-
Perception Improvement for Free: Exploring Imperceptible Black-box Adversarial Attacks on Image Classification
Authors:
Yongwei Wang,
Mingquan Feng,
Rabab Ward,
Z. Jane Wang,
Lanjun Wang
Abstract:
Deep neural networks are vulnerable to adversarial attacks. White-box adversarial attacks can fool neural networks with small adversarial perturbations, especially for large size images. However, kee** successful adversarial perturbations imperceptible is especially challenging for transfer-based black-box adversarial attacks. Often such adversarial examples can be easily spotted due to their un…
▽ More
Deep neural networks are vulnerable to adversarial attacks. White-box adversarial attacks can fool neural networks with small adversarial perturbations, especially for large size images. However, kee** successful adversarial perturbations imperceptible is especially challenging for transfer-based black-box adversarial attacks. Often such adversarial examples can be easily spotted due to their unpleasantly poor visual qualities, which compromises the threat of adversarial attacks in practice. In this study, to improve the image quality of black-box adversarial examples perceptually, we propose structure-aware adversarial attacks by generating adversarial images based on psychological perceptual models. Specifically, we allow higher perturbations on perceptually insignificant regions, while assigning lower or no perturbation on visually sensitive regions. In addition to the proposed spatial-constrained adversarial perturbations, we also propose a novel structure-aware frequency adversarial attack method in the discrete cosine transform (DCT) domain. Since the proposed attacks are independent of the gradient estimation, they can be directly incorporated with existing gradient-based attacks. Experimental results show that, with the comparable attack success rate (ASR), the proposed methods can produce adversarial examples with considerably improved visual quality for free. With the comparable perceptual quality, the proposed approaches achieve higher attack success rates: particularly for the frequency structure-aware attacks, the average ASR improves more than 10% over the baseline attacks.
△ Less
Submitted 30 October, 2020;
originally announced November 2020.
-
SunDown: Model-driven Per-Panel Solar Anomaly Detection for Residential Arrays
Authors:
Menghong Feng,
Noman Bashir,
Prashant Shenoy,
David Irwin,
Beka Kosanovic
Abstract:
There has been significant growth in both utility-scale and residential-scale solar installations in recent years, driven by rapid technology improvements and falling prices. Unlike utility-scale solar farms that are professionally managed and maintained, smaller residential-scale installations often lack sensing and instrumentation for performance monitoring and fault detection. As a result, faul…
▽ More
There has been significant growth in both utility-scale and residential-scale solar installations in recent years, driven by rapid technology improvements and falling prices. Unlike utility-scale solar farms that are professionally managed and maintained, smaller residential-scale installations often lack sensing and instrumentation for performance monitoring and fault detection. As a result, faults may go undetected for long periods of time, resulting in generation and revenue losses for the homeowner. In this paper, we present SunDown, a sensorless approach designed to detect per-panel faults in residential solar arrays. SunDown does not require any new sensors for its fault detection and instead uses a model-driven approach that leverages correlations between the power produced by adjacent panels to detect deviations from expected behavior. SunDown can handle concurrent faults in multiple panels and perform anomaly classification to determine probable causes. Using two years of solar generation data from a real home and a manually generated dataset of multiple solar faults, we show that our approach has a MAPE of 2.98\% when predicting per-panel output. Our results also show that SunDown is able to detect and classify faults, including from snow cover, leaves and debris, and electrical failures with 99.13% accuracy, and can detect multiple concurrent faults with 97.2% accuracy.
△ Less
Submitted 25 May, 2020;
originally announced May 2020.
-
Zoom in to where it matters: a hierarchical graph based model for mammogram analysis
Authors:
Hao Du,
Jiashi Feng,
Mengling Feng
Abstract:
In clinical practice, human radiologists actually review medical images with high resolution monitors and zoom into region of interests (ROIs) for a close-up examination. Inspired by this observation, we propose a hierarchical graph neural network to detect abnormal lesions from medical images by automatically zooming into ROIs. We focus on mammogram analysis for breast cancer diagnosis for this s…
▽ More
In clinical practice, human radiologists actually review medical images with high resolution monitors and zoom into region of interests (ROIs) for a close-up examination. Inspired by this observation, we propose a hierarchical graph neural network to detect abnormal lesions from medical images by automatically zooming into ROIs. We focus on mammogram analysis for breast cancer diagnosis for this study. Our proposed network consist of two graph attention networks performing two tasks: (1) node classification to predict whether to zoom into next level; (2) graph classification to classify whether a mammogram is normal/benign or malignant. The model is trained and evaluated on INbreast dataset and we obtain comparable AUC with state-of-the-art methods.
△ Less
Submitted 16 December, 2019;
originally announced December 2019.
-
Dealing with Limited Backhaul Capacity in Millimeter Wave Systems: A Deep Reinforcement Learning Approach
Authors:
Mingjie Feng,
Shiwen Mao
Abstract:
Millimeter Wave (MmWave) communication is one of the key technology of the fifth generation (5G) wireless systems to achieve the expected 1000x data rate. With large bandwidth at mmWave band, the link capacity between users and base stations (BS) can be much higher compared to sub-6GHz wireless systems. Meanwhile, due to the high cost of infrastructure upgrade, it would be difficult for operators…
▽ More
Millimeter Wave (MmWave) communication is one of the key technology of the fifth generation (5G) wireless systems to achieve the expected 1000x data rate. With large bandwidth at mmWave band, the link capacity between users and base stations (BS) can be much higher compared to sub-6GHz wireless systems. Meanwhile, due to the high cost of infrastructure upgrade, it would be difficult for operators to drastically enhance the capacity of backhaul links between mmWave BSs and the core network. As a result, the data rate provided by backhaul may not be sufficient to support all mmWave links, the backhaul connection becomes the new bottleneck that limits the system performance. On the other hand, as mmWave channels are subject to random blockage, the data rates of mmWave users significantly vary over time. With limited backhaul capacity and highly dynamic data rates of users, how to allocate backhaul resource to each user remains a challenge for mmWave systems. In this article, we present a deep reinforcement learning (DRL) approach to address this challenge. By learning the blockage pattern, the system dynamics can be captured and predicted, resulting in efficient utilization of backhaul resource. We begin with a discussion on DRL and its application in wireless systems. We then investigate the problem backhaul resource allocation and present the DRL based solution. Finally, we discuss open problems for future research and conclude this article.
△ Less
Submitted 27 December, 2018;
originally announced January 2019.