Search | arXiv e-print repository

arXiv:2312.06454 [pdf, other]

Point Transformer with Federated Learning for Predicting Breast Cancer HER2 Status from Hematoxylin and Eosin-Stained Whole Slide Images

Authors: Bao Li, Zhenyu Liu, Lizhi Shao, Bensheng Qiu, Hong Bu, Jie Tian

Abstract: Directly predicting human epidermal growth factor receptor 2 (HER2) status from widely available hematoxylin and eosin (HE)-stained whole slide images (WSIs) can reduce technical costs and expedite treatment selection. Accurately predicting HER2 requires large collections of multi-site WSIs. Federated learning enables collaborative training of these WSIs without gigabyte-size WSIs transportation a… ▽ More Directly predicting human epidermal growth factor receptor 2 (HER2) status from widely available hematoxylin and eosin (HE)-stained whole slide images (WSIs) can reduce technical costs and expedite treatment selection. Accurately predicting HER2 requires large collections of multi-site WSIs. Federated learning enables collaborative training of these WSIs without gigabyte-size WSIs transportation and data privacy concerns. However, federated learning encounters challenges in addressing label imbalance in multi-site WSIs from the real world. Moreover, existing WSI classification methods cannot simultaneously exploit local context information and long-range dependencies in the site-end feature representation of federated learning. To address these issues, we present a point transformer with federated learning for multi-site HER2 status prediction from HE-stained WSIs. Our approach incorporates two novel designs. We propose a dynamic label distribution strategy and an auxiliary classifier, which helps to establish a well-initialized model and mitigate label distribution variations across sites. Additionally, we propose a farthest cosine sampling based on cosine distance. It can sample the most distinctive features and capture the long-range dependencies. Extensive experiments and analysis show that our method achieves state-of-the-art performance at four sites with a total of 2687 WSIs. Furthermore, we demonstrate that our model can generalize to two unseen sites with 229 WSIs. △ Less

Submitted 27 February, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.02937 [pdf, other]

doi 10.2514/6.2024-1167

Synergistic Perception and Control Simplex for Verifiable Safe Vertical Landing

Authors: Ayoosh Bansal, Yang Zhao, James Zhu, Sheng Cheng, Yuliang Gu, Hyung-** Yoon, Hunmin Kim, Naira Hovakimyan, Lui Sha

Abstract: Perception, Planning, and Control form the essential components of autonomy in advanced air mobility. This work advances the holistic integration of these components to enhance the performance and robustness of the complete cyber-physical system. We adapt Perception Simplex, a system for verifiable collision avoidance amidst obstacle detection faults, to the vertical landing maneuver for autonomou… ▽ More Perception, Planning, and Control form the essential components of autonomy in advanced air mobility. This work advances the holistic integration of these components to enhance the performance and robustness of the complete cyber-physical system. We adapt Perception Simplex, a system for verifiable collision avoidance amidst obstacle detection faults, to the vertical landing maneuver for autonomous air mobility vehicles. We improve upon this system by replacing static assumptions of control capabilities with dynamic confirmation, i.e., real-time confirmation of control limitations of the system, ensuring reliable fulfillment of safety maneuvers and overrides, without dependence on overly pessimistic assumptions. Parameters defining control system capabilities and limitations, e.g., maximum deceleration, are continuously tracked within the system and used to make safety-critical decisions. We apply these techniques to propose a verifiable collision avoidance solution for autonomous aerial mobility vehicles operating in cluttered and potentially unsafe environments. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: To appear in AIAA SciTech 2024

ACM Class: C.3; C.4; J.7

Journal ref: AIAA SCITECH 2024 Forum, p. 1167

arXiv:2309.04710 [pdf, other]

Jade: A Differentiable Physics Engine for Articulated Rigid Bodies with Intersection-Free Frictional Contact

Authors: Gang Yang, Siyuan Luo, Lin Shao

Abstract: We present Jade, a differentiable physics engine for articulated rigid bodies. Jade models contacts as the Linear Complementarity Problem (LCP). Compared to existing differentiable simulations, Jade offers features including intersection-free collision simulation and stable LCP solutions for multiple frictional contacts. We use continuous collision detection to detect the time of impact and adopt… ▽ More We present Jade, a differentiable physics engine for articulated rigid bodies. Jade models contacts as the Linear Complementarity Problem (LCP). Compared to existing differentiable simulations, Jade offers features including intersection-free collision simulation and stable LCP solutions for multiple frictional contacts. We use continuous collision detection to detect the time of impact and adopt the backtracking strategy to prevent intersection between bodies with complex geometry shapes. We derive the gradient calculation to ensure the whole simulation process is differentiable under the backtracking mechanism. We modify the popular Dantzig algorithm to get valid solutions under multiple frictional contacts. We conduct extensive experiments to demonstrate the effectiveness of our differentiable physics simulation over a variety of contact-rich tasks. △ Less

Submitted 9 September, 2023; originally announced September 2023.

arXiv:2306.06102 [pdf, other]

Backup Plan Constrained Model Predictive Control with Guaranteed Stability

Authors: Ran Tao, Hunmin Kim, Hyung-** Yoon, Wenbin Wan, Naira Hovakimyan, Lui Sha, Petros Voulgaris

Abstract: This article proposes and evaluates a new safety concept called backup plan safety for path planning of autonomous vehicles under mission uncertainty using model predictive control (MPC). Backup plan safety is defined as the ability to complete an alternative mission when the primary mission is aborted. To include this new safety concept in control problems, we formulate a feasibility maximization… ▽ More This article proposes and evaluates a new safety concept called backup plan safety for path planning of autonomous vehicles under mission uncertainty using model predictive control (MPC). Backup plan safety is defined as the ability to complete an alternative mission when the primary mission is aborted. To include this new safety concept in control problems, we formulate a feasibility maximization problem aiming to maximize the feasibility of the primary and alternative missions. The feasibility maximization problem is based on multi-objective MPC, where each objective (cost function) is associated with a different mission and balanced by a weight vector. Furthermore, the feasibility maximization problem incorporates additional control input horizons toward the alternative missions on top of the control input horizon toward the primary mission, denoted as multi-horizon inputs, to evaluate the cost for each mission. We develop the backup plan constrained MPC algorithm, which designs the weight vector that ensures asymptotic stability of the closed-loop system, and generates the optimal control input by solving the feasibility maximization problem with computational efficiency. The performance of the proposed algorithm is validated through simulations of a UAV path planning problem. △ Less

Submitted 6 October, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

arXiv:2303.16860 [pdf, other]

Physical Deep Reinforcement Learning Towards Safety Guarantee

Authors: Hongpeng Cao, Yanbing Mao, Lui Sha, Marco Caccamo

Abstract: Deep reinforcement learning (DRL) has achieved tremendous success in many complex decision-making tasks of autonomous systems with high-dimensional state and/or action spaces. However, the safety and stability still remain major concerns that hinder the applications of DRL to safety-critical autonomous systems. To address the concerns, we proposed the Phy-DRL: a physical deep reinforcement learnin… ▽ More Deep reinforcement learning (DRL) has achieved tremendous success in many complex decision-making tasks of autonomous systems with high-dimensional state and/or action spaces. However, the safety and stability still remain major concerns that hinder the applications of DRL to safety-critical autonomous systems. To address the concerns, we proposed the Phy-DRL: a physical deep reinforcement learning framework. The Phy-DRL is novel in two architectural designs: i) Lyapunov-like reward, and ii) residual control (i.e., integration of physics-model-based control and data-driven control). The concurrent physical reward and residual control empower the Phy-DRL the (mathematically) provable safety and stability guarantees. Through experiments on the inverted pendulum, we show that the Phy-DRL features guaranteed safety and stability and enhanced robustness, while offering remarkably accelerated training and enlarged reward. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: Working Paper

arXiv:2212.14735 [pdf]

Compressed domain vibration detection and classification for distributed acoustic sensing

Authors: Xingliang Shen, Huan Wu, Kun Zhu, Yujia Li, Hua Zheng, Jialong Li, Liyang Shao, Perry ** Shum, Chao Lu

Abstract: Distributed acoustic sensing (DAS) is a novel enabling technology that can turn existing fibre optic networks to distributed acoustic sensors. However, it faces the challenges of transmitting, storing, and processing massive streams of data which are orders of magnitude larger than that collected from point sensors. The gap between intensive data generated by DAS and modern computing system with l… ▽ More Distributed acoustic sensing (DAS) is a novel enabling technology that can turn existing fibre optic networks to distributed acoustic sensors. However, it faces the challenges of transmitting, storing, and processing massive streams of data which are orders of magnitude larger than that collected from point sensors. The gap between intensive data generated by DAS and modern computing system with limited reading/writing speed and storage capacity imposes restrictions on many applications. Compressive sensing (CS) is a revolutionary signal acquisition method that allows a signal to be acquired and reconstructed with significantly fewer samples than that required by Nyquist-Shannon theorem. Though the data size is greatly reduced in the sampling stage, the reconstruction of the compressed data is however time and computation consuming. To address this challenge, we propose to map the feature extractor from Nyquist-domain to compressed-domain and therefore vibration detection and classification can be directly implemented in compressed-domain. The measured results show that our framework can be used to reduce the transmitted data size by 70% while achieves 99.4% true positive rate (TPR) and 0.04% false positive rate (TPR) along 5 km sensing fibre and 95.05% classification accuracy on a 5-class classification task. △ Less

Submitted 27 December, 2022; originally announced December 2022.

arXiv:2209.01710 [pdf, other]

doi 10.1002/stvr.1879

Perception Simplex: Verifiable Collision Avoidance in Autonomous Vehicles Amidst Obstacle Detection Faults

Authors: Ayoosh Bansal, Hunmin Kim, Simon Yu, Bo Li, Naira Hovakimyan, Marco Caccamo, Lui Sha

Abstract: Advances in deep learning have revolutionized cyber-physical applications, including the development of Autonomous Vehicles. However, real-world collisions involving autonomous control of vehicles have raised significant safety concerns regarding the use of Deep Neural Networks (DNN) in safety-critical tasks, particularly Perception. The inherent unverifiability of DNNs poses a key challenge in en… ▽ More Advances in deep learning have revolutionized cyber-physical applications, including the development of Autonomous Vehicles. However, real-world collisions involving autonomous control of vehicles have raised significant safety concerns regarding the use of Deep Neural Networks (DNN) in safety-critical tasks, particularly Perception. The inherent unverifiability of DNNs poses a key challenge in ensuring their safe and reliable operation. In this work, we propose Perception Simplex (PS), a fault-tolerant application architecture designed for obstacle detection and collision avoidance. We analyze an existing LiDAR-based classical obstacle detection algorithm to establish strict bounds on its capabilities and limitations. Such analysis and verification have not been possible for deep learning-based perception systems yet. By employing verifiable obstacle detection algorithms, PS identifies obstacle existence detection faults in the output of unverifiable DNN-based object detectors. When faults with potential collision risks are detected, appropriate corrective actions are initiated. Through extensive analysis and software-in-the-loop simulations, we demonstrate that PS provides predictable and deterministic fault tolerance against obstacle existence detection faults, establishing a robust safety guarantee. △ Less

Submitted 28 November, 2023; v1 submitted 4 September, 2022; originally announced September 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2208.14403

ACM Class: D.2.11; I.2.9; C.4; J.7

Journal ref: Software Testing, Verification and Reliability. 2024. e1879

arXiv:2208.14403 [pdf, other]

doi 10.1109/ISSRE55969.2022.00017

Verifiable Obstacle Detection

Authors: Ayoosh Bansal, Hunmin Kim, Simon Yu, Bo Li, Naira Hovakimyan, Marco Caccamo, Lui Sha

Abstract: Perception of obstacles remains a critical safety concern for autonomous vehicles. Real-world collisions have shown that the autonomy faults leading to fatal collisions originate from obstacle existence detection. Open source autonomous driving implementations show a perception pipeline with complex interdependent Deep Neural Networks. These networks are not fully verifiable, making them unsuitabl… ▽ More Perception of obstacles remains a critical safety concern for autonomous vehicles. Real-world collisions have shown that the autonomy faults leading to fatal collisions originate from obstacle existence detection. Open source autonomous driving implementations show a perception pipeline with complex interdependent Deep Neural Networks. These networks are not fully verifiable, making them unsuitable for safety-critical tasks. In this work, we present a safety verification of an existing LiDAR based classical obstacle detection algorithm. We establish strict bounds on the capabilities of this obstacle detection algorithm. Given safety standards, such bounds allow for determining LiDAR sensor properties that would reliably satisfy the standards. Such analysis has as yet been unattainable for neural network based perception systems. We provide a rigorous analysis of the obstacle detection system with empirical results based on real-world sensor data. △ Less

Submitted 30 August, 2022; originally announced August 2022.

Comments: Accepted at ISSRE 2022

ACM Class: D.2.4; I.2.9; I.4.8

Journal ref: 33rd International Symposium on Software Reliability Engineering (ISSRE), pp. 61-72. IEEE, 2022

arXiv:2205.01649 [pdf, other]

Learning Enriched Features for Fast Image Restoration and Enhancement

Authors: Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

Abstract: Given a degraded input image, image restoration aims to recover the missing high-quality image content. Numerous applications demand effective image restoration, e.g., computational photography, surveillance, autonomous vehicles, and remote sensing. Significant advances in image restoration have been made in recent years, dominated by convolutional neural networks (CNNs). The widely-used CNN-based… ▽ More Given a degraded input image, image restoration aims to recover the missing high-quality image content. Numerous applications demand effective image restoration, e.g., computational photography, surveillance, autonomous vehicles, and remote sensing. Significant advances in image restoration have been made in recent years, dominated by convolutional neural networks (CNNs). The widely-used CNN-based methods typically operate either on full-resolution or on progressively low-resolution representations. In the former case, spatial details are preserved but the contextual information cannot be precisely encoded. In the latter case, generated outputs are semantically reliable but spatially less accurate. This paper presents a new architecture with a holistic goal of maintaining spatially-precise high-resolution representations through the entire network, and receiving complementary contextual information from the low-resolution representations. The core of our approach is a multi-scale residual block containing the following key elements: (a) parallel multi-resolution convolution streams for extracting multi-scale features, (b) information exchange across the multi-resolution streams, (c) non-local attention mechanism for capturing contextual information, and (d) attention based multi-scale feature aggregation. Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Extensive experiments on six real image benchmark datasets demonstrate that our method, named as MIRNet-v2 , achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement. The source code and pre-trained models are available at https://github.com/swz30/MIRNetv2 △ Less

Submitted 19 April, 2022; originally announced May 2022.

Comments: This article supersedes arXiv:2003.06792. Accepted for publication in TPAMI

arXiv:2112.05752 [pdf, other]

Specificity-Preserving Federated Learning for MR Image Reconstruction

Authors: Chun-Mei Feng, Yunlu Yan, Shanshan Wang, Yong Xu, Ling Shao, Huazhu Fu

Abstract: Federated learning (FL) can be used to improve data privacy and efficiency in magnetic resonance (MR) image reconstruction by enabling multiple institutions to collaborate without needing to aggregate local data. However, the domain shift caused by different MR imaging protocols can substantially degrade the performance of FL models. Recent FL techniques tend to solve this by enhancing the general… ▽ More Federated learning (FL) can be used to improve data privacy and efficiency in magnetic resonance (MR) image reconstruction by enabling multiple institutions to collaborate without needing to aggregate local data. However, the domain shift caused by different MR imaging protocols can substantially degrade the performance of FL models. Recent FL techniques tend to solve this by enhancing the generalization of the global model, but they ignore the domain-specific features, which may contain important information about the device properties and be useful for local reconstruction. In this paper, we propose a specificity-preserving FL algorithm for MR image reconstruction (FedMRI). The core idea is to divide the MR reconstruction model into two parts: a globally shared encoder to obtain a generalized representation at the global level, and a client-specific decoder to preserve the domain-specific properties of each client, which is important for collaborative reconstruction when the clients have unique distribution. Such scheme is then executed in the frequency space and the image space respectively, allowing exploration of generalized representation and client-specific properties simultaneously in different spaces. Moreover, to further boost the convergence of the globally shared encoder when a domain shift is present, a weighted contrastive regularization is introduced to directly correct any deviation between the client and server during optimization. Extensive experiments demonstrate that our FedMRI's reconstructed results are the closest to the ground-truth for multi-institutional data, and that it outperforms state-of-the-art FL methods. △ Less

Submitted 22 August, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

Comments: 12 pages, 8 figures Code: https://github.com/chunmeifeng/FedMRI

Journal ref: IEEE Transactions on Medical Imaging, 2022

arXiv:2110.08080 [pdf, other]

Deep multi-modal aggregation network for MR image reconstruction with auxiliary modality

Authors: Chun-Mei Feng, Huazhu Fu, Tianfei Zhou, Yong Xu, Ling Shao, David Zhang

Abstract: Magnetic resonance (MR) imaging produces detailed images of organs and tissues with better contrast, but it suffers from a long acquisition time, which makes the image quality vulnerable to say motion artifacts. Recently, many approaches have been developed to reconstruct full-sampled images from partially observed measurements to accelerate MR imaging. However, most approaches focused on reconstr… ▽ More Magnetic resonance (MR) imaging produces detailed images of organs and tissues with better contrast, but it suffers from a long acquisition time, which makes the image quality vulnerable to say motion artifacts. Recently, many approaches have been developed to reconstruct full-sampled images from partially observed measurements to accelerate MR imaging. However, most approaches focused on reconstruction over a single modality, neglecting the discovery of correlation knowledge between the different modalities. Here we propose a Multi-modal Aggregation network for mR Image recOnstruction with auxiliary modality (MARIO), which is capable of discovering complementary representations from a fully sampled auxiliary modality, with which to hierarchically guide the reconstruction of a given target modality. This implies that our method can selectively aggregate multi-modal representations for better reconstruction, yielding comprehensive, multi-scale, multi-modal feature fusion. Extensive experiments on IXI and fastMRI datasets demonstrate the superiority of the proposed approach over state-of-the-art MR image reconstruction methods in removing artifacts. △ Less

Submitted 21 February, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

arXiv:2109.01664 [pdf, other]

Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution

Authors: Chun-Mei Feng, Yunlu Yan, Kai Yu, Yong Xu, Ling Shao, Huazhu Fu

Abstract: Super-resolving the Magnetic Resonance (MR) image of a target contrast under the guidance of the corresponding auxiliary contrast, which provides additional anatomical information, is a new and effective solution for fast MR imaging. However, current multi-contrast super-resolution (SR) methods tend to concatenate different contrasts directly, ignoring their relationships in different clues, e.g.,… ▽ More Super-resolving the Magnetic Resonance (MR) image of a target contrast under the guidance of the corresponding auxiliary contrast, which provides additional anatomical information, is a new and effective solution for fast MR imaging. However, current multi-contrast super-resolution (SR) methods tend to concatenate different contrasts directly, ignoring their relationships in different clues, e.g., in the high-intensity and low-intensity regions. In this study, we propose a separable attention network (comprising high-intensity priority attention and low-intensity separation attention), named SANet. Our SANet could explore the areas of high-intensity and low-intensity regions in the "forward" and "reverse" directions with the help of the auxiliary contrast, while learning clearer anatomical structure and edge information for the SR of a target-contrast MR image. SANet provides three appealing benefits: (1) It is the first model to explore a separable attention mechanism that uses the auxiliary contrast to predict the high-intensity and low-intensity regions regions, diverting more attention to refining any uncertain details between these regions and correcting the fine areas in the reconstructed results. (2) A multi-stage integration module is proposed to learn the response of multi-contrast fusion at multiple stages, get the dependency between the fused representations, and boost their representation ability. (3) Extensive experiments with various state-of-the-art multi-contrast SR methods on fastMRI and clinical \textit{in vivo} datasets demonstrate the superiority of our model. △ Less

Submitted 21 August, 2022; v1 submitted 3 September, 2021; originally announced September 2021.

Comments: arXiv admin note: text overlap with arXiv:2105.08949 https://github.com/chunmeifeng/SANet

arXiv:2108.06932 [pdf, other]

doi 10.26599/AIR.2023.9150015

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Authors: Bo Dong, Wenhai Wang, Deng-** Fan, **peng Li, Huazhu Fu, Ling Shao

Abstract: Most polyp segmentation methods use CNNs as their backbone, leading to two key issues when exchanging information between the encoder and decoder: 1) taking into account the differences in contribution between different-level features and 2) designing an effective mechanism for fusing these features. Unlike existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and… ▽ More Most polyp segmentation methods use CNNs as their backbone, leading to two key issues when exchanging information between the encoder and decoder: 1) taking into account the differences in contribution between different-level features and 2) designing an effective mechanism for fusing these features. Unlike existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations. In addition, considering the image acquisition influence and elusive properties of polyps, we introduce three standard modules, including a cascaded fusion module (CFM), a camouflage identification module (CIM), and a similarity aggregation module (SAM). Among these, the CFM is used to collect the semantic and location information of polyps from high-level features; the CIM is applied to capture polyp information disguised in low-level features, and the SAM extends the pixel features of the polyp area with high-level semantic position information to the entire polyp area, thereby effectively fusing cross-level features. The proposed model, named Polyp-PVT, effectively suppresses noises in the features and significantly improves their expressive capabilities. Extensive experiments on five widely adopted datasets show that the proposed model is more robust to various challenging situations (e.g., appearance changes, small objects, rotation) than existing representative methods. The proposed model is available at https://github.com/Deng**Fan/Polyp-PVT. △ Less

Submitted 19 February, 2024; v1 submitted 16 August, 2021; originally announced August 2021.

Comments: Accepted to CAAI AIR 2023

Journal ref: CAAI Artificial Intelligence Research, 2023, 2: 9150015

arXiv:2107.07314 [pdf, other]

Variational Topic Inference for Chest X-Ray Report Generation

Authors: Ivona Najdenkoska, Xiantong Zhen, Marcel Worring, Ling Shao

Abstract: Automating report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice. Recent work has shown that deep learning models can successfully caption natural images. However, learning from medical data is challenging due to the diversity and uncertainty inherent in the reports written by different radiologists with discrepant expertise and experience. To… ▽ More Automating report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice. Recent work has shown that deep learning models can successfully caption natural images. However, learning from medical data is challenging due to the diversity and uncertainty inherent in the reports written by different radiologists with discrepant expertise and experience. To tackle these challenges, we propose variational topic inference for automatic report generation. Specifically, we introduce a set of topics as latent variables to guide sentence generation by aligning image and language modalities in a latent space. The topics are inferred in a conditional variational inference framework, with each topic governing the generation of a sentence in the report. Further, we adopt a visual attention module that enables the model to attend to different locations in the image and generate more informative descriptions. We conduct extensive experiments on two benchmarks, namely Indiana U. Chest X-rays and MIMIC-CXR. The results demonstrate that our proposed variational topic inference method can generate novel reports rather than mere copies of reports used in training, while still achieving comparable performance to state-of-the-art methods in terms of standard language generation criteria. △ Less

Submitted 15 July, 2021; originally announced July 2021.

Comments: To be published in the International Conference on Medical Image Computing and Computer Assisted Intervention 2021

arXiv:2106.14248 [pdf, other]

Multi-Modal Transformer for Accelerated MR Imaging

Authors: Chun-Mei Feng, Yunlu Yan, Geng Chen, Yong Xu, Ling Shao, Huazhu Fu

Abstract: Accelerated multi-modal magnetic resonance (MR) imaging is a new and effective solution for fast MR imaging, providing superior performance in restoring the target modality from its undersampled counterpart with guidance from an auxiliary modality. However, existing works simply combine the auxiliary modality as prior information, lacking in-depth investigations on the potential mechanisms for fus… ▽ More Accelerated multi-modal magnetic resonance (MR) imaging is a new and effective solution for fast MR imaging, providing superior performance in restoring the target modality from its undersampled counterpart with guidance from an auxiliary modality. However, existing works simply combine the auxiliary modality as prior information, lacking in-depth investigations on the potential mechanisms for fusing different modalities. Further, they usually rely on the convolutional neural networks (CNNs), which is limited by the intrinsic locality in capturing the long-distance dependency. To this end, we propose a multi-modal transformer (MTrans), which is capable of transferring multi-scale features from the target modality to the auxiliary modality, for accelerated MR imaging. To capture deep multi-modal information, our MTrans utilizes an improved multi-head attention mechanism, named cross attention module, which absorbs features from the auxiliary modality that contribute to the target modality. Our framework provides three appealing benefits: (i) Our MTrans use an improved transformers for multi-modal MR imaging, affording more global information compared with existing CNN-based methods. (ii) A new cross attention module is proposed to exploit the useful information in each modality at different scales. The small patch in the target modality aims to keep more fine details, the large patch in the auxiliary modality aims to obtain high-level context features from the larger region and supplement the target modality effectively. (iii) We evaluate MTrans with various accelerated multi-modal MR imaging tasks, e.g., MR image reconstruction and super-resolution, where MTrans outperforms state-of-the-art methods on fastMRI and real-world clinical datasets. △ Less

Submitted 11 May, 2022; v1 submitted 27 June, 2021; originally announced June 2021.

Comments: https://github.com/chunmeifeng/MTrans

arXiv:2105.05980 [pdf, other]

DONet: Dual-Octave Network for Fast MR Image Reconstruction

Authors: Chun-Mei Feng, Zhanyuan Yang, Huazhu Fu, Yong Xu, Jian Yang, Ling Shao

Abstract: Magnetic resonance (MR) image acquisition is an inherently prolonged process, whose acceleration has long been the subject of research. This is commonly achieved by obtaining multiple undersampled images, simultaneously, through parallel imaging. In this paper, we propose the Dual-Octave Network (DONet), which is capable of learning multi-scale spatial-frequency features from both the real and ima… ▽ More Magnetic resonance (MR) image acquisition is an inherently prolonged process, whose acceleration has long been the subject of research. This is commonly achieved by obtaining multiple undersampled images, simultaneously, through parallel imaging. In this paper, we propose the Dual-Octave Network (DONet), which is capable of learning multi-scale spatial-frequency features from both the real and imaginary components of MR data, for fast parallel MR image reconstruction. More specifically, our DONet consists of a series of Dual-Octave convolutions (Dual-OctConv), which are connected in a dense manner for better reuse of features. In each Dual-OctConv, the input feature maps and convolutional kernels are first split into two components (ie, real and imaginary), and then divided into four groups according to their spatial frequencies. Then, our Dual-OctConv conducts intra-group information updating and inter-group information exchange to aggregate the contextual information across different groups. Our framework provides three appealing benefits: (i) It encourages information interaction and fusion between the real and imaginary components at various spatial frequencies to achieve richer representational capacity. (ii) The dense connections between the real and imaginary groups in each Dual-OctConv make the propagation of features more efficient by feature reuse. (iii) DONet enlarges the receptive field by learning multiple spatial-frequency features of both the real and imaginary components. Extensive experiments on two popular datasets (ie, clinical knee and fastMRI), under different undersampling patterns and acceleration factors, demonstrate the superiority of our model in accelerated parallel MR image reconstruction. △ Less

Submitted 12 June, 2021; v1 submitted 12 May, 2021; originally announced May 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2104.05345

Journal ref: IEEE Transactions on Neural Networks and Learning Systems, 2021

arXiv:2104.05345 [pdf, other]

Dual-Octave Convolution for Accelerated Parallel MR Image Reconstruction

Authors: Chun-Mei Feng, Zhanyuan Yang, Geng Chen, Yong Xu, Ling Shao

Abstract: Magnetic resonance (MR) image acquisition is an inherently prolonged process, whose acceleration by obtaining multiple undersampled images simultaneously through parallel imaging has always been the subject of research. In this paper, we propose the Dual-Octave Convolution (Dual-OctConv), which is capable of learning multi-scale spatial-frequency features from both real and imaginary components, f… ▽ More Magnetic resonance (MR) image acquisition is an inherently prolonged process, whose acceleration by obtaining multiple undersampled images simultaneously through parallel imaging has always been the subject of research. In this paper, we propose the Dual-Octave Convolution (Dual-OctConv), which is capable of learning multi-scale spatial-frequency features from both real and imaginary components, for fast parallel MR image reconstruction. By reformulating the complex operations using octave convolutions, our model shows a strong ability to capture richer representations of MR images, while at the same time greatly reducing the spatial redundancy. More specifically, the input feature maps and convolutional kernels are first split into two components (i.e., real and imaginary), which are then divided into four groups according to their spatial frequencies. Then, our Dual-OctConv conducts intra-group information updating and inter-group information exchange to aggregate the contextual information across different groups. Our framework provides two appealing benefits: (i) it encourages interactions between real and imaginary components at various spatial frequencies to achieve richer representational capacity, and (ii) it enlarges the receptive field by learning multiple spatial-frequency features of both the real and imaginary components. We evaluate the performance of the proposed model on the acceleration of multi-coil MR image reconstruction. Extensive experiments are conducted on an {in vivo} knee dataset under different undersampling patterns and acceleration factors. The experimental results demonstrate the superiority of our model in accelerated parallel MR image reconstruction. Our code is available at: github.com/chunmeifeng/Dual-OctConv. △ Less

Submitted 12 April, 2021; originally announced April 2021.

Comments: Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI) 2021

Journal ref: Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI) 2021

arXiv:2103.14819 [pdf, other]

Backup Plan Constrained Model Predictive Control

Authors: Hunmin Kim, Hyung** Yoon, Wenbin Wan, Naira Hovakimyan, Lui Sha, Petros Voulgaris

Abstract: This article proposes a new safety concept: backup plan safety. The backup plan safety is defined as the ability to complete one of the alternative missions in the case of primary mission abortion. To incorporate this new safety concept in control problems, we formulate a feasibility maximization problem that adopts additional (virtual) input horizons toward the alternative missions on top of the… ▽ More This article proposes a new safety concept: backup plan safety. The backup plan safety is defined as the ability to complete one of the alternative missions in the case of primary mission abortion. To incorporate this new safety concept in control problems, we formulate a feasibility maximization problem that adopts additional (virtual) input horizons toward the alternative missions on top of the input horizon toward the primary mission. Cost functions for the primary and alternative missions construct multiple objectives, and multi-horizon inputs evaluate them. To address the feasibility maximization problem, we develop a multi-horizon multi-objective model predictive path integral control (3M) algorithm. Model predictive path integral control (MPPI) is a sampling-based scheme that can help the proposed algorithm deal with nonlinear dynamic systems and achieve computational efficiency by parallel computation. Simulations of the aerial vehicle and ground vehicle control problems demonstrate the new concept of backup plan safety and the performance of the proposed algorithm. △ Less

Submitted 27 March, 2021; originally announced March 2021.

arXiv:2103.11587 [pdf, other]

Brain Image Synthesis with Unsupervised Multivariate Canonical CSC$\ell_4$Net

Authors: Yawen Huang, Feng Zheng, Danyang Wang, Weilin Huang, Matthew R. Scott, Ling Shao

Abstract: Recent advances in neuroscience have highlighted the effectiveness of multi-modal medical data for investigating certain pathologies and understanding human cognition. However, obtaining full sets of different modalities is limited by various factors, such as long acquisition times, high examination costs and artifact suppression. In addition, the complexity, high dimensionality and heterogeneity… ▽ More Recent advances in neuroscience have highlighted the effectiveness of multi-modal medical data for investigating certain pathologies and understanding human cognition. However, obtaining full sets of different modalities is limited by various factors, such as long acquisition times, high examination costs and artifact suppression. In addition, the complexity, high dimensionality and heterogeneity of neuroimaging data remains another key challenge in leveraging existing randomized scans effectively, as data of the same modality is often measured differently by different machines. There is a clear need to go beyond the traditional imaging-dependent process and synthesize anatomically specific target-modality data from a source input. In this paper, we propose to learn dedicated features that cross both intre- and intra-modal variations using a novel CSC$\ell_4$Net. Through an initial unification of intra-modal data in the feature maps and multivariate canonical adaptation, CSC$\ell_4$Net facilitates feature-level mutual transformation. The positive definite Riemannian manifold-penalized data fidelity term further enables CSC$\ell_4$Net to reconstruct missing measurements according to transformed features. Finally, the maximization $\ell_4$-norm boils down to a computationally efficient optimization problem. Extensive experiments validate the ability and robustness of our CSC$\ell_4$Net compared to the state-of-the-art methods on multiple datasets. △ Less

Submitted 22 March, 2021; originally announced March 2021.

Comments: 10 pages, 5 figures CVPR2021 oral

arXiv:2103.10825 [pdf, other]

Variational Knowledge Distillation for Disease Classification in Chest X-Rays

Authors: Tom van Sonsbeek, Xiantong Zhen, Marcel Worring, Ling Shao

Abstract: Disease classification relying solely on imaging data attracts great interest in medical image analysis. Current models could be further improved, however, by also employing Electronic Health Records (EHRs), which contain rich information on patients and findings from clinicians. It is challenging to incorporate this information into disease classification due to the high reliance on clinician inp… ▽ More Disease classification relying solely on imaging data attracts great interest in medical image analysis. Current models could be further improved, however, by also employing Electronic Health Records (EHRs), which contain rich information on patients and findings from clinicians. It is challenging to incorporate this information into disease classification due to the high reliance on clinician input in EHRs, limiting the possibility for automated diagnosis. In this paper, we propose \textit{variational knowledge distillation} (VKD), which is a new probabilistic inference framework for disease classification based on X-rays that leverages knowledge from EHRs. Specifically, we introduce a conditional latent variable model, where we infer the latent representation of the X-ray image with the variational posterior conditioning on the associated EHR text. By doing so, the model acquires the ability to extract the visual features relevant to the disease during learning and can therefore perform more accurate classification for unseen patients at inference based solely on their X-ray scans. We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs. The results show that the proposed variational knowledge distillation can consistently improve the performance of medical image classification and significantly surpasses current methods. △ Less

Submitted 19 March, 2021; originally announced March 2021.

arXiv:2012.02776 [pdf, other]

Learning to Fuse Asymmetric Feature Maps in Siamese Trackers

Authors: Wencheng Han, ** Dong, Fahad Shahbaz Khan, Ling Shao, Jianbing Shen

Abstract: Recently, Siamese-based trackers have achieved promising performance in visual tracking. Most recent Siamese-based trackers typically employ a depth-wise cross-correlation (DW-XCorr) to obtain multi-channel correlation information from the two feature maps (target and search region). However, DW-XCorr has several limitations within Siamese-based tracking: it can easily be fooled by distractors, ha… ▽ More Recently, Siamese-based trackers have achieved promising performance in visual tracking. Most recent Siamese-based trackers typically employ a depth-wise cross-correlation (DW-XCorr) to obtain multi-channel correlation information from the two feature maps (target and search region). However, DW-XCorr has several limitations within Siamese-based tracking: it can easily be fooled by distractors, has fewer activated channels, and provides weak discrimination of object boundaries. Further, DW-XCorr is a handcrafted parameter-free module and cannot fully benefit from offline learning on large-scale data. We propose a learnable module, called the asymmetric convolution (ACM), which learns to better capture the semantic correlation information in offline training on large-scale data. Different from DW-XCorr and its predecessor(XCorr), which regard a single feature map as the convolution kernel, our ACM decomposes the convolution operation on a concatenated feature map into two mathematically equivalent operations, thereby avoiding the need for the feature maps to be of the same size (width and height)during concatenation. Our ACM can incorporate useful prior information, such as bounding-box size, with standard visual features. Furthermore, ACM can easily be integrated into existing Siamese trackers based on DW-XCorror XCorr. To demonstrate its generalization ability, we integrate ACM into three representative trackers: SiamFC, SiamRPN++, and SiamBAN. Our experiments reveal the benefits of the proposed ACM, which outperforms existing methods on six tracking benchmarks. On the LaSOT test set, our ACM-based tracker obtains a significant improvement of 5.8% in terms of success (AUC), over the baseline. △ Less

Submitted 30 March, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

Comments: Accepted by CVPR2021

arXiv:2010.06616 [pdf, ps, other]

Finite-Time Model Inference From A Single Noisy Trajectory

Authors: Yanbing Mao, Naira Hovakimyan, Petros Voulgaris, Lui Sha

Abstract: This paper proposes a novel model inference procedure to identify system matrix from a single noisy trajectory over a finite-time interval. The proposed inference procedure comprises an observation data processor, a redundant data processor and an ordinary least-square estimator, wherein the data processors mitigate the influence of observation noise on inference error. We first systematically inv… ▽ More This paper proposes a novel model inference procedure to identify system matrix from a single noisy trajectory over a finite-time interval. The proposed inference procedure comprises an observation data processor, a redundant data processor and an ordinary least-square estimator, wherein the data processors mitigate the influence of observation noise on inference error. We first systematically investigate the comparisons with naive least-square-regression based model inference and uncover that 1) the same observation data has identical influence on the feasibility of the proposed and the naive model inferences, 2) the naive model inference uses all of the redundant data, while the proposed model inference optimally uses the basis and the redundant data. We then study the sample complexity of the proposed model inference in the presence of observation noise, which leads to the dependence of the processed bias in the observed system trajectory on time and coordinates. Particularly, we derive the sample-complexity upper bound (on the number of observations sufficient to infer a model with prescribed levels of accuracy and confidence) and the sample-complexity lower bound (high-probability lower bound on model error). Finally, the proposed model inference is numerically validated and analyzed. △ Less

Submitted 1 January, 2021; v1 submitted 13 October, 2020; originally announced October 2020.

Comments: Submitted

arXiv:2009.12349 [pdf, other]

Robust Vehicle Lane Kee** Control with Networked Proactive Adaptation

Authors: Hunmin Kim, Wenbin Wan, Naira Hovakimyan, Lui Sha, Petros Voulgaris

Abstract: Road condition is an important environmental factor for autonomous vehicle control. A dramatic change in the road condition from the nominal status is a source of uncertainty that can lead to a system failure. Once the vehicle encounters an uncertain environment, such as hitting an ice patch, it is too late to reduce the speed, and the vehicle can lose control. To cope with future uncertainties in… ▽ More Road condition is an important environmental factor for autonomous vehicle control. A dramatic change in the road condition from the nominal status is a source of uncertainty that can lead to a system failure. Once the vehicle encounters an uncertain environment, such as hitting an ice patch, it is too late to reduce the speed, and the vehicle can lose control. To cope with future uncertainties in advance, we study a proactive robust adaptive control architecture for autonomous vehicles' lane-kee** control problems. The data center generates a prior environmental uncertainty estimate by combining weather forecasts and measurements from anonymous vehicles through a spatio-temporal filter. The prior estimate contributes to designing a robust heading controller and nominal longitudinal velocity for proactive adaptation to each new abnormal condition. Then the control parameters are updated based on posterior information fusion with on-board measurements. △ Less

Submitted 28 September, 2020; v1 submitted 25 September, 2020; originally announced September 2020.

arXiv:2009.08973 [pdf, other]

GRAC: Self-Guided and Self-Regularized Actor-Critic

Authors: Lin Shao, Yifan You, Mengyuan Yan, Qingyun Sun, Jeannette Bohg

Abstract: Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component of recent deep reinforcement learning algorithms is the target network which mitigates the divergence when learning the Q function. However, target networks can slow down the learning process due to delayed function updates. Our main c… ▽ More Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component of recent deep reinforcement learning algorithms is the target network which mitigates the divergence when learning the Q function. However, target networks can slow down the learning process due to delayed function updates. Our main contribution in this work is a self-regularized TD-learning method to address divergence without requiring a target network. Additionally, we propose a self-guided policy improvement method by combining policy-gradient with zero-order optimization to search for actions associated with higher Q-values in a broad neighborhood. This makes learning more robust to local noise in the Q function approximation and guides the updates of our actor network. Taken together, these components define GRAC, a novel self-guided and self-regularized actor critic algorithm. We evaluate GRAC on the suite of OpenAI gym tasks, achieving or outperforming state of the art in every environment tested. △ Less

Submitted 10 November, 2020; v1 submitted 18 September, 2020; originally announced September 2020.

arXiv:2008.02101 [pdf, other]

Structure Preserving Stain Normalization of Histopathology Images Using Self-Supervised Semantic Guidance

Authors: Dwarikanath Mahapatra, Behzad Bozorgtabar, Jean-Philippe Thiran, Ling Shao

Abstract: Although generative adversarial network (GAN) based style transfer is state of the art in histopathology color-stain normalization, they do not explicitly integrate structural information of tissues. We propose a self-supervised approach to incorporate semantic guidance into a GAN based stain normalization framework and preserve detailed structural information. Our method does not require manual s… ▽ More Although generative adversarial network (GAN) based style transfer is state of the art in histopathology color-stain normalization, they do not explicitly integrate structural information of tissues. We propose a self-supervised approach to incorporate semantic guidance into a GAN based stain normalization framework and preserve detailed structural information. Our method does not require manual segmentation maps which is a significant advantage over existing methods. We integrate semantic information at different layers between a pre-trained semantic network and the stain color normalization network. The proposed scheme outperforms other color normalization methods leading to better classification and segmentation performance. △ Less

Submitted 3 June, 2021; v1 submitted 5 August, 2020; originally announced August 2020.

arXiv:2008.01627 [pdf, ps, other]

SL1-Simplex: Safe Velocity Regulation of Self-Driving Vehicles in Dynamic and Unforeseen Environments

Authors: Yanbing Mao, Yuliang Gu, Naira Hovakimyan, Lui Sha, Petros Voulgaris

Abstract: This paper proposes a novel extension of the Simplex architecture with model switching and model learning to achieve safe velocity regulation of self-driving vehicles in dynamic and unforeseen environments. To guarantee the reliability of autonomous vehicles, an $\mathcal{L}_{1}$ adaptive controller that compensates for uncertainties and disturbances is employed by the Simplex architecture as a ve… ▽ More This paper proposes a novel extension of the Simplex architecture with model switching and model learning to achieve safe velocity regulation of self-driving vehicles in dynamic and unforeseen environments. To guarantee the reliability of autonomous vehicles, an $\mathcal{L}_{1}$ adaptive controller that compensates for uncertainties and disturbances is employed by the Simplex architecture as a verified safe controller to tolerate concurrent software and physical failures. Meanwhile, safe switching controller is incorporated into the Simplex for safe velocity regulation through the integration of the traction control system and anti-lock braking system. Specifically, the vehicle's angular and longitudinal velocities asymptotically track the provided references that vary with driving environments, while the wheel slips are restricted to safety envelopes to prevent slip** and sliding. Due to the high dependence of vehicle dynamics on the driving environments, the proposed Simplex leverages the finite-time model learning to timely learn and update the vehicle model for $\mathcal{L}_{1}$ adaptive controller, when any deviation from the safety envelope or the uncertainty measurement threshold occurs in the unforeseen driving environments. Finally, the effectiveness of the proposed Simplex architecture for safe velocity regulation is validated by the AutoRally platform. △ Less

Submitted 1 February, 2022; v1 submitted 4 August, 2020; originally announced August 2020.

Comments: Submitted to ACM Transactions on Cyber-Physical Systems

arXiv:2006.11538 [pdf, other]

Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition

Authors: Ionut Cosmin Duta, Li Liu, Fan Zhu, Ling Shao

Abstract: This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales. PyConv contains a pyramid of kernels, where each level involves different types of filters with varying size and depth, which are able to capture different levels of details in the scene. On top of these improved recognition capabilities, PyConv is also efficient and, with our f… ▽ More This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales. PyConv contains a pyramid of kernels, where each level involves different types of filters with varying size and depth, which are able to capture different levels of details in the scene. On top of these improved recognition capabilities, PyConv is also efficient and, with our formulation, it does not increase the computational cost and parameters compared to standard convolution. Moreover, it is very flexible and extensible, providing a large space of potential network architectures for different applications. PyConv has the potential to impact nearly every computer vision task and, in this work, we present different architectures based on PyConv for four main tasks on visual recognition: image classification, video action classification/recognition, object detection and semantic image segmentation/parsing. Our approach shows significant improvements over all these core tasks in comparison with the baselines. For instance, on image recognition, our 50-layers network outperforms in terms of recognition performance on ImageNet dataset its counterpart baseline ResNet with 152 layers, while having 2.39 times less parameters, 2.52 times lower computational complexity and more than 3 times less layers. On image segmentation, our novel framework sets a new state-of-the-art on the challenging ADE20K benchmark for scene parsing. Code is available at: https://github.com/iduta/pyconv △ Less

Submitted 20 June, 2020; originally announced June 2020.

arXiv:2006.11392 [pdf, other]

PraNet: Parallel Reverse Attention Network for Polyp Segmentation

Authors: Deng-** Fan, Ge-Peng Ji, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, Ling Shao

Abstract: Colonoscopy is an effective technique for detecting colorectal polyps, which are highly related to colorectal cancer. In clinical practice, segmenting polyps from colonoscopy images is of great importance since it provides valuable information for diagnosis and surgery. However, accurate polyp segmentation is a challenging task, for two major reasons: (i) the same type of polyps has a diversity of… ▽ More Colonoscopy is an effective technique for detecting colorectal polyps, which are highly related to colorectal cancer. In clinical practice, segmenting polyps from colonoscopy images is of great importance since it provides valuable information for diagnosis and surgery. However, accurate polyp segmentation is a challenging task, for two major reasons: (i) the same type of polyps has a diversity of size, color and texture; and (ii) the boundary between a polyp and its surrounding mucosa is not sharp. To address these challenges, we propose a parallel reverse attention network (PraNet) for accurate polyp segmentation in colonoscopy images. Specifically, we first aggregate the features in high-level layers using a parallel partial decoder (PPD). Based on the combined feature, we then generate a global map as the initial guidance area for the following components. In addition, we mine the boundary cues using a reverse attention (RA) module, which is able to establish the relationship between areas and boundary cues. Thanks to the recurrent cooperation mechanism between areas and boundaries, our PraNet is capable of calibrating any misaligned predictions, improving the segmentation accuracy. Quantitative and qualitative evaluations on five challenging datasets across six metrics show that our PraNet improves the segmentation accuracy significantly, and presents a number of advantages in terms of generalizability, and real-time segmentation efficiency. △ Less

Submitted 3 July, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

Comments: Accepted to MICCAI 2020

arXiv:2006.10135 [pdf, other]

M2Net: Multi-modal Multi-channel Network for Overall Survival Time Prediction of Brain Tumor Patients

Authors: Tao Zhou, Huazhu Fu, Yu Zhang, Changqing Zhang, Xiankai Lu, Jianbing Shen, Ling Shao

Abstract: Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients. Although many OS time prediction methods have been developed and obtain promising results, there are still several issues. First, conventional prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume, which may not repre… ▽ More Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients. Although many OS time prediction methods have been developed and obtain promising results, there are still several issues. First, conventional prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume, which may not represent the full image or model complex tumor patterns. Second, different types of scanners (i.e., multi-modal data) are sensitive to different brain regions, which makes it challenging to effectively exploit the complementary information across multiple modalities and also preserve the modality-specific properties. Third, existing methods focus on prediction models, ignoring complex data-to-label relationships. To address the above issues, we propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net). Specifically, we first project the 3D MR volume onto 2D images in different directions, which reduces computational costs, while preserving important information and enabling pre-trained models to be transferred from other tasks. Then, we use a modality-specific network to extract implicit and high-level features from different MR scans. A multi-modal shared network is built to fuse these features using a bilinear pooling model, exploiting their correlations to provide complementary information. Finally, we integrate the outputs from each modality-specific network and the multi-modal shared network to generate the final prediction result. Experimental results demonstrate the superiority of our M2Net model over other methods. △ Less

Submitted 14 July, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: Accepted by MICCAI'20

arXiv:2005.07697 [pdf, other]

Safety Constrained Multi-UAV Time Coordination: A Bi-level Control Framework in GPS Denied Environment

Authors: Wenbin Wan, Hunmin Kim, Yikun Cheng, Naira Hovakimyan, Petros G. Voulgaris, Lui Sha

Abstract: Unmanned aerial vehicles (UAVs) suffer from sensor drifts in GPS denied environments, which can cause safety issues. To avoid intolerable sensor drifts while completing the time-critical coordination task for multi-UAV systems, we propose a safety constrained bi-level control framework. The first level is the time-critical coordination level that achieves a consensus of coordination states and pro… ▽ More Unmanned aerial vehicles (UAVs) suffer from sensor drifts in GPS denied environments, which can cause safety issues. To avoid intolerable sensor drifts while completing the time-critical coordination task for multi-UAV systems, we propose a safety constrained bi-level control framework. The first level is the time-critical coordination level that achieves a consensus of coordination states and provides a virtual target which is a function of the coordination state. The second level is the safety-critical control level that is designed to follow the virtual target while adapting the attacked UAV(s) at a path re-planning level to support resilient state estimation. In particular, the time-critical coordination level framework generates the desired speed and position profile of the virtual target based on the multi-UAV cooperative mission by the proposed consensus protocol algorithm. The safety-critical control level is able to make each UAV follow its assigned path while detecting the attacks, estimating the state resiliently, and driving the UAV(s) outside the effective range of the spoofing device within the escape time. The numerical simulations of a three-UAV system demonstrate the effectiveness of the proposed safety constrained bi-level control framework. △ Less

Submitted 19 May, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1910.10826

arXiv:2005.05594 [pdf, other]

Modeling and Enhancing Low-quality Retinal Fundus Images

Authors: Ziyi Shen, Huazhu Fu, Jianbing Shen, Ling Shao

Abstract: Retinal fundus images are widely used for the clinical screening and diagnosis of eye diseases. However, fundus images captured by operators with various levels of experience have a large variation in quality. Low-quality fundus images increase uncertainty in clinical observation and lead to the risk of misdiagnosis. However, due to the special optical beam of fundus imaging and structure of the r… ▽ More Retinal fundus images are widely used for the clinical screening and diagnosis of eye diseases. However, fundus images captured by operators with various levels of experience have a large variation in quality. Low-quality fundus images increase uncertainty in clinical observation and lead to the risk of misdiagnosis. However, due to the special optical beam of fundus imaging and structure of the retina, natural image enhancement methods cannot be utilized directly to address this. In this paper, we first analyze the ophthalmoscope imaging system and simulate a reliable degradation of major inferior-quality factors, including uneven illumination, image blurring, and artifacts. Then, based on the degradation model, a clinically oriented fundus enhancement network (cofe-Net) is proposed to suppress global degradation factors, while simultaneously preserving anatomical retinal structures and pathological characteristics for clinical observation and analysis. Experiments on both synthetic and real images demonstrate that our algorithm effectively corrects low-quality fundus images without losing retinal details. Moreover, we also show that the fundus correction method can benefit medical image analysis applications, e.g., retinal vessel segmentation and optic disc/cup detection. △ Less

Submitted 9 December, 2020; v1 submitted 12 May, 2020; originally announced May 2020.

arXiv:2004.14133 [pdf, other]

Inf-Net: Automatic COVID-19 Lung Infection Segmentation from CT Images

Authors: Deng-** Fan, Tao Zhou, Ge-Peng Ji, Yi Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, Ling Shao

Abstract: Coronavirus Disease 2019 (COVID-19) spread globally in early 2020, causing the world to face an existential health crisis. Automated detection of lung infections from computed tomography (CT) images offers a great potential to augment the traditional healthcare strategy for tackling COVID-19. However, segmenting infected regions from CT slices faces several challenges, including high variation in… ▽ More Coronavirus Disease 2019 (COVID-19) spread globally in early 2020, causing the world to face an existential health crisis. Automated detection of lung infections from computed tomography (CT) images offers a great potential to augment the traditional healthcare strategy for tackling COVID-19. However, segmenting infected regions from CT slices faces several challenges, including high variation in infection characteristics, and low intensity contrast between infections and normal tissues. Further, collecting a large amount of data is impractical within a short time period, inhibiting the training of a deep model. To address these challenges, a novel COVID-19 Lung Infection Segmentation Deep Network (Inf-Net) is proposed to automatically identify infected regions from chest CT slices. In our Inf-Net, a parallel partial decoder is used to aggregate the high-level features and generate a global map. Then, the implicit reverse attention and explicit edge-attention are utilized to model the boundaries and enhance the representations. Moreover, to alleviate the shortage of labeled data, we present a semi-supervised segmentation framework based on a randomly selected propagation strategy, which only requires a few labeled images and leverages primarily unlabeled data. Our semi-supervised framework can improve the learning ability and achieve a higher performance. Extensive experiments on our COVID-SemiSeg and real CT volumes demonstrate that the proposed Inf-Net outperforms most cutting-edge segmentation models and advances the state-of-the-art performance. △ Less

Submitted 21 May, 2020; v1 submitted 22 April, 2020; originally announced April 2020.

Comments: To appear in IEEE TMI. The code is released in: https://github.com/Deng**Fan/Inf-Net

arXiv:2004.08499 [pdf, other]

Design and Control of Roller Grasper V2 for In-Hand Manipulation

Authors: Shenli Yuan, Lin Shao, Connor L. Yako, Alex Gruebele, J. Kenneth Salisbury

Abstract: The ability to perform in-hand manipulation still remains an unsolved problem; having this capability would allow robots to perform sophisticated tasks requiring repositioning and reorienting of grasped objects. In this work, we present a novel non-anthropomorphic robot grasper with the ability to manipulate objects by means of active surfaces at the fingertips. Active surfaces are achieved by sph… ▽ More The ability to perform in-hand manipulation still remains an unsolved problem; having this capability would allow robots to perform sophisticated tasks requiring repositioning and reorienting of grasped objects. In this work, we present a novel non-anthropomorphic robot grasper with the ability to manipulate objects by means of active surfaces at the fingertips. Active surfaces are achieved by spherical rolling fingertips with two degrees of freedom (DoF) -- a pivoting motion for surface reorientation -- and a continuous rolling motion for moving the object. A further DoF is in the base of each finger, allowing the fingers to grasp objects over a range of size and shapes. Instantaneous kinematics was derived and objects were successfully manipulated both with a custom handcrafted control scheme as well as one learned through imitation learning, in simulation and experimentally on the hardware. △ Less

Submitted 17 November, 2020; v1 submitted 17 April, 2020; originally announced April 2020.

Comments: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) October 25-29, 2020, Las Vegas, NV, USA (Virtual)

arXiv:2004.04491 [pdf, other]

doi 10.1109/TIP.2020.2983560

Multi-Granularity Canonical Appearance Pooling for Remote Sensing Scene Classification

Authors: S. Wang, Y. Guan, L. Shao

Abstract: Recognising remote sensing scene images remains challenging due to large visual-semantic discrepancies. These mainly arise due to the lack of detailed annotations that can be employed to align pixel-level representations with high-level semantic labels. As the tagging process is labour-intensive and subjective, we hereby propose a novel Multi-Granularity Canonical Appearance Pooling (MG-CAP) to au… ▽ More Recognising remote sensing scene images remains challenging due to large visual-semantic discrepancies. These mainly arise due to the lack of detailed annotations that can be employed to align pixel-level representations with high-level semantic labels. As the tagging process is labour-intensive and subjective, we hereby propose a novel Multi-Granularity Canonical Appearance Pooling (MG-CAP) to automatically capture the latent ontological structure of remote sensing datasets. We design a granular framework that allows progressively crop** the input image to learn multi-grained features. For each specific granularity, we discover the canonical appearance from a set of pre-defined transformations and learn the corresponding CNN features through a maxout-based Siamese style architecture. Then, we replace the standard CNN features with Gaussian covariance matrices and adopt the proper matrix normalisations for improving the discriminative power of features. Besides, we provide a stable solution for training the eigenvalue-decomposition function (EIG) in a GPU and demonstrate the corresponding back-propagation using matrix calculus. Extensive experiments have shown that our framework can achieve promising results in public remote sensing scene datasets. △ Less

Submitted 9 April, 2020; originally announced April 2020.

Comments: This paper is going to be published by IEEE Transactions on Image Processing

Journal ref: IEEE Transactions on Image Processing 29, 5396--5407 (2020)

arXiv:2003.14119 [pdf, other]

Pathological Retinal Region Segmentation From OCT Images Using Geometric Relation Based Augmentation

Authors: Dwarikanath Mahapatra, Behzad Bozorgtabar, Jean-Philippe Thiran, Ling Shao

Abstract: Medical image segmentation is an important task for computer aided diagnosis. Pixelwise manual annotations of large datasets require high expertise and is time consuming. Conventional data augmentations have limited benefit by not fully representing the underlying distribution of the training set, thus affecting model robustness when tested on images captured from different sources. Prior work lev… ▽ More Medical image segmentation is an important task for computer aided diagnosis. Pixelwise manual annotations of large datasets require high expertise and is time consuming. Conventional data augmentations have limited benefit by not fully representing the underlying distribution of the training set, thus affecting model robustness when tested on images captured from different sources. Prior work leverages synthetic images for data augmentation ignoring the interleaved geometric relationship between different anatomical labels. We propose improvements over previous GAN-based medical image synthesis methods by jointly encoding the intrinsic relationship of geometry and shape. Latent space variable sampling results in diverse generated images from a base image and improves robustness. Given those augmented images generated by our method, we train the segmentation network to enhance the segmentation performance of retinal optical coherence tomography (OCT) images. The proposed method outperforms state-of-the-art segmentation methods on the public RETOUCH dataset having images captured from different acquisition procedures. Ablation studies and visual analysis also demonstrate benefits of integrating geometry and diversity. △ Less

Submitted 25 April, 2020; v1 submitted 31 March, 2020; originally announced March 2020.

arXiv:2003.07761 [pdf, other]

CycleISP: Real Image Restoration via Improved Data Synthesis

Authors: Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

Abstract: The availability of large-scale datasets has helped unleash the true potential of deep convolutional neural networks (CNNs). However, for the single-image denoising problem, capturing a real dataset is an unacceptably expensive and cumbersome procedure. Consequently, image denoising algorithms are mostly developed and evaluated on synthetic data that is usually generated with a widespread assumpti… ▽ More The availability of large-scale datasets has helped unleash the true potential of deep convolutional neural networks (CNNs). However, for the single-image denoising problem, capturing a real dataset is an unacceptably expensive and cumbersome procedure. Consequently, image denoising algorithms are mostly developed and evaluated on synthetic data that is usually generated with a widespread assumption of additive white Gaussian noise (AWGN). While the CNNs achieve impressive results on these synthetic datasets, they do not perform well when applied on real camera images, as reported in recent benchmark datasets. This is mainly because the AWGN is not adequate for modeling the real camera noise which is signal-dependent and heavily transformed by the camera imaging pipeline. In this paper, we present a framework that models camera imaging pipeline in forward and reverse directions. It allows us to produce any number of realistic image pairs for denoising both in RAW and sRGB spaces. By training a new image denoising network on realistic synthetic data, we achieve the state-of-the-art performance on real camera benchmark datasets. The parameters in our model are ~5 times lesser than the previous best method for RAW denoising. Furthermore, we demonstrate that the proposed framework generalizes beyond image denoising problem e.g., for color matching in stereoscopic cinema. The source code and pre-trained models are available at https://github.com/swz30/CycleISP. △ Less

Submitted 17 March, 2020; originally announced March 2020.

Comments: CVPR 2020 (Oral)

arXiv:2003.04253 [pdf, other]

doi 10.1109/TIP.2020.3013162

Motion-Attentive Transition for Zero-Shot Video Object Segmentation

Authors: Tianfei Zhou, Shunzhou Wang, Yi Zhou, Yazhou Yao, Jianwu Li, Ling Shao

Abstract: In this paper, we present a novel Motion-Attentive Transition Network (MATNet) for zero-shot video object segmentation, which provides a new way of leveraging motion information to reinforce spatio-temporal object representation. An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder, which transforms appearance features into motion-attenti… ▽ More In this paper, we present a novel Motion-Attentive Transition Network (MATNet) for zero-shot video object segmentation, which provides a new way of leveraging motion information to reinforce spatio-temporal object representation. An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder, which transforms appearance features into motion-attentive representations at each convolutional stage. In this way, the encoder becomes deeply interleaved, allowing for closely hierarchical interactions between object motion and appearance. This is superior to the typical two-stream architecture, which treats motion and appearance separately in each stream and often suffers from overfitting to appearance information. Additionally, a bridge network is proposed to obtain a compact, discriminative and scale-sensitive representation for multi-level encoder features, which is further fed into a decoder to achieve segmentation results. Extensive experiments on three challenging public benchmarks (i.e. DAVIS-16, FBMS and Youtube-Objects) show that our model achieves compelling performance against the state-of-the-arts. △ Less

Submitted 9 July, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

Comments: AAAI 2020. Code: https://github.com/tfzhou/MATNet

arXiv:2002.05000 [pdf, other]

Hi-Net: Hybrid-fusion Network for Multi-modal MR Image Synthesis

Authors: Tao Zhou, Huazhu Fu, Geng Chen, Jianbing Shen, Ling Shao

Abstract: Magnetic resonance imaging (MRI) is a widely used neuroimaging technique that can provide images of different contrasts (i.e., modalities). Fusing this multi-modal data has proven particularly effective for boosting model performance in many tasks. However, due to poor data quality and frequent patient dropout, collecting all modalities for every patient remains a challenge. Medical image synthesi… ▽ More Magnetic resonance imaging (MRI) is a widely used neuroimaging technique that can provide images of different contrasts (i.e., modalities). Fusing this multi-modal data has proven particularly effective for boosting model performance in many tasks. However, due to poor data quality and frequent patient dropout, collecting all modalities for every patient remains a challenge. Medical image synthesis has been proposed as an effective solution to this, where any missing modalities are synthesized from the existing ones. In this paper, we propose a novel Hybrid-fusion Network (Hi-Net) for multi-modal MR image synthesis, which learns a map** from multi-modal source images (i.e., existing modalities) to target images (i.e., missing modalities). In our Hi-Net, a modality-specific network is utilized to learn representations for each individual modality, and a fusion network is employed to learn the common latent representation of multi-modal data. Then, a multi-modal synthesis network is designed to densely combine the latent representation with hierarchical features from each modality, acting as a generator to synthesize the target images. Moreover, a layer-wise multi-modal fusion strategy is presented to effectively exploit the correlations among multiple modalities, in which a Mixed Fusion Block (MFB) is proposed to adaptively weight different fusion strategies (i.e., element-wise summation, product, and maximization). Extensive experiments demonstrate that the proposed model outperforms other state-of-the-art medical image synthesis methods. △ Less

Submitted 11 February, 2020; originally announced February 2020.

Comments: has been accepted by IEEE TMI

arXiv:1912.04670 [pdf, other]

doi 10.1109/JBHI.2020.3045475

DR-GAN: Conditional Generative Adversarial Network for Fine-Grained Lesion Synthesis on Diabetic Retinopathy Images

Authors: Yi Zhou, Boyang Wang, Xiaodong He, Shanshan Cui, Ling Shao

Abstract: Diabetic retinopathy (DR) is a complication of diabetes that severely affects eyes. It can be graded into five levels of severity according to international protocol. However, optimizing a grading model to have strong generalizability requires a large amount of balanced training data, which is difficult to collect particularly for the high severity levels. Typical data augmentation methods, includ… ▽ More Diabetic retinopathy (DR) is a complication of diabetes that severely affects eyes. It can be graded into five levels of severity according to international protocol. However, optimizing a grading model to have strong generalizability requires a large amount of balanced training data, which is difficult to collect particularly for the high severity levels. Typical data augmentation methods, including random flip** and rotation, cannot generate data with high diversity. In this paper, we propose a diabetic retinopathy generative adversarial network (DR-GAN) to synthesize high-resolution fundus images which can be manipulated with arbitrary grading and lesion information. Thus, large-scale generated data can be used for more meaningful augmentation to train a DR grading and lesion segmentation model. The proposed retina generator is conditioned on the structural and lesion masks, as well as adaptive grading vectors sampled from the latent grading space, which can be adopted to control the synthesized grading severity. Moreover, a multi-scale spatial and channel attention module is devised to improve the generation ability to synthesize details. Multi-scale discriminators are designed to operate from large to small receptive fields, and joint adversarial losses are adopted to optimize the whole network in an end-to-end manner. With extensive experiments evaluated on the EyePACS dataset connected to Kaggle, as well as the FGADR dataset, we validate the effectiveness of our method, which can both synthesize highly realistic (1280 x 1280) controllable fundus images and contribute to the DR grading task. △ Less

Submitted 11 November, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

Comments: Extension work of our MICCAI paper

Journal ref: IEEE Journal of Biomedical and Health Informatics 2020

arXiv:1911.04470 [pdf, other]

doi 10.1109/TCSVT.2019.2936710

Semi-Heterogeneous Three-Way Joint Embedding Network for Sketch-Based Image Retrieval

Authors: Jianjun Lei, Yuxin Song, Bo Peng, Zhanyu Ma, Ling Shao, Yi-Zhe Song

Abstract: Sketch-based image retrieval (SBIR) is a challenging task due to the large cross-domain gap between sketches and natural images. How to align abstract sketches and natural images into a common high-level semantic space remains a key problem in SBIR. In this paper, we propose a novel semi-heterogeneous three-way joint embedding network (Semi3-Net), which integrates three branches (a sketch branch,… ▽ More Sketch-based image retrieval (SBIR) is a challenging task due to the large cross-domain gap between sketches and natural images. How to align abstract sketches and natural images into a common high-level semantic space remains a key problem in SBIR. In this paper, we propose a novel semi-heterogeneous three-way joint embedding network (Semi3-Net), which integrates three branches (a sketch branch, a natural image branch, and an edgemap branch) to learn more discriminative cross-domain feature representations for the SBIR task. The key insight lies with how we cultivate the mutual and subtle relationships amongst the sketches, natural images, and edgemaps. A semi-heterogeneous feature map** is designed to extract bottom features from each domain, where the sketch and edgemap branches are shared while the natural image branch is heterogeneous to the other branches. In addition, a joint semantic embedding is introduced to embed the features from different domains into a common high-level semantic space, where all of the three branches are shared. To further capture informative features common to both natural images and the corresponding edgemaps, a co-attention model is introduced to conduct common channel-wise feature recalibration between different domains. A hybrid-loss mechanism is designed to align the three branches, where an alignment loss and a sketch-edgemap contrastive loss are presented to encourage the network to learn invariant cross-domain representations. Experimental results on two widely used category-level datasets (Sketchy and TU-Berlin Extension) demonstrate that the proposed method outperforms state-of-the-art methods. △ Less

Submitted 9 November, 2019; originally announced November 2019.

Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology

arXiv:1911.00969 [pdf, other]

Learning to Scaffold the Development of Robotic Manipulation Skills

Authors: Lin Shao, Toki Migimatsu, Jeannette Bohg

Abstract: Learning contact-rich, robotic manipulation skills is a challenging problem due to the high-dimensionality of the state and action space as well as uncertainty from noisy sensors and inaccurate motor control. To combat these factors and achieve more robust manipulation, humans actively exploit contact constraints in the environment. By adopting a similar strategy, robots can also achieve more robu… ▽ More Learning contact-rich, robotic manipulation skills is a challenging problem due to the high-dimensionality of the state and action space as well as uncertainty from noisy sensors and inaccurate motor control. To combat these factors and achieve more robust manipulation, humans actively exploit contact constraints in the environment. By adopting a similar strategy, robots can also achieve more robust manipulation. In this paper, we enable a robot to autonomously modify its environment and thereby discover how to ease manipulation skill learning. Specifically, we provide the robot with fixtures that it can freely place within the environment. These fixtures provide hard constraints that limit the outcome of robot actions. Thereby, they funnel uncertainty from perception and motor control and scaffold manipulation skill learning. We propose a learning system that consists of two learning loops. In the outer loop, the robot positions the fixture in the workspace. In the inner loop, the robot learns a manipulation skill and after a fixed number of episodes, returns the reward to the outer loop. Thereby, the robot is incentivised to place the fixture such that the inner loop quickly achieves a high reward. We demonstrate our framework both in simulation and in the real world on three tasks: peg insertion, wrench manipulation and shallow-depth insertion. We show that manipulation skill learning is dramatically sped up through this way of scaffolding. △ Less

Submitted 5 October, 2020; v1 submitted 3 November, 2019; originally announced November 2019.

Comments: Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2020

arXiv:1910.10826 [pdf, other]

A Safety Constrained Control Framework for UAVs in GPS Denied Environment

Authors: Wenbin Wan, Hunmin Kim, Naira Hovakimyan, Lui Sha, Petros G. Voulgaris

Abstract: Unmanned aerial vehicles (UAVs) suffer from sensor drifts in GPS denied environments, which can lead to potentially dangerous situations. To avoid intolerable sensor drifts in the presence of GPS spoofing attacks, we propose a safety constrained control framework that adapts the UAV at a path re-planning level to support resilient state estimation against GPS spoofing attacks. The attack detector… ▽ More Unmanned aerial vehicles (UAVs) suffer from sensor drifts in GPS denied environments, which can lead to potentially dangerous situations. To avoid intolerable sensor drifts in the presence of GPS spoofing attacks, we propose a safety constrained control framework that adapts the UAV at a path re-planning level to support resilient state estimation against GPS spoofing attacks. The attack detector is used to detect GPS spoofing attacks based on the resilient state estimation and provides a switching criterion between the robust control mode and emergency control mode. To quantify the safety margin, we introduce the escape time which is defined as a safe time under which the state estimation error remains within a tolerable error with designated confidence. An attacker location tracker (ALT) is developed to track the location of the attacker and estimate the output power of the spoofing device by the unscented Kalman filter (UKF) with sliding window outputs. Using the estimates from ALT, an escape controller (ESC) is designed based on the constrained model predictive controller (MPC) such that the UAV escapes from the effective range of the spoofing device within the escape time. △ Less

Submitted 12 April, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

arXiv:1909.03749 [pdf, other]

Learning Visual Dynamics Models of Rigid Objects using Relational Inductive Biases

Authors: Fabio Ferreira, Lin Shao, Tamim Asfour, Jeannette Bohg

Abstract: Endowing robots with human-like physical reasoning abilities remains challenging. We argue that existing methods often disregard spatio-temporal relations and by using Graph Neural Networks (GNNs) that incorporate a relational inductive bias, we can shift the learning process towards exploiting relations. In this work, we learn action-conditional forward dynamics models of a simulated manipulation… ▽ More Endowing robots with human-like physical reasoning abilities remains challenging. We argue that existing methods often disregard spatio-temporal relations and by using Graph Neural Networks (GNNs) that incorporate a relational inductive bias, we can shift the learning process towards exploiting relations. In this work, we learn action-conditional forward dynamics models of a simulated manipulation task from visual observations involving cluttered and irregularly shaped objects. We investigate two GNN approaches and empirically assess their capability to generalize to scenarios with novel and an increasing number of objects. The first, Graph Networks (GN) based approach, considers explicitly defined edge attributes and not only does it consistently underperform an auto-encoder baseline that we modified to predict future states, our results indicate how different edge attributes can significantly influence the predictions. Consequently, we develop the Auto-Predictor that does not rely on explicitly defined edge attributes. It outperforms the baseline and the GN-based models. Overall, our results show the sensitivity of GNN-based approaches to the task representation, the efficacy of relational inductive biases and advocate choosing lightweight approaches that implicitly reason about relations over ones that leave these decisions to human designers. △ Less

Submitted 23 October, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

Comments: short paper (4 pages, two figures), accepted to NeurIPS 2019 Graph Representation Learning workshop

arXiv:1907.05598 [pdf, other]

Coupled-Projection Residual Network for MRI Super-Resolution

Authors: Chun-Mei Feng, Kai Wang, Shijian Lu, Yong Xu, Heng Kong, Ling Shao

Abstract: Magnetic Resonance Imaging(MRI) has been widely used in clinical application and pathology research by hel** doctors make more accurate diagnoses. On the other hand, accurate diagnosis by MRI remains a great challenge as images obtained via present MRI techniques usually have low resolutions. Improving MRI image quality and resolution thus becomes a critically important task. This paper presents… ▽ More Magnetic Resonance Imaging(MRI) has been widely used in clinical application and pathology research by hel** doctors make more accurate diagnoses. On the other hand, accurate diagnosis by MRI remains a great challenge as images obtained via present MRI techniques usually have low resolutions. Improving MRI image quality and resolution thus becomes a critically important task. This paper presents an innovative Coupled-Projection Residual Network (CPRN) for MRI super-resolution. The CPRN consists of two complementary sub-networks: a shallow network and a deep network that keep the content consistency while learning high frequency differences between low-resolution and high-resolution images. The shallow sub-network employs coupled-projection for better retaining the MRI image details, where a novel feedback mechanism is introduced to guide the reconstruction of high-resolution images. The deep sub-network learns from the residuals of the high-frequency image information, where multiple residual blocks are cascaded to magnify the MRI images at the last network layer. Finally, the features from the shallow and deep sub-networks are fused for the reconstruction of high-resolution MRI images. For effective fusion of features from the deep and shallow sub-networks, a step-wise connection (CPRN S) is designed as inspired by the human cognitive processes (from simple to complex). Experiments over three public MRI datasets show that our proposed CPRN achieves superior MRI super-resolution performance as compared with the state-of-the-art. Our source code will be publicly available at http://www.yongxu.org/lunwen.html. △ Less

Submitted 12 July, 2019; originally announced July 2019.

Comments: Our source code will be publicly available at http://www.yongxu.org/lunwen.html

arXiv:1906.06878 [pdf, other]

doi 10.1109/TIP.2020.3026622

Noisy-As-Clean: Learning Self-supervised Denoising from the Corrupted Image

Authors: Jun Xu, Yuan Huang, Ming-Ming Cheng, Li Liu, Fan Zhu, Zhou Xu, Ling Shao

Abstract: Supervised deep networks have achieved promisingperformance on image denoising, by learning image priors andnoise statistics on plenty pairs of noisy and clean images. Unsupervised denoising networks are trained with only noisy images. However, for an unseen corrupted image, both supervised andunsupervised networks ignore either its particular image prior, the noise statistics, or both. That is, t… ▽ More Supervised deep networks have achieved promisingperformance on image denoising, by learning image priors andnoise statistics on plenty pairs of noisy and clean images. Unsupervised denoising networks are trained with only noisy images. However, for an unseen corrupted image, both supervised andunsupervised networks ignore either its particular image prior, the noise statistics, or both. That is, the networks learned from external images inherently suffer from a domain gap problem: the image priors and noise statistics are very different between the training and test images. This problem becomes more clear when dealing with the signal dependent realistic noise. To circumvent this problem, in this work, we propose a novel "Noisy-As-Clean" (NAC) strategy of training self-supervised denoising networks. Specifically, the corrupted test image is directly taken as the "clean" target, while the inputs are synthetic images consisted of this corrupted image and a second and similar corruption. A simple but useful observation on our NAC is: as long as the noise is weak, it is feasible to learn a self-supervised network only with the corrupted image, approximating the optimal parameters of a supervised network learned with pairs of noisy and clean images. Experiments on synthetic and realistic noise removal demonstrate that, the DnCNN and ResNet networks trained with our self-supervised NAC strategy achieve comparable or better performance than the original ones and previous supervised/unsupervised/self-supervised networks. The code is publicly available at https://github.com/csjunxu/Noisy-As-Clean. △ Less

Submitted 9 May, 2020; v1 submitted 17 June, 2019; originally announced June 2019.

Comments: 12 pages, 9 figures, 6 tables, the first two authors contribute equally

arXiv:1906.05348 [pdf, other]

Towards Resilient UAV: Escape Time in GPS Denied Environment with Sensor Drift

Authors: Hyung-** Yoon, Wenbin Wan, Hunmin Kim, Naira Hovakimyan, Lui Sha, Petros G. Voulgaris

Abstract: This paper considers a resilient state estimation framework for unmanned aerial vehicles (UAVs) that integrates a Kalman filter-like state estimator and an attack detector. When an attack is detected, the state estimator uses only IMU signals as the GPS signals do not contain legitimate information. This limited sensor availability induces a sensor drift problem questioning the reliability of the… ▽ More This paper considers a resilient state estimation framework for unmanned aerial vehicles (UAVs) that integrates a Kalman filter-like state estimator and an attack detector. When an attack is detected, the state estimator uses only IMU signals as the GPS signals do not contain legitimate information. This limited sensor availability induces a sensor drift problem questioning the reliability of the sensor estimates. We propose a new resilience measure, escape time, as the safe time within which the estimation errors remain in a tolerable region with high probability. This paper analyzes the stability of the proposed resilient estimation framework and quantifies a lower bound for the escape time. Moreover, simulations of the UAV model demonstrate the performance of the proposed framework and provide analytical results. △ Less

Submitted 11 June, 2019; originally announced June 2019.

arXiv:1811.08064 [pdf, other]

doi 10.1109/ICCAD.2017.8203885

Model and Integrate Medical Resource Availability into Verifiably Correct Executable Medical Guidelines - Technical Report

Authors: Chunhui Guo, Zhicheng Fu, Zhenyu Zhang, Shang** Ren, Lui Sha

Abstract: Improving effectiveness and safety of patient care is an ultimate objective for medical cyber-physical systems. A recent study shows that the patients' death rate can be reduced by computerizing medical guidelines. Most existing medical guideline models are validated and/or verified based on the assumption that all necessary medical resources needed for a patient care are always available. However… ▽ More Improving effectiveness and safety of patient care is an ultimate objective for medical cyber-physical systems. A recent study shows that the patients' death rate can be reduced by computerizing medical guidelines. Most existing medical guideline models are validated and/or verified based on the assumption that all necessary medical resources needed for a patient care are always available. However, the reality is that some medical resources, such as special medical equipment or medical specialists, can be temporarily unavailable for an individual patient. In such cases, safety properties validated and/or verified in existing medical guideline models without considering medical resource availability may not hold any more. The paper argues that considering medical resource availability is essential in building verifiably correct executable medical guidelines. We present an approach to explicitly and separately model medical resource availability and automatically integrate resource availability models into an existing statechart-based computerized medical guideline model. This approach requires minimal change in existing medical guideline models to take into consideration of medical resource availability in validating and verifying medical guideline models. A simplified stroke scenario is used as a case study to investigate the effectiveness and validity of our approach. △ Less

Submitted 19 November, 2018; originally announced November 2018.

Comments: full version, 8 pages. arXiv admin note: substantial text overlap with arXiv:1811.08061

Journal ref: IEEE/ACM 36th International Conference on Computer-Aided Design (ICCAD), 2017

arXiv:1811.08061 [pdf, other]

doi 10.1109/ICCPS.2018.00032

Model and Integrate Medical Resource Available Times and Relationships in Verifiably Correct Executable Medical Best Practice Guideline Models (Extended Version)

Authors: Chunhui Guo, Zhicheng Fu, Zhenyu Zhang, Shang** Ren, Lui Sha

Abstract: Improving patient care safety is an ultimate objective for medical cyber-physical systems. A recent study shows that the patients' death rate is significantly reduced by computerizing medical best practice guidelines. Recent data also show that some morbidity and mortality in emergency care are directly caused by delayed or interrupted treatment due to lack of medical resources. However, medical g… ▽ More Improving patient care safety is an ultimate objective for medical cyber-physical systems. A recent study shows that the patients' death rate is significantly reduced by computerizing medical best practice guidelines. Recent data also show that some morbidity and mortality in emergency care are directly caused by delayed or interrupted treatment due to lack of medical resources. However, medical guidelines usually do not provide guidance on medical resource demands and how to manage potential unexpected delays in resource availability. If medical resources are temporarily unavailable, safety properties in existing executable medical guideline models may fail which may cause increased risk to patients under care. The paper presents a separately model and jointly verify (SMJV) architecture to separately model medical resource available times and relationships and jointly verify safety properties of existing medical best practice guideline models with resource models being integrated in. The SMJV architecture allows medical staff to effectively manage medical resource demands and unexpected resource availability delays during emergency care. The separated modeling approach also allows different domain professionals to make independent model modifications, facilitates the management of frequent resource availability changes, and enables resource statechart reuse in multiple medical guideline models. A simplified stroke scenario is used as a case study to investigate the effectiveness and validity of the SMJV architecture. The case study indicates that the SMJV architecture is able to identify unsafe properties caused by unexpected resource delays. △ Less

Submitted 19 November, 2018; originally announced November 2018.

Comments: full version, 12 pages

Journal ref: ACM/IEEE 9th International Conference on Cyber-Physical Systems (ICCPS), 2018

arXiv:1811.00694 [pdf, other]

doi 10.1109/JIOT.2018.2879475

Design Verifiably Correct Model Patterns to Facilitate Modeling Medical Best Practice Guidelines with Statecharts (Technical Report)

Authors: Chunhui Guo, Zhicheng Fu, Zhenyu Zhang, Shang** Ren, Lui Sha

Abstract: Improving patient care safety is an ultimate objective for medical cyber-physical systems. A recent study shows that the patients' death rate can be significantly reduced by computerizing medical best practice guidelines. To facilitate the development of computerized medical best practice guidelines, statecharts are often used as a modeling tool because of their high resemblances to disease and tr… ▽ More Improving patient care safety is an ultimate objective for medical cyber-physical systems. A recent study shows that the patients' death rate can be significantly reduced by computerizing medical best practice guidelines. To facilitate the development of computerized medical best practice guidelines, statecharts are often used as a modeling tool because of their high resemblances to disease and treatment models and their capabilities to provide rapid prototy** and simulation for clinical validations. However, some implementations of statecharts, such as Yakindu statecharts, are priority-based and have synchronous execution semantics which makes it difficult to model certain functionalities that are essential in modeling medical guidelines, such as two-way communications and configurable execution orders. Rather than introducing new statechart elements or changing the statechart implementation's underline semantics, we use existing basic statechart elements to design model patterns for the commonly occurring issues. In particular, we show the design of model patterns for two-way communications and configurable execution orders and formally prove the correctness of these model patterns. We further use a simplified airway laser surgery scenario as a case study to demonstrate how the developed model patterns address the two-way communication and configurable execution order issues and their impact on validation and verification of medical safety properties. △ Less

Submitted 1 November, 2018; originally announced November 2018.

Comments: full version, 14 pages

Journal ref: IEEE Internet of Things Journal, 2018

arXiv:1810.12126 [pdf, other]

ActionXPose: A Novel 2D Multi-view Pose-based Algorithm for Real-time Human Action Recognition

Authors: Federico Angelini, Zeyu Fu, Yang Long, Ling Shao, Syed Mohsen Naqvi

Abstract: We present ActionXPose, a novel 2D pose-based algorithm for posture-level Human Action Recognition (HAR). The proposed approach exploits 2D human poses provided by OpenPose detector from RGB videos. ActionXPose aims to process poses data to be provided to a Long Short-Term Memory Neural Network and to a 1D Convolutional Neural Network, which solve the classification problem. ActionXPose is one of… ▽ More We present ActionXPose, a novel 2D pose-based algorithm for posture-level Human Action Recognition (HAR). The proposed approach exploits 2D human poses provided by OpenPose detector from RGB videos. ActionXPose aims to process poses data to be provided to a Long Short-Term Memory Neural Network and to a 1D Convolutional Neural Network, which solve the classification problem. ActionXPose is one of the first algorithms that exploits 2D human poses for HAR. The algorithm has real-time performance and it is robust to camera movings, subject proximity changes, viewpoint changes, subject appearance changes and provide high generalization degree. In fact, extensive simulations show that ActionXPose can be successfully trained using different datasets at once. State-of-the-art performance on popular datasets for posture-related HAR problems (i3DPost, KTH) are provided and results are compared with those obtained by other methods, including the selected ActionXPose baseline. Moreover, we also proposed two novel datasets called MPOSE and ISLD recorded in our Intelligent Sensing Lab, to show ActionXPose generalization performance. △ Less

Submitted 29 October, 2018; originally announced October 2018.

Showing 1–50 of 52 results for author: Shao, L