-
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization
Authors:
Tiantian Geng,
Teng Wang,
Yanfu Zhang,
**ming Duan,
Weili Guan,
Feng Zheng
Abstract:
Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL). Existing methods over-specialize on each task, overlooking the fact that these instances often occur in the same video to form the complete video content. In this work, we present UniAV, a Unified Audio…
▽ More
Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL). Existing methods over-specialize on each task, overlooking the fact that these instances often occur in the same video to form the complete video content. In this work, we present UniAV, a Unified Audio-Visual perception network, to achieve joint learning of TAL, SED and AVEL tasks for the first time. UniAV can leverage diverse data available in task-specific datasets, allowing the model to learn and share mutually beneficial knowledge across tasks and modalities. To tackle the challenges posed by substantial variations in datasets (size/domain/duration) and distinct task characteristics, we propose to uniformly encode visual and audio modalities of all videos to derive generic representations, while also designing task-specific experts to capture unique knowledge for each task. Besides, we develop a unified language-aware classifier by utilizing a pre-trained text encoder, enabling the model to flexibly detect various types of instances and previously unseen ones by simply changing prompts during inference. UniAV outperforms its single-task counterparts by a large margin with fewer parameters, achieving on-par or superior performances compared to state-of-the-art task-specific methods across ActivityNet 1.3, DESED and UnAV-100 benchmarks.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Decoder-Only Image Registration
Authors:
Xi Jia,
Wenqi Lu,
Xinxing Cheng,
**ming Duan
Abstract:
In unsupervised medical image registration, the predominant approaches involve the utilization of a encoder-decoder network architecture, allowing for precise prediction of dense, full-resolution displacement fields from given paired images. Despite its widespread use in the literature, we argue for the necessity of making both the encoder and decoder learnable in such an architecture. For this, w…
▽ More
In unsupervised medical image registration, the predominant approaches involve the utilization of a encoder-decoder network architecture, allowing for precise prediction of dense, full-resolution displacement fields from given paired images. Despite its widespread use in the literature, we argue for the necessity of making both the encoder and decoder learnable in such an architecture. For this, we propose a novel network architecture, termed LessNet in this paper, which contains only a learnable decoder, while entirely omitting the utilization of a learnable encoder. LessNet substitutes the learnable encoder with simple, handcrafted features, eliminating the need to learn (optimize) network parameters in the encoder altogether. Consequently, this leads to a compact, efficient, and decoder-only architecture for 3D medical image registration. Evaluated on two publicly available brain MRI datasets, we demonstrate that our decoder-only LessNet can effectively and efficiently learn both dense displacement and diffeomorphic deformation fields in 3D. Furthermore, our decoder-only LessNet can achieve comparable registration performance to state-of-the-art methods such as VoxelMorph and TransMorph, while requiring significantly fewer computational resources. Our code and pre-trained models are available at https://github.com/xi-jia/LessNet.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Optimization Landscape of Policy Gradient Methods for Discrete-time Static Output Feedback
Authors:
**gliang Duan,
Jie Li,
Xuyang Chen,
Kai Zhao,
Shengbo Eben Li,
Lin Zhao
Abstract:
In recent times, significant advancements have been made in delving into the optimization landscape of policy gradient methods for achieving optimal control in linear time-invariant (LTI) systems. Compared with state-feedback control, output-feedback control is more prevalent since the underlying state of the system may not be fully observed in many practical settings. This paper analyzes the opti…
▽ More
In recent times, significant advancements have been made in delving into the optimization landscape of policy gradient methods for achieving optimal control in linear time-invariant (LTI) systems. Compared with state-feedback control, output-feedback control is more prevalent since the underlying state of the system may not be fully observed in many practical settings. This paper analyzes the optimization landscape inherent to policy gradient methods when applied to static output feedback (SOF) control in discrete-time LTI systems subject to quadratic cost. We begin by establishing crucial properties of the SOF cost, encompassing coercivity, L-smoothness, and M-Lipschitz continuous Hessian. Despite the absence of convexity, we leverage these properties to derive novel findings regarding convergence (and nearly dimension-free rate) to stationary points for three policy gradient methods, including the vanilla policy gradient method, the natural policy gradient method, and the Gauss-Newton method. Moreover, we provide proof that the vanilla policy gradient method exhibits linear convergence towards local minima when initialized near such minima. The paper concludes by presenting numerical examples that validate our theoretical findings. These results not only characterize the performance of gradient descent for optimizing the SOF problem but also provide insights into the effectiveness of general policy gradient methods within the realm of reinforcement learning.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
DSAC-T: Distributional Soft Actor-Critic with Three Refinements
Authors:
**gliang Duan,
Wenxuan Wang,
Liming Xiao,
Jiaxin Gao,
Shengbo Eben Li
Abstract:
Reinforcement learning (RL) has proven to be highly effective in tackling complex decision-making and control tasks. However, prevalent model-free RL methods often face severe performance degradation due to the well-known overestimation issue. In response to this problem, we recently introduced an off-policy RL algorithm, called distributional soft actor-critic (DSAC or DSAC-v1), which can effecti…
▽ More
Reinforcement learning (RL) has proven to be highly effective in tackling complex decision-making and control tasks. However, prevalent model-free RL methods often face severe performance degradation due to the well-known overestimation issue. In response to this problem, we recently introduced an off-policy RL algorithm, called distributional soft actor-critic (DSAC or DSAC-v1), which can effectively improve the value estimation accuracy by learning a continuous Gaussian value distribution. Nonetheless, standard DSAC has its own shortcomings, including occasionally unstable learning processes and the necessity for task-specific reward scaling, which may hinder its overall performance and adaptability in some special tasks. This paper further introduces three important refinements to standard DSAC in order to address these shortcomings. These refinements consist of expected value substituting, twin value distribution learning, and variance-based critic gradient adjusting. The modified RL algorithm is named as DSAC with three refinements (DSAC-T or DSAC-v2), and its performances are systematically evaluated on a diverse set of benchmark tasks. Without any task-specific hyperparameter tuning, DSAC-T surpasses or matches a lot of mainstream model-free RL algorithms, including SAC, TD3, DDPG, TRPO, and PPO, in all tested environments. Additionally, DSAC-T, unlike its standard version, ensures a highly stable learning process and delivers similar performance across varying reward scales.
△ Less
Submitted 28 December, 2023; v1 submitted 9 October, 2023;
originally announced October 2023.
-
VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence
Authors:
Jianing Qiu,
Jian Wu,
Hao Wei,
Peilun Shi,
Minqing Zhang,
Yunyun Sun,
Lin Li,
Hanruo Liu,
Hongyi Liu,
Simeng Hou,
Yuyang Zhao,
Xuehui Shi,
Junfang Xian,
Xiaoxia Qu,
Sirui Zhu,
Lijie Pan,
Xiaoniao Chen,
Xiaojia Zhang,
Shuai Jiang,
Kebing Wang,
Chenlong Yang,
Mingqiang Chen,
Sujie Fan,
Jianhua Hu,
Aiguo Lv
, et al. (17 additional authors not shown)
Abstract:
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi…
▽ More
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassification of disease phenotype, and systemic biomarker and disease prediction, with each application enhanced with expert-level intelligence and accuracy. The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases. Evaluated on a new large-scale ophthalmic disease diagnosis benchmark database, as well as a new large-scale segmentation and detection benchmark database, VisionFM outperformed strong baseline deep neural networks. The ophthalmic image representations learned by VisionFM exhibited noteworthy explainability, and demonstrated strong generalizability to new ophthalmic modalities, disease spectrum, and imaging devices. As a foundation model, VisionFM has a large capacity to learn from diverse ophthalmic imaging data and disparate datasets. To be commensurate with this capacity, in addition to the real data used for pre-training, we also generated and leveraged synthetic ophthalmic imaging data. Experimental results revealed that synthetic data that passed visual Turing tests, can also enhance the representation learning capability of VisionFM, leading to substantial performance gains on downstream ophthalmic AI tasks. Beyond the ophthalmic AI applications developed, validated, and demonstrated in this work, substantial further applications can be achieved in an efficient and cost-effective manner using VisionFM as the foundation.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Recovering high-quality FODs from a reduced number of diffusion-weighted images using a model-driven deep learning architecture
Authors:
J Bartlett,
C E Davey,
L A Johnston,
J Duan
Abstract:
Fibre orientation distribution (FOD) reconstruction using deep learning has the potential to produce accurate FODs from a reduced number of diffusion-weighted images (DWIs), decreasing total imaging time. Diffusion acquisition invariant representations of the DWI signals are typically used as input to these methods to ensure that they can be applied flexibly to data with different b-vectors and b-…
▽ More
Fibre orientation distribution (FOD) reconstruction using deep learning has the potential to produce accurate FODs from a reduced number of diffusion-weighted images (DWIs), decreasing total imaging time. Diffusion acquisition invariant representations of the DWI signals are typically used as input to these methods to ensure that they can be applied flexibly to data with different b-vectors and b-values; however, this means the network cannot condition its output directly on the DWI signal. In this work, we propose a spherical deconvolution network, a model-driven deep learning FOD reconstruction architecture, that ensures intermediate and output FODs produced by the network are consistent with the input DWI signals. Furthermore, we implement a fixel classification penalty within our loss function, encouraging the network to produce FODs that can subsequently be segmented into the correct number of fixels and improve downstream fixel-based analysis. Our results show that the model-based deep learning architecture achieves competitive performance compared to a state-of-the-art FOD super-resolution network, FOD-Net. Moreover, we show that the fixel classification penalty can be tuned to offer improved performance with respect to metrics that rely on accurately segmented of FODs. Our code is publicly available at https://github.com/Jbartlett6/SDNet .
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Protecting the Future: Neonatal Seizure Detection with Spatial-Temporal Modeling
Authors:
Ziyue Li,
Yuchen Fang,
You Li,
Kan Ren,
Yansen Wang,
Xufang Luo,
Juanyong Duan,
Congrui Huang,
Dongsheng Li,
Lili Qiu
Abstract:
A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusing on adult epilepsy monitoring often fail due to…
▽ More
A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusing on adult epilepsy monitoring often fail due to (i) dynamic seizure onset location in human brains; (ii) different montages on neonates and (iii) huge distribution shift among different subjects. In this paper, we propose a deep learning framework, namely STATENet, to address the exclusive challenges with exquisite designs at the temporal, spatial and model levels. The experiments over the real-world large-scale neonatal EEG dataset illustrate that our framework achieves significantly better seizure detection performance.
△ Less
Submitted 2 July, 2023;
originally announced July 2023.
-
Fourier-Net+: Leveraging Band-Limited Representation for Efficient 3D Medical Image Registration
Authors:
Xi Jia,
Alexander Thorley,
Alberto Gomez,
Wenqi Lu,
Dipak Kotecha,
**ming Duan
Abstract:
U-Net style networks are commonly utilized in unsupervised image registration to predict dense displacement fields, which for high-resolution volumetric image data is a resource-intensive and time-consuming task. To tackle this challenge, we first propose Fourier-Net, which replaces the costly U-Net style expansive path with a parameter-free model-driven decoder. Instead of directly predicting a f…
▽ More
U-Net style networks are commonly utilized in unsupervised image registration to predict dense displacement fields, which for high-resolution volumetric image data is a resource-intensive and time-consuming task. To tackle this challenge, we first propose Fourier-Net, which replaces the costly U-Net style expansive path with a parameter-free model-driven decoder. Instead of directly predicting a full-resolution displacement field, our Fourier-Net learns a low-dimensional representation of the displacement field in the band-limited Fourier domain which our model-driven decoder converts to a full-resolution displacement field in the spatial domain. Expanding upon Fourier-Net, we then introduce Fourier-Net+, which additionally takes the band-limited spatial representation of the images as input and further reduces the number of convolutional layers in the U-Net style network's contracting path. Finally, to enhance the registration performance, we propose a cascaded version of Fourier-Net+. We evaluate our proposed methods on three datasets, on which our proposed Fourier-Net and its variants achieve comparable results with current state-of-the art methods, while exhibiting faster inference speeds, lower memory footprint, and fewer multiply-add operations. With such small computational cost, our Fourier-Net+ enables the efficient training of large-scale 3D registration on low-VRAM GPUs. Our code is publicly available at \url{https://github.com/xi-jia/Fourier-Net}.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization
Authors:
Fei Kong,
**hao Duan,
RuiPeng Ma,
Hengtao Shen,
Xiaofeng Zhu,
Xiaoshuang Shi,
Kaidi Xu
Abstract:
Recently, diffusion models have achieved remarkable success in generating tasks, including image and audio generation. However, like other generative models, diffusion models are prone to privacy issues. In this paper, we propose an efficient query-based membership inference attack (MIA), namely Proximal Initialization Attack (PIA), which utilizes groundtruth trajectory obtained by $ε$ initialized…
▽ More
Recently, diffusion models have achieved remarkable success in generating tasks, including image and audio generation. However, like other generative models, diffusion models are prone to privacy issues. In this paper, we propose an efficient query-based membership inference attack (MIA), namely Proximal Initialization Attack (PIA), which utilizes groundtruth trajectory obtained by $ε$ initialized in $t=0$ and predicted point to infer memberships. Experimental results indicate that the proposed method can achieve competitive performance with only two queries on both discrete-time and continuous-time diffusion models. Moreover, previous works on the privacy of diffusion models have focused on vision tasks without considering audio tasks. Therefore, we also explore the robustness of diffusion models to MIA in the text-to-speech (TTS) task, which is an audio generation task. To the best of our knowledge, this work is the first to study the robustness of diffusion models to MIA in the TTS task. Experimental results indicate that models with mel-spectrogram (image-like) output are vulnerable to MIA, while models with audio output are relatively robust to MIA. {Code is available at \url{https://github.com/kong13661/PIA}}.
△ Less
Submitted 9 October, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Feasible Policy Iteration
Authors:
Yujie Yang,
Zhilong Zheng,
Shengbo Eben Li,
**gliang Duan,
**g**g Liu,
Xianyuan Zhan,
Ya-Qin Zhang
Abstract:
Safe reinforcement learning (RL) aims to find the optimal policy and its feasible region in a constrained optimal control problem (OCP). Ensuring feasibility and optimality simultaneously has been a major challenge. Existing methods either attempt to solve OCPs directly with constrained optimization algorithms, leading to unstable training processes and unsatisfactory feasibility, or restrict poli…
▽ More
Safe reinforcement learning (RL) aims to find the optimal policy and its feasible region in a constrained optimal control problem (OCP). Ensuring feasibility and optimality simultaneously has been a major challenge. Existing methods either attempt to solve OCPs directly with constrained optimization algorithms, leading to unstable training processes and unsatisfactory feasibility, or restrict policies in overly small feasible regions, resulting in excessive conservativeness with sacrificed optimality. To address this challenge, we propose an indirect safe RL framework called feasible policy iteration, which guarantees that the feasible region monotonically expands and converges to the maximum one, and the state-value function monotonically improves and converges to the optimal one. We achieve this by designing a policy update principle called region-wise policy improvement, which maximizes the state-value function under the constraint of the constraint decay function (CDF) inside the feasible region and minimizes the CDF outside the feasible region simultaneously. This update scheme ensures that the state-value function monotonically increases state-wise in the feasible region and the CDF monotonically decreases state-wise in the entire state space. We prove that the CDF converges to the solution of the risky Bellman equation while the state-value function converges to the solution of the feasible Bellman equation. The former represents the maximum feasible region and the latter manifests the optimal state-value function. Experiments show that our algorithm learns strictly safe and near-optimal policies with accurate feasible regions on classic control tasks. It also achieves fewer constraint violations with performance better than (or comparable to) baselines on Safety Gym.
△ Less
Submitted 28 January, 2024; v1 submitted 18 April, 2023;
originally announced April 2023.
-
Integrated Behavior Planning and Motion Control for Autonomous Vehicles with Traffic Rules Compliance
Authors:
Haichao Liu,
Kai Chen,
Yulin Li,
Zhenmin Huang,
Jianghua Duan,
Jun Ma
Abstract:
In this article, we propose an optimization-based integrated behavior planning and motion control scheme, which is an interpretable and adaptable urban autonomous driving solution that complies with complex traffic rules while ensuring driving safety. Inherently, to ensure compliance with traffic rules, an innovative design of potential functions (PFs) is presented to characterize various traffic…
▽ More
In this article, we propose an optimization-based integrated behavior planning and motion control scheme, which is an interpretable and adaptable urban autonomous driving solution that complies with complex traffic rules while ensuring driving safety. Inherently, to ensure compliance with traffic rules, an innovative design of potential functions (PFs) is presented to characterize various traffic rules related to traffic lights, traversable and non-traversable traffic line markings, etc. These PFs are further incorporated as part of the model predictive control (MPC) formulation. In this sense, high-level behavior planning is attained implicitly along with motion control as an integrated architecture, facilitating flexible maneuvers with safety guarantees. Due to the well-designed objective function of the MPC scheme, our integrated behavior planning and motion control scheme is competent for various urban driving scenarios and able to generate versatile behaviors, such as overtaking with adaptive cruise control, turning in the intersection, and merging in and out of the roundabout. As demonstrated from a series of simulations with challenging scenarios in CARLA, it is noteworthy that the proposed framework admits real-time performance and high generalizability.
△ Less
Submitted 30 November, 2023; v1 submitted 3 April, 2023;
originally announced April 2023.
-
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Authors:
Tiantian Geng,
Teng Wang,
**ming Duan,
Runmin Cong,
Feng Zheng
Abstract:
Existing audio-visual event localization (AVE) handles manually trimmed videos with only a single instance in each of them. However, this setting is unrealistic as natural videos often contain numerous audio-visual events with different categories. To better adapt to real-life applications, in this paper we focus on the task of dense-localizing audio-visual events, which aims to jointly localize a…
▽ More
Existing audio-visual event localization (AVE) handles manually trimmed videos with only a single instance in each of them. However, this setting is unrealistic as natural videos often contain numerous audio-visual events with different categories. To better adapt to real-life applications, in this paper we focus on the task of dense-localizing audio-visual events, which aims to jointly localize and recognize all audio-visual events occurring in an untrimmed video. The problem is challenging as it requires fine-grained audio-visual scene and context understanding. To tackle this problem, we introduce the first Untrimmed Audio-Visual (UnAV-100) dataset, which contains 10K untrimmed videos with over 30K audio-visual events. Each video has 2.8 audio-visual events on average, and the events are usually related to each other and might co-occur as in real-life scenes. Next, we formulate the task using a new learning-based framework, which is capable of fully integrating audio and visual modalities to localize audio-visual events with various lengths and capture dependencies between them in a single pass. Extensive experiments demonstrate the effectiveness of our method as well as the significance of multi-scale cross-modal perception and dependency modeling for this task.
△ Less
Submitted 24 March, 2023; v1 submitted 22 March, 2023;
originally announced March 2023.
-
Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate
Authors:
Dongjie Yu,
Wenjun Zou,
Yujie Yang,
Haitong Ma,
Shengbo Eben Li,
**gliang Duan,
Jianyu Chen
Abstract:
Safe reinforcement learning (RL) that solves constraint-satisfactory policies provides a promising way to the broader safety-critical applications of RL in real-world problems such as robotics. Among all safe RL approaches, model-based methods reduce training time violations further due to their high sample efficiency. However, lacking safety robustness against the model uncertainties remains an i…
▽ More
Safe reinforcement learning (RL) that solves constraint-satisfactory policies provides a promising way to the broader safety-critical applications of RL in real-world problems such as robotics. Among all safe RL approaches, model-based methods reduce training time violations further due to their high sample efficiency. However, lacking safety robustness against the model uncertainties remains an issue in safe model-based RL, especially in training time safety. In this paper, we propose a distributional reachability certificate (DRC) and its Bellman equation to address model uncertainties and characterize robust persistently safe states. Furthermore, we build a safe RL framework to resolve constraints required by the DRC and its corresponding shield policy. We also devise a line search method to maintain safety and reach higher returns simultaneously while leveraging the shield policy. Comprehensive experiments on classical benchmarks such as constrained tracking and navigation indicate that the proposed algorithm achieves comparable returns with much fewer constraint violations during training.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
On the Optimization Landscape of Dynamic Output Feedback: A Case Study for Linear Quadratic Regulator
Authors:
**gliang Duan,
Wenhan Cao,
Yang Zheng,
Lin Zhao
Abstract:
The convergence of policy gradient algorithms in reinforcement learning hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies…
▽ More
The convergence of policy gradient algorithms in reinforcement learning hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies (controllers). We investigate the more challenging case of dynamic output-feedback policies for linear quadratic regulation (abbreviated as dLQR), which is prevalent in practice but has a rather complicated optimization landscape. We first show how the dLQR cost varies with the coordinate transformation of the dynamic controller and then derive the optimal transformation for a given observable stabilizing controller. At the core of our results is the uniqueness of the stationary point of dLQR when it is observable, which is in a concise form of an observer-based controller with the optimal similarity transformation. These results shed light on designing efficient algorithms for general decision-making problems with partially observed information.
△ Less
Submitted 29 October, 2023; v1 submitted 12 September, 2022;
originally announced September 2022.
-
U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration?
Authors:
Xi Jia,
Joseph Bartlett,
Tianyang Zhang,
Wenqi Lu,
Zhaowen Qiu,
**ming Duan
Abstract:
Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net is sufficient to capture accurate deformations without needing long-range dependencies. The purpose of this study is therefore to investigate whether U-Net-based metho…
▽ More
Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net is sufficient to capture accurate deformations without needing long-range dependencies. The purpose of this study is therefore to investigate whether U-Net-based methods are outdated compared to modern transformer-based approaches when applied to medical image registration. For this, we propose a large kernel U-Net (LKU-Net) by embedding a parallel convolutional block to a vanilla U-Net in order to enhance the effective receptive field. On the public 3D IXI brain dataset for atlas-based registration, we show that the performance of the vanilla U-Net is already comparable with that of state-of-the-art transformer-based networks (such as TransMorph), and that the proposed LKU-Net outperforms TransMorph by using only 1.12% of its parameters and 10.8% of its mult-adds operations. We further evaluate LKU-Net on a MICCAI Learn2Reg 2021 challenge dataset for inter-subject registration, our LKU-Net also outperforms TransMorph on this dataset and ranks first on the public leaderboard as of the submission of this work. With only modest modifications to the vanilla U-Net, we show that U-Net can outperform transformer-based architectures on inter-subject and atlas-based 3D medical image registration. Code is available at https://github.com/xi-jia/LKU-Net.
△ Less
Submitted 13 August, 2022; v1 submitted 7 August, 2022;
originally announced August 2022.
-
Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs
Authors:
Dongsheng Ding,
Kaiqing Zhang,
Jiali Duan,
Tamer Başar,
Mihailo R. Jovanović
Abstract:
We study sequential decision making problems aimed at maximizing the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon optimal control problem for Constrained Markov Decision Processes (constrained MDPs). Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD)…
▽ More
We study sequential decision making problems aimed at maximizing the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon optimal control problem for Constrained Markov Decision Processes (constrained MDPs). Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method that updates the primal variable via natural policy gradient ascent and the dual variable via projected sub-gradient descent. Although the underlying maximization involves a nonconcave objective function and a nonconvex constraint set, under the softmax policy parametrization we prove that our method achieves global convergence with sublinear rates regarding both the optimality gap and the constraint violation. Such convergence is independent of the size of the state-action space, i.e., it is~dimension-free. Furthermore, for log-linear and general smooth policy parametrizations, we establish sublinear convergence rates up to a function approximation error caused by restricted policy parametrization. We also provide convergence and finite-sample complexity guarantees for two sample-based NPG-PD algorithms. Finally, we use computational experiments to showcase the merits and the effectiveness of our approach.
△ Less
Submitted 17 October, 2023; v1 submitted 6 June, 2022;
originally announced June 2022.
-
Structure Unbiased Adversarial Model for Medical Image Segmentation
Authors:
Tianyang Zhang,
Shaoming Zheng,
Jun Cheng,
Xi Jia,
Joseph Bartlett,
Xinxing Cheng,
Huazhu Fu,
Zhaowen Qiu,
Jiang Liu,
**ming Duan
Abstract:
Generative models have been widely proposed in image recognition to generate more images where the distribution is similar to that of the real ones. It often introduces a discriminator network to differentiate the real data from the generated ones. Such models utilise a discriminator network tasked with differentiating style transferred data from data contained in the target dataset. However in do…
▽ More
Generative models have been widely proposed in image recognition to generate more images where the distribution is similar to that of the real ones. It often introduces a discriminator network to differentiate the real data from the generated ones. Such models utilise a discriminator network tasked with differentiating style transferred data from data contained in the target dataset. However in doing so the network focuses on discrepancies in the intensity distribution and may overlook structural differences between the datasets. In this paper we formulate a new image-to-image translation problem to ensure that the structure of the generated images is similar to that in the target dataset. We propose a simple, yet powerful Structure-Unbiased Adversarial (SUA) network which accounts for both intensity and structural differences between the training and test sets when performing image segmentation. It consists of a spatial transformation block followed by an intensity distribution rendering module. The spatial transformation block is proposed to reduce the structure gap between the two images, and also produce an inverse deformation field to warp the final segmented image back. The intensity distribution rendering module then renders the deformed structure to an image with the target intensity distribution. Experimental results show that the proposed SUA method has the capability to transfer both intensity distribution and structural content between multiple datasets.
△ Less
Submitted 11 August, 2022; v1 submitted 25 May, 2022;
originally announced May 2022.
-
Improve Generalization of Driving Policy at Signalized Intersections with Adversarial Learning
Authors:
Yangang Ren,
Guojian Zhan,
Liye Tang,
Shengbo Eben Li,
Jianhua Jiang,
**gliang Duan
Abstract:
Intersections are quite challenging among various driving scenes wherein the interaction of signal lights and distinct traffic actors poses great difficulty to learn a wise and robust driving policy. Current research rarely considers the diversity of intersections and stochastic behaviors of traffic participants. For practical applications, the randomness usually leads to some devastating events,…
▽ More
Intersections are quite challenging among various driving scenes wherein the interaction of signal lights and distinct traffic actors poses great difficulty to learn a wise and robust driving policy. Current research rarely considers the diversity of intersections and stochastic behaviors of traffic participants. For practical applications, the randomness usually leads to some devastating events, which should be the focus of autonomous driving. This paper introduces an adversarial learning paradigm to boost the intelligence and robustness of driving policy for signalized intersections with dense traffic flow. Firstly, we design a static path planner which is capable of generating trackable candidate paths for multiple intersections with diversified topology. Next, a constrained optimal control problem (COCP) is built based on these candidate paths wherein the bounded uncertainty of dynamic models is considered to capture the randomness of driving environment. We propose adversarial policy gradient (APG) to solve the COCP wherein the adversarial policy is introduced to provide disturbances by seeking the most severe uncertainty while the driving policy learns to handle this situation by competition. Finally, a comprehensive system is established to conduct training and testing wherein the perception module is introduced and the human experience is incorporated to solve the yellow light dilemma. Experiments indicate that the trained policy can handle the signal lights flexibly meanwhile realizing the smooth and efficient passing with a humanoid paradigm. Besides, APG enables a large-margin improvement of the resistance to the abnormal behaviors and thus ensures a high safety level for the autonomous vehicle.
△ Less
Submitted 9 April, 2022;
originally announced April 2022.
-
Primal-dual Estimator Learning: an Offline Constrained Moving Horizon Estimation Method with Feasibility and Near-optimality Guarantees
Authors:
Wenhan Cao,
**gliang Duan,
Shengbo Eben Li,
Chen Chen,
Chang Liu,
Yu Wang
Abstract:
This paper proposes a primal-dual framework to learn a stable estimator for linear constrained estimation problems leveraging the moving horizon approach. To avoid the online computational burden in most existing methods, we learn a parameterized function offline to approximate the primal estimate. Meanwhile, a dual estimator is trained to check the suboptimality of the primal estimator during exe…
▽ More
This paper proposes a primal-dual framework to learn a stable estimator for linear constrained estimation problems leveraging the moving horizon approach. To avoid the online computational burden in most existing methods, we learn a parameterized function offline to approximate the primal estimate. Meanwhile, a dual estimator is trained to check the suboptimality of the primal estimator during execution time. Both the primal and dual estimators are learned from data using supervised learning techniques, and the explicit sample size is provided, which enables us to guarantee the quality of each learned estimator in terms of feasibility and optimality. This in turn allows us to bound the probability of the learned estimator being infeasible or suboptimal. Furthermore, we analyze the stability of the resulting estimator with a bounded error in the minimization of the cost function. Since our algorithm does not require the solution of an optimization problem during runtime, state estimates can be generated online almost instantly. Simulation results are presented to show the accuracy and time efficiency of the proposed framework compared to online optimization of moving horizon estimation and Kalman filter. To the best of our knowledge, this is the first learning-based state estimator with feasibility and near-optimality guarantees for linear constrained systems.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
On the Optimization Landscape of Dynamic Output Feedback Linear Quadratic Control
Authors:
**gliang Duan,
Wenhan Cao,
Yang Zheng,
Lin Zhao
Abstract:
The convergence of policy gradient algorithms hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies (controllers). We investig…
▽ More
The convergence of policy gradient algorithms hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies (controllers). We investigate the more challenging case of dynamic output-feedback policies for linear quadratic regulation (abbreviated as dLQR), which is prevalent in practice but has a rather complicated optimization landscape. We first show how the dLQR cost varies with the coordinate transformation of the dynamic controller and then derive the optimal transformation for a given observable stabilizing controller. One of our core results is the uniqueness of the stationary point of dLQR when it is observable, which provides an optimality certificate for solving dynamic controllers using policy gradient methods. Moreover, we establish conditions under which dLQR and linear quadratic Gaussian control are equivalent, thus providing a unified viewpoint of optimal control of both deterministic and stochastic linear systems. These results further shed light on designing policy gradient algorithms for more general decision-making problems with partially observed information.
△ Less
Submitted 29 October, 2023; v1 submitted 24 January, 2022;
originally announced January 2022.
-
Interpreting Audiograms with Multi-stage Neural Networks
Authors:
Shufan Li,
Congxi Lu,
Linkai Li,
Jirong Duan,
** Fu,
Haoshuai Zhou
Abstract:
Audiograms are a particular type of line charts representing individuals' hearing level at various frequencies. They are used by audiologists to diagnose hearing loss, and further select and tune appropriate hearing aids for customers. There have been several projects such as Autoaudio that aim to accelerate this process through means of machine learning. But all existing models at their best can…
▽ More
Audiograms are a particular type of line charts representing individuals' hearing level at various frequencies. They are used by audiologists to diagnose hearing loss, and further select and tune appropriate hearing aids for customers. There have been several projects such as Autoaudio that aim to accelerate this process through means of machine learning. But all existing models at their best can only detect audiograms in images and classify them into general categories. They are unable to extract hearing level information from detected audiograms by interpreting the marks, axis, and lines. To address this issue, we propose a Multi-stage Audiogram Interpretation Network (MAIN) that directly reads hearing level data from photos of audiograms. We also established Open Audiogram, an open dataset of audiogram images with annotations of marks and axes on which we trained and evaluated our proposed model. Experiments show that our model is feasible and reliable.
△ Less
Submitted 17 December, 2021;
originally announced December 2021.
-
Learn2Reg: comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning
Authors:
Alessa Hering,
Lasse Hansen,
Tony C. W. Mok,
Albert C. S. Chung,
Hanna Siebert,
Stephanie Häger,
Annkristin Lange,
Sven Kuckertz,
Stefan Heldmann,
Wei Shao,
Sulaiman Vesal,
Mirabela Rusu,
Geoffrey Sonn,
Théo Estienne,
Maria Vakalopoulou,
Luyi Han,
Yunzhi Huang,
Pew-Thian Yap,
Mikael Brudfors,
Yaël Balbastre,
Samuel Joutard,
Marc Modat,
Gal Lifshitz,
Dan Raviv,
**xin Lv
, et al. (28 additional authors not shown)
Abstract:
Image registration is a fundamental medical image analysis task, and a wide variety of approaches have been proposed. However, only a few studies have comprehensively compared medical image registration approaches on a wide range of clinically relevant tasks. This limits the development of registration methods, the adoption of research advances into practice, and a fair benchmark across competing…
▽ More
Image registration is a fundamental medical image analysis task, and a wide variety of approaches have been proposed. However, only a few studies have comprehensively compared medical image registration approaches on a wide range of clinically relevant tasks. This limits the development of registration methods, the adoption of research advances into practice, and a fair benchmark across competing approaches. The Learn2Reg challenge addresses these limitations by providing a multi-task medical image registration data set for comprehensive characterisation of deformable registration algorithms. A continuous evaluation will be possible at https://learn2reg.grand-challenge.org. Learn2Reg covers a wide range of anatomies (brain, abdomen, and thorax), modalities (ultrasound, CT, MR), availability of annotations, as well as intra- and inter-patient registration evaluation. We established an easily accessible framework for training and validation of 3D registration methods, which enabled the compilation of results of over 65 individual method submissions from more than 20 unique teams. We used a complementary set of metrics, including robustness, accuracy, plausibility, and runtime, enabling unique insight into the current state-of-the-art of medical image registration. This paper describes datasets, tasks, evaluation methods and results of the challenge, as well as results of further analysis of transferability to new datasets, the importance of label supervision, and resulting bias. While no single approach worked best across all tasks, many methodological aspects could be identified that push the performance of medical image registration to new state-of-the-art performance. Furthermore, we demystified the common belief that conventional registration methods have to be much slower than deep-learning-based methods.
△ Less
Submitted 7 October, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
Optimization Landscape of Gradient Descent for Discrete-time Static Output Feedback
Authors:
**gliang Duan,
Jie Li,
Shengbo Eben Li,
Lin Zhao
Abstract:
In this paper, we analyze the optimization landscape of gradient descent methods for static output feedback (SOF) control of discrete-time linear time-invariant systems with quadratic cost. The SOF setting can be quite common, for example, when there are unmodeled hidden states in the underlying process. We first establish several important properties of the SOF cost function, including coercivity…
▽ More
In this paper, we analyze the optimization landscape of gradient descent methods for static output feedback (SOF) control of discrete-time linear time-invariant systems with quadratic cost. The SOF setting can be quite common, for example, when there are unmodeled hidden states in the underlying process. We first establish several important properties of the SOF cost function, including coercivity, L-smoothness, and M-Lipschitz continuous Hessian. We then utilize these properties to show that the gradient descent is able to converge to a stationary point at a dimension-free rate. Furthermore, we prove that under some mild conditions, gradient descent converges linearly to a local minimum if the starting point is close to one. These results not only characterize the performance of gradient descent in optimizing the SOF problem, but also shed light on the efficiency of general policy gradient methods in reinforcement learning.
△ Less
Submitted 10 March, 2022; v1 submitted 27 September, 2021;
originally announced September 2021.
-
Encoding Distributional Soft Actor-Critic for Autonomous Driving in Multi-lane Scenarios
Authors:
**gliang Duan,
Yangang Ren,
Fawang Zhang,
Yang Guan,
Dongjie Yu,
Shengbo Eben Li,
Bo Cheng,
Lin Zhao
Abstract:
In this paper, we propose a new reinforcement learning (RL) algorithm, called encoding distributional soft actor-critic (E-DSAC), for decision-making in autonomous driving. Unlike existing RL-based decision-making methods, E-DSAC is suitable for situations where the number of surrounding vehicles is variable and eliminates the requirement for manually pre-designed sorting rules, resulting in highe…
▽ More
In this paper, we propose a new reinforcement learning (RL) algorithm, called encoding distributional soft actor-critic (E-DSAC), for decision-making in autonomous driving. Unlike existing RL-based decision-making methods, E-DSAC is suitable for situations where the number of surrounding vehicles is variable and eliminates the requirement for manually pre-designed sorting rules, resulting in higher policy performance and generality. We first develop an encoding distributional policy iteration (DPI) framework by embedding a permutation invariant module, which employs a feature neural network (NN) to encode the indicators of each vehicle, in the distributional RL framework. The proposed DPI framework is proved to exhibit important properties in terms of convergence and global optimality. Next, based on the developed encoding DPI framework, we propose the E-DSAC algorithm by adding the gradient-based update rule of the feature NN to the policy evaluation process of the DSAC algorithm. Then, the multi-lane driving task and the corresponding reward function are designed to verify the effectiveness of the proposed algorithm. Results show that the policy learned by E-DSAC can realize efficient, smooth, and relatively safe autonomous driving in the designed scenario. And the final policy performance learned by E-DSAC is about three times that of DSAC. Furthermore, its effectiveness has also been verified in real vehicle experiments.
△ Less
Submitted 12 September, 2021;
originally announced September 2021.
-
Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian
Authors:
Baiyu Peng,
**gliang Duan,
Jianyu Chen,
Shengbo Eben Li,
Gen** Xie,
Congsheng Zhang,
Yang Guan,
Yao Mu,
Enxin Sun
Abstract:
Safety is essential for reinforcement learning (RL) applied in the real world. Adding chance constraints (or probabilistic constraints) is a suitable way to enhance RL safety under uncertainty. Existing chance-constrained RL methods like the penalty methods and the Lagrangian methods either exhibit periodic oscillations or learn an over-conservative or unsafe policy. In this paper, we address thes…
▽ More
Safety is essential for reinforcement learning (RL) applied in the real world. Adding chance constraints (or probabilistic constraints) is a suitable way to enhance RL safety under uncertainty. Existing chance-constrained RL methods like the penalty methods and the Lagrangian methods either exhibit periodic oscillations or learn an over-conservative or unsafe policy. In this paper, we address these shortcomings by proposing a separated proportional-integral Lagrangian (SPIL) algorithm. We first review the constrained policy optimization process from a feedback control perspective, which regards the penalty weight as the control input and the safe probability as the control output. Based on this, the penalty method is formulated as a proportional controller, and the Lagrangian method is formulated as an integral controller. We then unify them and present a proportional-integral Lagrangian method to get both their merits, with an integral separation technique to limit the integral value in a reasonable range. To accelerate training, the gradient of safe probability is computed in a model-based manner. We demonstrate our method can reduce the oscillations and conservatism of RL policy in a car-following simulation. To prove its practicality, we also apply our method to a real-world mobile robot navigation task, where our robot successfully avoids a moving obstacle with highly uncertain or even aggressive behaviors.
△ Less
Submitted 26 August, 2021;
originally announced August 2021.
-
Iterative Self-consistent Parallel Magnetic Resonance Imaging Reconstruction based on Nonlocal Low-Rank Regularization
Authors:
Ting Pan,
Jizhong Duan,
Junfeng Wang,
Yu Liu
Abstract:
Iterative self-consistent parallel imaging reconstruction (SPIRiT) is an effective self-calibrated reconstruction model for parallel magnetic resonance imaging (PMRI). The joint L1 norm of wavelet coefficients and joint total variation (TV) regularization terms are incorporated into the SPIRiT model to improve the reconstruction performance. The simultaneous two-directional low-rankness (STDLR) in…
▽ More
Iterative self-consistent parallel imaging reconstruction (SPIRiT) is an effective self-calibrated reconstruction model for parallel magnetic resonance imaging (PMRI). The joint L1 norm of wavelet coefficients and joint total variation (TV) regularization terms are incorporated into the SPIRiT model to improve the reconstruction performance. The simultaneous two-directional low-rankness (STDLR) in k-space data is incorporated into SPIRiT to realize improved reconstruction. Recent methods have exploited the nonlocal self-similarity (NSS) of images by imposing nonlocal low-rankness of similar patches to achieve a superior performance. To fully utilize both the NSS in Magnetic resonance (MR) images and calibration consistency in the k-space domain, we propose a nonlocal low-rank (NLR)-SPIRiT model by incorporating NLR regularization into the SPIRiT model. We apply the weighted nuclear norm (WNN) as a surrogate of the rank and employ the Nash equilibrium (NE) formulation and alternating direction method of multipliers (ADMM) to efficiently solve the NLR-SPIRiT model. The experimental results demonstrate the superior performance of NLR-SPIRiT over the state-of-the-art methods via three objective metrics and visual comparison.
△ Less
Submitted 17 April, 2022; v1 submitted 10 August, 2021;
originally announced August 2021.
-
Lightness Modulated Deep Inverse Tone Map**
Authors:
Kanglin Liu,
Gaofeng Cao,
Jiang Duan,
Guo** Qiu
Abstract:
Single-image HDR reconstruction or inverse tone map** (iTM) is a challenging task. In particular, recovering information in over-exposed regions is extremely difficult because details in such regions are almost completely lost. In this paper, we present a deep learning based iTM method that takes advantage of the feature extraction and map** power of deep convolutional neural networks (CNNs) a…
▽ More
Single-image HDR reconstruction or inverse tone map** (iTM) is a challenging task. In particular, recovering information in over-exposed regions is extremely difficult because details in such regions are almost completely lost. In this paper, we present a deep learning based iTM method that takes advantage of the feature extraction and map** power of deep convolutional neural networks (CNNs) and uses a lightness prior to modulate the CNN to better exploit observations in the surrounding areas of the over-exposed regions to enhance the quality of HDR image reconstruction. Specifically, we introduce a Hierarchical Synthesis Network (HiSN) for inferring a HDR image from a LDR input and a Lightness Adpative Modulation Network (LAMN) to incorporate the the lightness prior knowledge in the inferring process. The HiSN hierarchically synthesizes the high-brightness component and the low-brightness component of the HDR image whilst the LAMN uses a lightness adaptive mask that separates detail-less saturated bright pixels from well-exposed lower light pixels to enable HiSN to better infer the missing information, particularly in the difficult over-exposed detail-less areas. We present experimental results to demonstrate the effectiveness of the new technique based on quantitative measures and visual comparisons. In addition, we present ablation studies of HiSN and visualization of the activation maps inside LAMN to help gain a deeper understanding of the internal working of the new iTM algorithm and explain why it can achieve much improved performance over state-of-the-art algorithms.
△ Less
Submitted 16 July, 2021;
originally announced July 2021.
-
Learning a Model-Driven Variational Network for Deformable Image Registration
Authors:
Xi Jia,
Alexander Thorley,
Wei Chen,
Huaqi Qiu,
Linlin Shen,
Iain B Styles,
Hyung ** Chang,
Ales Leonardis,
Antonio de Marvao,
Declan P. O'Regan,
Daniel Rueckert,
**ming Duan
Abstract:
Data-driven deep learning approaches to image registration can be less accurate than conventional iterative approaches, especially when training data is limited. To address this whilst retaining the fast inference speed of deep learning, we propose VR-Net, a novel cascaded variational network for unsupervised deformable image registration. Using the variable splitting optimization scheme, we first…
▽ More
Data-driven deep learning approaches to image registration can be less accurate than conventional iterative approaches, especially when training data is limited. To address this whilst retaining the fast inference speed of deep learning, we propose VR-Net, a novel cascaded variational network for unsupervised deformable image registration. Using the variable splitting optimization scheme, we first convert the image registration problem, established in a generic variational framework, into two sub-problems, one with a point-wise, closed-form solution while the other one is a denoising problem. We then propose two neural layers (i.e. war** layer and intensity consistency layer) to model the analytical solution and a residual U-Net to formulate the denoising problem (i.e. generalized denoising layer). Finally, we cascade the war** layer, intensity consistency layer, and generalized denoising layer to form the VR-Net. Extensive experiments on three (two 2D and one 3D) cardiac magnetic resonance imaging datasets show that VR-Net outperforms state-of-the-art deep learning methods on registration accuracy, while maintains the fast inference speed of deep learning and the data-efficiency of variational model.
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
Fixed-Dimensional and Permutation Invariant State Representation of Autonomous Driving
Authors:
**gliang Duan,
Dongjie Yu,
Shengbo Eben Li,
Wenxuan Wang,
Yangang Ren,
Ziyu Lin,
Bo Cheng
Abstract:
In this paper, we propose a new state representation method, called encoding sum and concatenation (ESC), for the state representation of decision-making in autonomous driving. Unlike existing state representation methods, ESC is applicable to a variable number of surrounding vehicles and eliminates the need for manually pre-designed sorting rules, leading to higher representation ability and gene…
▽ More
In this paper, we propose a new state representation method, called encoding sum and concatenation (ESC), for the state representation of decision-making in autonomous driving. Unlike existing state representation methods, ESC is applicable to a variable number of surrounding vehicles and eliminates the need for manually pre-designed sorting rules, leading to higher representation ability and generality. The proposed ESC method introduces a representation neural network (NN) to encode each surrounding vehicle into an encoding vector, and then adds these vectors to obtain the representation vector of the set of surrounding vehicles. By concatenating the set representation with other variables, such as indicators of the ego vehicle and road, we realize the fixed-dimensional and permutation invariant state representation. This paper has further proved that the proposed ESC method can realize the injective representation if the output dimension of the representation NN is greater than the number of variables of all surrounding vehicles. This means that by taking the ESC representation as policy inputs, we can find the nearly optimal representation NN and policy NN by simultaneously optimizing them using gradient-based updating. Experiments demonstrate that compared with the fixed-permutation representation method, the proposed method improves the representation ability of the surrounding vehicles, and the corresponding approximation error is reduced by 62.2%.
△ Less
Submitted 4 March, 2022; v1 submitted 24 May, 2021;
originally announced May 2021.
-
A Distributed and Resilient Bargaining Game for Weather-Predictive Microgrid Energy Cooperation
Authors:
Lu An,
Jie Duan,
Mo-Yuen Chow,
Alexandra Duel-Hallen
Abstract:
A bargaining game is investigated for cooperative energy management in microgrids. This game incorporates a fully distributed and realistic cooperative power scheduling algorithm (CoDES) as well as a distributed Nash Bargaining Solution (NBS)-based method of allocating the overall power bill resulting from CoDES. A novel weather-based stochastic renewable generation (RG) prediction method is incor…
▽ More
A bargaining game is investigated for cooperative energy management in microgrids. This game incorporates a fully distributed and realistic cooperative power scheduling algorithm (CoDES) as well as a distributed Nash Bargaining Solution (NBS)-based method of allocating the overall power bill resulting from CoDES. A novel weather-based stochastic renewable generation (RG) prediction method is incorporated in the power scheduling. We demonstrate the proposed game using a 4-user grid-connected microgrid model with diverse user demands, storage, and RG profiles and examine the effect of weather prediction on day-ahead power scheduling and cost/profit allocation. Finally, the impact of users' ambivalence about cooperation and /or dishonesty on the bargaining outcome is investigated, and it is shown that the proposed game is resilient to malicious users' attempts to avoid payment of their fair share of the overall bill.
△ Less
Submitted 12 April, 2021;
originally announced April 2021.
-
Approximate Optimal Filter for Linear Gaussian Time-invariant Systems
Authors:
Kaiming Tang,
Shengbo Eben Li,
Yuming Yin,
Yang Guan,
**gliang Duan,
Wenhan Cao,
Jie Li
Abstract:
State estimation is critical to control systems, especially when the states cannot be directly measured. This paper presents an approximate optimal filter, which enables to use policy iteration technique to obtain the steady-state gain in linear Gaussian time-invariant systems. This design transforms the optimal filtering problem with minimum mean square error into an optimal control problem, call…
▽ More
State estimation is critical to control systems, especially when the states cannot be directly measured. This paper presents an approximate optimal filter, which enables to use policy iteration technique to obtain the steady-state gain in linear Gaussian time-invariant systems. This design transforms the optimal filtering problem with minimum mean square error into an optimal control problem, called Approximate Optimal Filtering (AOF) problem. The equivalence holds given certain conditions about initial state distributions and policy formats, in which the system state is the estimation error, control input is the filter gain, and control objective function is the accumulated estimation error. We present a policy iteration algorithm to solve the AOF problem in steady-state. A classic vehicle state estimation problem finally evaluates the approximate filter. The results show that the policy converges to the steady-state Kalman gain, and its accuracy is within 2 %.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
Recurrent Model Predictive Control
Authors:
Zhengyu Liu,
**gliang Duan,
Wenxuan Wang,
Shengbo Eben Li,
Yuming Yin,
Ziyu Lin,
Qi Sun,
Bo Cheng
Abstract:
This paper proposes an off-line algorithm, called Recurrent Model Predictive Control (RMPC), to solve general nonlinear finite-horizon optimal control problems. Unlike traditional Model Predictive Control (MPC) algorithms, it can make full use of the current computing resources and adaptively select the longest model prediction horizon. Our algorithm employs a recurrent function to approximate the…
▽ More
This paper proposes an off-line algorithm, called Recurrent Model Predictive Control (RMPC), to solve general nonlinear finite-horizon optimal control problems. Unlike traditional Model Predictive Control (MPC) algorithms, it can make full use of the current computing resources and adaptively select the longest model prediction horizon. Our algorithm employs a recurrent function to approximate the optimal policy, which maps the system states and reference values directly to the control inputs. The number of prediction steps is equal to the number of recurrent cycles of the learned policy function. With an arbitrary initial policy function, the proposed RMPC algorithm can converge to the optimal policy by directly minimizing the designed loss function. We further prove the convergence and optimality of the RMPC algorithm thorough Bellman optimality principle, and demonstrate its generality and efficiency using two numerical examples.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Recurrent Model Predictive Control: Learning an Explicit Recurrent Controller for Nonlinear Systems
Authors:
Zhengyu Liu,
**gliang Duan,
Wenxuan Wang,
Shengbo Eben Li,
Yuming Yin,
Ziyu Lin,
Bo Cheng
Abstract:
This paper proposes an offline control algorithm, called Recurrent Model Predictive Control (RMPC), to solve large-scale nonlinear finite-horizon optimal control problems. It can be regarded as an explicit solver of traditional Model Predictive Control (MPC) algorithms, which can adaptively select appropriate model prediction horizon according to current computing resources, so as to improve the p…
▽ More
This paper proposes an offline control algorithm, called Recurrent Model Predictive Control (RMPC), to solve large-scale nonlinear finite-horizon optimal control problems. It can be regarded as an explicit solver of traditional Model Predictive Control (MPC) algorithms, which can adaptively select appropriate model prediction horizon according to current computing resources, so as to improve the policy performance. Our algorithm employs a recurrent function to approximate the optimal policy, which maps the system states and reference values directly to the control inputs. The output of the learned policy network after N recurrent cycles corresponds to the nearly optimal solution of N-step MPC. A policy optimization objective is designed by decomposing the MPC cost function according to the Bellman's principle of optimality. The optimal recurrent policy can be obtained by directly minimizing the designed objective function, which is applicable for general nonlinear and non input-affine systems. Both simulation-based and real-robot path-tracking tasks are utilized to demonstrate the effectiveness of the proposed method.
△ Less
Submitted 8 April, 2022; v1 submitted 20 February, 2021;
originally announced February 2021.
-
Separated Proportional-Integral Lagrangian for Chance Constrained Reinforcement Learning
Authors:
Baiyu Peng,
Yao Mu,
**gliang Duan,
Yang Guan,
Shengbo Eben Li,
Jianyu Chen
Abstract:
Safety is essential for reinforcement learning (RL) applied in real-world tasks like autonomous driving. Chance constraints which guarantee the satisfaction of state constraints at a high probability are suitable to represent the requirements in real-world environment with uncertainty. Existing chance constrained RL methods like the penalty method and the Lagrangian method either exhibit periodic…
▽ More
Safety is essential for reinforcement learning (RL) applied in real-world tasks like autonomous driving. Chance constraints which guarantee the satisfaction of state constraints at a high probability are suitable to represent the requirements in real-world environment with uncertainty. Existing chance constrained RL methods like the penalty method and the Lagrangian method either exhibit periodic oscillations or cannot satisfy the constraints. In this paper, we address these shortcomings by proposing a separated proportional-integral Lagrangian (SPIL) algorithm. Taking a control perspective, we first interpret the penalty method and the Lagrangian method as proportional feedback and integral feedback control, respectively. Then, a proportional-integral Lagrangian method is proposed to steady learning process while improving safety. To prevent integral overshooting and reduce conservatism, we introduce the integral separation technique inspired by PID control. Finally, an analytical gradient of the chance constraint is utilized for model-based policy optimization. The effectiveness of SPIL is demonstrated by a narrow car-following task. Experiments indicate that compared with previous methods, SPIL improves the performance while guaranteeing safety, with a steady learning process.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
Complementary Time-Frequency Domain Networks for Dynamic Parallel MR Image Reconstruction
Authors:
Chen Qin,
**ming Duan,
Kerstin Hammernik,
Jo Schlemper,
Thomas Küstner,
René Botnar,
Claudia Prieto,
Anthony N. Price,
Joseph V. Hajnal,
Daniel Rueckert
Abstract:
Purpose: To introduce a novel deep learning based approach for fast and high-quality dynamic multi-coil MR reconstruction by learning a complementary time-frequency domain network that exploits spatio-temporal correlations simultaneously from complementary domains.
Theory and Methods: Dynamic parallel MR image reconstruction is formulated as a multi-variable minimisation problem, where the data…
▽ More
Purpose: To introduce a novel deep learning based approach for fast and high-quality dynamic multi-coil MR reconstruction by learning a complementary time-frequency domain network that exploits spatio-temporal correlations simultaneously from complementary domains.
Theory and Methods: Dynamic parallel MR image reconstruction is formulated as a multi-variable minimisation problem, where the data is regularised in combined temporal Fourier and spatial (x-f) domain as well as in spatio-temporal image (x-t) domain. An iterative algorithm based on variable splitting technique is derived, which alternates among signal de-aliasing steps in x-f and x-t spaces, a closed-form point-wise data consistency step and a weighted coupling step. The iterative model is embedded into a deep recurrent neural network which learns to recover the image via exploiting spatio-temporal redundancies in complementary domains.
Results: Experiments were performed on two datasets of highly undersampled multi-coil short-axis cardiac cine MRI scans. Results demonstrate that our proposed method outperforms the current state-of-the-art approaches both quantitatively and qualitatively. The proposed model can also generalise well to data acquired from a different scanner and data with pathologies that were not seen in the training set.
Conclusion: The work shows the benefit of reconstructing dynamic parallel MRI in complementary time-frequency domains with deep neural networks. The method can effectively and robustly reconstruct high-quality images from highly undersampled dynamic multi-coil data ($16 \times$ and $24 \times$ yielding 15s and 10s scan times respectively) with fast reconstruction speed (2.8s). This could potentially facilitate achieving fast single-breath-hold clinical 2D cardiac cine imaging.
△ Less
Submitted 18 June, 2021; v1 submitted 22 December, 2020;
originally announced December 2020.
-
On Training Effective Reinforcement Learning Agents for Real-time Power Grid Operation and Control
Authors:
Ruisheng Diao,
Di Shi,
Bei Zhang,
Siqi Wang,
Haifeng Li,
Chunlei Xu,
Tu Lan,
Desong Bian,
Jiajun Duan
Abstract:
Deriving fast and effectively coordinated control actions remains a grand challenge affecting the secure and economic operation of today's large-scale power grid. This paper presents a novel artificial intelligence (AI) based methodology to achieve multi-objective real-time power grid control for real-world implementation. State-of-the-art off-policy reinforcement learning (RL) algorithm, soft act…
▽ More
Deriving fast and effectively coordinated control actions remains a grand challenge affecting the secure and economic operation of today's large-scale power grid. This paper presents a novel artificial intelligence (AI) based methodology to achieve multi-objective real-time power grid control for real-world implementation. State-of-the-art off-policy reinforcement learning (RL) algorithm, soft actor-critic (SAC) is adopted to train AI agents with multi-thread offline training and periodic online training for regulating voltages and transmission losses without violating thermal constraints of lines. A software prototype was developed and deployed in the control center of SGCC Jiangsu Electric Power Company that interacts with their Energy Management System (EMS) every 5 minutes. Massive numerical studies using actual power grid snapshots in the real-time environment verify the effectiveness of the proposed approach. Well-trained SAC agents can learn to provide effective and subsecond control actions in regulating voltage profiles and reducing transmission losses.
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
Automated Model Selection for Time-Series Anomaly Detection
Authors:
Yuanxiang Ying,
Juanyong Duan,
Chunlei Wang,
Yu**g Wang,
Congrui Huang,
Bixiong Xu
Abstract:
Time-series anomaly detection is a popular topic in both academia and industrial fields. Many companies need to monitor thousands of temporal signals for their applications and services and require instant feedback and alerts for potential incidents in time. The task is challenging because of the complex characteristics of time-series, which are messy, stochastic, and often without proper labels.…
▽ More
Time-series anomaly detection is a popular topic in both academia and industrial fields. Many companies need to monitor thousands of temporal signals for their applications and services and require instant feedback and alerts for potential incidents in time. The task is challenging because of the complex characteristics of time-series, which are messy, stochastic, and often without proper labels. This prohibits training supervised models because of lack of labels and a single model hardly fits different time series. In this paper, we propose a solution to address these issues. We present an automated model selection framework to automatically find the most suitable detection model with proper parameters for the incoming data. The model selection layer is extensible as it can be updated without too much effort when a new detector is available to the service. Finally, we incorporate a customized tuning algorithm to flexibly filter anomalies to meet customers' criteria. Experiments on real-world datasets show the effectiveness of our solution.
△ Less
Submitted 25 August, 2020;
originally announced September 2020.
-
Ternary Policy Iteration Algorithm for Nonlinear Robust Control
Authors:
Jie Li,
Shengbo Eben Li,
Yang Guan,
**gliang Duan,
Wenyu Li,
Yuming Yin
Abstract:
The uncertainties in plant dynamics remain a challenge for nonlinear control problems. This paper develops a ternary policy iteration (TPI) algorithm for solving nonlinear robust control problems with bounded uncertainties. The controller and uncertainty of the system are considered as game players, and the robust control problem is formulated as a two-player zero-sum differential game. In order t…
▽ More
The uncertainties in plant dynamics remain a challenge for nonlinear control problems. This paper develops a ternary policy iteration (TPI) algorithm for solving nonlinear robust control problems with bounded uncertainties. The controller and uncertainty of the system are considered as game players, and the robust control problem is formulated as a two-player zero-sum differential game. In order to solve the differential game, the corresponding Hamilton-Jacobi-Isaacs (HJI) equation is then derived. Three loss functions and three update phases are designed to match the identity equation, minimization and maximization of the HJI equation, respectively. These loss functions are defined by the expectation of the approximate Hamiltonian in a generated state set to prevent operating all the states in the entire state set concurrently. The parameters of value function and policies are directly updated by diminishing the designed loss functions using the gradient descent method. Moreover, zero-initialization can be applied to the parameters of the control policy. The effectiveness of the proposed TPI algorithm is demonstrated through two simulation studies. The simulation results show that the TPI algorithm can converge to the optimal solution for the linear plant, and has high resistance to disturbances for the nonlinear plant.
△ Less
Submitted 14 July, 2020;
originally announced July 2020.
-
Deep Network Interpolation for Accelerated Parallel MR Image Reconstruction
Authors:
Chen Qin,
Jo Schlemper,
Kerstin Hammernik,
**ming Duan,
Ronald M Summers,
Daniel Rueckert
Abstract:
We present a deep network interpolation strategy for accelerated parallel MR image reconstruction. In particular, we examine the network interpolation in parameter space between a source model that is formulated in an unrolled scheme with L1 and SSIM losses and its counterpart that is trained with an adversarial loss. We show that by interpolating between the two different models of the same netwo…
▽ More
We present a deep network interpolation strategy for accelerated parallel MR image reconstruction. In particular, we examine the network interpolation in parameter space between a source model that is formulated in an unrolled scheme with L1 and SSIM losses and its counterpart that is trained with an adversarial loss. We show that by interpolating between the two different models of the same network structure, the new interpolated network can model a trade-off between perceptual quality and fidelity.
△ Less
Submitted 12 July, 2020;
originally announced July 2020.
-
Continuous-time finite-horizon ADP for automated vehicle controller design with high efficiency
Authors:
Ziyu Lin,
**gliang Duan,
Shengbo Eben Li,
Haitong Ma,
Yuming Yin
Abstract:
The design of an automated vehicle controller can be generally formulated into an optimal control problem. This paper proposes a continuous-time finite-horizon approximate dynamicprogramming (ADP) method, which can synthesis off-line near-optimal control policy with analytical vehicle dynamics. Lying on the general Policy Iteration framework, it employs value andpolicy neural networks to approxima…
▽ More
The design of an automated vehicle controller can be generally formulated into an optimal control problem. This paper proposes a continuous-time finite-horizon approximate dynamicprogramming (ADP) method, which can synthesis off-line near-optimal control policy with analytical vehicle dynamics. Lying on the general Policy Iteration framework, it employs value andpolicy neural networks to approximate the map**s from thesystem states to value function and control inputs, respectively. The proposed method can converge to the near-optimal solutionof the finite-horizon Hamilton-Jacobi-Bellman (HJB) equation. We further applied our algorithm to the simulation of automated vehicle control for the path tracking maneuver. The results suggest that the proposed ADP method can obtain the near-optimal policy with 1% error and less calculation time. What is more, the proposed ADP algorithm is also suitable for nonlinear control systems, where ADP is almost 500 times faster than the nonlinear MPC ipopt solver.
△ Less
Submitted 4 July, 2020;
originally announced July 2020.
-
Hierarchical Reinforcement Learning for Self-Driving Decision-Making without Reliance on Labeled Driving Data
Authors:
**gliang Duan,
Shengbo Eben Li,
Yang Guan,
Qi Sun,
Bo Cheng
Abstract:
Decision making for self-driving cars is usually tackled by manually encoding rules from drivers' behaviors or imitating drivers' manipulation using supervised learning techniques. Both of them rely on mass driving data to cover all possible driving scenarios. This paper presents a hierarchical reinforcement learning method for decision making of self-driving cars, which does not depend on a large…
▽ More
Decision making for self-driving cars is usually tackled by manually encoding rules from drivers' behaviors or imitating drivers' manipulation using supervised learning techniques. Both of them rely on mass driving data to cover all possible driving scenarios. This paper presents a hierarchical reinforcement learning method for decision making of self-driving cars, which does not depend on a large amount of labeled driving data. This method comprehensively considers both high-level maneuver selection and low-level motion control in both lateral and longitudinal directions. We firstly decompose the driving tasks into three maneuvers, including driving in lane, right lane change and left lane change, and learn the sub-policy for each maneuver. Then, a master policy is learned to choose the maneuver policy to be executed in the current state. All policies including master policy and maneuver policies are represented by fully-connected neural networks and trained by using asynchronous parallel reinforcement learners (APRL), which builds a map** from the sensory outputs to driving decisions. Different state spaces and reward functions are designed for each maneuver. We apply this method to a highway driving scenario, which demonstrates that it can realize smooth and safe decision making for self-driving cars.
△ Less
Submitted 27 January, 2020;
originally announced January 2020.
-
Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors
Authors:
**gliang Duan,
Yang Guan,
Shengbo Eben Li,
Yangang Ren,
Bo Cheng
Abstract:
In reinforcement learning (RL), function approximation errors are known to easily lead to the Q-value overestimations, thus greatly reducing policy performance. This paper presents a distributional soft actor-critic (DSAC) algorithm, which is an off-policy RL method for continuous control setting, to improve the policy performance by mitigating Q-value overestimations. We first discover in theory…
▽ More
In reinforcement learning (RL), function approximation errors are known to easily lead to the Q-value overestimations, thus greatly reducing policy performance. This paper presents a distributional soft actor-critic (DSAC) algorithm, which is an off-policy RL method for continuous control setting, to improve the policy performance by mitigating Q-value overestimations. We first discover in theory that learning a distribution function of state-action returns can effectively mitigate Q-value overestimations because it is capable of adaptively adjusting the update stepsize of the Q-value function. Then, a distributional soft policy iteration (DSPI) framework is developed by embedding the return distribution function into maximum entropy RL. Finally, we present a deep off-policy actor-critic variant of DSPI, called DSAC, which directly learns a continuous return distribution by kee** the variance of the state-action returns within a reasonable range to address exploding and vanishing gradient problems. We evaluate DSAC on the suite of MuJoCo continuous control tasks, achieving the state-of-the-art performance.
△ Less
Submitted 11 June, 2021; v1 submitted 8 January, 2020;
originally announced January 2020.
-
$Σ$-net: Systematic Evaluation of Iterative Deep Neural Networks for Fast Parallel MR Image Reconstruction
Authors:
Kerstin Hammernik,
Jo Schlemper,
Chen Qin,
**ming Duan,
Ronald M. Summers,
Daniel Rueckert
Abstract:
Purpose: To systematically investigate the influence of various data consistency layers, (semi-)supervised learning and ensembling strategies, defined in a $Σ$-net, for accelerated parallel MR image reconstruction using deep learning.
Theory and Methods: MR image reconstruction is formulated as learned unrolled optimization scheme with a Down-Up network as regularization and varying data consist…
▽ More
Purpose: To systematically investigate the influence of various data consistency layers, (semi-)supervised learning and ensembling strategies, defined in a $Σ$-net, for accelerated parallel MR image reconstruction using deep learning.
Theory and Methods: MR image reconstruction is formulated as learned unrolled optimization scheme with a Down-Up network as regularization and varying data consistency layers. The different architectures are split into sensitivity networks, which rely on explicit coil sensitivity maps, and parallel coil networks, which learn the combination of coils implicitly. Different content and adversarial losses, a semi-supervised fine-tuning scheme and model ensembling are investigated.
Results: Evaluated on the fastMRI multicoil validation set, architectures involving raw k-space data outperform image enhancement methods significantly. Semi-supervised fine-tuning adapts to new k-space data and provides, together with reconstructions based on adversarial training, the visually most appealing results although quantitative quality metrics are reduced. The $Σ$-net ensembles the benefits from different models and achieves similar scores compared to the single state-of-the-art approaches.
Conclusion: This work provides an open-source framework to perform a systematic wide-range comparison of state-of-the-art reconstruction approaches for parallel MR image reconstruction on the fastMRI knee dataset and explores the importance of data consistency. A suitable trade-off between perceptual image quality and quantitative scores are achieved with the ensembled $Σ$-net.
△ Less
Submitted 18 December, 2019;
originally announced December 2019.
-
$Σ$-net: Ensembled Iterative Deep Neural Networks for Accelerated Parallel MR Image Reconstruction
Authors:
Jo Schlemper,
Chen Qin,
**ming Duan,
Ronald M. Summers,
Kerstin Hammernik
Abstract:
We explore an ensembled $Σ$-net for fast parallel MR imaging, including parallel coil networks, which perform implicit coil weighting, and sensitivity networks, involving explicit sensitivity maps. The networks in $Σ$-net are trained in a supervised way, including content and GAN losses, and with various ways of data consistency, i.e., proximal map**s, gradient descent and variable splitting. A…
▽ More
We explore an ensembled $Σ$-net for fast parallel MR imaging, including parallel coil networks, which perform implicit coil weighting, and sensitivity networks, involving explicit sensitivity maps. The networks in $Σ$-net are trained in a supervised way, including content and GAN losses, and with various ways of data consistency, i.e., proximal map**s, gradient descent and variable splitting. A semi-supervised finetuning scheme allows us to adapt to the k-space data at test time, which, however, decreases the quantitative metrics, although generating the visually most textured and sharp images. For this challenge, we focused on robust and high SSIM scores, which we achieved by ensembling all models to a $Σ$-net.
△ Less
Submitted 11 December, 2019;
originally announced December 2019.
-
Adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints
Authors:
**gliang Duan,
Zhengyu Liu,
Shengbo Eben Li,
Qi Sun,
Zhenzhong Jia,
Bo Cheng
Abstract:
This paper presents a constrained adaptive dynamic programming (CADP) algorithm to solve general nonlinear nonaffine optimal control problems with known dynamics. Unlike previous ADP algorithms, it can directly deal with problems with state constraints. Firstly, a constrained generalized policy iteration (CGPI) framework is developed to handle state constraints by transforming the traditional poli…
▽ More
This paper presents a constrained adaptive dynamic programming (CADP) algorithm to solve general nonlinear nonaffine optimal control problems with known dynamics. Unlike previous ADP algorithms, it can directly deal with problems with state constraints. Firstly, a constrained generalized policy iteration (CGPI) framework is developed to handle state constraints by transforming the traditional policy improvement process into a constrained policy optimization problem. Next, we propose an actor-critic variant of CGPI, called CADP, in which both policy and value functions are approximated by multi-layer neural networks to directly map the system states to control inputs and value function, respectively. CADP linearizes the constrained optimization problem locally into a quadratically constrained linear programming problem, and then obtains the optimal update of the policy network by solving its dual problem. A trust region constraint is added to prevent excessive policy update, thus ensuring linearization accuracy. We determine the feasibility of the policy optimization problem by calculating the minimum trust region boundary and update the policy using two recovery rules when infeasible. The vehicle control problem in the path-tracking task is used to demonstrate the effectiveness of this proposed method.
△ Less
Submitted 8 April, 2022; v1 submitted 26 November, 2019;
originally announced November 2019.
-
AI-Based Autonomous Line Flow Control via Topology Adjustment for Maximizing Time-Series ATCs
Authors:
Tu Lan,
Jiajun Duan,
Bei Zhang,
Di Shi,
Zhiwei Wang,
Ruisheng Diao,
Xiaohu Zhang
Abstract:
This paper presents a novel AI-based approach for maximizing time-series available transfer capabilities (ATCs) via autonomous topology control considering various practical constraints and uncertainties. Several AI techniques including supervised learning and deep reinforcement learning (DRL) are adopted and improved to train effective AI agents for achieving the desired performance. First, imita…
▽ More
This paper presents a novel AI-based approach for maximizing time-series available transfer capabilities (ATCs) via autonomous topology control considering various practical constraints and uncertainties. Several AI techniques including supervised learning and deep reinforcement learning (DRL) are adopted and improved to train effective AI agents for achieving the desired performance. First, imitation learning (IL) is used to provide a good initial policy for the AI agent. Then, the agent is trained by DRL algorithms with a novel guided exploration technique, which significantly improves the training efficiency. Finally, an Early Warning (EW) mechanism is designed to help the agent find good topology control strategies for long testing periods, which helps the agent to determine action timing using power system domain knowledge; thus, effectively increases the system error-tolerance and robustness. Effectiveness of the proposed approach is demonstrated in the "2019 Learn to Run a Power Network (L2RPN)" global competition, where the developed AI agents can continuously and safely control a power grid to maximize ATCs without operator's intervention for up to 1-month's operation data and eventually won the first place in both development and final phases of the competition. The winning agent has been open-sourced on GitHub.
△ Less
Submitted 8 November, 2019;
originally announced November 2019.
-
Deep learning for cardiac image segmentation: A review
Authors:
Chen Chen,
Chen Qin,
Huaqi Qiu,
Giacomo Tarroni,
**ming Duan,
Wenjia Bai,
Daniel Rueckert
Abstract:
Deep learning has become the most widely used approach for cardiac image segmentation in recent years. In this paper, we provide a review of over 100 cardiac image segmentation papers using deep learning, which covers common imaging modalities including magnetic resonance imaging (MRI), computed tomography (CT), and ultrasound (US) and major anatomical structures of interest (ventricles, atria and…
▽ More
Deep learning has become the most widely used approach for cardiac image segmentation in recent years. In this paper, we provide a review of over 100 cardiac image segmentation papers using deep learning, which covers common imaging modalities including magnetic resonance imaging (MRI), computed tomography (CT), and ultrasound (US) and major anatomical structures of interest (ventricles, atria and vessels). In addition, a summary of publicly available cardiac image datasets and code repositories are included to provide a base for encouraging reproducible research. Finally, we discuss the challenges and limitations with current deep learning-based approaches (scarcity of labels, model generalizability across different domains, interpretability) and suggest potential directions for future research.
△ Less
Submitted 9 November, 2019;
originally announced November 2019.
-
Data consistency networks for (calibration-less) accelerated parallel MR image reconstruction
Authors:
Jo Schlemper,
**ming Duan,
Cheng Ouyang,
Chen Qin,
Jose Caballero,
Joseph V. Hajnal,
Daniel Rueckert
Abstract:
We present simple reconstruction networks for multi-coil data by extending deep cascade of CNN's and exploiting the data consistency layer. In particular, we propose two variants, where one is inspired by POCSENSE and the other is calibration-less. We show that the proposed approaches are competitive relative to the state of the art both quantitatively and qualitatively.
We present simple reconstruction networks for multi-coil data by extending deep cascade of CNN's and exploiting the data consistency layer. In particular, we propose two variants, where one is inspired by POCSENSE and the other is calibration-less. We show that the proposed approaches are competitive relative to the state of the art both quantitatively and qualitatively.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
dAUTOMAP: decomposing AUTOMAP to achieve scalability and enhance performance
Authors:
Jo Schlemper,
Ilkay Oksuz,
James R. Clough,
**ming Duan,
Andrew P. King,
Julia A. Schnabel,
Joseph V. Hajnal,
Daniel Rueckert
Abstract:
AUTOMAP is a promising generalized reconstruction approach, however, it is not scalable and hence the practicality is limited. We present dAUTOMAP, a novel way for decomposing the domain transformation of AUTOMAP, making the model scale linearly. We show dAUTOMAP outperforms AUTOMAP with significantly fewer parameters.
AUTOMAP is a promising generalized reconstruction approach, however, it is not scalable and hence the practicality is limited. We present dAUTOMAP, a novel way for decomposing the domain transformation of AUTOMAP, making the model scale linearly. We show dAUTOMAP outperforms AUTOMAP with significantly fewer parameters.
△ Less
Submitted 25 September, 2019; v1 submitted 24 September, 2019;
originally announced September 2019.
-
Relaxed Actor-Critic with Convergence Guarantees for Continuous-Time Optimal Control of Nonlinear Systems
Authors:
**gliang Duan,
Jie Li,
Qiang Ge,
Shengbo Eben Li,
Monimoy Bujarbaruah,
Fei Ma,
Dezhao Zhang
Abstract:
This paper presents the Relaxed Continuous-Time Actor-critic (RCTAC) algorithm, a method for finding the nearly optimal policy for nonlinear continuous-time (CT) systems with known dynamics and infinite horizon, such as the path-tracking control of vehicles. RCTAC has several advantages over existing adaptive dynamic programming algorithms for CT systems. It does not require the ``admissibility" o…
▽ More
This paper presents the Relaxed Continuous-Time Actor-critic (RCTAC) algorithm, a method for finding the nearly optimal policy for nonlinear continuous-time (CT) systems with known dynamics and infinite horizon, such as the path-tracking control of vehicles. RCTAC has several advantages over existing adaptive dynamic programming algorithms for CT systems. It does not require the ``admissibility" of the initialized policy or the input-affine nature of controlled systems for convergence. Instead, given any initial policy, RCTAC can converge to an admissible, and subsequently nearly optimal policy for a general nonlinear system with a saturated controller. RCTAC consists of two phases: a warm-up phase and a generalized policy iteration phase. The warm-up phase minimizes the square of the Hamiltonian to achieve admissibility, while the generalized policy iteration phase relaxes the update termination conditions for faster convergence. The convergence and optimality of the algorithm are proven through Lyapunov analysis, and its effectiveness is demonstrated through simulations and real-world path-tracking tasks.
△ Less
Submitted 30 March, 2023; v1 submitted 11 September, 2019;
originally announced September 2019.