Search | arXiv e-print repository

Safe Policy Improvement Approaches and their Limitations

Authors: Philipp Scholl, Felix Dietrich, Clemens Otte, Steffen Udluft

Abstract: Safe Policy Improvement (SPI) is an important technique for offline reinforcement learning in safety critical applications as it improves the behavior policy with a high probability. We classify various SPI approaches from the literature into two groups, based on how they utilize the uncertainty of state-action pairs. Focusing on the Soft-SPIBB (Safe Policy Improvement with Soft Baseline Bootstrap… ▽ More Safe Policy Improvement (SPI) is an important technique for offline reinforcement learning in safety critical applications as it improves the behavior policy with a high probability. We classify various SPI approaches from the literature into two groups, based on how they utilize the uncertainty of state-action pairs. Focusing on the Soft-SPIBB (Safe Policy Improvement with Soft Baseline Bootstrap**) algorithms, we show that their claim of being provably safe does not hold. Based on this finding, we develop adaptations, the Adv-Soft-SPIBB algorithms, and show that they are provably safe. A heuristic adaptation, Lower-Approx-Soft-SPIBB, yields the best performance among all SPIBB algorithms in extensive experiments on two benchmarks. We also check the safety guarantees of the provably safe algorithms and show that huge amounts of data are necessary such that the safety bounds become useful in practice. △ Less

Submitted 1 August, 2022; originally announced August 2022.

Comments: 27 pages. arXiv admin note: substantial text overlap with arXiv:2201.12175

arXiv:2201.12175 [pdf, other]

doi 10.5220/0010786600003116

Safe Policy Improvement Approaches on Discrete Markov Decision Processes

Authors: Philipp Scholl, Felix Dietrich, Clemens Otte, Steffen Udluft

Abstract: Safe Policy Improvement (SPI) aims at provable guarantees that a learned policy is at least approximately as good as a given baseline policy. Building on SPI with Soft Baseline Bootstrap** (Soft-SPIBB) by Nadjahi et al., we identify theoretical issues in their approach, provide a corrected theory, and derive a new algorithm that is provably safe on finite Markov Decision Processes (MDP). Additio… ▽ More Safe Policy Improvement (SPI) aims at provable guarantees that a learned policy is at least approximately as good as a given baseline policy. Building on SPI with Soft Baseline Bootstrap** (Soft-SPIBB) by Nadjahi et al., we identify theoretical issues in their approach, provide a corrected theory, and derive a new algorithm that is provably safe on finite Markov Decision Processes (MDP). Additionally, we provide a heuristic algorithm that exhibits the best performance among many state of the art SPI algorithms on two different benchmarks. Furthermore, we introduce a taxonomy of SPI algorithms and empirically show an interesting property of two classes of SPI algorithms: while the mean performance of algorithms that incorporate the uncertainty as a penalty on the action-value is higher, actively restricting the set of policies more consistently produces good policies and is, thus, safer. △ Less

Submitted 28 January, 2022; originally announced January 2022.

Comments: 12 pages, International Conference on Agents and Artificial Intelligence 2022

arXiv:1911.09328 [pdf, other]

doi 10.1002/mp.13388

Bimodal intravascular volumetric imaging combining OCT and MPI

Authors: Sarah Latus, Florian Griese, Matthias Schlüter, Christoph Otte, Martin Möddel, Matthias Graeser, Thore Saathoff, Tobias Knopp, Alexander Schlaefer

Abstract: Intravascular optical coherence tomography (IVOCT) is a catheter based image modality allowing for high resolution imaging of vessels. It is based on a fast sequential acquisition of A-scans with an axial spatial resolution in the range of 5 to 10 μm, i.e., one order of magnitude higher than in conventional methods like intravascular ultrasound or computed tomography angiography. However, position… ▽ More Intravascular optical coherence tomography (IVOCT) is a catheter based image modality allowing for high resolution imaging of vessels. It is based on a fast sequential acquisition of A-scans with an axial spatial resolution in the range of 5 to 10 μm, i.e., one order of magnitude higher than in conventional methods like intravascular ultrasound or computed tomography angiography. However, position and orientation of the catheter in patient coordinates cannot be obtained from the IVOCT measurements alone. Hence, the pose of the catheter needs to be established to correctly reconstruct the three-dimensional vessel shape. Magnetic particle imaging (MPI) is a three-dimensional tomographic, tracer-based and radiation-free image modality providing high temporal resolution with unlimited penetration depth. Volumetric MPI images are angiographic and hence suitable to complement IVOCT as a co-modality. We study simultaneous bimodal IVOCT MPI imaging with the goal of estimating the IVOCT pullback path based on the 3D MPI data. We present a setup to study and evaluate simultaneous IVOCT and MPI image acquisition of differently shaped vessel phantoms. First, the infuence of the MPI tracer concentration on the optical properties required for IVOCT is analyzed. Second, using a concentration allowing for simultaneous imaging, IVOCT and MPI image data is acquired sequentially and simultaneously. Third, the luminal centerline is established from the MPI image volumes and used to estimate the catheter pullback trajectory for IVOCT image reconstruction. The image volumes are compared to the known shape of the phantoms. We were able to identify a suitable MPI tracer concentration of 2.5 mmol/L with negligible influence on the IVOCT signal. The pullback trajectory estimated from MPI agrees well with the centerline of the phantoms. (...) △ Less

Submitted 21 November, 2019; originally announced November 2019.

Comments: 16 pages, 16 figures

arXiv:1907.04902 [pdf, other]

Interpretable Dynamics Models for Data-Efficient Reinforcement Learning

Authors: Markus Kaiser, Clemens Otte, Thomas Runkler, Carl Henrik Ek

Abstract: In this paper, we present a Bayesian view on model-based reinforcement learning. We use expert knowledge to impose structure on the transition model and present an efficient learning scheme based on variational inference. This scheme is applied to a heteroskedastic and bimodal benchmark problem on which we compare our results to NFQ and show how our approach yields human-interpretable insight abou… ▽ More In this paper, we present a Bayesian view on model-based reinforcement learning. We use expert knowledge to impose structure on the transition model and present an efficient learning scheme based on variational inference. This scheme is applied to a heteroskedastic and bimodal benchmark problem on which we compare our results to NFQ and show how our approach yields human-interpretable insight about the underlying dynamics while also increasing data-efficiency. △ Less

Submitted 10 July, 2019; originally announced July 2019.

Comments: ESANN 2019 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 24-26 April 2019, i6doc.com publ., ISBN 978-287-587-065-0

arXiv:1905.09282 [pdf, other]

Spatio-Temporal Deep Learning Models for Tip Force Estimation During Needle Insertion

Authors: Nils Gessert, Torben Priegnitz, Thore Saathoff, Sven-Thomas Antoni, David Meyer, Moritz Franz Hamann, Klaus-Peter Jünemann, Christoph Otte, Alexander Schlaefer

Abstract: Purpose. Precise placement of needles is a challenge in a number of clinical applications such as brachytherapy or biopsy. Forces acting at the needle cause tissue deformation and needle deflection which in turn may lead to misplacement or injury. Hence, a number of approaches to estimate the forces at the needle have been proposed. Yet, integrating sensors into the needle tip is challenging and a… ▽ More Purpose. Precise placement of needles is a challenge in a number of clinical applications such as brachytherapy or biopsy. Forces acting at the needle cause tissue deformation and needle deflection which in turn may lead to misplacement or injury. Hence, a number of approaches to estimate the forces at the needle have been proposed. Yet, integrating sensors into the needle tip is challenging and a careful calibration is required to obtain good force estimates. Methods. We describe a fiber-optical needle tip force sensor design using a single OCT fiber for measurement. The fiber images the deformation of an epoxy layer placed below the needle tip which results in a stream of 1D depth profiles. We study different deep learning approaches to facilitate calibration between this spatio-temporal image data and the related forces. In particular, we propose a novel convGRU-CNN architecture for simultaneous spatial and temporal data processing. Results. The needle can be adapted to different operating ranges by changing the stiffness of the epoxy layer. Likewise, calibration can be adapted by training the deep learning models. Our novel convGRU-CNN architecture results in the lowest mean absolute error of 1.59 +- 1.3 mN and a cross-correlation coefficient of 0.9997, and clearly outperforms the other methods. Ex vivo experiments in human prostate tissue demonstrate the needle's application. Conclusions. Our OCT-based fiber-optical sensor presents a viable alternative for needle tip force estimation. The results indicate that the rich spatio-temporal information included in the stream of images showing the deformation throughout the epoxy layer can be effectively used by deep learning models. Particularly, we demonstrate that the convGRU-CNN architecture performs favorably, making it a promising approach for other spatio-temporal learning problems. △ Less

Submitted 22 May, 2019; originally announced May 2019.

Comments: Accepted for publication in the International Journal of Computer Assisted Radiology and Surgery

arXiv:1810.12355 [pdf, other]

Feasibility of a markerless tracking system based on optical coherence tomography

Authors: Matthias Schlüter, Christoph Otte, Thore Saathoff, Nils Gessert, Alexander Schlaefer

Abstract: Clinical tracking systems are popular but typically require specific tracking markers. During the last years, scanning speed of optical coherence tomography (OCT) has increased to A-scan rates above 1 MHz allowing to acquire volume scans of moving objects. Thorefore, we propose a markerless tracking system based on OCT to obtain small volumetric images including information of sub-surface structur… ▽ More Clinical tracking systems are popular but typically require specific tracking markers. During the last years, scanning speed of optical coherence tomography (OCT) has increased to A-scan rates above 1 MHz allowing to acquire volume scans of moving objects. Thorefore, we propose a markerless tracking system based on OCT to obtain small volumetric images including information of sub-surface structures at high spatio-temporal resolution. In contrast to conventional vision based approaches, this allows identifying natural landmarks even for smooth and homogeneous surfaces. We describe the optomechanical setup and process flow to evaluate OCT volumes for translations and accordingly adjust the position of the field-of-view to follow moving samples. While our current setup is still preliminary, we demonstrate tracking of motion transversal to the OCT beam of up to 20 mm/s with errors around 0.2 mm and even better for some scenarios. Tracking is evaluated on a clearly structured and on a homogeneous phantom as well as on actual tissue samples. The results show that OCT is promising for fast and precise tracking of smooth, monochromatic objects in medical scenarios. △ Less

Submitted 14 January, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

Comments: Accepted at SPIE Medical Imaging 2019

arXiv:1810.07158 [pdf, other]

Data Association with Gaussian Processes

Authors: Markus Kaiser, Clemens Otte, Thomas Runkler, Carl Henrik Ek

Abstract: The data association problem is concerned with separating data coming from different generating processes, for example when data come from different data sources, contain significant noise, or exhibit multimodality. We present a fully Bayesian approach to this problem. Our model is capable of simultaneously solving the data association problem and the induced supervised learning problems. Underpin… ▽ More The data association problem is concerned with separating data coming from different generating processes, for example when data come from different data sources, contain significant noise, or exhibit multimodality. We present a fully Bayesian approach to this problem. Our model is capable of simultaneously solving the data association problem and the induced supervised learning problems. Underpinning our approach is the use of Gaussian process priors to encode the structure of both the data and the data associations. We present an efficient learning scheme based on doubly stochastic variational inference and discuss how it can be applied to deep Gaussian process priors. △ Less

Submitted 5 May, 2019; v1 submitted 16 October, 2018; originally announced October 2018.

arXiv:1805.11911 [pdf, other]

doi 10.1007/978-3-030-00937-3_26

Needle Tip Force Estimation using an OCT Fiber and a Fused convGRU-CNN Architecture

Authors: Nils Gessert, Torben Priegnitz, Thore Saathoff, Sven-Thomas Antoni, David Meyer, Moritz Franz Hamann, Klaus-Peter Jünemann, Christoph Otte, Alexander Schlaefer

Abstract: Needle insertion is common during minimally invasive interventions such as biopsy or brachytherapy. During soft tissue needle insertion, forces acting at the needle tip cause tissue deformation and needle deflection. Accurate needle tip force measurement provides information on needle-tissue interaction and helps detecting and compensating potential misplacement. For this purpose we introduce an i… ▽ More Needle insertion is common during minimally invasive interventions such as biopsy or brachytherapy. During soft tissue needle insertion, forces acting at the needle tip cause tissue deformation and needle deflection. Accurate needle tip force measurement provides information on needle-tissue interaction and helps detecting and compensating potential misplacement. For this purpose we introduce an image-based needle tip force estimation method using an optical fiber imaging the deformation of an epoxy layer below the needle tip over time. For calibration and force estimation, we introduce a novel deep learning-based fused convolutional GRU-CNN model which effectively exploits the spatio-temporal data structure. The needle is easy to manufacture and our model achieves a mean absolute error of 1.76 +- 1.5 mN with a cross-correlation coefficient of 0.9996, clearly outperforming other methods. We test needles with different materials to demonstrate that the approach can be adapted for different sensitivities and force ranges. Furthermore, we validate our approach in an ex-vivo prostate needle insertion scenario. △ Less

Submitted 30 May, 2018; originally announced May 2018.

Comments: Accepted for Publication at MICCAI 2018

arXiv:1804.10002 [pdf, other]

doi 10.1007/s11548-018-1777-8

Force Estimation from OCT Volumes using 3D CNNs

Authors: Nils Gessert, Jens Beringhoff, Christoph Otte, Alexander Schlaefer

Abstract: \textit{Purpose} Estimating the interaction forces of instruments and tissue is of interest, particularly to provide haptic feedback during robot assisted minimally invasive interventions. Different approaches based on external and integrated force sensors have been proposed. These are hampered by friction, sensor size, and sterilizability. We investigate a novel approach to estimate the force vec… ▽ More \textit{Purpose} Estimating the interaction forces of instruments and tissue is of interest, particularly to provide haptic feedback during robot assisted minimally invasive interventions. Different approaches based on external and integrated force sensors have been proposed. These are hampered by friction, sensor size, and sterilizability. We investigate a novel approach to estimate the force vector directly from optical coherence tomography image volumes. \textit{Methods} We introduce a novel Siamese 3D CNN architecture. The network takes an undeformed reference volume and a deformed sample volume as an input and outputs the three components of the force vector. We employ a deep residual architecture with bottlenecks for increased efficiency. We compare the Siamese approach to methods using difference volumes and two-dimensional projections. Data was generated using a robotic setup to obtain ground truth force vectors for silicon tissue phantoms as well as porcine tissue. \textit{Results} Our method achieves a mean average error of 7.7 +- 4.3 mN when estimating the force vector. Our novel Siamese 3D CNN architecture outperforms single-path methods that achieve a mean average error of 11.59 +- 6.7 mN. Moreover, the use of volume data leads to significantly higher performance compared to processing only surface information which achieves a mean average error of 24.38 +- 22.0 mN. Based on the tissue dataset, our methods shows good generalization in between different subjects. \textit{Conclusions} We propose a novel image-based force estimation method using optical coherence tomography. We illustrate that capturing the deformation of subsurface structures substantially improves force estimation. Our approach can provide accurate force estimates in surgical setups when using intraoperative optical coherence tomography. △ Less

Submitted 26 April, 2018; originally announced April 2018.

Comments: Published in the International Journal of Computer Assisted Radiology and Surgery

arXiv:1710.02766 [pdf, other]

Bayesian Alignments of Warped Multi-Output Gaussian Processes

Authors: Markus Kaiser, Clemens Otte, Thomas Runkler, Carl Henrik Ek

Abstract: We propose a novel Bayesian approach to modelling nonlinear alignments of time series based on latent shared information. We apply the method to the real-world problem of finding common structure in the sensor data of wind turbines introduced by the underlying latent and turbulent wind field. The proposed model allows for both arbitrary alignments of the inputs and non-parametric output war**s t… ▽ More We propose a novel Bayesian approach to modelling nonlinear alignments of time series based on latent shared information. We apply the method to the real-world problem of finding common structure in the sensor data of wind turbines introduced by the underlying latent and turbulent wind field. The proposed model allows for both arbitrary alignments of the inputs and non-parametric output war**s to transform the observations. This gives rise to multiple deep Gaussian process models connected via latent generating processes. We present an efficient variational approximation based on nested variational compression and show how the model can be used to extract shared information between dependent time series, recovering an interpretable functional decomposition of the learning problem. We show results for an artificial data set and real-world data of two wind turbines. △ Less

Submitted 23 May, 2018; v1 submitted 7 October, 2017; originally announced October 2017.

arXiv:1604.04906 [pdf, other]

doi 10.1109/ISBI.2016.7493359

Generating Semi-Synthetic Validation Benchmarks for Embryomics

Authors: Johannes Stegmaier, Julian Arz, Benjamin Schott, Jens C. Otte, Andrei Kobitski, G. Ulrich Nienhaus, Uwe Strähle, Peter Sanders, Ralf Mikut

Abstract: Systematic validation is an essential part of algorithm development. The enormous dataset sizes and the complexity observed in many recent time-resolved 3D fluorescence microscopy imaging experiments, however, prohibit a comprehensive manual ground truth generation. Moreover, existing simulated benchmarks in this field are often too simple or too specialized to sufficiently validate the observed i… ▽ More Systematic validation is an essential part of algorithm development. The enormous dataset sizes and the complexity observed in many recent time-resolved 3D fluorescence microscopy imaging experiments, however, prohibit a comprehensive manual ground truth generation. Moreover, existing simulated benchmarks in this field are often too simple or too specialized to sufficiently validate the observed image analysis problems. We present a new semi-synthetic approach to generate realistic 3D+t benchmarks that combines challenging cellular movement dynamics of real embryos with simulated fluorescent nuclei and artificial image distortions including various parametrizable options like cell numbers, acquisition deficiencies or multiview simulations. We successfully applied the approach to simulate the development of a zebrafish embryo with thousands of cells over 14 hours of its early existence. △ Less

Submitted 17 April, 2016; originally announced April 2016.

Comments: Accepted publication at IEEE International Symposium on Biomedical Imaging: From Nano to Macro (ISBI), 2016

Showing 1–11 of 11 results for author: Otte, C