-
Ten simple rules for teaching sustainable software engineering
Authors:
Kit Gallagher,
Richard Creswell,
Ben Lambert,
Martin Robinson,
Chon Lok Lei,
Gary R. Mirams,
David J. Gavaghan
Abstract:
Computational methods and associated software implementations are central to every field of scientific investigation. Modern biological research, particularly within systems biology, has relied heavily on the development of software tools to process and organize increasingly large datasets, simulate complex mechanistic models, provide tools for the analysis and management of data, and visualize an…
▽ More
Computational methods and associated software implementations are central to every field of scientific investigation. Modern biological research, particularly within systems biology, has relied heavily on the development of software tools to process and organize increasingly large datasets, simulate complex mechanistic models, provide tools for the analysis and management of data, and visualize and organize outputs. However, develo** high-quality research software requires scientists to develop a host of software development skills, and teaching these skills to students is challenging. There has been a growing importance placed on ensuring reproducibility and good development practices in computational research. However, less attention has been devoted to informing the specific teaching strategies which are effective at nurturing in researchers the complex skillset required to produce high-quality software that, increasingly, is required to underpin both academic and industrial biomedical research. Recent articles in the Ten Simple Rules collection have discussed the teaching of foundational computer science and coding techniques to biology students. We advance this discussion by describing the specific steps for effectively teaching the necessary skills scientists need to develop sustainable software packages which are fit for (re-)use in academic research or more widely. Although our advice is likely to be applicable to all students and researchers ho** to improve their software development skills, our guidelines are directed towards an audience of students that have some programming literacy but little formal training in software development or engineering, typical of early doctoral students. These practices are also applicable outside of doctoral training environments, and we believe they should form a key part of postgraduate training schemes more generally in the life sciences.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Understanding the impact of numerical solvers on inference for differential equation models
Authors:
Richard Creswell,
Katherine M. Shepherd,
Ben Lambert,
Gary R. Mirams,
Chon Lok Lei,
Simon Tavener,
Martin Robinson,
David J. Gavaghan
Abstract:
Most ordinary differential equation (ODE) models used to describe biological or physical systems must be solved approximately using numerical methods. Perniciously, even those solvers which seem sufficiently accurate for the forward problem, i.e., for obtaining an accurate simulation, may not be sufficiently accurate for the inverse problem, i.e., for inferring the model parameters from data. We s…
▽ More
Most ordinary differential equation (ODE) models used to describe biological or physical systems must be solved approximately using numerical methods. Perniciously, even those solvers which seem sufficiently accurate for the forward problem, i.e., for obtaining an accurate simulation, may not be sufficiently accurate for the inverse problem, i.e., for inferring the model parameters from data. We show that for both fixed step and adaptive step ODE solvers, solving the forward problem with insufficient accuracy can distort likelihood surfaces, which may become jagged, causing inference algorithms to get stuck in local "phantom" optima. We demonstrate that biases in inference arising from numerical approximation of ODEs are potentially most severe in systems involving low noise and rapid nonlinear dynamics. We reanalyze an ODE changepoint model previously fit to the COVID-19 outbreak in Germany and show the effect of the step size on simulation and inference results. We then fit a more complicated rainfall-runoff model to hydrological data and illustrate the importance of tuning solver tolerances to avoid distorted likelihood surfaces. Our results indicate that when performing inference for ODE model parameters, adaptive step size solver tolerances must be set cautiously and likelihood surfaces should be inspected for characteristic signs of numerical issues.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Empirical quantification of predictive uncertainty due to model discrepancy by training with an ensemble of experimental designs: an application to ion channel kinetics
Authors:
Joseph G. Shuttleworth,
Chon Lok Lei,
Dominic G. Whittaker,
Monique J. Windley,
Adam P. Hill,
Simon P. Preston,
Gary R. Mirams
Abstract:
When mathematical biology models are used to make quantitative predictions for clinical or industrial use, it is important that these predictions come with a reliable estimate of their accuracy (uncertainty quantification). Because models of complex biological systems are always large simplifications, model discrepancy arises - where a mathematical model fails to recapitulate the true data generat…
▽ More
When mathematical biology models are used to make quantitative predictions for clinical or industrial use, it is important that these predictions come with a reliable estimate of their accuracy (uncertainty quantification). Because models of complex biological systems are always large simplifications, model discrepancy arises - where a mathematical model fails to recapitulate the true data generating process. This presents a particular challenge for making accurate predictions, and especially for making accurate estimates of uncertainty in these predictions. Experimentalists and modellers must choose which experimental procedures (protocols) are used to produce data to train their models. We propose to characterise uncertainty owing to model discrepancy with an ensemble of parameter sets, each of which results from training to data from a different protocol. The variability in predictions from this ensemble provides an empirical estimate of predictive uncertainty owing to model discrepancy, even for unseen protocols. We use the example of electrophysiology experiments, which are used to investigate the kinetics of the hERG potassium ion channel. Here, 'information-rich' protocols allow mathematical models to be trained using numerous short experiments performed on the same cell. Typically, assuming independent observational errors and training a model to an individual experiment results in parameter estimates with very little dependence on observational noise. Moreover, parameter sets arising from the same model applied to different experiments often conflict - indicative of model discrepancy. Our methods will help select more suitable mathematical models of hERG for future studies, and will be widely applicable to a range of biological modelling problems.
△ Less
Submitted 19 February, 2024; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Considering discrepancy when calibrating a mechanistic electrophysiology model
Authors:
Chon Lok Lei,
Sanmitra Ghosh,
Dominic G. Whittaker,
Yasser Aboelkassem,
Kylie A. Beattie,
Chris D. Cantwell,
Tammo Delhaas,
Charles Houston,
Gustavo Montes Novaes,
Alexander V. Panfilov,
Pras Pathmanathan,
Marina Riabiz,
Rodrigo Weber dos Santos,
John Walmsley,
Keith Worden,
Gary R. Mirams,
Richard D. Wilkinson
Abstract:
Uncertainty quantification (UQ) is a vital step in using mathematical models and simulations to take decisions. The field of cardiac simulation has begun to explore and adopt UQ methods to characterise uncertainty in model inputs and how that propagates through to outputs or predictions. In this perspective piece we draw attention to an important and under-addressed source of uncertainty in our pr…
▽ More
Uncertainty quantification (UQ) is a vital step in using mathematical models and simulations to take decisions. The field of cardiac simulation has begun to explore and adopt UQ methods to characterise uncertainty in model inputs and how that propagates through to outputs or predictions. In this perspective piece we draw attention to an important and under-addressed source of uncertainty in our predictions -- that of uncertainty in the model structure or the equations themselves. The difference between imperfect models and reality is termed model discrepancy, and we are often uncertain as to the size and consequences of this discrepancy. Here we provide two examples of the consequences of discrepancy when calibrating models at the ion channel and action potential scales. Furthermore, we attempt to account for this discrepancy when calibrating and validating an ion channel model using different methods, based on modelling the discrepancy using Gaussian processes (GPs) and autoregressive-moving-average (ARMA) models, then highlight the advantages and shortcomings of each approach. Finally, suggestions and lines of enquiry for future work are provided.
△ Less
Submitted 23 April, 2020; v1 submitted 13 January, 2020;
originally announced January 2020.
-
Probabilistic Inference on Noisy Time Series (PINTS)
Authors:
Michael Clerx,
Martin Robinson,
Ben Lambert,
Chon Lok Lei,
Sanmitra Ghosh,
Gary R. Mirams,
David J. Gavaghan
Abstract:
Time series models are ubiquitous in science, arising in any situation where researchers seek to understand how a system's behaviour changes over time. A key problem in time series modelling is \emph{inference}; determining properties of the underlying system based on observed time series. For both statistical and mechanistic models, inference involves finding parameter values, or distributions of…
▽ More
Time series models are ubiquitous in science, arising in any situation where researchers seek to understand how a system's behaviour changes over time. A key problem in time series modelling is \emph{inference}; determining properties of the underlying system based on observed time series. For both statistical and mechanistic models, inference involves finding parameter values, or distributions of parameters values, for which model outputs are consistent with observations. A wide variety of inference techniques are available and different approaches are suitable for different classes of problems. This variety presents a challenge for researchers, who may not have the resources or expertise to implement and experiment with these methods. PINTS (Probabilistic Inference on Noisy Time Series - https://github.com/pints-team/pints is an open-source (BSD 3-clause license) Python library that provides researchers with a broad suite of non-linear optimisation and sampling methods. It allows users to wrap a model and data in a transparent and straightforward interface, which can then be used with custom or pre-defined error measures for optimisation, or with likelihood functions for Bayesian inference or maximum-likelihood estimation. Derivative-free optimisation algorithms - which work without harder-to-obtain gradient information - are included, as well as inference algorithms such as adaptive Markov chain Monte Carlo and nested sampling which estimate distributions over parameter values. By making these statistical techniques available in an open and easy-to-use framework, PINTS brings the power of modern statistical techniques to a wider scientific audience.
△ Less
Submitted 18 December, 2018;
originally announced December 2018.
-
Gaussian process emulation for discontinuous response surfaces with applications for cardiac electrophysiology models
Authors:
Sanmitra Ghosh,
David J. Gavaghan,
Gary R. Mirams
Abstract:
Mathematical models of biological systems are beginning to be used for safety-critical applications, where large numbers of repeated model evaluations are required to perform uncertainty quantification and sensitivity analysis. Most of these models are nonlinear both in variables and parameters/inputs which has two consequences. First, analytic solutions are rarely available so repeated evaluation…
▽ More
Mathematical models of biological systems are beginning to be used for safety-critical applications, where large numbers of repeated model evaluations are required to perform uncertainty quantification and sensitivity analysis. Most of these models are nonlinear both in variables and parameters/inputs which has two consequences. First, analytic solutions are rarely available so repeated evaluation of these models by numerically solving differential equations incurs a significant computational burden. Second, many models undergo bifurcations in behaviour as parameters are varied. As a result, simulation outputs often contain discontinuities as we change parameter values and move through parameter/input space.
Statistical emulators such as Gaussian processes are frequently used to reduce the computational cost of uncertainty quantification, but discontinuities render a standard Gaussian process emulation approach unsuitable as these emulators assume a smooth and continuous response to changes in parameter values.
In this article, we propose a novel two-step method for building a Gaussian Process emulator for models with discontinuous response surfaces. We first use a Gaussian Process classifier to detect boundaries of discontinuities and then constrain the Gaussian Process emulation of the response surface within these boundaries. We introduce a novel `certainty metric' to guide active learning for a multi-class probabilistic classifier.
We apply the new classifier to simulations of drug action on a cardiac electrophysiology model, to propagate our uncertainty in a drug's action through to predictions of changes to the cardiac action potential. The proposed two-step active learning method significantly reduces the computational cost of emulating models that undergo multiple bifurcations.
△ Less
Submitted 25 May, 2018;
originally announced May 2018.
-
Mathematical Modelling of Heart Rate Changes in the Mouse
Authors:
Mark Christie,
Manasi Nandi,
Yanika Borg,
Valentina Carapella,
Gary Mirams,
Philip Aston,
Saziye Bayram,
Radostin D. Simitev,
Jennifer Siggers,
Buddhapriya Chakrabarti
Abstract:
The CVS is composed of numerous interacting and dynamically regulated physiological subsystems which each generate measurable periodic components such that the CVS can itself be presented as a system of weakly coupled oscillators. The interactions between these oscillators generate a chaotic blood pressure waveform signal, where periods of apparent rhythmicity are punctuated by asynchronous behavi…
▽ More
The CVS is composed of numerous interacting and dynamically regulated physiological subsystems which each generate measurable periodic components such that the CVS can itself be presented as a system of weakly coupled oscillators. The interactions between these oscillators generate a chaotic blood pressure waveform signal, where periods of apparent rhythmicity are punctuated by asynchronous behaviour. It is this variability which seems to characterise the normal state. We used a standard experimental data set for the purposes of analysis and modelling. Arterial blood pressure waveform data was collected from conscious mice instrumented with radiotelemetry devices over $24$ hours, at a $100$ Hz and $1$ kHz time base. During a $24$ hour period, these mice display diurnal variation leading to changes in the cardiovascular waveform. We undertook preliminary analysis of our data using Fourier transforms and subsequently applied a series of both linear and nonlinear mathematical approaches in parallel. We provide a minimalistic linear and nonlinear coupled oscillator model and employed spectral and Hilbert analysis as well as a phase plane analysis. This provides a route to a three way synergistic investigation of the original blood pressure data by a combination of physiological experiments, data analysis viz. Fourier and Hilbert transforms and attractor reconstructions, and numerical solutions of linear and nonlinear coupled oscillator models. We believe that a minimal model of coupled oscillator models that quantitatively describes the complex physiological data could be developed via such a method. Further investigations of each of these techniques will be explored in separate publications.
△ Less
Submitted 5 October, 2015;
originally announced October 2015.