-
Ten simple rules for teaching sustainable software engineering
Authors:
Kit Gallagher,
Richard Creswell,
Ben Lambert,
Martin Robinson,
Chon Lok Lei,
Gary R. Mirams,
David J. Gavaghan
Abstract:
Computational methods and associated software implementations are central to every field of scientific investigation. Modern biological research, particularly within systems biology, has relied heavily on the development of software tools to process and organize increasingly large datasets, simulate complex mechanistic models, provide tools for the analysis and management of data, and visualize an…
▽ More
Computational methods and associated software implementations are central to every field of scientific investigation. Modern biological research, particularly within systems biology, has relied heavily on the development of software tools to process and organize increasingly large datasets, simulate complex mechanistic models, provide tools for the analysis and management of data, and visualize and organize outputs. However, develo** high-quality research software requires scientists to develop a host of software development skills, and teaching these skills to students is challenging. There has been a growing importance placed on ensuring reproducibility and good development practices in computational research. However, less attention has been devoted to informing the specific teaching strategies which are effective at nurturing in researchers the complex skillset required to produce high-quality software that, increasingly, is required to underpin both academic and industrial biomedical research. Recent articles in the Ten Simple Rules collection have discussed the teaching of foundational computer science and coding techniques to biology students. We advance this discussion by describing the specific steps for effectively teaching the necessary skills scientists need to develop sustainable software packages which are fit for (re-)use in academic research or more widely. Although our advice is likely to be applicable to all students and researchers ho** to improve their software development skills, our guidelines are directed towards an audience of students that have some programming literacy but little formal training in software development or engineering, typical of early doctoral students. These practices are also applicable outside of doctoral training environments, and we believe they should form a key part of postgraduate training schemes more generally in the life sciences.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Understanding the impact of numerical solvers on inference for differential equation models
Authors:
Richard Creswell,
Katherine M. Shepherd,
Ben Lambert,
Gary R. Mirams,
Chon Lok Lei,
Simon Tavener,
Martin Robinson,
David J. Gavaghan
Abstract:
Most ordinary differential equation (ODE) models used to describe biological or physical systems must be solved approximately using numerical methods. Perniciously, even those solvers which seem sufficiently accurate for the forward problem, i.e., for obtaining an accurate simulation, may not be sufficiently accurate for the inverse problem, i.e., for inferring the model parameters from data. We s…
▽ More
Most ordinary differential equation (ODE) models used to describe biological or physical systems must be solved approximately using numerical methods. Perniciously, even those solvers which seem sufficiently accurate for the forward problem, i.e., for obtaining an accurate simulation, may not be sufficiently accurate for the inverse problem, i.e., for inferring the model parameters from data. We show that for both fixed step and adaptive step ODE solvers, solving the forward problem with insufficient accuracy can distort likelihood surfaces, which may become jagged, causing inference algorithms to get stuck in local "phantom" optima. We demonstrate that biases in inference arising from numerical approximation of ODEs are potentially most severe in systems involving low noise and rapid nonlinear dynamics. We reanalyze an ODE changepoint model previously fit to the COVID-19 outbreak in Germany and show the effect of the step size on simulation and inference results. We then fit a more complicated rainfall-runoff model to hydrological data and illustrate the importance of tuning solver tolerances to avoid distorted likelihood surfaces. Our results indicate that when performing inference for ODE model parameters, adaptive step size solver tolerances must be set cautiously and likelihood surfaces should be inspected for characteristic signs of numerical issues.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Probabilistic Inference on Noisy Time Series (PINTS)
Authors:
Michael Clerx,
Martin Robinson,
Ben Lambert,
Chon Lok Lei,
Sanmitra Ghosh,
Gary R. Mirams,
David J. Gavaghan
Abstract:
Time series models are ubiquitous in science, arising in any situation where researchers seek to understand how a system's behaviour changes over time. A key problem in time series modelling is \emph{inference}; determining properties of the underlying system based on observed time series. For both statistical and mechanistic models, inference involves finding parameter values, or distributions of…
▽ More
Time series models are ubiquitous in science, arising in any situation where researchers seek to understand how a system's behaviour changes over time. A key problem in time series modelling is \emph{inference}; determining properties of the underlying system based on observed time series. For both statistical and mechanistic models, inference involves finding parameter values, or distributions of parameters values, for which model outputs are consistent with observations. A wide variety of inference techniques are available and different approaches are suitable for different classes of problems. This variety presents a challenge for researchers, who may not have the resources or expertise to implement and experiment with these methods. PINTS (Probabilistic Inference on Noisy Time Series - https://github.com/pints-team/pints is an open-source (BSD 3-clause license) Python library that provides researchers with a broad suite of non-linear optimisation and sampling methods. It allows users to wrap a model and data in a transparent and straightforward interface, which can then be used with custom or pre-defined error measures for optimisation, or with likelihood functions for Bayesian inference or maximum-likelihood estimation. Derivative-free optimisation algorithms - which work without harder-to-obtain gradient information - are included, as well as inference algorithms such as adaptive Markov chain Monte Carlo and nested sampling which estimate distributions over parameter values. By making these statistical techniques available in an open and easy-to-use framework, PINTS brings the power of modern statistical techniques to a wider scientific audience.
△ Less
Submitted 18 December, 2018;
originally announced December 2018.
-
Gaussian process emulation for discontinuous response surfaces with applications for cardiac electrophysiology models
Authors:
Sanmitra Ghosh,
David J. Gavaghan,
Gary R. Mirams
Abstract:
Mathematical models of biological systems are beginning to be used for safety-critical applications, where large numbers of repeated model evaluations are required to perform uncertainty quantification and sensitivity analysis. Most of these models are nonlinear both in variables and parameters/inputs which has two consequences. First, analytic solutions are rarely available so repeated evaluation…
▽ More
Mathematical models of biological systems are beginning to be used for safety-critical applications, where large numbers of repeated model evaluations are required to perform uncertainty quantification and sensitivity analysis. Most of these models are nonlinear both in variables and parameters/inputs which has two consequences. First, analytic solutions are rarely available so repeated evaluation of these models by numerically solving differential equations incurs a significant computational burden. Second, many models undergo bifurcations in behaviour as parameters are varied. As a result, simulation outputs often contain discontinuities as we change parameter values and move through parameter/input space.
Statistical emulators such as Gaussian processes are frequently used to reduce the computational cost of uncertainty quantification, but discontinuities render a standard Gaussian process emulation approach unsuitable as these emulators assume a smooth and continuous response to changes in parameter values.
In this article, we propose a novel two-step method for building a Gaussian Process emulator for models with discontinuous response surfaces. We first use a Gaussian Process classifier to detect boundaries of discontinuities and then constrain the Gaussian Process emulation of the response surface within these boundaries. We introduce a novel `certainty metric' to guide active learning for a multi-class probabilistic classifier.
We apply the new classifier to simulations of drug action on a cardiac electrophysiology model, to propagate our uncertainty in a drug's action through to predictions of changes to the cardiac action potential. The proposed two-step active learning method significantly reduces the computational cost of emulating models that undergo multiple bifurcations.
△ Less
Submitted 25 May, 2018;
originally announced May 2018.
-
The Role of the Hes1 Crosstalk Hub in Notch-Wnt Interactions of the Intestinal Crypt
Authors:
Sophie K. Kay,
Heather A. Harrington,
Sarah Shepherd,
Keith Brennan,
Trevor Dale,
James M. Osborne,
David J. Gavaghan,
Helen M. Byrne
Abstract:
The Notch pathway plays a vital role in determining whether cells in the intestinal epithelium adopt a secretory or an absorptive phenotype. Cell fate specification is coordinated via Notch's interaction with the canonical Wnt pathway. Here, we propose a new mathematical model of the Notch and Wnt pathways, in which the Hes1 promoter acts as a hub for pathway crosstalk. Computational simulations o…
▽ More
The Notch pathway plays a vital role in determining whether cells in the intestinal epithelium adopt a secretory or an absorptive phenotype. Cell fate specification is coordinated via Notch's interaction with the canonical Wnt pathway. Here, we propose a new mathematical model of the Notch and Wnt pathways, in which the Hes1 promoter acts as a hub for pathway crosstalk. Computational simulations of the model can assist in understanding how healthy intestinal tissue is maintained, and predict the likely consequences of biochemical knockouts upon cell fate selection processes. Chemical reaction network theory (CRNT) is a powerful, generalised framework which assesses the capacity of our model for monostability or multistability, by analysing properties of the underlying network structure without recourse to specific parameter values or functional forms for reaction rates. CRNT highlights the role of beta-catenin in stabilising the Notch pathway and dam** oscillations, demonstrating that Wnt-mediated actions on the Hes1 promoter can induce dynamical transitions in the Notch system, from multistability to monostability. Time-dependent model simulations of cell pairs reveal the stabilising influence of Wnt upon the Notch pathway, in which beta-catenin- and Dsh-mediated action on the Hes1 promoter are key in sha** the subcellular dynamics. Where Notch-mediated transcription of Hes1 dominates, there is Notch oscillation and maintenance of fate flexibility; Wnt-mediated transcription of Hes1 favours bistability akin to cell fate selection. Cells could therefore regulate the proportion of Wnt- and Notch-mediated control of the Hes1 promoter to coordinate the timing of cell fate selection as they migrate through the intestinal epithelium and are subject to reduced Wnt stimuli.
△ Less
Submitted 22 August, 2016;
originally announced August 2016.