-
Bayes Linear Analysis for Statistical Modelling with Uncertain Inputs
Authors:
Samuel E. Jackson,
David C. Woods
Abstract:
Statistical models typically capture uncertainties in our knowledge of the corresponding real-world processes, however, it is less common for this uncertainty specification to capture uncertainty surrounding the values of the inputs to the model, which are often assumed known. We develop general modelling methodology with uncertain inputs in the context of the Bayes linear paradigm, which involves…
▽ More
Statistical models typically capture uncertainties in our knowledge of the corresponding real-world processes, however, it is less common for this uncertainty specification to capture uncertainty surrounding the values of the inputs to the model, which are often assumed known. We develop general modelling methodology with uncertain inputs in the context of the Bayes linear paradigm, which involves adjustment of second-order belief specifications over all quantities of interest only, without the requirement for probabilistic specifications. In particular, we propose an extension of commonly-employed second-order modelling assumptions to the case of uncertain inputs, with explicit implementation in the context of regression analysis, stochastic process modelling, and statistical emulation. We apply the methodology to a regression model for extracting aluminium by electrolysis, and emulation of the motivating epidemiological simulator chain to model the impact of an airborne infectious disease.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Efficient Emulation of Computer Models Utilising Multiple Known Boundaries of Differing Dimensions
Authors:
Samuel E. Jackson,
Ian Vernon
Abstract:
Emulation has been successfully applied across a wide variety of scientific disciplines for efficiently analysing computationally intensive models. We develop known boundary emulation strategies which utilise the fact that, for many computer models, there exist hyperplanes in the input parameter space for which the model output can be evaluated far more efficiently, whether this be analytically or…
▽ More
Emulation has been successfully applied across a wide variety of scientific disciplines for efficiently analysing computationally intensive models. We develop known boundary emulation strategies which utilise the fact that, for many computer models, there exist hyperplanes in the input parameter space for which the model output can be evaluated far more efficiently, whether this be analytically or just significantly faster using a more efficient and simpler numerical solver. The information contained on these known hyperplanes, or boundaries, can be incorporated into the emulation process via analytical update, thus involving no additional computational cost. In this article, we show that such analytical updates are available for multiple boundaries of various dimensions. We subsequently demonstrate which configurations of boundaries such analytical updates are available for, in particular by presenting a set of conditions that such a set of boundaries must satisfy. We demonstrate the powerful computational advantages of the known boundary emulation techniques developed on both an illustrative low-dimensional simulated example and a scientifically relevant and high-dimensional systems biology model of hormonal crosstalk in the roots of an Arabidopsis plant.
△ Less
Submitted 13 March, 2020; v1 submitted 19 October, 2019;
originally announced October 2019.
-
Bayes Linear Emulation of Simulator Networks
Authors:
Samuel E. Jackson,
David C. Woods
Abstract:
Computationally expensive simulators, implementing mathematical models in computer codes, are commonly approximated using statistical emulators. We develop and assess novel emulation methods for systems best modelled via a chain, series or network of simulators. Using a Bayes linear framework, we link statistical emulators of the component simulators to explicitly account for the simulator input u…
▽ More
Computationally expensive simulators, implementing mathematical models in computer codes, are commonly approximated using statistical emulators. We develop and assess novel emulation methods for systems best modelled via a chain, series or network of simulators. Using a Bayes linear framework, we link statistical emulators of the component simulators to explicitly account for the simulator input uncertainty induced by links between models in arbitrarily large networks. We demonstrate the advantages of these methods compared to use of a single emulator of the composite simulator network for a variety of examples, including the motivating epidemiological simulator chain to model the impact of an airborne infectious disease.
△ Less
Submitted 25 August, 2021; v1 submitted 17 October, 2019;
originally announced October 2019.
-
Known Boundary Emulation of Complex Computer Models
Authors:
Ian Vernon,
Samuel E. Jackson,
Jonathan A. Cumming
Abstract:
Computer models are now widely used across a range of scientific disciplines to describe various complex physical systems, however to perform full uncertainty quantification we often need to employ emulators. An emulator is a fast statistical construct that mimics the complex computer model, and greatly aids the vastly more computationally intensive uncertainty quantification calculations that a s…
▽ More
Computer models are now widely used across a range of scientific disciplines to describe various complex physical systems, however to perform full uncertainty quantification we often need to employ emulators. An emulator is a fast statistical construct that mimics the complex computer model, and greatly aids the vastly more computationally intensive uncertainty quantification calculations that a serious scientific analysis often requires. In some cases, the complex model can be solved far more efficiently for certain parameter settings, leading to boundaries or hyperplanes in the input parameter space where the model is essentially known. We show that for a large class of Gaussian process style emulators, multiple boundaries can be formally incorporated into the emulation process, by Bayesian updating of the emulators with respect to the boundaries, for trivial computational cost. The resulting updated emulator equations are given analytically. This leads to emulators that possess increased accuracy across large portions of the input parameter space. We also describe how a user can incorporate such boundaries within standard black box GP emulation packages that are currently available, without altering the core code. Appropriate designs of model runs in the presence of known boundaries are then analysed, with two kinds of general purpose designs proposed. We then apply the improved emulation and design methodology to an important systems biology model of hormonal crosstalk in Arabidopsis Thaliana.
△ Less
Submitted 3 May, 2019; v1 submitted 9 January, 2018;
originally announced January 2018.
-
Understanding Hormonal Crosstalk in Arabidopsis Root Development via Emulation and History Matching
Authors:
Samuel E. Jackson,
Ian Vernon,
Junli Liu,
Keith Lindsey
Abstract:
A major challenge in plant developmental biology is to understand how plant growth is coordinated by interacting hormones and genes. To meet this challenge, it is important to not only use experimental data, but also formulate a mathematical model. For the mathematical model to best describe the true biological system, it is necessary to understand the parameter space of the model, along with the…
▽ More
A major challenge in plant developmental biology is to understand how plant growth is coordinated by interacting hormones and genes. To meet this challenge, it is important to not only use experimental data, but also formulate a mathematical model. For the mathematical model to best describe the true biological system, it is necessary to understand the parameter space of the model, along with the links between the model, the parameter space and experimental observations. We develop sequential history matching methodology, using Bayesian emulation, to gain substantial insight into biological model parameter spaces. This is achieved by finding sets of acceptable parameters in accordance with successive sets of physical observations. These methods are then applied to a complex hormonal crosstalk model for Arabidopsis root growth. In this application, we demonstrate how an initial set of 22 observed trends reduce the volume of the set of acceptable inputs to a proportion of 6.1 x 10^(-7) of the original space. Additional sets of biologically relevant experimental data, each of size 5, reduce the size of this space by a further three and two orders of magnitude respectively. Hence, we provide insight into the constraints placed upon the model structure by, and the biological consequences of, measuring subsets of observations.
△ Less
Submitted 19 October, 2019; v1 submitted 4 January, 2018;
originally announced January 2018.
-
How to fold intricately: using theory and experiments to unravel the properties of knotted proteins
Authors:
Sophie E. Jackson,
Antonio Suma,
Cristian Micheletti
Abstract:
Over the years, advances in experimental and computational methods have helped us to understand the role of thermodynamic, kinetic and active (chaperone-aided) effects in coordinating the folding steps required to achieving a knotted native state. Here, we review such developments by paying particular attention to the complementarity of experimental and computational studies. Key open issues that…
▽ More
Over the years, advances in experimental and computational methods have helped us to understand the role of thermodynamic, kinetic and active (chaperone-aided) effects in coordinating the folding steps required to achieving a knotted native state. Here, we review such developments by paying particular attention to the complementarity of experimental and computational studies. Key open issues that could be tackled with either or both approaches are finally pointed out.
△ Less
Submitted 18 October, 2016;
originally announced October 2016.
-
pyFRET: A Python Library for Single Molecule Fluorescence Data Analysis
Authors:
Rebecca R. Murphy,
Sophie E. Jackson,
David Klenerman
Abstract:
Single molecule Förster resonance energy transfer (smFRET) is a powerful experimental technique for studying the properties of individual biological molecules in solution. However, as adoption of smFRET techniques becomes more widespread, the lack of available software, whether open source or commercial, for data analysis, is becoming a significant issue. Here, we present pyFRET, an open source Py…
▽ More
Single molecule Förster resonance energy transfer (smFRET) is a powerful experimental technique for studying the properties of individual biological molecules in solution. However, as adoption of smFRET techniques becomes more widespread, the lack of available software, whether open source or commercial, for data analysis, is becoming a significant issue. Here, we present pyFRET, an open source Python package for the analysis of data from single-molecule fluorescence experiments from freely diffusing biomolecules. The package provides methods for the complete analysis of a smFRET dataset, from burst selection and denoising, through data visualisation and model fitting. We provide support for both continuous excitation and alternating laser excitation (ALEX) data analysis. pyFRET is available as a package downloadable from the Python Package Index (PyPI) under the open source three-clause BSD licence, together with links to extensive documentation and tutorials, including example usage and test data. Additional documentation including tutorials is hosted independently on ReadTheDocs. The code is available from the free hosting site Bitbucket. Through distribution of this software, we hope to lower the barrier for the adoption of smFRET experiments by other research groups and we encourage others to contribute modules for specific analysis needs.
△ Less
Submitted 19 December, 2014;
originally announced December 2014.