Search | arXiv e-print repository

Bayesian design for mathematical models of fruit growth based on misspecified prior information

Authors: Nushrath Najimuddin, David J. Warne, Helen Thompson, James M. McGree

Abstract: Bayesian design can be used for efficient data collection over time when the process can be described by the solution to an ordinary differential equation (ODE). Typically, Bayesian designs in such settings are obtained by maximising the expected value of a utility function that is derived from the joint probability distribution of the parameters and the response, given prior information about an… ▽ More Bayesian design can be used for efficient data collection over time when the process can be described by the solution to an ordinary differential equation (ODE). Typically, Bayesian designs in such settings are obtained by maximising the expected value of a utility function that is derived from the joint probability distribution of the parameters and the response, given prior information about an appropriate ODE. However, in practice, appropriately defining such information \textit{a priori} can be difficult due to incomplete knowledge about the mechanisms that govern how the process evolves over time. In this paper, we propose a method for finding Bayesian designs based on a flexible class of ODEs. Specifically, we consider the inclusion of spline terms into ODEs to provide flexibility in modelling how the process changes over time. We then propose to leverage this flexibility to form designs that are efficient even when the prior information is misspecified. Our approach is motivated by a sampling problem in agriculture where the goal is to provide a better understanding of fruit growth where prior information is based on studies conducted overseas, and therefore is potentially misspecified. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 24 pages, 6 Figures

arXiv:2310.17440 [pdf, other]

Gibbs optimal design of experiments

Authors: Antony M. Overstall, Jacinta Holloway-Brown, James M. McGree

Abstract: Bayesian optimal design of experiments is a well-established approach to planning experiments. Briefly, a probability distribution, known as a statistical model, for the responses is assumed which is dependent on a vector of unknown parameters. A utility function is then specified which gives the gain in information for estimating the true value of the parameters using the Bayesian posterior distr… ▽ More Bayesian optimal design of experiments is a well-established approach to planning experiments. Briefly, a probability distribution, known as a statistical model, for the responses is assumed which is dependent on a vector of unknown parameters. A utility function is then specified which gives the gain in information for estimating the true value of the parameters using the Bayesian posterior distribution. A Bayesian optimal design is given by maximising the expectation of the utility with respect to the joint distribution given by the statistical model and prior distribution for the true parameter values. The approach takes account of the experimental aim via specification of the utility and of all assumed sources of uncertainty via the expected utility. However, it is predicated on the specification of the statistical model. Recently, a new type of statistical inference, known as Gibbs (or General Bayesian) inference, has been advanced. This is Bayesian-like, in that uncertainty on unknown quantities is represented by a posterior distribution, but does not necessarily rely on specification of a statistical model. Thus the resulting inference should be less sensitive to misspecification of the statistical model. The purpose of this paper is to propose Gibbs optimal design: a framework for optimal design of experiments for Gibbs inference. The concept behind the framework is introduced along with a computational approach to find Gibbs optimal designs in practice. The framework is demonstrated on exemplars including linear models, and experiments with count and time-to-event responses. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2307.16357 [pdf]

Communicating uncertainty in Indigenous sea Country monitoring with Bayesian statistics: towards more informed decision-making

Authors: Katherine Cure, Diego R Barneche, Martial Depczynski, Rebecca Fisher, David J Warne, James M McGree, Jim Underwood, Frank Weisenberger, Elizabeth Evans-Illidge, Daniel Oades, Azton Howard, Phillip McCarthy, Damon Pyke, Zac Edgar, Rodney Maher, Trevor Sampi, Bardi Jawi Traditional Owners

Abstract: First Nations Australians have a cultural obligation to look after land and sea Country, and Indigenous-partnered science is beginning to drive socially inclusive initiatives in conservation. The Australian Institute of Marine Science has partnered with Indigenous communities in systematically collecting monitoring data to understand the natural variability of ecological communities and better inf… ▽ More First Nations Australians have a cultural obligation to look after land and sea Country, and Indigenous-partnered science is beginning to drive socially inclusive initiatives in conservation. The Australian Institute of Marine Science has partnered with Indigenous communities in systematically collecting monitoring data to understand the natural variability of ecological communities and better inform sea Country management. Monitoring partnerships are centred around the 2-way sharing of Traditional Ecological Knowledge, training in science and technology, and develo** communication products that can be accessed across the broader community. We present a case study with the Bardi Jawi Rangers in northwest Australia focusing on a 3-year co-developed and co-delivered monitoring dataset for culturally important fish in coral reef ecosystems. We show how uncertainty estimated by Bayesian statistics can be incorporated into monitoring indicators and facilitate fuller communication between scientists and First Nations partners about the limitations of monitoring to identify change. △ Less

Submitted 30 July, 2023; originally announced July 2023.

arXiv:2303.00901 [pdf, other]

A general Bayesian approach to design adaptive clinical trials with time-to-event outcomes

Authors: James M. McGree, Antony M. Overstall, Mark Jones, Robert K. Mahar

Abstract: Clinical trials are an integral component of medical research. Trials require careful design to, for example, maintain the safety of participants, use resources efficiently and allow clinically meaningful conclusions to be drawn. Adaptive clinical trials (i.e. trials that can be altered based on evidence that has accrued) are often more efficient, informative and ethical than standard or non-adapt… ▽ More Clinical trials are an integral component of medical research. Trials require careful design to, for example, maintain the safety of participants, use resources efficiently and allow clinically meaningful conclusions to be drawn. Adaptive clinical trials (i.e. trials that can be altered based on evidence that has accrued) are often more efficient, informative and ethical than standard or non-adaptive trials because they require fewer participants, target more promising treatments, and can stop early with sufficient evidence of effectiveness or harm. The design of adaptive trials requires the pre-specification of adaptions that are permissible throughout the conduct of the trial. Proposed adaptive designs are then usually evaluated through simulation which provides indicative metrics of performance (e.g. statistical power and type-1 error) under different scenarios. Trial simulation requires assumptions about the data generating process to be specified but correctly specifying these in practice can be difficult, particularly for new and emerging diseases. To address this, we propose an approach to design adaptive clinical trials without needing to specify the complete data generating process. To facilitate this, we consider a general Bayesian framework where inference about the treatment effect on a time-to-event outcome can be performed via the partial likelihood. As a consequence, the proposed approach to evaluate trial designs is robust to the specific form of the baseline hazard function. The benefits of this approach are demonstrated through the redesign of a recent clinical trial to evaluate whether a third dose of a vaccine provides improved protection against gastroenteritis in Australian Indigenous infants. △ Less

Submitted 1 March, 2023; originally announced March 2023.

arXiv:2212.09999 [pdf, other]

doi 10.1109/WSC57314.2022.10015326

Robust simulation design for generalized linear models in conditions of heteroscedasticity or correlation

Authors: Andrew Gill, David J. Warne, Antony M. Overstall, Clare McGrory, James M. McGree

Abstract: A meta-model of the input-output data of a computationally expensive simulation is often employed for prediction, optimization, or sensitivity analysis purposes. Fitting is enabled by a designed experiment, and for computationally expensive simulations, the design efficiency is of importance. Heteroscedasticity in simulation output is common, and it is potentially beneficial to induce dependence t… ▽ More A meta-model of the input-output data of a computationally expensive simulation is often employed for prediction, optimization, or sensitivity analysis purposes. Fitting is enabled by a designed experiment, and for computationally expensive simulations, the design efficiency is of importance. Heteroscedasticity in simulation output is common, and it is potentially beneficial to induce dependence through the reuse of pseudo-random number streams to reduce the variance of the meta-model parameter estimators. In this paper, we develop a computational approach to robust design for computer experiments without the need to assume independence or identical distribution of errors. Through explicit inclusion of the variance or correlation structures into the meta-model distribution, either maximum likelihood estimation or generalized estimating equations can be employed to obtain an appropriate Fisher information matrix. Robust designs can then be computationally sought which maximize some relevant summary measure of this matrix, averaged across a prior distribution of any unknown parameters. △ Less

Submitted 20 December, 2022; originally announced December 2022.

MSC Class: 62K05

arXiv:2211.10029 [pdf, other]

Being Bayesian in the 2020s: opportunities and challenges in the practice of modern applied Bayesian statistics

Authors: Joshua J. Bon, Adam Bretherton, Katie Buchhorn, Susanna Cramb, Christopher Drovandi, Conor Hassan, Adrianne L. Jenner, Helen J. Mayfield, James M. McGree, Kerrie Mengersen, Aiden Price, Robert Salomone, Edgar Santos-Fernandez, Julie Vercelloni, Xiaoyu Wang

Abstract: Building on a strong foundation of philosophy, theory, methods and computation over the past three decades, Bayesian approaches are now an integral part of the toolkit for most statisticians and data scientists. Whether they are dedicated Bayesians or opportunistic users, applied professionals can now reap many of the benefits afforded by the Bayesian paradigm. In this paper, we touch on six moder… ▽ More Building on a strong foundation of philosophy, theory, methods and computation over the past three decades, Bayesian approaches are now an integral part of the toolkit for most statisticians and data scientists. Whether they are dedicated Bayesians or opportunistic users, applied professionals can now reap many of the benefits afforded by the Bayesian paradigm. In this paper, we touch on six modern opportunities and challenges in applied Bayesian statistics: intelligent data collection, new data sources, federated analysis, inference for implicit models, model transfer and purposeful software products. △ Less

Submitted 17 January, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

Comments: 27 pages, 8 figures

arXiv:2207.14440 [pdf, other]

A model robust sub-sampling approach for Generalised Linear Models in Big data settings

Authors: Amalan Mahendran, Helen Thompson, James M. McGree

Abstract: In today's modern era of Big data, computationally efficient and scalable methods are needed to support timely insights and informed decision making. One such method is sub-sampling, where a subset of the Big data is analysed and used as the basis for inference rather than considering the whole data set. A key question when applying sub-sampling approaches is how to select an informative subset ba… ▽ More In today's modern era of Big data, computationally efficient and scalable methods are needed to support timely insights and informed decision making. One such method is sub-sampling, where a subset of the Big data is analysed and used as the basis for inference rather than considering the whole data set. A key question when applying sub-sampling approaches is how to select an informative subset based on the questions being asked of the data. A recent approach for this has been proposed based on determining sub-sampling probabilities for each data point, but a limitation of this approach is that appropriate sub-sampling probabilities rely on an assumed model for the Big data. In this article, to overcome this limitation, we propose a model robust approach where a set of models is considered, and the sub-sampling probabilities are evaluated based on the weighted average of probabilities that would be obtained if each model was considered singularly. Theoretical support for such an approach is provided. Our model robust sub-sampling approach is applied in a simulation study and in two real world applications where performance is compared to current sub-sampling practices. The results show that our model robust approach outperforms alternative approaches. △ Less

Submitted 6 September, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

Comments: Removed unnecessary space before algorithms

arXiv:2206.05369 [pdf, other]

Bayesian Design with Sampling Windows for Complex Spatial Processes

Authors: Katie Buchhorn, Kerrie Mengersen, Edgar Santos-Fernandez, Erin E. Peterson, James M. McGree

Abstract: Optimal design facilitates intelligent data collection. In this paper, we introduce a fully Bayesian design approach for spatial processes with complex covariance structures, like those typically exhibited in natural ecosystems. Coordinate Exchange algorithms are commonly used to find optimal design points. However, collecting data at specific points is often infeasible in practice. Currently, the… ▽ More Optimal design facilitates intelligent data collection. In this paper, we introduce a fully Bayesian design approach for spatial processes with complex covariance structures, like those typically exhibited in natural ecosystems. Coordinate Exchange algorithms are commonly used to find optimal design points. However, collecting data at specific points is often infeasible in practice. Currently, there is no provision to allow for flexibility in the choice of design. We also propose an approach to find Bayesian sampling windows, rather than points, via Gaussian process emulation to identify regions of high design efficiency across a multi-dimensional space. These developments are motivated by two ecological case studies: monitoring water temperature in a river network system in the northwestern United States and monitoring submerged coral reefs off the north-west coast of Australia. △ Less

Submitted 10 June, 2022; originally announced June 2022.

arXiv:2202.07166 [pdf, other]

SSNbayes: An R package for Bayesian spatio-temporal modelling on stream networks

Authors: Edgar Santos-Fernandez, Jay M. Ver Hoef, James M. McGree, Daniel J. Isaak, Kerrie Mengersen, Erin E. Peterson

Abstract: Spatio-temporal models are widely used in many research areas from ecology to epidemiology. However, most covariance functions describe spatial relationships based on Euclidean distance only. In this paper, we introduce the R package SSNbayes for fitting Bayesian spatio-temporal models and making predictions on branching stream networks. SSNbayes provides a linear regression framework with multipl… ▽ More Spatio-temporal models are widely used in many research areas from ecology to epidemiology. However, most covariance functions describe spatial relationships based on Euclidean distance only. In this paper, we introduce the R package SSNbayes for fitting Bayesian spatio-temporal models and making predictions on branching stream networks. SSNbayes provides a linear regression framework with multiple options for incorporating spatial and temporal autocorrelation. Spatial dependence is captured using stream distance and flow connectivity while temporal autocorrelation is modelled using vector autoregression approaches. SSNbayes provides the functionality to make predictions across the whole network, compute exceedance probabilities and other probabilistic estimates such as the proportion of suitable habitat. We illustrate the functionality of the package using a stream temperature dataset collected in Idaho, USA. △ Less

Submitted 14 February, 2022; originally announced February 2022.

arXiv:2103.01132 [pdf, other]

General Bayesian L2 calibration of mathematical models

Authors: Antony M. Overstall, James M. McGree

Abstract: A mathematical model is a representation of a physical system depending on unknown parameters. Calibration refers to attributing values to these parameters, using observations of the physical system, acknowledging that the mathematical model is an inexact representation of the physical system. General Bayesian inference generalizes traditional Bayesian inference by replacing the log-likelihood in… ▽ More A mathematical model is a representation of a physical system depending on unknown parameters. Calibration refers to attributing values to these parameters, using observations of the physical system, acknowledging that the mathematical model is an inexact representation of the physical system. General Bayesian inference generalizes traditional Bayesian inference by replacing the log-likelihood in Bayes' theorem by a (negative) loss function. Methodology is proposed for the general Bayesian calibration of mathematical models where the resulting posterior distributions estimate the values of the parameters that minimize the L2 norm of the difference between the mathematical model and true physical system. △ Less

Submitted 25 August, 2022; v1 submitted 1 March, 2021; originally announced March 2021.

arXiv:2001.08308 [pdf, other]

Bayesian design for minimising prediction uncertainty in bivariate spatial responses with applications to air quality monitoring

Authors: S. G. Jagath Senarathne, Werner G. Müller, James M. McGree

Abstract: Model-based geostatistical design involves the selection of locations to collect data to minimise an expected loss function over a set of all possible locations. The loss function is specified to reflect the aim of data collection, which, for geostatistical studies, could be to minimise the prediction uncertainty at unobserved locations. In this paper, we propose a new approach to design such stud… ▽ More Model-based geostatistical design involves the selection of locations to collect data to minimise an expected loss function over a set of all possible locations. The loss function is specified to reflect the aim of data collection, which, for geostatistical studies, could be to minimise the prediction uncertainty at unobserved locations. In this paper, we propose a new approach to design such studies via a loss function derived through considering the entropy about the model predictions and the parameters of the model. The approach also includes a multivariate extension to generalised linear spatial models, and thus can be used to design experiments with more than one response. Unfortunately, evaluating our proposed loss function is computationally expensive so we provide an approximation such that our approach can be adopted to design realistically sized geostatistical studies. This is demonstrated through a simulated study and through designing an air quality monitoring program in Queensland, Australia. The results show that our designs remain highly efficient in achieving each experimental objective individually, providing an ideal compromise between the two objectives. Accordingly, we advocate that our approach could be adopted more generally in model-based geostatistical design. △ Less

Submitted 2 December, 2021; v1 submitted 22 January, 2020; originally announced January 2020.

arXiv:1912.00540 [pdf, other]

SSNdesign -- an R package for pseudo-Bayesian optimal and adaptive sampling designs on stream networks

Authors: Alan R. Pearse, James M. McGree, Nicholas A. Som, Catherine Leigh, Jay M. Ver Hoef, Paul Maxwell, Erin E. Peterson

Abstract: Streams and rivers are biodiverse and provide valuable ecosystem services. Maintaining these ecosystems is an important task, so organisations often monitor the status and trends in stream condition and biodiversity using field sampling and, more recently, autonomous in-situ sensors. However, data collection is often costly and so effective and efficient survey designs are crucial to maximise info… ▽ More Streams and rivers are biodiverse and provide valuable ecosystem services. Maintaining these ecosystems is an important task, so organisations often monitor the status and trends in stream condition and biodiversity using field sampling and, more recently, autonomous in-situ sensors. However, data collection is often costly and so effective and efficient survey designs are crucial to maximise information while minimising costs. Geostatistics and optimal and adaptive design theory can be used to optimise the placement of sampling sites in freshwater studies and aquatic monitoring programs. Geostatistical modelling and experimental design on stream networks pose statistical challenges due to the branching structure of the network, flow connectivity and directionality, and differences in flow volume. Thus, unique challenges of geostatistics and experimental design on stream networks necessitates the development of new open-source software for implementing the theory. We present SSNdesign, an R package for solving optimal and adaptive design problems on stream networks that integrates with existing open-source software. We demonstrate the mathematical foundations of our approach, and illustrate the functionality of SSNdesign using two case studies involving real data from Queensland, Australia. In both case studies we demonstrate that the optimal or adaptive designs outperform random and spatially balanced survey designs. The SSNdesign package has the potential to boost the efficiency of freshwater monitoring efforts and provide much-needed information for freshwater conservation and management. △ Less

Submitted 1 December, 2019; originally announced December 2019.

Comments: Main document: 18 pages, 7 figures Supp Info A: 11 pages, 0 figures Supp Info B: 24 pages, 6 figures Supp Info C: 3 pages, 0 figures

arXiv:1911.00878 [pdf, other]

Bayesian adaptive N-of-1 trials for estimating population and individual treatment effects

Authors: S. G. Jagath Senarathne, Antony M. Overstall, James M. McGree

Abstract: This article proposes a novel adaptive design algorithm that can be used to find optimal treatment allocations in N-of-1 clinical trials. This new methodology uses two Laplace approximations to provide a computationally efficient estimate of population and individual random effects within a repeated measures, adaptive design framework. Given the efficiency of this approach, it is also adopted for… ▽ More This article proposes a novel adaptive design algorithm that can be used to find optimal treatment allocations in N-of-1 clinical trials. This new methodology uses two Laplace approximations to provide a computationally efficient estimate of population and individual random effects within a repeated measures, adaptive design framework. Given the efficiency of this approach, it is also adopted for treatment selection to target the collection of data for the precise estimation of treatment effects. To evaluate this approach, we consider both a simulated and motivating N-of-1 clinical trial from the literature. For each trial, our methods were compared to the multi-armed bandit approach and a randomised N-of-1 trial design in terms of identifying the best treatment for each patient and the information gained about the model parameters. The results show that our new approach selects designs that are highly efficient in achieving each of these objectives. As such, we propose our Laplace-based algorithm as an efficient approach for designing adaptive N-of-1 trials. △ Less

Submitted 28 July, 2020; v1 submitted 3 November, 2019; originally announced November 2019.

arXiv:1909.12570 [pdf, other]

Bayesian decision-theoretic design of experiments under an alternative model

Authors: Antony M. Overstall, James M. McGree

Abstract: Traditionally Bayesian decision-theoretic design of experiments proceeds by choosing a design to minimise expectation of a given loss function over the space of all designs. The loss function encapsulates the aim of the experiment, and the expectation is taken with respect to the joint distribution of all unknown quantities implied by the statistical model that will be fitted to observed responses… ▽ More Traditionally Bayesian decision-theoretic design of experiments proceeds by choosing a design to minimise expectation of a given loss function over the space of all designs. The loss function encapsulates the aim of the experiment, and the expectation is taken with respect to the joint distribution of all unknown quantities implied by the statistical model that will be fitted to observed responses. In this paper, an extended framework is proposed whereby the expectation of the loss is taken with respect to a joint distribution implied by an alternative statistical model. Motivation for this includes promoting robustness, ensuring computational feasibility and for allowing realistic prior specification when deriving a design. To aid in exploring the new framework, an asymptotic approximation to the expected loss under an alternative model is derived, and the properties of different loss functions are established. The framework is then demonstrated on a linear regression versus full-treatment model scenario, on estimating parameters of a non-linear model under model discrepancy and a cubic spline model under an unknown number of basis functions. △ Less

Submitted 9 August, 2021; v1 submitted 27 September, 2019; originally announced September 2019.

Comments: Supplementary material appears as an appendix

arXiv:1903.04168 [pdf, ps, other]

A synthetic likelihood-based Laplace approximation for efficient design of biological processes

Authors: Mahasen Dehideniya, Antony M. Overstall, Chris C. Drovandi, James M. McGree

Abstract: Complex models used to describe biological processes in epidemiology and ecology often have computationally intractable or expensive likelihoods. This poses significant challenges in terms of Bayesian inference but more significantly in the design of experiments. Bayesian designs are found by maximising the expectation of a utility function over a design space, and typically this requires sampling… ▽ More Complex models used to describe biological processes in epidemiology and ecology often have computationally intractable or expensive likelihoods. This poses significant challenges in terms of Bayesian inference but more significantly in the design of experiments. Bayesian designs are found by maximising the expectation of a utility function over a design space, and typically this requires sampling from or approximating a large number of posterior distributions. This renders approaches adopted in inference computationally infeasible to implement in design. Consequently, optimal design in such fields has been limited to a small number of dimensions or a restricted range of utility functions. To overcome such limitations, we propose a synthetic likelihood-based Laplace approximation for approximating utility functions for models with intractable likelihoods. As will be seen, the proposed approximation is flexible in that a wide range of utility functions can be considered, and remains computationally efficient in high dimensions. To explore the validity of this approximation, an illustrative example from epidemiology is considered. Then, our approach is used to design experiments with a relatively large number of observations in two motivating applications from epidemiology and ecology. △ Less

Submitted 11 March, 2019; originally announced March 2019.

arXiv:1810.13076 [pdf]

A framework for automated anomaly detection in high frequency water-quality data from in situ sensors

Authors: Catherine Leigh, Omar Alsibai, Rob J. Hyndman, Sevvandi Kandanaarachchi, Olivia C. King, James M. McGree, Catherine Neelamraju, Jennifer Strauss, Priyanga Dilini Talagala, Ryan S. Turner, Kerrie Mengersen, Erin E. Peterson

Abstract: River water-quality monitoring is increasingly conducted using automated in situ sensors, enabling timelier identification of unexpected values. However, anomalies caused by technical issues confound these data, while the volume and velocity of data prevent manual detection. We present a framework for automated anomaly detection in high-frequency water-quality data from in situ sensors, using turb… ▽ More River water-quality monitoring is increasingly conducted using automated in situ sensors, enabling timelier identification of unexpected values. However, anomalies caused by technical issues confound these data, while the volume and velocity of data prevent manual detection. We present a framework for automated anomaly detection in high-frequency water-quality data from in situ sensors, using turbidity, conductivity and river level data. After identifying end-user needs and defining anomalies, we ranked their importance and selected suitable detection methods. High priority anomalies included sudden isolated spikes and level shifts, most of which were classified correctly by regression-based methods such as autoregressive integrated moving average models. However, using other water-quality variables as covariates reduced performance due to complex relationships among variables. Classification of drift and periods of anomalously low or high variability improved when we applied replaced anomalous measurements with forecasts, but this inflated false positive rates. Feature-based methods also performed well on high priority anomalies, but were also less proficient at detecting lower priority anomalies, resulting in high false negative rates. Unlike regression-based methods, all feature-based methods produced low false positive rates, but did not and require training or optimization. Rule-based methods successfully detected impossible values and missing observations. Thus, we recommend using a combination of methods to improve anomaly detection performance, whilst minimizing false detection rates. Furthermore, our framework emphasizes the importance of communication between end-users and analysts for optimal outcomes with respect to both detection performance and end-user needs. Our framework is applicable to other types of high frequency time-series data and anomaly detection applications. △ Less

Submitted 7 February, 2019; v1 submitted 30 October, 2018; originally announced October 2018.

arXiv:1810.12499 [pdf]

doi 10.1371/journal.pone.0215503

Predicting Sediment and Nutrient Concentrations in Rivers Using High Frequency Water Quality Surrogates

Authors: Catherine Leigh, Sevvandi Kandanaarachchi, James M. McGree, Rob J. Hyndman, Omar Alsibai, Kerrie Mengersen, Erin E. Peterson

Abstract: A particular focus of water-quality monitoring is the concentrations of sediments and nutrients in rivers, constituents that can smother biota and cause eutrophication. However, the physical and economic constraints of manual sampling prohibit data collection at the frequency required to capture adequately the variation in concentrations through time. Here, we developed models to predict total sus… ▽ More A particular focus of water-quality monitoring is the concentrations of sediments and nutrients in rivers, constituents that can smother biota and cause eutrophication. However, the physical and economic constraints of manual sampling prohibit data collection at the frequency required to capture adequately the variation in concentrations through time. Here, we developed models to predict total suspended solids (TSS) and oxidized nitrogen (NOx) concentrations based on high-frequency time series of turbidity, conductivity and river level data from low-cost in situ sensors in rivers flowing into the Great Barrier Reef lagoon. We fit generalized least squares linear mixed effects models with a continuous first-order autoregressive correlation to data collected traditionally by manual sampling for subsequent analysis in the laboratory, then used these models to predict TSS or NOx from in situ sensor water-quality surrogate data, at two freshwater sites and one estuarine site. These models accounted for both temporal autocorrelation and unevenly time-spaced observations in the data. Turbidity proved a useful surrogate of TSS, with high predictive ability at both freshwater and estuarine sites. NOx models had much poorer fits, even when additional covariates of conductivity and river level were included along with turbidity. Furthermore, the relative influence of covariates in the NOx models was not consistent across sites. Our findings likely reflect the complexity of dissolved nutrient dynamics in rivers, which are influenced by multiple and interacting factors including physical, chemical and biological processes, and the need for greater and better incorporation of spatial and temporal components within models. △ Less

Submitted 29 October, 2018; originally announced October 2018.

arXiv:1803.07018 [pdf, other]

Bayesian design of experiments for intractable likelihood models using coupled auxiliary models and multivariate emulation

Authors: Antony M. Overstall, James M. McGree

Abstract: A Bayesian design is given by maximising an expected utility over a design space. The utility is chosen to represent the aim of the experiment and its expectation is taken with respect to all unknowns: responses, parameters and/or models. Although straightforward in principle, there are several challenges to finding Bayesian designs in practice. Firstly, the utility and expected utility are rarely… ▽ More A Bayesian design is given by maximising an expected utility over a design space. The utility is chosen to represent the aim of the experiment and its expectation is taken with respect to all unknowns: responses, parameters and/or models. Although straightforward in principle, there are several challenges to finding Bayesian designs in practice. Firstly, the utility and expected utility are rarely available in closed form and require approximation. Secondly, the design space can be of high-dimensionality. In the case of intractable likelihood models, these problems are compounded by the fact that the likelihood function, whose evaluation is required to approximate the expected utility, is not available in closed form. A strategy is proposed to find Bayesian designs for intractable likelihood models. It relies on the development of an automatic, auxiliary modelling approach, using multivariate Gaussian process emulators, to approximate the likelihood function. This is then combined with a copula-based approach to approximate the marginal likelihood (a quantity commonly required to evaluate many utility functions). These approximations are demonstrated on examples of stochastic process models involving experimental aims of both parameter estimation and model comparison. △ Less

Submitted 15 January, 2019; v1 submitted 19 March, 2018; originally announced March 2018.

Comments: Minor & final update

arXiv:1608.05815 [pdf, other]

An approach for finding fully Bayesian optimal designs using normal-based approximations to loss functions

Authors: Antony M. Overstall, James M. McGree, Christopher C. Drovandi

Abstract: The generation of decision-theoretic Bayesian optimal designs is complicated by the significant computational challenge of minimising an analytically intractable expected loss function over a, potentially, high-dimensional design space. A new general approach for approximately finding Bayesian optimal designs is proposed which uses computationally efficient normal-based approximations to posterior… ▽ More The generation of decision-theoretic Bayesian optimal designs is complicated by the significant computational challenge of minimising an analytically intractable expected loss function over a, potentially, high-dimensional design space. A new general approach for approximately finding Bayesian optimal designs is proposed which uses computationally efficient normal-based approximations to posterior summaries to aid in approximating the expected loss. This new approach is demonstrated on illustrative, yet challenging, examples including hierarchical models for blocked experiments, and experimental aims of parameter estimation and model discrimination. Where possible, the results of the proposed methodology are compared, both in terms of performance and computing time, to results from using computationally more expensive, but potentially more accurate, Monte Carlo approximations. Moreover the methodology is also applied to problems where the use of Monte Carlo approximations is computationally infeasible. △ Less

Submitted 6 February, 2017; v1 submitted 20 August, 2016; originally announced August 2016.

Comments: 21 pages, 6 figures

arXiv:1601.08088 [pdf, other]

Model selection via Bayesian information capacity designs for generalised linear models

Authors: David C. Woods, James M. McGree, Susan M. Lewis

Abstract: The first investigation is made of designs for screening experiments where the response variable is approximated by a generalised linear model. A Bayesian information capacity criterion is defined for the selection of designs that are robust to the form of the linear predictor. For binomial data and logistic regression, the effectiveness of these designs for screening is assessed through simulatio… ▽ More The first investigation is made of designs for screening experiments where the response variable is approximated by a generalised linear model. A Bayesian information capacity criterion is defined for the selection of designs that are robust to the form of the linear predictor. For binomial data and logistic regression, the effectiveness of these designs for screening is assessed through simulation studies using all-subsets regression and model selection via maximum penalised likelihood and a generalised information criterion. For Poisson data and log-linear regression, similar assessments are made using maximum likelihood and the Akaike information criterion for minimally-supported designs that are constructed analytically. The results show that effective screening, that is, high power with moderate type I error rate and false discovery rate, can be achieved through suitable choices for the number of design support points and experiment size. Logistic regression is shown to present a more challenging problem than log-linear regression. Some areas for future work are also indicated. △ Less

Submitted 26 October, 2016; v1 submitted 29 January, 2016; originally announced January 2016.

MSC Class: 62K05; 62K20; 62J12

Showing 1–20 of 20 results for author: McGree, J M