Skip to main content

Showing 1–15 of 15 results for author: Preston, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2403.09014  [pdf, other

    stat.AP

    Urban map** in Dar es Salaam using AJIVE

    Authors: Rachel J. Carrington, Ian L. Dryden, Madeleine Ellis, James O. Goulding, Simon P. Preston, David J. Sirl

    Abstract: Map** deprivation in urban areas is important, for example for identifying areas of greatest need and planning interventions. Traditional ways of obtaining deprivation estimates are based on either census or household survey data, which in many areas is unavailable or difficult to collect. However, there has been a huge rise in the amount of new, non-traditional forms of data, such as satellite… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 34 pages, 25 figures

  2. arXiv:2306.16564  [pdf, other

    cs.CL stat.ML

    Pareto Optimal Learning for Estimating Large Language Model Errors

    Authors: Theodore Zhao, Mu Wei, J. Samuel Preston, Hoifung Poon

    Abstract: Large Language Models (LLMs) have shown impressive abilities in many applications. When a concrete and precise answer is desired, it is important to have a quantitative estimation of the potential error rate. However, this can be challenging due to the text-in-text-out nature of generative models. We present a method based on Pareto optimization that generates a risk score to estimate the probabil… ▽ More

    Submitted 22 May, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

  3. arXiv:2302.02942  [pdf, other

    stat.CO math.DS math.OC q-bio.QM

    Empirical quantification of predictive uncertainty due to model discrepancy by training with an ensemble of experimental designs: an application to ion channel kinetics

    Authors: Joseph G. Shuttleworth, Chon Lok Lei, Dominic G. Whittaker, Monique J. Windley, Adam P. Hill, Simon P. Preston, Gary R. Mirams

    Abstract: When mathematical biology models are used to make quantitative predictions for clinical or industrial use, it is important that these predictions come with a reliable estimate of their accuracy (uncertainty quantification). Because models of complex biological systems are always large simplifications, model discrepancy arises - where a mathematical model fails to recapitulate the true data generat… ▽ More

    Submitted 19 February, 2024; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Final published version with a typographical error in Table 1 (the value of q_6) corrected

    MSC Class: 92B05; 92C30; 62M05

    Journal ref: Bulletin of Mathematical Biology, 86(1), 2 (2024)

  4. arXiv:2010.14128  [pdf, other

    stat.AP stat.CO stat.ME

    The Bayesian Spatial Bradley--Terry Model: Urban Deprivation Modeling in Tanzania

    Authors: R. G. Seymour, D. Sirl, S. Preston, I. L. Dryden, M. J. A. Ellis, B. Perrat, J. Goulding

    Abstract: Identifying the most deprived regions of any country or city is key if policy makers are to design successful interventions. However, locating areas with the greatest need is often surprisingly challenging in develo** countries. Due to the logistical challenges of traditional household surveying, official statistics can be slow to be updated; estimates that exist can be coarse, a consequence of… ▽ More

    Submitted 28 October, 2021; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: 23 pages, 7 figures, to be published in the journal of the Royal Statistical Society: Series C

    MSC Class: 62G05 (Primary) 62P25; 62-11 (Secondary)

  5. arXiv:2010.00050  [pdf, ps, other

    stat.ME stat.AP

    Non-parametric regression for networks

    Authors: Katie E. Severn, Ian L. Dryden, Simon P. Preston

    Abstract: Network data are becoming increasingly available, and so there is a need to develop suitable methodology for statistical analysis. Networks can be represented as graph Laplacian matrices, which are a type of manifold-valued data. Our main objective is to estimate a regression curve from a sample of graph Laplacian matrices conditional on a set of Euclidean covariates, for example in dynamic networ… ▽ More

    Submitted 30 September, 2020; originally announced October 2020.

    Comments: 15 pages, 4 figures

    MSC Class: Primary 62H99; 62H15; secondary 62P99

  6. arXiv:1911.02656  [pdf, other

    stat.ML cs.CL cs.LG stat.CO

    Invariance and identifiability issues for word embeddings

    Authors: Rachel Carrington, Karthik Bharath, Simon Preston

    Abstract: Word embeddings are commonly obtained as optimizers of a criterion function $f$ of a text corpus, but assessed on word-task performance using a different evaluation function $g$ of the test data. We contend that a possible source of disparity in performance on tasks is the incompatibility between classes of transformations that leave $f$ and $g$ invariant. In particular, word embeddings defined by… ▽ More

    Submitted 6 November, 2019; originally announced November 2019.

    Comments: NIPS 2019

  7. arXiv:1902.08290  [pdf, other

    stat.ME

    Manifold valued data analysis of samples of networks, with applications in corpus linguistics

    Authors: Katie E. Severn, Ian L. Dryden, Simon P. Preston

    Abstract: Networks arise in many applications, such as in the analysis of text documents, social interactions and brain activity. We develop a general framework for extrinsic statistical analysis of samples of networks, motivated by networks representing text documents in corpus linguistics. We identify networks with their graph Laplacian matrices, for which we define metrics, embeddings, tangent spaces, an… ▽ More

    Submitted 16 September, 2020; v1 submitted 21 February, 2019; originally announced February 2019.

    Comments: 29 pages, 10 figures

    MSC Class: Primary 62H99; 62H15; secondary 62P99

  8. Quantifying Age and Model Uncertainties in Paleoclimate Data and Dynamical Climate Models with a Joint Inferential Analysis

    Authors: Jake Carson, Michel Crucifix, Simon P. Preston, Richard D. Wilkinson

    Abstract: A major goal in paleoclimate science is to reconstruct historical climates using proxies for climate variables such as those observed in sediment cores, and in the process learn about climate dynamics. This is hampered by uncertainties in how sediment core depths relate to ages, how proxy quantities relate to climate variables, how climate models are specified, and the values of parameters in clim… ▽ More

    Submitted 17 April, 2019; v1 submitted 22 March, 2018; originally announced March 2018.

    Journal ref: Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 475, 2019

  9. arXiv:1711.02774  [pdf, ps, other

    stat.ME

    The extended power distribution: A new distribution on $(0, 1)$

    Authors: Chibueze E. Ogbonnaya, Simon P. Preston, Andrew T. A. Wood

    Abstract: We propose a two-parameter bounded probability distribution called the extended power distribution. This distribution on $(0, 1)$ is similar to the beta distribution, however there are some advantages which we explore. We define the moments and quantiles of this distribution and show that it is possible to give an $r$-parameter extension of this distribution ($r>2$). We also consider its complemen… ▽ More

    Submitted 7 November, 2017; originally announced November 2017.

    Comments: 22 pages, 19 figures, 5 tables

    MSC Class: 62E15; 60E05

  10. arXiv:1703.02111  [pdf, other

    cs.LG stat.ML

    Classification and clustering for observations of event time data using non-homogeneous Poisson process models

    Authors: Duncan Barrack, Simon Preston

    Abstract: Data of the form of event times arise in various applications. A simple model for such data is a non-homogeneous Poisson process (NHPP) which is specified by a rate function that depends on time. We consider the problem of having access to multiple independent observations of event time data, observed on a common interval, from which we wish to classify or cluster the observations according to the… ▽ More

    Submitted 20 June, 2018; v1 submitted 6 March, 2017; originally announced March 2017.

    Comments: cleaned up figures and text

  11. arXiv:1607.07974  [pdf, ps, other

    stat.ME

    Nonparametric hypothesis testing for equality of means on the simplex

    Authors: Michail Tsagris, Simon Preston, Andrew T. A. Wood

    Abstract: In the context of data that lie on the simplex, we investigate use of empirical and exponential empirical likelihood, and Hotelling and James statistics, to test the null hypothesis of equal population means based on two independent samples. We perform an extensive numerical study using data simulated from various distributions on the simplex. The results, taken together with practical considerati… ▽ More

    Submitted 4 August, 2016; v1 submitted 27 July, 2016; originally announced July 2016.

    Comments: This is a preprint of the article to be published by Taylor & Francis Group in Journal of Statistical Computation and Simulation

  12. Bayesian model selection for the glacial-interglacial cycle

    Authors: Jake Carson, Michel Crucifix, Simon Preston, Richard D. Wilkinson

    Abstract: A prevailing viewpoint in palaeoclimate science is that a single palaeoclimate record contains insufficient information to discriminate between most competing explanatory models. Results we present here suggest the contrary. Using SMC^2 combined with novel Brownian bridge type proposals for the state trajectories, we show that even with relatively short time series it is possible to estimate Bayes… ▽ More

    Submitted 11 November, 2015; originally announced November 2015.

    Journal ref: Journal of the Royal Statistical Society: Series C (Applied Statistics), 67(1):25-54, 2018

  13. arXiv:1506.04976  [pdf, ps, other

    stat.ME

    Improved classification for compositional data using the $α$-transformation

    Authors: Michail Tsagris, Simon Preston, Andrew T. A. Wood

    Abstract: In compositional data analysis an observation is a vector containing non-negative values, only the relative sizes of which are considered to be of interest. Without loss of generality, a compositional vector can be taken to be a vector of proportions that sum to one. Data of this type arise in many areas including geology, archaeology, biology, economics and political science. In this paper we inv… ▽ More

    Submitted 17 June, 2015; v1 submitted 16 June, 2015; originally announced June 2015.

    Comments: This is a 17-page preprint and has been accepted for publication at the Journal of Classification

    MSC Class: 62H30

  14. arXiv:1301.2975  [pdf, ps, other

    stat.CO stat.AP stat.ME

    Fast Approximate Bayesian Computation for discretely observed Markov models using a factorised posterior distribution

    Authors: Simon R. White, Theodore Kypraios, Simon P. Preston

    Abstract: Many modern statistical applications involve inference for complicated stochastic models for which the likelihood function is difficult or even impossible to calculate, and hence conventional likelihood-based inferential echniques cannot be used. In such settings, Bayesian inference can be performed using Approximate Bayesian Computation (ABC). However, in spite of many recent developments to ABC… ▽ More

    Submitted 28 May, 2013; v1 submitted 14 January, 2013; originally announced January 2013.

  15. arXiv:1106.1451  [pdf, ps, other

    stat.ME

    A data-based power transformation for compositional data

    Authors: Michail T. Tsagris, Simon Preston, Andrew T. A. Wood

    Abstract: Compositional data analysis is carried out either by neglecting the compositional constraint and applying standard multivariate data analysis, or by transforming the data using the logs of the ratios of the components. In this work we examine a more general transformation which includes both approaches as special cases. It is a power transformation and involves a single parameter, α. The transform… ▽ More

    Submitted 16 June, 2011; v1 submitted 7 June, 2011; originally announced June 2011.

    Comments: Published in the proceddings of the 4th international workshop on Compositional Data Analysis. http://congress.cimne.com/codawork11/frontal/default.asp

    Journal ref: Proceedings of CoDaWork'11: 4th international workshop on Compositional Data Analysis, Egozcue, J.J., Tolosana-Delgado, R. and Ortego, M.I. (eds.) 2011. ISBN: 978-84-87867-76-7