-
Urban map** in Dar es Salaam using AJIVE
Authors:
Rachel J. Carrington,
Ian L. Dryden,
Madeleine Ellis,
James O. Goulding,
Simon P. Preston,
David J. Sirl
Abstract:
Map** deprivation in urban areas is important, for example for identifying areas of greatest need and planning interventions. Traditional ways of obtaining deprivation estimates are based on either census or household survey data, which in many areas is unavailable or difficult to collect. However, there has been a huge rise in the amount of new, non-traditional forms of data, such as satellite…
▽ More
Map** deprivation in urban areas is important, for example for identifying areas of greatest need and planning interventions. Traditional ways of obtaining deprivation estimates are based on either census or household survey data, which in many areas is unavailable or difficult to collect. However, there has been a huge rise in the amount of new, non-traditional forms of data, such as satellite imagery and cell-phone call-record data, which may contain information useful for identifying deprivation. We use Angle-Based Joint and Individual Variation Explained (AJIVE) to jointly model satellite imagery data, cell-phone data, and survey data for the city of Dar es Salaam, Tanzania. We first identify interpretable low-dimensional structure from the imagery and cell-phone data, and find that we can use these to identify deprivation. We then consider what is gained from further incorporating the more traditional and costly survey data. We also introduce a scalar measure of deprivation as a response variable to be predicted, and consider various approaches to multiview regression, including using AJIVE scores as predictors.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Object oriented data analysis of surface motion time series in peatland landscapes
Authors:
Emily G. Mitchell,
Ian L. Dryden,
Christopher J. Fallaize,
Roxane Andersen,
Andrew V. Bradley,
David J. Large,
Andrew Sowter
Abstract:
Peatlands account for 10% of UK land area, 80% of which are degraded to some degree, emitting carbon at a similar magnitude to oil refineries or landfill sites. A lack of tools for rapid and reliable assessment of peatland condition has limited monitoring of vast areas of peatland and prevented targeting areas urgently needing action to halt further degradation. Measured using interferometric synt…
▽ More
Peatlands account for 10% of UK land area, 80% of which are degraded to some degree, emitting carbon at a similar magnitude to oil refineries or landfill sites. A lack of tools for rapid and reliable assessment of peatland condition has limited monitoring of vast areas of peatland and prevented targeting areas urgently needing action to halt further degradation. Measured using interferometric synthetic aperture radar (InSAR), peatland surface motion is highly indicative of peatland condition, largely driven by the eco-hydrological change in the peatland causing swelling and shrinking of the peat substrate. The computational intensity of recent methods using InSAR time series to capture the annual functional structure of peatland surface motion becomes increasingly challenging as the sample size increases. Instead, we utilize the behavior of the entire peatland surface motion time series using object oriented data analysis to assess peatland condition. In a Gibbs sampling scheme, our cluster analysis based on the functional behavior of the surface motion time series finds features representative of soft/wet peatlands, drier/shrubby peatlands and thin/modified peatlands align with the clusters. The posterior distribution of the assigned peatland types enables the scale of peatland degradation to be assessed, which will guide future cost-effective decisions for peatland restoration.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
The Bayesian Spatial Bradley--Terry Model: Urban Deprivation Modeling in Tanzania
Authors:
R. G. Seymour,
D. Sirl,
S. Preston,
I. L. Dryden,
M. J. A. Ellis,
B. Perrat,
J. Goulding
Abstract:
Identifying the most deprived regions of any country or city is key if policy makers are to design successful interventions. However, locating areas with the greatest need is often surprisingly challenging in develo** countries. Due to the logistical challenges of traditional household surveying, official statistics can be slow to be updated; estimates that exist can be coarse, a consequence of…
▽ More
Identifying the most deprived regions of any country or city is key if policy makers are to design successful interventions. However, locating areas with the greatest need is often surprisingly challenging in develo** countries. Due to the logistical challenges of traditional household surveying, official statistics can be slow to be updated; estimates that exist can be coarse, a consequence of prohibitive costs and poor infrastructures; and mass urbanisation can render manually surveyed figures rapidly out-of-date. Comparative judgement models, such as the Bradley--Terry model, offer a promising solution. Leveraging local knowledge, elicited via comparisons of different areas' affluence, such models can both simplify logistics and circumvent biases inherent to house-hold surveys. Yet widespread adoption remains limited, due to the large amount of data existing approaches still require. We address this via development of a novel Bayesian Spatial Bradley--Terry model, which substantially decreases the amount of data comparisons required for effective inference. This model integrates a network representation of the city or country, along with assumptions of spatial smoothness that allow deprivation in one area to be informed by neighbouring areas. We demonstrate the practical effectiveness of this method, through a novel comparative judgement data set collected in Dar es Salaam, Tanzania.
△ Less
Submitted 28 October, 2021; v1 submitted 27 October, 2020;
originally announced October 2020.
-
Non-parametric regression for networks
Authors:
Katie E. Severn,
Ian L. Dryden,
Simon P. Preston
Abstract:
Network data are becoming increasingly available, and so there is a need to develop suitable methodology for statistical analysis. Networks can be represented as graph Laplacian matrices, which are a type of manifold-valued data. Our main objective is to estimate a regression curve from a sample of graph Laplacian matrices conditional on a set of Euclidean covariates, for example in dynamic networ…
▽ More
Network data are becoming increasingly available, and so there is a need to develop suitable methodology for statistical analysis. Networks can be represented as graph Laplacian matrices, which are a type of manifold-valued data. Our main objective is to estimate a regression curve from a sample of graph Laplacian matrices conditional on a set of Euclidean covariates, for example in dynamic networks where the covariate is time. We develop an adapted Nadaraya-Watson estimator which has uniform weak consistency for estimation using Euclidean and power Euclidean metrics. We apply the methodology to the Enron email corpus to model smooth trends in monthly networks and highlight anomalous networks. Another motivating application is given in corpus linguistics, which explores trends in an author's writing style over time based on word co-occurrence networks.
△ Less
Submitted 30 September, 2020;
originally announced October 2020.
-
Principal nested shape space analysis of molecular dynamics data
Authors:
Ian L. Dryden,
Kwang-Rae Kim,
Charles A. Laughton,
Huiling Le
Abstract:
Molecular dynamics simulations produce huge datasets of temporal sequences of molecules. It is of interest to summarize the shape evolution of the molecules in a succinct, low-dimensional representation. However, Euclidean techniques such as principal components analysis (PCA) can be problematic as the data may lie far from in a flat manifold. Principal nested spheres gives a fundamentally differe…
▽ More
Molecular dynamics simulations produce huge datasets of temporal sequences of molecules. It is of interest to summarize the shape evolution of the molecules in a succinct, low-dimensional representation. However, Euclidean techniques such as principal components analysis (PCA) can be problematic as the data may lie far from in a flat manifold. Principal nested spheres gives a fundamentally different decomposition of data from the usual Euclidean sub-space based PCA (Jung, Dryden and Marron, 2012, Biometrika). Sub-spaces of successively lower dimension are fitted to the data in a backwards manner, with the aim of retaining signal and dispensing with noise at each stage. We adapt the methodology to 3D sub-shape spaces and provide some practical fitting algorithms. The methodology is applied to cluster analysis of peptides, where different states of the molecules can be identified. Also, the temporal transitions between cluster states are explored.
△ Less
Submitted 22 March, 2019;
originally announced March 2019.
-
Manifold valued data analysis of samples of networks, with applications in corpus linguistics
Authors:
Katie E. Severn,
Ian L. Dryden,
Simon P. Preston
Abstract:
Networks arise in many applications, such as in the analysis of text documents, social interactions and brain activity. We develop a general framework for extrinsic statistical analysis of samples of networks, motivated by networks representing text documents in corpus linguistics. We identify networks with their graph Laplacian matrices, for which we define metrics, embeddings, tangent spaces, an…
▽ More
Networks arise in many applications, such as in the analysis of text documents, social interactions and brain activity. We develop a general framework for extrinsic statistical analysis of samples of networks, motivated by networks representing text documents in corpus linguistics. We identify networks with their graph Laplacian matrices, for which we define metrics, embeddings, tangent spaces, and a projection from Euclidean space to the space of graph Laplacians. This framework provides a way of computing means, performing principal component analysis and regression, and carrying out hypothesis tests, such as for testing for equality of means between two samples of networks. We apply the methodology to the set of novels by Jane Austen and Charles Dickens.
△ Less
Submitted 16 September, 2020; v1 submitted 21 February, 2019;
originally announced February 2019.
-
Smoothing splines on Riemannian manifolds, with applications to 3D shape space
Authors:
Kwang-Rae Kim,
Ian L. Dryden,
Huiling Le,
Katie E. Severn
Abstract:
There has been increasing interest in statistical analysis of data lying in manifolds. This paper generalizes a smoothing spline fitting method to Riemannian manifold data based on the technique of unrolling and unwrap** originally proposed in Jupp and Kent (1987) for spherical data. In particular we develop such a fitting procedure for shapes of configurations in general $m$-dimensional Euclide…
▽ More
There has been increasing interest in statistical analysis of data lying in manifolds. This paper generalizes a smoothing spline fitting method to Riemannian manifold data based on the technique of unrolling and unwrap** originally proposed in Jupp and Kent (1987) for spherical data. In particular we develop such a fitting procedure for shapes of configurations in general $m$-dimensional Euclidean space, extending our previous work for two dimensional shapes. We show that parallel transport along a geodesic on Kendall shape space is linked to the solution of a homogeneous first-order differential equation, some of whose coefficients are implicitly defined functions. This finding enables us to approximate the procedure of unrolling and unwrap** by simultaneously solving such equations numerically, and so to find numerical solutions for smoothing splines fitted to higher dimensional shape data. This fitting method is applied to the analysis of some dynamic 3D peptide data.
△ Less
Submitted 16 September, 2020; v1 submitted 15 January, 2018;
originally announced January 2018.
-
Mixed Effect Modelling of Single Trial Variability in Ultra-High Field fMRI
Authors:
Christopher J. Brignell,
William J. Browne,
Ian L. Dryden,
Susan T. Francis
Abstract:
Neuronal brain activity in response to repeated stimuli can be perceived using functional magnetic resonance imaging (fMRI). In this paper, we develop a statistical model for fMRI data that estimates both the associated haemodynamic response function and the within and between trial variability through maximum likelihood estimation. We discuss our results in the context of other model-driven appro…
▽ More
Neuronal brain activity in response to repeated stimuli can be perceived using functional magnetic resonance imaging (fMRI). In this paper, we develop a statistical model for fMRI data that estimates both the associated haemodynamic response function and the within and between trial variability through maximum likelihood estimation. We discuss our results in the context of other model-driven approaches, extending models already popular in the literature, while removing the need for some of their assumptions. We consider an application to the motor cortex activity caused by a subject pressing a button and observe that the response changes significantly with task and through time.
△ Less
Submitted 23 January, 2015;
originally announced January 2015.
-
Penalized Euclidean Distance Regression
Authors:
D. Vasiliu,
T. Dey,
I. L. Dryden
Abstract:
A new method is proposed for variable screening, variable selection and prediction in linear regression problems where the number of predictors can be much larger than the number of observations. The method involves minimizing a penalized Euclidean distance, where the penalty is the geometric mean of the $\ell_1$ and $\ell_2$ norms of the regression coefficients. This particular formulation exhibi…
▽ More
A new method is proposed for variable screening, variable selection and prediction in linear regression problems where the number of predictors can be much larger than the number of observations. The method involves minimizing a penalized Euclidean distance, where the penalty is the geometric mean of the $\ell_1$ and $\ell_2$ norms of the regression coefficients. This particular formulation exhibits a grou** effect, which is useful for screening out predictors in higher or ultra-high dimensional problems. Also, an important result is a signal recovery theorem, which does not require an estimate of the noise standard deviation. Practical performances of variable selection and prediction are evaluated through simulation studies and the analysis of a dataset of mass spectrometry scans from melanoma patients, where excellent predictive performance is obtained.
△ Less
Submitted 12 September, 2017; v1 submitted 18 May, 2014;
originally announced May 2014.
-
Bayesian registration of functions and curves
Authors:
Wen Cheng,
Ian L. Dryden,
Xianzheng Huang
Abstract:
Bayesian analysis of functions and curves is considered, where war** and other geometrical transformations are often required for meaningful comparisons. We focus on two applications involving the classification of mouse vertebrae shape outlines and the alignment of mass spectrometry data in proteomics. The functions and curves of interest are represented using the recently introduced square roo…
▽ More
Bayesian analysis of functions and curves is considered, where war** and other geometrical transformations are often required for meaningful comparisons. We focus on two applications involving the classification of mouse vertebrae shape outlines and the alignment of mass spectrometry data in proteomics. The functions and curves of interest are represented using the recently introduced square root velocity function, which enables a war** invariant elastic distance to be calculated in a straightforward manner. We distinguish between various spaces of interest: the original space, the ambient space after standardizing, and the quotient space after removing a group of transformations. Using Gaussian process models in the ambient space and Dirichlet priors for the war** functions, we explore Bayesian inference for curves and functions. Markov chain Monte Carlo algorithms are introduced for simulating from the posterior, including simulated tempering for multimodal posteriors. We also compare ambient and quotient space estimators for mean shape, and explain their frequent similarity in many practical problems using a Laplace approximation. A simulation study is carried out, as well as shape classification of the mouse vertebra outlines and practical alignment of the mass spectrometry functions.
△ Less
Submitted 8 November, 2013;
originally announced November 2013.
-
Discussion of "Geodesic Monte Carlo on Embedded Manifolds"
Authors:
Simon Byrne,
Mark Girolami,
Persi Diaconis,
Christof Seiler,
Susan Holmes,
Ian L. Dryden,
John T. Kent,
Marcelo Pereyra,
Babak Shahbaba,
Shiwei Lan,
Jeffrey Streets,
Daniel Simpson
Abstract:
Contributed discussion and rejoinder to "Geodesic Monte Carlo on Embedded Manifolds" (arXiv:1301.6064)
Contributed discussion and rejoinder to "Geodesic Monte Carlo on Embedded Manifolds" (arXiv:1301.6064)
△ Less
Submitted 5 November, 2013;
originally announced November 2013.
-
Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment
Authors:
Irina Czogiel,
Ian L. Dryden,
Christopher J. Brignell
Abstract:
Statistical methodology is proposed for comparing unlabeled marked point sets, with an application to aligning steroid molecules in chemoinformatics. Methods from statistical shape analysis are combined with techniques for predicting random fields in spatial statistics in order to define a suitable measure of similarity between two marked point sets. Bayesian modeling of the predicted field overla…
▽ More
Statistical methodology is proposed for comparing unlabeled marked point sets, with an application to aligning steroid molecules in chemoinformatics. Methods from statistical shape analysis are combined with techniques for predicting random fields in spatial statistics in order to define a suitable measure of similarity between two marked point sets. Bayesian modeling of the predicted field overlap between pairs of point sets is proposed, and posterior inference of the alignment is carried out using Markov chain Monte Carlo simulation. By representing the fields in reproducing kernel Hilbert spaces, the degree of overlap can be computed without expensive numerical integration. Superimposing entire fields rather than the configuration matrices of point coordinates thereby avoids the problem that there is usually no clear one-to-one correspondence between the points. In addition, mask parameters are introduced in the model, so that partial matching of the marked point sets can be carried out. We also propose an adaptation of the generalized Procrustes analysis algorithm for the simultaneous alignment of multiple point sets. The methodology is illustrated with a simulation study and then applied to a data set of 31 steroid molecules, where the relationship between shape and binding activity to the corticosteroid binding globulin receptor is explored.
△ Less
Submitted 1 March, 2012;
originally announced March 2012.
-
Non-Euclidean statistical analysis of covariance matrices and diffusion tensors
Authors:
Ian L. Dryden,
Alexey Kolydenko,
Diwei Zhou,
Bai Li
Abstract:
The statistical analysis of covariance matrices occurs in many important applications, e.g. in diffusion tensor imaging and longitudinal data analysis. We consider the situation where it is of interest to estimate an average covariance matrix, describe its anisotropy, to carry out principal geodesic analysis and to interpolate between covariance matrices. There are many choices of metric available…
▽ More
The statistical analysis of covariance matrices occurs in many important applications, e.g. in diffusion tensor imaging and longitudinal data analysis. We consider the situation where it is of interest to estimate an average covariance matrix, describe its anisotropy, to carry out principal geodesic analysis and to interpolate between covariance matrices. There are many choices of metric available, each with its advantages. The particular choice of what is best will depend on the particular application. The use of the Procrustes size-and-shape metric is particularly appropriate when the covariance matrices are close to being deficient in rank. We discuss the use of different metrics for diffusion tensor analysis, and we also introduce certain types of regularization for tensors.
△ Less
Submitted 10 October, 2010;
originally announced October 2010.
-
Bayesian matching of unlabelled point sets using Procrustes and configuration models
Authors:
Kim Kenobi,
Ian L. Dryden
Abstract:
The problem of matching unlabelled point sets using Bayesian inference is considered. Two recently proposed models for the likelihood are compared, based on the Procrustes size-and-shape and the full configuration. Bayesian inference is carried out for matching point sets using Markov chain Monte Carlo simulation. An improvement to the existing Procrustes algorithm is proposed which improves conve…
▽ More
The problem of matching unlabelled point sets using Bayesian inference is considered. Two recently proposed models for the likelihood are compared, based on the Procrustes size-and-shape and the full configuration. Bayesian inference is carried out for matching point sets using Markov chain Monte Carlo simulation. An improvement to the existing Procrustes algorithm is proposed which improves convergence rates, using occasional large jumps in the burn-in period. The Procrustes and configuration methods are compared in a simulation study and using real data, where it is of interest to estimate the strengths of matches between protein binding sites. The performance of both methods is generally quite similar, and a connection between the two models is made using a Laplace approximation.
△ Less
Submitted 15 September, 2010;
originally announced September 2010.
-
Power Euclidean metrics for covariance matrices with application to diffusion tensor imaging
Authors:
Ian L. Dryden,
Xavier Pennec,
Jean-Marc Peyrat
Abstract:
Various metrics for comparing diffusion tensors have been recently proposed in the literature. We consider a broad family of metrics which is indexed by a single power parameter. A likelihood-based procedure is developed for choosing the most appropriate metric from the family for a given dataset at hand. The approach is analogous to using the Box-Cox transformation that is frequently investigated…
▽ More
Various metrics for comparing diffusion tensors have been recently proposed in the literature. We consider a broad family of metrics which is indexed by a single power parameter. A likelihood-based procedure is developed for choosing the most appropriate metric from the family for a given dataset at hand. The approach is analogous to using the Box-Cox transformation that is frequently investigated in regression analysis. The methodology is illustrated with a simulation study and an application to a real dataset of diffusion tensor images of canine hearts.
△ Less
Submitted 15 September, 2010;
originally announced September 2010.
-
Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging
Authors:
Ian L. Dryden,
Alexey Koloydenko,
Diwei Zhou
Abstract:
The statistical analysis of covariance matrix data is considered and, in particular, methodology is discussed which takes into account the non-Euclidean nature of the space of positive semi-definite symmetric matrices. The main motivation for the work is the analysis of diffusion tensors in medical image analysis. The primary focus is on estimation of a mean covariance matrix and, in particular,…
▽ More
The statistical analysis of covariance matrix data is considered and, in particular, methodology is discussed which takes into account the non-Euclidean nature of the space of positive semi-definite symmetric matrices. The main motivation for the work is the analysis of diffusion tensors in medical image analysis. The primary focus is on estimation of a mean covariance matrix and, in particular, on the use of Procrustes size-and-shape space. Comparisons are made with other estimation techniques, including using the matrix logarithm, matrix square root and Cholesky decomposition. Applications to diffusion tensor imaging are considered and, in particular, a new measure of fractional anisotropy called Procrustes Anisotropy is discussed.
△ Less
Submitted 9 October, 2009;
originally announced October 2009.