-
Modelling Populations of Interaction Networks via Distance Metrics
Authors:
George Bolt,
Simón Lunagómez,
Christopher Nemeth
Abstract:
Network data arises through observation of relational information between a collection of entities. Recent work in the literature has independently considered when (i) one observes a sample of networks, connectome data in neuroscience being a ubiquitous example, and (ii) the units of observation within a network are edges or paths, such as emails between people or a series of page visits to a webs…
▽ More
Network data arises through observation of relational information between a collection of entities. Recent work in the literature has independently considered when (i) one observes a sample of networks, connectome data in neuroscience being a ubiquitous example, and (ii) the units of observation within a network are edges or paths, such as emails between people or a series of page visits to a website by a user, often referred to as interaction network data. The intersection of these two cases, however, is yet to be considered. In this paper, we propose a new Bayesian modelling framework to analyse such data. Given a practitioner-specified distance metric between observations, we define families of models through location and scale parameters, akin to a Gaussian distribution, with subsequent inference of model parameters providing reasoned statistical summaries for this non-standard data structure. To facilitate inference, we propose specialised Markov chain Monte Carlo (MCMC) schemes capable of sampling from doubly-intractable posterior distributions over discrete and multi-dimensional parameter spaces. Through simulation studies we confirm the efficacy of our methodology and inference scheme, whilst its application we illustrate via an example analysis of a location-based social network (LSBN) data set.
△ Less
Submitted 20 June, 2022;
originally announced June 2022.
-
Distances for Comparing Multisets and Sequences
Authors:
George Bolt,
Simón Lunagómez,
Christopher Nemeth
Abstract:
Measuring the distance between data points is fundamental to many statistical techniques, such as dimension reduction or clustering algorithms. However, improvements in data collection technologies has led to a growing versatility of structured data for which standard distance measures are inapplicable. In this paper, we consider the problem of measuring the distance between sequences and multiset…
▽ More
Measuring the distance between data points is fundamental to many statistical techniques, such as dimension reduction or clustering algorithms. However, improvements in data collection technologies has led to a growing versatility of structured data for which standard distance measures are inapplicable. In this paper, we consider the problem of measuring the distance between sequences and multisets of points lying within a metric space, motivated by the analysis of an in-play football data set. Drawing on the wider literature, including that of time series analysis and optimal transport, we discuss various distances which are available in such an instance. For each distance, we state and prove theoretical properties, proposing possible extensions where they fail. Finally, via an example analysis of the in-play football data, we illustrate the usefulness of these distances in practice.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Wrapped Distributions on homogeneous Riemannian manifolds
Authors:
Fernando Galaz-Garcia,
Marios Papamichalis,
Kathryn Turnbull,
Simon Lunagomez,
Edoardo Airoldi
Abstract:
We provide a general framework for constructing probability distributions on Riemannian manifolds, taking advantage of area-preserving maps and isometries. Control over distributions' properties, such as parameters, symmetry and modality yield a family of flexible distributions that are straightforward to sample from, suitable for use within Monte Carlo algorithms and latent variable models, such…
▽ More
We provide a general framework for constructing probability distributions on Riemannian manifolds, taking advantage of area-preserving maps and isometries. Control over distributions' properties, such as parameters, symmetry and modality yield a family of flexible distributions that are straightforward to sample from, suitable for use within Monte Carlo algorithms and latent variable models, such as autoencoders. As an illustration, we empirically validate our approach by utilizing our proposed distributions within a variational autoencoder and a latent space network model. Finally, we take advantage of the generalized description of this framework to posit questions for future work.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
Bayesian modelling and computation utilising cycles in multiple network data
Authors:
Anastasia Mantziou,
Robin Mitra,
Simon Lunagomez
Abstract:
Modelling multiple network data is crucial for addressing a wide range of applied research questions. However, there are many challenges, both theoretical and computational, to address. Network cycles are often of particular interest in many applications, such as ecological studies, and an unexplored area has been how to incorporate networks' cycles within the inferential framework in an explicit…
▽ More
Modelling multiple network data is crucial for addressing a wide range of applied research questions. However, there are many challenges, both theoretical and computational, to address. Network cycles are often of particular interest in many applications, such as ecological studies, and an unexplored area has been how to incorporate networks' cycles within the inferential framework in an explicit way. The recently developed Spherical Network Family of models (SNF) offers a flexible formulation for modelling multiple network data that permits any type of metric. This has opened up the possibility to formulate network models that focus on network properties hitherto not possible or practical to consider. In this article we propose a novel network distance metric that measures similarities between networks with respect to their cycles, and incorporate this within the SNF model to allow inferences that explicitly capture information on cycles. These network motifs are of particular interest in ecological studies. We further propose a novel computational framework to allow posterior inferences from the intractable SNF model for moderate sized networks. Lastly, we apply the resulting methodology to a set of ecological network data studying aggressive interactions between species of fish. We show our model is able to make cogent inferences concerning the cycle behaviour amongst the species, and beyond those possible from a model that does not consider this network motif.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
Latent Space Network Modelling with Hyperbolic and Spherical Geometries
Authors:
Marios Papamichalis,
Kathryn Turnbull,
Simon Lunagomez,
Edoardo Airoldi
Abstract:
A rich class of network models associate each node with a low-dimensional latent coordinate that controls the propensity for connections to form. Models of this type are well established in the network analysis literature, where it is typical to assume that the underlying geometry is Euclidean. Recent work has explored the consequences of this choice and has motivated the study of models which rel…
▽ More
A rich class of network models associate each node with a low-dimensional latent coordinate that controls the propensity for connections to form. Models of this type are well established in the network analysis literature, where it is typical to assume that the underlying geometry is Euclidean. Recent work has explored the consequences of this choice and has motivated the study of models which rely on non-Euclidean latent geometries, with a primary focus on spherical and hyperbolic geometry. In this paper, we examine to what extent latent features can be inferred from the observable links in the network, considering network models which rely on spherical and hyperbolic geometries. For each geometry, we describe a latent space network model, detail constraints on the latent coordinates which remove the well-known identifiability issues, and present Bayesian estimation schemes. Thus, we develop computational procedures to perform inference for network models in which the properties of the underlying geometry play a vital role. Finally, we assess the validity of these models on real data.
△ Less
Submitted 10 February, 2022; v1 submitted 7 September, 2021;
originally announced September 2021.
-
Bayesian model-based clustering for populations of network data
Authors:
Anastasia Mantziou,
Simon Lunagomez,
Robin Mitra
Abstract:
There is increasing appetite for analysing populations of network data due to the fast-growing body of applications demanding such methods. While methods exist to provide readily interpretable summaries of heterogeneous network populations, these are often descriptive or ad hoc, lacking any formal justification. In contrast, principled analysis methods often provide results difficult to relate bac…
▽ More
There is increasing appetite for analysing populations of network data due to the fast-growing body of applications demanding such methods. While methods exist to provide readily interpretable summaries of heterogeneous network populations, these are often descriptive or ad hoc, lacking any formal justification. In contrast, principled analysis methods often provide results difficult to relate back to the applied problem of interest. Motivated by two complementary applied examples, we develop a Bayesian framework to appropriately model complex heterogeneous network populations, whilst also allowing analysts to gain insights from the data, and make inferences most relevant to their needs. The first application involves a study in Computer Science measuring human movements across a University. The second analyses data from Neuroscience investigating relationships between different regions of the brain. While both applications entail analysis of a heterogeneous population of networks, network sizes vary considerably. We focus on the problem of clustering the elements of a network population, where each cluster is characterised by a network representative. We take advantage of the Bayesian machinery to simultaneously infer the cluster membership, the representatives, and the community structure of the representatives, thus allowing intuitive inferences to be made. The implementation of our method on the human movement study reveals interesting movement patterns of individuals in clusters, readily characterised by their network representative. For the brain networks application, our model reveals a cluster of individuals with different network properties of particular interest in Neuroscience. The performance of our method is additionally validated in extensive simulation studies.
△ Less
Submitted 20 June, 2023; v1 submitted 7 July, 2021;
originally announced July 2021.
-
Robustness on Networks
Authors:
Marios Papamichalis,
Simon Lunagomez,
Patrick J. Wolfe
Abstract:
We adopt the statistical framework on robustness proposed by Watson and Holmes in 2016 and then tackle the practical challenges that hinder its applicability to network models. The goal is to evaluate how the quality of an inference for a network feature degrades when the assumed model is misspecified. Decision theory methods aimed to identify model missespecification are applied in the context of…
▽ More
We adopt the statistical framework on robustness proposed by Watson and Holmes in 2016 and then tackle the practical challenges that hinder its applicability to network models. The goal is to evaluate how the quality of an inference for a network feature degrades when the assumed model is misspecified. Decision theory methods aimed to identify model missespecification are applied in the context of network data with the goal of investigating the stability of optimal actions to perturbations to the assumed model. Here the modified versions of the model are contained within a well defined neighborhood of model space. Our main challenge is to combine stochastic optimization and graph limits tools to explore the model space. As a result, a method for robustness on exchangeable random networks is developed. Our approach is inspired by recent developments in the context of robustness and recent works in the robust control, macroeconomics and financial mathematics literature and more specifically and is based on the concept of graphon approximation through its empirical graphon.
△ Less
Submitted 4 December, 2020;
originally announced December 2020.
-
Lasso for hierarchical polynomial models
Authors:
Hugo Maruri-Aguilar,
Simon Lunagomez
Abstract:
In a polynomial regression model, the divisibility conditions implicit in polynomial hierarchy give way to a natural construction of constraints for the model parameters. We use this principle to derive versions of strong and weak hierarchy and to extend existing work in the literature, which at the moment is only concerned with models of degree two. We discuss how to estimate parameters in lasso…
▽ More
In a polynomial regression model, the divisibility conditions implicit in polynomial hierarchy give way to a natural construction of constraints for the model parameters. We use this principle to derive versions of strong and weak hierarchy and to extend existing work in the literature, which at the moment is only concerned with models of degree two. We discuss how to estimate parameters in lasso using standard quadratic programming techniques and apply our proposal to both simulated data and examples from the literature. The proposed methodology compares favorably with existing techniques in terms of low validation error and model size.
△ Less
Submitted 21 January, 2020;
originally announced January 2020.
-
Latent Space Modelling of Hypergraph Data
Authors:
Kathryn Turnbull,
Simón Lunagómez,
Christopher Nemeth,
Edoardo Airoldi
Abstract:
The increasing prevalence of relational data describing interactions among a target population has motivated a wide literature on statistical network analysis. In many applications, interactions may involve more than two members of the population and this data is more appropriately represented by a hypergraph. In this paper, we present a model for hypergraph data which extends the well established…
▽ More
The increasing prevalence of relational data describing interactions among a target population has motivated a wide literature on statistical network analysis. In many applications, interactions may involve more than two members of the population and this data is more appropriately represented by a hypergraph. In this paper, we present a model for hypergraph data which extends the well established latent space approach for graphs and, by drawing a connection to constructs from computational topology, we develop a model whose likelihood is inexpensive to compute. A delayed-acceptance MCMC scheme is proposed to obtain posterior samples and we rely on Bookstein coordinates to remove the identifiability issues associated with the latent representation. We theoretically examine the degree distribution of hypergraphs generated under our framework and, through simulation, we investigate the flexibility of our model and consider estimation of predictive distributions. Finally, we explore the application of our model to two real-world datasets.
△ Less
Submitted 2 November, 2021; v1 submitted 1 September, 2019;
originally announced September 2019.
-
Modeling Network Populations via Graph Distances
Authors:
Simón Lunagómez,
Sofia C. Olhede,
Patrick J. Wolfe
Abstract:
This article introduces a new class of models for multiple networks. The core idea is to parametrize a distribution on labelled graphs in terms of a Fréchet mean graph (which depends on a user-specified choice of metric or graph distance) and a parameter that controls the concentration of this distribution about its mean. Entropy is the natural parameter for such control, varying from a point mass…
▽ More
This article introduces a new class of models for multiple networks. The core idea is to parametrize a distribution on labelled graphs in terms of a Fréchet mean graph (which depends on a user-specified choice of metric or graph distance) and a parameter that controls the concentration of this distribution about its mean. Entropy is the natural parameter for such control, varying from a point mass concentrated on the Fréchet mean itself to a uniform distribution over all graphs on a given vertex set. We provide a hierarchical Bayesian approach for exploiting this construction, along with straightforward strategies for sampling from the resultant posterior distribution. We conclude by demonstrating the efficacy of our approach via simulation studies and two multiple-network data analysis examples: one drawn from systems biology and the other from neuroscience.
△ Less
Submitted 6 March, 2020; v1 submitted 15 April, 2019;
originally announced April 2019.
-
Evaluating and Optimizing Network Sampling Designs: Decision Theory and Information Theory Perspectives
Authors:
Simón Lunagómez,
Marios Papamichalis,
Patrick J. Wolfe,
Edoardo M. Airoldi
Abstract:
Some of the most used sampling mechanisms that implicitly leverage a social network depend on tuning parameters; for instance, Respondent-Driven Sampling (RDS) is specified by the number of seeds and maximum number of referrals. We are interested in the problem of optimizing these sampling mechanisms with respect to their tuning parameters in order to optimize the inference on a population quantit…
▽ More
Some of the most used sampling mechanisms that implicitly leverage a social network depend on tuning parameters; for instance, Respondent-Driven Sampling (RDS) is specified by the number of seeds and maximum number of referrals. We are interested in the problem of optimizing these sampling mechanisms with respect to their tuning parameters in order to optimize the inference on a population quantity, where such quantity is a function of the network and measurements taken at the nodes. This is done by formulating the problem in terms of decision theory and information theory, in turn. We discuss how the approaches discussed in this paper relate, via theoretical results, to other formalisms aimed to compare sampling designs, namely sufficiency and the Goel-DeGroot Criterion. The optimization procedure for different network sampling mechanisms is illustrated via simulations in the fashion of the ones used for Bayesian clinical trials.
△ Less
Submitted 5 December, 2019; v1 submitted 19 November, 2018;
originally announced November 2018.
-
Bayesian Inference from Non-Ignorable Network Sampling Designs
Authors:
Simon Lunagomez,
Edoardo Airoldi
Abstract:
Consider a population of individuals and a network that encodes social connections among them. We are interested in making inference on finite population and super-population estimands that are a function of both individuals' responses and of the network, from a sample. Neither the sampling frame nor the network are available. However, the sampling mechanism implicitly leverages the network to rec…
▽ More
Consider a population of individuals and a network that encodes social connections among them. We are interested in making inference on finite population and super-population estimands that are a function of both individuals' responses and of the network, from a sample. Neither the sampling frame nor the network are available. However, the sampling mechanism implicitly leverages the network to recruit individuals, thus partially revealing social interactions among the individuals in the sample, as well as their responses. This is a common setting that arises, for instance, in epidemiology and healthcare, where samples from hard-to-reach populations are collected using link-tracing mechanisms, including respondent-driven sampling. In this paper, we study statistical properties of popular network sampling mechanisms. We formulate the estimation problem in terms of Rubin's inferential framework to explicitly account for social network structure. We then identify key modeling elements that lead to inferences with good frequentist properties when dealing with data collected through non-ignorable network sampling mechanisms. We demonstrate these methods on a study of the incidence of HIV in Brazil
△ Less
Submitted 9 December, 2016; v1 submitted 19 January, 2014;
originally announced January 2014.
-
Geometric Representations of Random Hypergraphs
Authors:
Simón Lunagómez,
Sayan Mukherjee,
Robert L. Wolpert,
Edoardo M. Airoldi
Abstract:
A parametrization of hypergraphs based on the geometry of points in $\mathbf{R}^d$ is developed. Informative prior distributions on hypergraphs are induced through this parametrization by priors on point configurations via spatial processes. This prior specification is used to infer conditional independence models or Markov structure of multivariate distributions. Specifically, we can recover both…
▽ More
A parametrization of hypergraphs based on the geometry of points in $\mathbf{R}^d$ is developed. Informative prior distributions on hypergraphs are induced through this parametrization by priors on point configurations via spatial processes. This prior specification is used to infer conditional independence models or Markov structure of multivariate distributions. Specifically, we can recover both the junction tree factorization as well as the hyper Markov law. This approach offers greater control on the distribution of graph features than Erdös-Rényi random graphs, supports inference of factorizations that cannot be retrieved by a graph alone, and leads to new Metropolis\slash Hastings Markov chain Monte Carlo algorithms with both local and global moves in graph space. We illustrate the utility of this parametrization and prior specification using simulations.
△ Less
Submitted 12 April, 2015; v1 submitted 18 December, 2009;
originally announced December 2009.