-
Message-Passing on Hypergraphs: Detectability, Phase Transitions and Higher-Order Information
Authors:
Nicolò Ruggeri,
Alessandro Lonardi,
Caterina De Bacco
Abstract:
Hypergraphs are widely adopted tools to examine systems with higher-order interactions. Despite recent advancements in methods for community detection in these systems, we still lack a theoretical analysis of their detectability limits. Here, we derive closed-form bounds for community detection in hypergraphs. Using a Message-Passing formulation, we demonstrate that detectability depends on hyperg…
▽ More
Hypergraphs are widely adopted tools to examine systems with higher-order interactions. Despite recent advancements in methods for community detection in these systems, we still lack a theoretical analysis of their detectability limits. Here, we derive closed-form bounds for community detection in hypergraphs. Using a Message-Passing formulation, we demonstrate that detectability depends on hypergraphs' structural properties, such as the distribution of hyperedge sizes or their assortativity. Our formulation enables a characterization of the entropy of a hypergraph in relation to that of its clique expansion, showing that community detection is enhanced when hyperedges highly overlap on pairs of nodes. We develop an efficient Message-Passing algorithm to learn communities and model parameters on large systems. Additionally, we devise an exact sampling routine to generate synthetic data from our probabilistic model. With these methods, we numerically investigate the boundaries of community detection in synthetic datasets, and extract communities from real systems. Our results extend the understanding of the limits of community detection in hypergraphs and introduce flexible mathematical tools to study systems with higher-order interactions.
△ Less
Submitted 24 April, 2024; v1 submitted 1 December, 2023;
originally announced December 2023.
-
Hypergraphs with node attributes: structure and inference
Authors:
Anna Badalyan,
Nicolò Ruggeri,
Caterina De Bacco
Abstract:
Many networked datasets with units interacting in groups of two or more, encoded with hypergraphs, are accompanied by extra information about nodes, such as the role of an individual in a workplace. Here we show how these node attributes can be used to improve our understanding of the structure resulting from higher-order interactions. We consider the problem of community detection in hypergraphs…
▽ More
Many networked datasets with units interacting in groups of two or more, encoded with hypergraphs, are accompanied by extra information about nodes, such as the role of an individual in a workplace. Here we show how these node attributes can be used to improve our understanding of the structure resulting from higher-order interactions. We consider the problem of community detection in hypergraphs and develop a principled model that combines higher-order interactions and node attributes to better represent the observed interactions and to detect communities more accurately than using either of these types of information alone. The method learns automatically from the input data the extent to which structure and attributes contribute to explain the data, down weighing or discarding attributes if not informative. Our algorithmic implementation is efficient and scales to large hypergraphs and interactions of large numbers of units. We apply our method to a variety of systems, showing strong performance in hyperedge prediction tasks and in selecting community divisions that correlate with attributes when these are informative, but discarding them otherwise. Our approach illustrates the advantage of using informative node attributes when available with higher-order data.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Hypergraphx: a library for higher-order network analysis
Authors:
Quintino Francesco Lotito,
Martina Contisciani,
Caterina De Bacco,
Leonardo Di Gaetano,
Luca Gallo,
Alberto Montresor,
Federico Musciotto,
Nicolò Ruggeri,
Federico Battiston
Abstract:
From social to biological systems, many real-world systems are characterized by higher-order, non-dyadic interactions. Such systems are conveniently described by hypergraphs, where hyperedges encode interactions among an arbitrary number of units. Here, we present an open-source python library, hypergraphx (HGX), providing a comprehensive collection of algorithms and functions for the analysis of…
▽ More
From social to biological systems, many real-world systems are characterized by higher-order, non-dyadic interactions. Such systems are conveniently described by hypergraphs, where hyperedges encode interactions among an arbitrary number of units. Here, we present an open-source python library, hypergraphx (HGX), providing a comprehensive collection of algorithms and functions for the analysis of higher-order networks. These include different ways to convert data across distinct higher-order representations, a large variety of measures of higher-order organization at the local and the mesoscale, statistical filters to sparsify higher-order data, a wide array of static and dynamic generative models, and an implementation of different dynamical processes with higher-order interactions. Our computational framework is general, and allows to analyse hypergraphs with weighted, directed, signed, temporal and multiplex group interactions. We provide visual insights on higher-order data through a variety of different visualization tools. We accompany our code with an extended higher-order data repository, and demonstrate the ability of HGX to analyse real-world systems through a systematic analysis of a social network with higher-order interactions. The library is conceived as an evolving, community-based effort, which will further extend its functionalities over the years. Our software is available at https://github.com/HGX-Team/hypergraphx
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Community Detection in Large Hypergraphs
Authors:
Nicolò Ruggeri,
Martina Contisciani,
Federico Battiston,
Caterina De Bacco
Abstract:
Hypergraphs, describing networks where interactions take place among any number of units, are a natural tool to model many real-world social and biological systems. In this work we propose a principled framework to model the organization of higher-order data. Our approach recovers community structure with accuracy exceeding that of currently available state-of-the-art algorithms, as tested in synt…
▽ More
Hypergraphs, describing networks where interactions take place among any number of units, are a natural tool to model many real-world social and biological systems. In this work we propose a principled framework to model the organization of higher-order data. Our approach recovers community structure with accuracy exceeding that of currently available state-of-the-art algorithms, as tested in synthetic benchmarks with both hard and overlap** ground-truth partitions. Our model is flexible and allows capturing both assortative and disassortative community structures. Moreover, our method scales orders of magnitude faster than competing algorithms, making it suitable for the analysis of very large hypergraphs, containing millions of nodes and interactions among thousands of nodes. Our work constitutes a practical and general tool for hypergraph analysis, broadening our understanding of the organization of real-world higher-order systems.
△ Less
Submitted 3 July, 2023; v1 submitted 26 January, 2023;
originally announced January 2023.
-
A framework to generate hypergraphs with community structure
Authors:
Nicolò Ruggeri,
Federico Battiston,
Caterina De Bacco
Abstract:
In recent years hypergraphs have emerged as a powerful tool to study systems with multi-body interactions which cannot be trivially reduced to pairs. While highly structured methods to generate synthetic data have proved fundamental for the standardized evaluation of algorithms and the statistical study of real-world networked data, these are scarcely available in the context of hypergraphs. Here…
▽ More
In recent years hypergraphs have emerged as a powerful tool to study systems with multi-body interactions which cannot be trivially reduced to pairs. While highly structured methods to generate synthetic data have proved fundamental for the standardized evaluation of algorithms and the statistical study of real-world networked data, these are scarcely available in the context of hypergraphs. Here we propose a flexible and efficient framework for the generation of hypergraphs with many nodes and large hyperedges, which allows specifying general community structures and tune different local statistics. We illustrate how to use our model to sample synthetic data with desired features (assortative or disassortative communities, mixed or hard community assignments, etc.), analyze community detection algorithms, and generate hypergraphs structurally similar to real-world data. Overcoming previous limitations on the generation of synthetic hypergraphs, our work constitutes a substantial advancement in the statistical modeling of higher-order systems.
△ Less
Submitted 22 June, 2023; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Sampling on networks: estimating spectral centrality measures and their impact in evaluating other relevant network measures
Authors:
Nicolò Ruggeri,
Caterina De Bacco
Abstract:
We perform an extensive analysis of how sampling impacts the estimate of several relevant network measures.
In particular, we focus on how a sampling strategy optimized to recover a particular spectral centrality measure impacts other topological quantities. Our goal is on one hand to extend the analysis of the behavior of TCEC [Ruggeri2019], a theoretically-grounded sampling method for eigenvec…
▽ More
We perform an extensive analysis of how sampling impacts the estimate of several relevant network measures.
In particular, we focus on how a sampling strategy optimized to recover a particular spectral centrality measure impacts other topological quantities. Our goal is on one hand to extend the analysis of the behavior of TCEC [Ruggeri2019], a theoretically-grounded sampling method for eigenvector centrality estimation.
On the other hand, to demonstrate more broadly how sampling can impact the estimation of relevant network properties like centrality measures different than the one aimed at optimizing, community structure and node attribute distribution.
Finally, we adapt the theoretical framework behind TCEC for the case of PageRank centrality and propose a sampling algorithm aimed at optimizing its estimation. We show that, while the theoretical derivation can be suitably adapted to cover this case, the resulting algorithm suffers of a high computational complexity that requires further approximations compared to the eigenvector centrality case.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Sampling on networks: estimating eigenvector centrality on incomplete graphs
Authors:
Nicolò Ruggeri,
Caterina De Bacco
Abstract:
We develop a new sampling method to estimate eigenvector centrality on incomplete networks. Our goal is to estimate this global centrality measure having at disposal a limited amount of data. This is the case in many real-world scenarios where data collection is expensive, the network is too big for data storage capacity or only partial information is available. The sampling algorithm is theoretic…
▽ More
We develop a new sampling method to estimate eigenvector centrality on incomplete networks. Our goal is to estimate this global centrality measure having at disposal a limited amount of data. This is the case in many real-world scenarios where data collection is expensive, the network is too big for data storage capacity or only partial information is available. The sampling algorithm is theoretically grounded by results derived from spectral approximation theory. We studied the problem on both synthetic and real data and tested the performance comparing with traditional methods, such as random walk and uniform sampling. We show that approximations obtained from such methods are not always reliable and that our algorithm, while preserving computational scalability, improves performance under different error measures.
△ Less
Submitted 1 August, 2019;
originally announced August 2019.