-
BBK: a simpler, faster algorithm for enumerating maximal bicliques in large sparse bipartite graphs
Authors:
Alexis Baudin,
Clémence Magnien,
Lionel Tabourier
Abstract:
Bipartite graphs are a prevalent modeling tool for real-world networks, capturing interactions between vertices of two different types. Within this framework, bicliques emerge as crucial structures when studying dense subgraphs: they are sets of vertices such that all vertices of the first type interact with all vertices of the second type. Therefore, they allow identifying groups of closely relat…
▽ More
Bipartite graphs are a prevalent modeling tool for real-world networks, capturing interactions between vertices of two different types. Within this framework, bicliques emerge as crucial structures when studying dense subgraphs: they are sets of vertices such that all vertices of the first type interact with all vertices of the second type. Therefore, they allow identifying groups of closely related vertices of the network, such as individuals with similar interests or webpages with similar contents. This article introduces a new algorithm designed for the exhaustive enumeration of maximal bicliques within a bipartite graph. This algorithm, called BBK for Bipartite Bron-Kerbosch, is a new extension to the bipartite case of the Bron-Kerbosch algorithm, which enumerates the maximal cliques in standard (non-bipartite) graphs. It is faster than the state-of-the-art algorithms and allows the enumeration on massive bipartite graphs that are not manageable with existing implementations. We analyze it theoretically to establish two complexity formulas: one as a function of the input and one as a function of the output characteristics of the algorithm. We also provide an open-access implementation of BBK in C++, which we use to experiment and validate its efficiency on massive real-world datasets and show that its execution time is shorter in practice than state-of-the art algorithms. These experiments also show that the order in which the vertices are processed, as well as the choice of one of the two types of vertices on which to initiate the enumeration have an impact on the computation time.
△ Less
Submitted 24 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
LSCPM: communities in massive real-world Link Streams by Clique Percolation Method
Authors:
Alexis Baudin,
Lionel Tabourier,
Clémence Magnien
Abstract:
Community detection is a popular approach to understand the organization of interactions in static networks. For that purpose, the Clique Percolation Method (CPM), which involves the percolation of k-cliques, is a well-studied technique that offers several advantages. Besides, studying interactions that occur over time is useful in various contexts, which can be modeled by the link stream formalis…
▽ More
Community detection is a popular approach to understand the organization of interactions in static networks. For that purpose, the Clique Percolation Method (CPM), which involves the percolation of k-cliques, is a well-studied technique that offers several advantages. Besides, studying interactions that occur over time is useful in various contexts, which can be modeled by the link stream formalism. The Dynamic Clique Percolation Method (DCPM) has been proposed for extending CPM to temporal networks.
However, existing implementations are unable to handle massive datasets. We present a novel algorithm that adapts CPM to link streams, which has the advantage that it allows us to speed up the computation time with respect to the existing DCPM method. We evaluate it experimentally on real datasets and show that it scales to massive link streams. For example, it allows to obtain a complete set of communities in under twenty-five minutes for a dataset with thirty million links, what the state of the art fails to achieve even after a week of computation. We further show that our method provides communities similar to DCPM, but slightly more aggregated. We exhibit the relevance of the obtained communities in real world cases, and show that they provide information on the importance of vertices in the link streams.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Faster maximal clique enumeration in large real-world link streams
Authors:
Alexis Baudin,
Clémence Magnien,
Lionel Tabourier
Abstract:
Link streams offer a good model for representing interactions over time. They consist of links $(b,e,u,v)$, where $u$ and $v$ are vertices interacting during the whole time interval $[b,e]$. In this paper, we deal with the problem of enumerating maximal cliques in link streams. A clique is a pair $(C,[t_0,t_1])$, where $C$ is a set of vertices that all interact pairwise during the full interval…
▽ More
Link streams offer a good model for representing interactions over time. They consist of links $(b,e,u,v)$, where $u$ and $v$ are vertices interacting during the whole time interval $[b,e]$. In this paper, we deal with the problem of enumerating maximal cliques in link streams. A clique is a pair $(C,[t_0,t_1])$, where $C$ is a set of vertices that all interact pairwise during the full interval $[t_0,t_1]$. It is maximal when neither its set of vertices nor its time interval can be increased. Some of the main works solving this problem are based on the famous Bron-Kerbosch algorithm for enumerating maximal cliques in graphs. We take this idea as a starting point to propose a new algorithm which matches the cliques of the instantaneous graphs formed by links existing at a given time $t$ to the maximal cliques of the link stream. We prove its validity and compute its complexity, which is better than the state-of-the art ones in many cases of interest. We also study the output-sensitive complexity, which is close to the output size, thereby showing that our algorithm is efficient. To confirm this, we perform experiments on link streams used in the state of the art, and on massive link streams, up to 100 million links. In all cases our algorithm is faster, mostly by a factor of at least 10 and up to a factor of $10^4$. Moreover, it scales to massive link streams for which the existing algorithms are not able to provide the solution.
△ Less
Submitted 24 May, 2024; v1 submitted 1 February, 2023;
originally announced February 2023.
-
Compressing bipartite graphs with a dual reordering scheme
Authors:
Maximilien Danisch,
Ioannis Panagiotas,
Lionel Tabourier
Abstract:
In order to manage massive graphs in practice, it is often necessary to resort to graph compression, which aims at reducing the memory used when storing and processing the graph. Efficient compression methods have been proposed in the literature, especially for web graphs. In most cases, they are combined with a vertex reordering pre-processing step which significantly improves the compression rat…
▽ More
In order to manage massive graphs in practice, it is often necessary to resort to graph compression, which aims at reducing the memory used when storing and processing the graph. Efficient compression methods have been proposed in the literature, especially for web graphs. In most cases, they are combined with a vertex reordering pre-processing step which significantly improves the compression rate. However, these techniques are not as efficient when considering other kinds of graphs. In this paper, we focus on the class of bipartite graphs and adapt the vertex reordering phase to their specific structure by proposing a dual reordering scheme. By reordering each group of vertices in the purpose of minimizing a specific score, we show that we can reach better compression rates. We also suggest that this approach can be further refined to make the node orderings more adapted to the compression phase that follows the ordering phase.
△ Less
Submitted 11 January, 2023; v1 submitted 24 September, 2022;
originally announced September 2022.
-
Tailored vertex ordering for faster triangle listing in large graphs
Authors:
Fabrice Lécuyer,
Louis Jachiet,
Clémence Magnien,
Lionel Tabourier
Abstract:
Listing triangles is a fundamental graph problem with many applications, and large graphs require fast algorithms. Vertex ordering allows the orientation of edges from lower to higher vertex indices, and state-of-the-art triangle listing algorithms use this to accelerate their execution and to bound their time complexity. Yet, only basic orderings have been tested. In this paper, we show that stud…
▽ More
Listing triangles is a fundamental graph problem with many applications, and large graphs require fast algorithms. Vertex ordering allows the orientation of edges from lower to higher vertex indices, and state-of-the-art triangle listing algorithms use this to accelerate their execution and to bound their time complexity. Yet, only basic orderings have been tested. In this paper, we show that studying the precise cost of algorithms instead of their bounded complexity leads to faster solutions. We introduce cost functions that link ordering properties with the running time of a given algorithm. We prove that their minimization is NP-hard and propose heuristics to obtain new orderings with different trade-offs between cost reduction and ordering time. Using datasets with up to two billion edges, we show that our heuristics accelerate the listing of triangles by an average of 38% when the ordering is already given as an input, and 16% when the ordering time is included.
△ Less
Submitted 2 November, 2022; v1 submitted 9 March, 2022;
originally announced March 2022.
-
Testing the Impact of Semantics and Structure on Recommendation Accuracy and Diversity
Authors:
Pedro Ramaciotti Morales,
Lionel Tabourier,
Raphaël Fournier-S'niehotta
Abstract:
The Heterogeneous Information Network (HIN) formalism is very flexible and enables complex recommendations models. We evaluate the effect of different parts of a HIN on the accuracy and the diversity of recommendations, then investigate if these effects are only due to the semantic content encoded in the network. We use recently-proposed diversity measures which are based on the network structure…
▽ More
The Heterogeneous Information Network (HIN) formalism is very flexible and enables complex recommendations models. We evaluate the effect of different parts of a HIN on the accuracy and the diversity of recommendations, then investigate if these effects are only due to the semantic content encoded in the network. We use recently-proposed diversity measures which are based on the network structure and better suited to the HIN formalism. Finally, we randomly shuffle the edges of some parts of the HIN, to empty the network from its semantic content, while leaving its structure relatively unaffected. We show that the semantic content encoded in the network data has a limited importance for the performance of a recommender system and that structure is crucial.
△ Less
Submitted 10 November, 2020; v1 submitted 7 November, 2020;
originally announced November 2020.
-
Measuring Diversity in Heterogeneous Information Networks
Authors:
Pedro Ramaciotti Morales,
Robin Lamarche-Perrin,
Raphael Fournier-S'niehotta,
Remy Poulain,
Lionel Tabourier,
Fabien Tarissan
Abstract:
Diversity is a concept relevant to numerous domains of research varying from ecology, to information theory, and to economics, to cite a few. It is a notion that is steadily gaining attention in the information retrieval, network analysis, and artificial neural networks communities. While the use of diversity measures in network-structured data counts a growing number of applications, no clear and…
▽ More
Diversity is a concept relevant to numerous domains of research varying from ecology, to information theory, and to economics, to cite a few. It is a notion that is steadily gaining attention in the information retrieval, network analysis, and artificial neural networks communities. While the use of diversity measures in network-structured data counts a growing number of applications, no clear and comprehensive description is available for the different ways in which diversities can be measured. In this article, we develop a formal framework for the application of a large family of diversity measures to heterogeneous information networks (HINs), a flexible, widely-used network data formalism. This extends the application of diversity measures, from systems of classifications and apportionments, to more complex relations that can be better modeled by networks. In doing so, we not only provide an effective organization of multiple practices from different domains, but also unearth new observables in systems modeled by heterogeneous information networks. We illustrate the pertinence of our approach by develo** different applications related to various domains concerned by both diversity and networks. In particular, we illustrate the usefulness of these new proposed observables in the domains of recommender systems and social media studies, among other fields.
△ Less
Submitted 16 December, 2020; v1 submitted 5 January, 2020;
originally announced January 2020.
-
Predicting interactions between individuals with structural and dynamical information
Authors:
Thibaud Arnoux,
Lionel Tabourier,
Matthieu Latapy
Abstract:
Capturing both the structural and temporal aspects of interactions is crucial for many real world datasets like contact between individuals. Using the link stream formalism to capture the dynamic of the systems, we tackle the issue of activity prediction in link streams, that is to say predicting the number of links occurring during a given period of time and we present a protocol that takes advan…
▽ More
Capturing both the structural and temporal aspects of interactions is crucial for many real world datasets like contact between individuals. Using the link stream formalism to capture the dynamic of the systems, we tackle the issue of activity prediction in link streams, that is to say predicting the number of links occurring during a given period of time and we present a protocol that takes advantage of the temporal and structural information contained in the link stream. Using a supervised learning method, we are able to model the dynamic of our system to improve the prediction. We investigate the behavior of our algorithm and crucial elements affecting the prediction. By introducing different categories of pair of nodes, we are able to improve the quality as well as increase the diversity of our prediction.
△ Less
Submitted 12 April, 2018; v1 submitted 27 March, 2018;
originally announced April 2018.
-
Predicting links in ego-networks using temporal information
Authors:
Lionel Tabourier,
Anne-Sophie Libert,
Renaud Lambiotte
Abstract:
Link prediction appears as a central problem of network science, as it calls for unfolding the mechanisms that govern the micro-dynamics of the network. In this work, we are interested in ego-networks, that is the mere information of interactions of a node to its neighbors, in the context of social relationships. As the structural information is very poor, we rely on another source of information…
▽ More
Link prediction appears as a central problem of network science, as it calls for unfolding the mechanisms that govern the micro-dynamics of the network. In this work, we are interested in ego-networks, that is the mere information of interactions of a node to its neighbors, in the context of social relationships. As the structural information is very poor, we rely on another source of information to predict links among egos' neighbors: the timing of interactions. We define several features to capture different kinds of temporal information and apply machine learning methods to combine these various features and improve the quality of the prediction. We demonstrate the efficiency of this temporal approach on a cellphone interaction dataset, pointing out features which prove themselves to perform well in this context, in particular the temporal profile of interactions and elapsed time between contacts.
△ Less
Submitted 15 December, 2015;
originally announced December 2015.
-
RankMerging: A supervised learning-to-rank framework to predict links in large social network
Authors:
Lionel Tabourier,
Daniel Faria Bernardes,
Anne-Sophie Libert,
Renaud Lambiotte
Abstract:
Uncovering unknown or missing links in social networks is a difficult task because of their sparsity and because links may represent different types of relationships, characterized by different structural patterns. In this paper, we define a simple yet efficient supervised learning-to-rank framework, called RankMerging, which aims at combining information provided by various unsupervised rankings.…
▽ More
Uncovering unknown or missing links in social networks is a difficult task because of their sparsity and because links may represent different types of relationships, characterized by different structural patterns. In this paper, we define a simple yet efficient supervised learning-to-rank framework, called RankMerging, which aims at combining information provided by various unsupervised rankings. We illustrate our method on three different kinds of social networks and show that it substantially improves the performances of unsupervised metrics of ranking. We also compare it to other combination strategies based on standard methods. Finally, we explore various aspects of RankMerging, such as feature selection and parameter estimation and discuss its area of relevance: the prediction of an adjustable number of links on large networks.
△ Less
Submitted 11 April, 2019; v1 submitted 9 July, 2014;
originally announced July 2014.
-
A data-driven analysis to question epidemic models for citation cascades on the blogosphere
Authors:
Abdelhamid Salah Brahim,
Lionel Tabourier,
Bénédicte Le Grand
Abstract:
Citation cascades in blog networks are often considered as traces of information spreading on this social medium. In this work, we question this point of view using both a structural and semantic analysis of five months activity of the most representative blogs of the french-speaking community.Statistical measures reveal that our dataset shares many features with those that can be found in the lit…
▽ More
Citation cascades in blog networks are often considered as traces of information spreading on this social medium. In this work, we question this point of view using both a structural and semantic analysis of five months activity of the most representative blogs of the french-speaking community.Statistical measures reveal that our dataset shares many features with those that can be found in the literature, suggesting the existence of an identical underlying process. However, a closer analysis of the post content indicates that the popular epidemic-like descriptions of cascades are misleading in this context.A basic model, taking only into account the behavior of bloggers and their restricted social network, accounts for several important statistical features of the data.These arguments support the idea that citations primary goal may not be information spreading on the blogosphere.
△ Less
Submitted 3 June, 2013;
originally announced June 2013.
-
Burstiness and spreading on temporal networks
Authors:
Renaud Lambiotte,
Lionel Tabourier,
Jean-Charles Delvenne
Abstract:
We discuss how spreading processes on temporal networks are impacted by the shape of their inter-event time distributions. Through simple mathematical arguments and toy examples, we find that the key factor is the ordering in which events take place, a property that tends to be affected by the bulk of the distributions and not only by their tail, as usually considered in the literature. We show th…
▽ More
We discuss how spreading processes on temporal networks are impacted by the shape of their inter-event time distributions. Through simple mathematical arguments and toy examples, we find that the key factor is the ordering in which events take place, a property that tends to be affected by the bulk of the distributions and not only by their tail, as usually considered in the literature. We show that a detailed modeling of the temporal patterns observed in complex networks can change dramatically the properties of a spreading process, such as the ergodicity of a random walk process or the persistence of an epidemic.
△ Less
Submitted 2 May, 2013;
originally announced May 2013.
-
Directedness of information flow in mobile phone communication networks
Authors:
Fernando Peruani,
Lionel Tabourier
Abstract:
Without having direct access to the information that is being exchanged, traces of information flow can be obtained by looking at temporal sequences of user interactions. These sequences can be represented as causality trees whose statistics result from a complex interplay between the topology of the underlying (social) network and the time correlations among the communications. Here, we study cau…
▽ More
Without having direct access to the information that is being exchanged, traces of information flow can be obtained by looking at temporal sequences of user interactions. These sequences can be represented as causality trees whose statistics result from a complex interplay between the topology of the underlying (social) network and the time correlations among the communications. Here, we study causality trees in mobile-phone data, which can be represented as a dynamical directed network. This representation of the data reveals the existence of super-spreaders and super-receivers. We show that the tree statistics, respectively the information spreading process, are extremely sensitive to the in-out degree correlation exhibited by the users. We also learn that a given information, e.g., a rumor, would require users to retransmit it for more than 30 hours in order to cover a macroscopic fraction of the system. Our analysis indicates that topological node-node correlations of the underlying social network, while allowing the existence of information loops, they also promote information spreading. Temporal correlations, and therefore causality effects, are only visible as local phenomena and during short time scales. These results are obtained through a combination of theory and data analysis techniques.
△ Less
Submitted 1 February, 2013;
originally announced February 2013.
-
Intrinsically Dynamic Network Communities
Authors:
Bivas Mitra,
Lionel Tabourier,
Camille Roth
Abstract:
Community finding algorithms for networks have recently been extended to dynamic data. Most of these recent methods aim at exhibiting community partitions from successive graph snapshots and thereafter connecting or smoothing these partitions using clever time-dependent features and sampling techniques. These approaches are nonetheless achieving longitudinal rather than dynamic community detection…
▽ More
Community finding algorithms for networks have recently been extended to dynamic data. Most of these recent methods aim at exhibiting community partitions from successive graph snapshots and thereafter connecting or smoothing these partitions using clever time-dependent features and sampling techniques. These approaches are nonetheless achieving longitudinal rather than dynamic community detection. We assume that communities are fundamentally defined by the repetition of interactions among a set of nodes over time. According to this definition, analyzing the data by considering successive snapshots induces a significant loss of information: we suggest that it blurs essentially dynamic phenomena - such as communities based on repeated inter-temporal interactions, nodes switching from a community to another across time, or the possibility that a community survives while its members are being integrally replaced over a longer time period. We propose a formalism which aims at tackling this issue in the context of time-directed datasets (such as citation networks), and present several illustrations on both empirical and synthetic dynamic networks. We eventually introduce intrinsically dynamic metrics to qualify temporal community structure and emphasize their possible role as an estimator of the quality of the community detection - taking into account the fact that various empirical contexts may call for distinct `community' definitions and detection criteria.
△ Less
Submitted 8 November, 2011;
originally announced November 2011.
-
Internal links and pairs as a new tool for the analysis of bipartite complex networks
Authors:
Oussama Allali,
Lionel Tabourier,
Clémence Magnien,
Matthieu Latapy
Abstract:
Many real-world complex networks are best modeled as bipartite (or 2-mode) graphs, where nodes are divided into two sets with links connecting one side to the other. However, there is currently a lack of methods to analyze properly such graphs as most existing measures and methods are suited to classical graphs. A usual but limited approach consists in deriving 1-mode graphs (called projections) f…
▽ More
Many real-world complex networks are best modeled as bipartite (or 2-mode) graphs, where nodes are divided into two sets with links connecting one side to the other. However, there is currently a lack of methods to analyze properly such graphs as most existing measures and methods are suited to classical graphs. A usual but limited approach consists in deriving 1-mode graphs (called projections) from the underlying bipartite structure, though it causes important loss of information and data storage issues. We introduce here internal links and pairs as a new notion useful for such analysis: it gives insights on the information lost by projecting the bipartite graph. We illustrate the relevance of theses concepts on several real-world instances illustrating how it enables to discriminate behaviors among various cases when we compare them to a benchmark of random networks. Then, we show that we can draw benefit from this concept for both modeling complex networks and storing them in a compact format.
△ Less
Submitted 27 October, 2011; v1 submitted 22 April, 2011;
originally announced April 2011.
-
Generating constrained random graphs using multiple edge switches
Authors:
Lionel Tabourier,
Camille Roth,
Jean-Philippe Cointet
Abstract:
The generation of random graphs using edge swaps provides a reliable method to draw uniformly random samples of sets of graphs respecting some simple constraints, e.g. degree distributions. However, in general, it is not necessarily possible to access all graphs obeying some given con- straints through a classical switching procedure calling on pairs of edges. We therefore propose to get round thi…
▽ More
The generation of random graphs using edge swaps provides a reliable method to draw uniformly random samples of sets of graphs respecting some simple constraints, e.g. degree distributions. However, in general, it is not necessarily possible to access all graphs obeying some given con- straints through a classical switching procedure calling on pairs of edges. We therefore propose to get round this issue by generalizing this classical approach through the use of higher-order edge switches. This method, which we denote by "k-edge switching", makes it possible to progres- sively improve the covered portion of a set of constrained graphs, thereby providing an increasing, asymptotically certain confidence on the statistical representativeness of the obtained sample.
△ Less
Submitted 3 February, 2012; v1 submitted 14 December, 2010;
originally announced December 2010.