-
Measuring Diversity in Heterogeneous Information Networks
Authors:
Pedro Ramaciotti Morales,
Robin Lamarche-Perrin,
Raphael Fournier-S'niehotta,
Remy Poulain,
Lionel Tabourier,
Fabien Tarissan
Abstract:
Diversity is a concept relevant to numerous domains of research varying from ecology, to information theory, and to economics, to cite a few. It is a notion that is steadily gaining attention in the information retrieval, network analysis, and artificial neural networks communities. While the use of diversity measures in network-structured data counts a growing number of applications, no clear and…
▽ More
Diversity is a concept relevant to numerous domains of research varying from ecology, to information theory, and to economics, to cite a few. It is a notion that is steadily gaining attention in the information retrieval, network analysis, and artificial neural networks communities. While the use of diversity measures in network-structured data counts a growing number of applications, no clear and comprehensive description is available for the different ways in which diversities can be measured. In this article, we develop a formal framework for the application of a large family of diversity measures to heterogeneous information networks (HINs), a flexible, widely-used network data formalism. This extends the application of diversity measures, from systems of classifications and apportionments, to more complex relations that can be better modeled by networks. In doing so, we not only provide an effective organization of multiple practices from different domains, but also unearth new observables in systems modeled by heterogeneous information networks. We illustrate the pertinence of our approach by develo** different applications related to various domains concerned by both diversity and networks. In particular, we illustrate the usefulness of these new proposed observables in the domains of recommender systems and social media studies, among other fields.
△ Less
Submitted 16 December, 2020; v1 submitted 5 January, 2020;
originally announced January 2020.
-
Link weights recovery in heterogeneous information networks
Authors:
Hong-Lan Botterman,
Robin Lamarche-Perrin
Abstract:
Socio-technical systems usually consists of many intertwined networks, each connecting different types of objects (or actors) through a variety of means. As these networks are co-dependent, one can take advantage of this entangled structure to study interaction patterns in a particular network from the information provided by other related networks. A method is hence proposed and tested to recover…
▽ More
Socio-technical systems usually consists of many intertwined networks, each connecting different types of objects (or actors) through a variety of means. As these networks are co-dependent, one can take advantage of this entangled structure to study interaction patterns in a particular network from the information provided by other related networks. A method is hence proposed and tested to recover the weights of missing or unobserved links in heterogeneous information networks (HIN) - abstract representations of systems composed of multiple types of entities and their relations. Given a pair of nodes in a HIN, this work aims at recovering the exact weight of the incident link to these two nodes, knowing some other links present in the HIN. To do so, probability distributions resulting from path-constrained random walks i.e., random walks where the walker is forced to follow only a specific sequence of node types and edge types, capable to capture specific semantics and commonly called a meta-path, are combined in a linearly fashion in order to approximate the desired result. This method is general enough to compute the link weight between any types of nodes. Experiments on Twitter and bibliographic data show the applicability of the method.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.
-
Multidimensional Outlier Detection in Temporal Interaction Networks: An Application to Political Communication on Twitter
Authors:
Audrey Wilmet,
Robin Lamarche-Perrin
Abstract:
In social network Twitter, users can interact with each other and spread information via retweets. These millions of interactions may result in media events whose influence goes beyond Twitter framework. In this paper, we thoroughly explore interactions to provide a better understanding of the emergence of certain trends. First, we consider an interaction on Twitter to be a triplet $(s,a,t)$ meani…
▽ More
In social network Twitter, users can interact with each other and spread information via retweets. These millions of interactions may result in media events whose influence goes beyond Twitter framework. In this paper, we thoroughly explore interactions to provide a better understanding of the emergence of certain trends. First, we consider an interaction on Twitter to be a triplet $(s,a,t)$ meaning that user $s$, called the spreader, has retweeted a tweet of user $a$, called the author, at time $t$. We model this set of interactions as a data cube with three dimensions: spreaders, authors and time. Then, we provide a method which builds different contexts, where a context is a set of features characterizing the circumstances of an event. Finally, these contexts allow us to find relevant unexpected behaviors, according to several dimensions and various perspectives: a user during a given hour which is abnormal compared to its usual behavior, a relationship between two users which is abnormal compared to all other relationships, \textit{etc.} We apply our method to a set of retweets related to the 2017 French presidential election and show that one can build interesting insights regarding political organization on Twitter.
△ Less
Submitted 6 June, 2019;
originally announced June 2019.
-
Degree-based Outlier Detection within IP Traffic Modelled as a Link Stream
Authors:
Audrey Wilmet,
Tiphaine Viard,
Matthieu Latapy,
Robin Lamarche-Perrin
Abstract:
This paper aims at precisely detecting and identifying anomalous events in IP traffic. To this end, we adopt the link stream formalism which properly captures temporal and structural features of the data. Within this framework, we focus on finding anomalous behaviours with respect to the degree of IP addresses over time. Due to diversity in IP profiles, this feature is typically distributed hetero…
▽ More
This paper aims at precisely detecting and identifying anomalous events in IP traffic. To this end, we adopt the link stream formalism which properly captures temporal and structural features of the data. Within this framework, we focus on finding anomalous behaviours with respect to the degree of IP addresses over time. Due to diversity in IP profiles, this feature is typically distributed heterogeneously, preventing us to directly find anomalies. To deal with this challenge, we design a method to detect outliers as well as precisely identify their cause in a sequence of similar heterogeneous distributions. We apply it to several MAWI captures of IP traffic and we show that it succeeds in detecting relevant patterns in terms of anomalous network activity.
△ Less
Submitted 6 June, 2019;
originally announced June 2019.
-
An Information-theoretic Framework for the Lossy Compression of Link Streams
Authors:
Robin Lamarche-Perrin
Abstract:
Graph compression is a data analysis technique that consists in the replacement of parts of a graph by more general structural patterns in order to reduce its description length. It notably provides interesting exploration tools for the study of real, large-scale, and complex graphs which cannot be grasped at first glance. This article proposes a framework for the compression of temporal graphs, t…
▽ More
Graph compression is a data analysis technique that consists in the replacement of parts of a graph by more general structural patterns in order to reduce its description length. It notably provides interesting exploration tools for the study of real, large-scale, and complex graphs which cannot be grasped at first glance. This article proposes a framework for the compression of temporal graphs, that is for the compression of graphs that evolve with time. This framework first builds on a simple and limited scheme, exploiting structural equivalence for the lossless compression of static graphs, then generalises it to the lossy compression of link streams, a recent formalism for the study of temporal graphs. Such generalisation relies on the natural extension of (bidimensional) relational data by the addition of a third temporal dimension. Moreover, we introduce an information-theoretic measure to quantify and to control the information that is lost during compression, as well as an algebraic characterisation of the space of possible compression patterns to enhance the expressiveness of the initial compression scheme. These contributions lead to the definition of a combinatorial optimisation problem, that is the Lossy Multistream Compression Problem, for which we provide an exact algorithm.
△ Less
Submitted 18 July, 2018;
originally announced July 2018.