Search | arXiv e-print repository

Multilayer Correlation Clustering

Authors: Atsushi Miyauchi, Florian Adriaens, Francesco Bonchi, Nikolaj Tatti

Abstract: In this paper, we establish Multilayer Correlation Clustering, a novel generalization of Correlation Clustering (Bansal et al., FOCS '02) to the multilayer setting. In this model, we are given a series of inputs of Correlation Clustering (called layers) over the common set $V$. The goal is then to find a clustering of $V$ that minimizes the $\ell_p$-norm ($p\geq 1$) of the disagreements vector, wh… ▽ More In this paper, we establish Multilayer Correlation Clustering, a novel generalization of Correlation Clustering (Bansal et al., FOCS '02) to the multilayer setting. In this model, we are given a series of inputs of Correlation Clustering (called layers) over the common set $V$. The goal is then to find a clustering of $V$ that minimizes the $\ell_p$-norm ($p\geq 1$) of the disagreements vector, which is defined as the vector (with dimension equal to the number of layers), each element of which represents the disagreements of the clustering on the corresponding layer. For this generalization, we first design an $O(L\log n)$-approximation algorithm, where $L$ is the number of layers, based on the well-known region growing technique. We then study an important special case of our problem, namely the problem with the probability constraint. For this case, we first give an $(α+2)$-approximation algorithm, where $α$ is any possible approximation ratio for the single-layer counterpart. For instance, we can take $α=2.5$ in general (Ailon et al., JACM '08) and $α=1.73+ε$ for the unweighted case (Cohen-Addad et al., FOCS '23). Furthermore, we design a $4$-approximation algorithm, which improves the above approximation ratio of $α+2=4.5$ for the general probability-constraint case. Computational experiments using real-world datasets demonstrate the effectiveness of our proposed algorithms. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2306.03571 [pdf, other]

Minimizing Hitting Time between Disparate Groups with Shortcut Edges

Authors: Florian Adriaens, Honglian Wang, Aristides Gionis

Abstract: Structural bias or segregation of networks refers to situations where two or more disparate groups are present in the network, so that the groups are highly connected internally, but loosely connected to each other. In many cases it is of interest to increase the connectivity of disparate groups so as to, e.g., minimize social friction, or expose individuals to diverse viewpoints. A commonly-used… ▽ More Structural bias or segregation of networks refers to situations where two or more disparate groups are present in the network, so that the groups are highly connected internally, but loosely connected to each other. In many cases it is of interest to increase the connectivity of disparate groups so as to, e.g., minimize social friction, or expose individuals to diverse viewpoints. A commonly-used mechanism for increasing the network connectivity is to add edge shortcuts between pairs of nodes. In many applications of interest, edge shortcuts typically translate to recommendations, e.g., what video to watch, or what news article to read next. The problem of reducing structural bias or segregation via edge shortcuts has recently been studied in the literature, and random walks have been an essential tool for modeling navigation and connectivity in the underlying networks. Existing methods, however, either do not offer approximation guarantees, or engineer the objective so that it satisfies certain desirable properties that simplify the optimization~task. In this paper we address the problem of adding a given number of shortcut edges in the network so as to directly minimize the average hitting time and the maximum hitting time between two disparate groups. Our algorithm for minimizing average hitting time is a greedy bicriteria that relies on supermodularity. In contrast, maximum hitting time is not supermodular. Despite, we develop an approximation algorithm for that objective as well, by leveraging connections with average hitting time and the asymmetric k-center problem. △ Less

Submitted 16 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: To appear in KDD 2023

arXiv:2209.00370 [pdf, other]

doi 10.1109/ICDM54844.2022.00095

Diameter Minimization by Shortcutting with Degree Constraints

Authors: Florian Adriaens, Aristides Gionis

Abstract: We consider the problem of adding a fixed number of new edges to an undirected graph in order to minimize the diameter of the augmented graph, and under the constraint that the number of edges added for each vertex is bounded by an integer. The problem is motivated by network-design applications, where we want to minimize the worst case communication in the network without excessively increasing t… ▽ More We consider the problem of adding a fixed number of new edges to an undirected graph in order to minimize the diameter of the augmented graph, and under the constraint that the number of edges added for each vertex is bounded by an integer. The problem is motivated by network-design applications, where we want to minimize the worst case communication in the network without excessively increasing the degree of any single vertex, so as to avoid additional overload. We present three algorithms for this task, each with their own merits. The special case of a matching augmentation, when every vertex can be incident to at most one new edge, is of particular interest, for which we show an inapproximability result, and provide bounds on the smallest achievable diameter when these edges are added to a path. Finally, we empirically evaluate and compare our algorithms on several real-life networks of varying types. △ Less

Submitted 2 September, 2022; v1 submitted 1 September, 2022; originally announced September 2022.

Comments: A shorter version of this work has been accepted at the IEEE ICDM 2022 conference

arXiv:2202.07992 [pdf, other]

Improved analysis of randomized SVD for top-eigenvector approximation

Authors: Ruo-Chun Tzeng, Po-An Wang, Florian Adriaens, Aristides Gionis, Chi-Jen Lu

Abstract: Computing the top eigenvectors of a matrix is a problem of fundamental interest to various fields. While the majority of the literature has focused on analyzing the reconstruction error of low-rank matrices associated with the retrieved eigenvectors, in many applications one is interested in finding one vector with high Rayleigh quotient. In this paper we study the problem of approximating the top… ▽ More Computing the top eigenvectors of a matrix is a problem of fundamental interest to various fields. While the majority of the literature has focused on analyzing the reconstruction error of low-rank matrices associated with the retrieved eigenvectors, in many applications one is interested in finding one vector with high Rayleigh quotient. In this paper we study the problem of approximating the top-eigenvector. Given a symmetric matrix $\mathbf{A}$ with largest eigenvalue $λ_1$, our goal is to find a vector \hu that approximates the leading eigenvector $\mathbf{u}_1$ with high accuracy, as measured by the ratio $R(\hat{\mathbf{u}})=λ_1^{-1}{\hat{\mathbf{u}}^T\mathbf{A}\hat{\mathbf{u}}}/{\hat{\mathbf{u}}^T\hat{\mathbf{u}}}$. We present a novel analysis of the randomized SVD algorithm of \citet{halko2011finding} and derive tight bounds in many cases of interest. Notably, this is the first work that provides non-trivial bounds of $R(\hat{\mathbf{u}})$ for randomized SVD with any number of iterations. Our theoretical analysis is complemented with a thorough experimental study that confirms the efficiency and accuracy of the method. △ Less

Submitted 16 February, 2022; originally announced February 2022.

Comments: Accepted to International Conference on Artificial Intelligence and Statistics (AISTATS) 2022

ACM Class: G.1.3

arXiv:2102.07587 [pdf, other]

Testing properties of signed graphs

Authors: Florian Adriaens, Simon Apers

Abstract: In graph property testing the task is to distinguish whether a graph satisfies a given property or is "far" from having that property, preferably with a sublinear query and time complexity. In this work we initiate the study of property testing in signed graphs, where every edge has either a positive or a negative sign. We show that there exist sublinear algorithms for testing three key properties… ▽ More In graph property testing the task is to distinguish whether a graph satisfies a given property or is "far" from having that property, preferably with a sublinear query and time complexity. In this work we initiate the study of property testing in signed graphs, where every edge has either a positive or a negative sign. We show that there exist sublinear algorithms for testing three key properties of signed graphs: balance (or 2-clusterability), clusterability and signed triangle freeness. We consider both the dense graph model, where we can query the (signed) adjacency matrix of a signed graph, and the bounded-degree model, where we can query for the neighbors of a node and the sign of the connecting edge. Our algorithms use a variety of tools from graph property testing, as well as reductions from one setting to the other. Our main technical contribution is a sublinear algorithm for testing clusterability in the bounded-degree model. This contrasts with the property of k-clusterability which is not testable with a sublinear number of queries. The tester builds on the seminal work of Goldreich and Ron for testing bipartiteness. △ Less

Submitted 15 February, 2021; originally announced February 2021.

Comments: 21 pages

arXiv:2002.07076 [pdf, other]

doi 10.1109/DSAA49011.2020.00019

Block-Approximated Exponential Random Graphs

Authors: Florian Adriaens, Alexandru Mara, Jefrey Lijffijt, Tijl De Bie

Abstract: An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs. By utilizing fast matrix block-approximation techniques, we propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions, while being able to meaningfully model both local information of the graph (e.g.,… ▽ More An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs. By utilizing fast matrix block-approximation techniques, we propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions, while being able to meaningfully model both local information of the graph (e.g., degrees) as well as global information (e.g., clustering coefficient, assortativity, etc.) if desired. This allows one to efficiently generate random networks with similar properties as an observed network, and the models can be used for several downstream tasks such as link prediction. Our methods are scalable to sparse graphs consisting of millions of nodes. Empirical evaluation demonstrates competitiveness in terms of both speed and accuracy with state-of-the-art methods -- which are typically based on embedding the graph into some low-dimensional space -- for link prediction, showcasing the potential of a more direct and interpretable probabalistic model for this task. △ Less

Submitted 26 August, 2020; v1 submitted 14 February, 2020; originally announced February 2020.

Comments: Accepted for DSAA 2020 conference

arXiv:1909.01060 [pdf, other]

doi 10.1145/3357384.3357970

Discovering Interesting Cycles in Directed Graphs

Authors: Florian Adriaens, Cigdem Aslay, Tijl De Bie, Aristides Gionis, Jefrey Lijffijt

Abstract: Cycles in graphs often signify interesting processes. For example, cyclic trading patterns can indicate inefficiencies or economic dependencies in trade networks, cycles in food webs can identify fragile dependencies in ecosystems, and cycles in financial transaction networks can be an indication of money laundering. Identifying such interesting cycles, which can also be constrained to contain a g… ▽ More Cycles in graphs often signify interesting processes. For example, cyclic trading patterns can indicate inefficiencies or economic dependencies in trade networks, cycles in food webs can identify fragile dependencies in ecosystems, and cycles in financial transaction networks can be an indication of money laundering. Identifying such interesting cycles, which can also be constrained to contain a given set of query nodes, although not extensively studied, is thus a problem of considerable importance. In this paper, we introduce the problem of discovering interesting cycles in graphs. We first address the problem of quantifying the extent to which a given cycle is interesting for a particular analyst. We then show that finding cycles according to this interestingness measure is related to the longest cycle and maximum mean-weight cycle problems (in the unconstrained setting) and to the maximum Steiner cycle and maximum mean Steiner cycle problems (in the constrained setting). A complexity analysis shows that finding interesting cycles is NP-hard, and is NP-hard to approximate within a constant factor in the unconstrained setting, and within a factor polynomial in the input size for the constrained setting. The latter inapproximability result implies a similar result for the maximum Steiner cycle and maximum mean Steiner cycle problems. Motivated by these hardness results, we propose a number of efficient heuristic algorithms. We verify the effectiveness of the proposed methods and demonstrate their practical utility on two real-world use cases: a food web and an international trade-network dataset. △ Less

Submitted 3 September, 2019; originally announced September 2019.

Comments: Accepted for CIKM'19

arXiv:1802.03549 [pdf, other]

doi 10.1007/s10618-020-00673-0

From acquaintance to best friend forever: robust and fine-grained inference of social tie strengths

Authors: Florian Adriaens, Tijl De Bie, Aristides Gionis, Jefrey Lijffijt, Polina Rozenshtein

Abstract: Social networks often provide only a binary perspective on social ties: two individuals are either connected or not. While sometimes external information can be used to infer the strength of social ties, access to such information may be restricted or impractical. Sintos and Tsaparas (KDD 2014) first suggested to infer the strength of social ties from the topology of the network alone, by leveragi… ▽ More Social networks often provide only a binary perspective on social ties: two individuals are either connected or not. While sometimes external information can be used to infer the strength of social ties, access to such information may be restricted or impractical. Sintos and Tsaparas (KDD 2014) first suggested to infer the strength of social ties from the topology of the network alone, by leveraging the Strong Triadic Closure (STC) property. The STC property states that if person A has strong social ties with persons B and C, B and C must be connected to each other as well (whether with a weak or strong tie). Sintos and Tsaparas exploited this to formulate the inference of the strength of social ties as NP-hard optimization problem, and proposed two approximation algorithms. We refine and improve upon this landmark paper, by develo** a sequence of linear relaxations of this problem that can be solved exactly in polynomial time. Usefully, these relaxations infer more fine-grained levels of tie strength (beyond strong and weak), which also allows to avoid making arbitrary strong/weak strength assignments when the network topology provides inconclusive evidence. One of the relaxations simultaneously infers the presence of a limited number of STC violations. An extensive theoretical analysis leads to two efficient algorithmic approaches. Finally, our experimental results elucidate the strengths of the proposed approach, and sheds new light on the validity of the STC property in practice. △ Less

Submitted 18 September, 2018; v1 submitted 10 February, 2018; originally announced February 2018.

Journal ref: Data Min. Knowl. Discov. 34(3): 611-651 (2020)

Showing 1–8 of 8 results for author: Adriaens, F