Search | arXiv e-print repository

TGB 2.0: A Benchmark for Learning on Temporal Knowledge Graphs and Heterogeneous Graphs

Authors: Julia Gastinger, Shenyang Huang, Mikhail Galkin, Erfan Loghmani, Ali Parviz, Farimah Poursafaei, Jacob Danovitch, Emanuele Rossi, Ioannis Koutis, Heiner Stuckenschmidt, Reihaneh Rabbany, Guillaume Rabusseau

Abstract: Multi-relational temporal graphs are powerful tools for modeling real-world data, capturing the evolving and interconnected nature of entities over time. Recently, many novel models are proposed for ML on such graphs intensifying the need for robust evaluation and standardized benchmark datasets. However, the availability of such resources remains scarce and evaluation faces added complexity due t… ▽ More Multi-relational temporal graphs are powerful tools for modeling real-world data, capturing the evolving and interconnected nature of entities over time. Recently, many novel models are proposed for ML on such graphs intensifying the need for robust evaluation and standardized benchmark datasets. However, the availability of such resources remains scarce and evaluation faces added complexity due to reproducibility issues in experimental protocols. To address these challenges, we introduce Temporal Graph Benchmark 2.0 (TGB 2.0), a novel benchmarking framework tailored for evaluating methods for predicting future links on Temporal Knowledge Graphs and Temporal Heterogeneous Graphs with a focus on large-scale datasets, extending the Temporal Graph Benchmark. TGB 2.0 facilitates comprehensive evaluations by presenting eight novel datasets spanning five domains with up to 53 million edges. TGB 2.0 datasets are significantly larger than existing datasets in terms of number of nodes, edges, or timestamps. In addition, TGB 2.0 provides a reproducible and realistic evaluation pipeline for multi-relational temporal graphs. Through extensive experimentation, we observe that 1) leveraging edge-type information is crucial to obtain high performance, 2) simple heuristic baselines are often competitive with more complex methods, 3) most methods fail to run on our largest datasets, highlighting the need for research on more scalable methods. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 27 pages, 8 figures

arXiv:2312.00660 [pdf, other]

Resource-constrained knowledge diffusion processes inspired by human peer learning

Authors: Ehsan Beikihassan, Amy K. Hoover, Ioannis Koutis, Ali Parviz, Niloofar Aghaieabiane

Abstract: We consider a setting where a population of artificial learners is given, and the objective is to optimize aggregate measures of performance, under constraints on training resources. The problem is motivated by the study of peer learning in human educational systems. In this context, we study natural knowledge diffusion processes in networks of interacting artificial learners. By `natural', we mea… ▽ More We consider a setting where a population of artificial learners is given, and the objective is to optimize aggregate measures of performance, under constraints on training resources. The problem is motivated by the study of peer learning in human educational systems. In this context, we study natural knowledge diffusion processes in networks of interacting artificial learners. By `natural', we mean processes that reflect human peer learning where the students' internal state and learning process is mostly opaque, and the main degree of freedom lies in the formation of peer learning groups by a coordinator who can potentially evaluate the learners before assigning them to peer groups. Among else, we empirically show that such processes indeed make effective use of the training resources, and enable the design of modular neural models that have the capacity to generalize without being prone to overfitting noisy labels. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2310.04292 [pdf, other]

Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets

Authors: Dominique Beaini, Shenyang Huang, Joao Alex Cunha, Zhiyi Li, Gabriela Moisescu-Pareja, Oleksandr Dymov, Samuel Maddrell-Mander, Callum McLean, Frederik Wenkel, Luis Müller, Jama Hussein Mohamud, Ali Parviz, Michael Craig, Michał Koziarski, Jiarui Lu, Zhaocheng Zhu, Cristian Gabellini, Kerstin Klaser, Josef Dean, Cas Wognum, Maciej Sypetkowski, Guillaume Rabusseau, Reihaneh Rabbany, Jian Tang, Christopher Morris , et al. (10 additional authors not shown)

Abstract: Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by… ▽ More Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks. △ Less

Submitted 18 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2309.15645 [pdf, other]

Sidestep** Barriers for Dominating Set in Parameterized Complexity

Authors: Ioannis Koutis, Michał Włodarczyk, Meirav Zehavi

Abstract: We study the classic {\sc Dominating Set} problem with respect to several prominent parameters. Specifically, we present algorithmic results that sidestep time complexity barriers by the incorporation of either approximation or larger parameterization. Our results span several parameterization regimes, including: (i,ii,iii) time/ratio-tradeoff for the parameters {\em treewidth}, {\em vertex modula… ▽ More We study the classic {\sc Dominating Set} problem with respect to several prominent parameters. Specifically, we present algorithmic results that sidestep time complexity barriers by the incorporation of either approximation or larger parameterization. Our results span several parameterization regimes, including: (i,ii,iii) time/ratio-tradeoff for the parameters {\em treewidth}, {\em vertex modulator to constant treewidth} and {\em solution size}; (iv,v) FPT-algorithms for the parameters {\em vertex cover number} and {\em feedback edge set number}; and (vi) compression for the parameter {\em feedback edge set number}. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: Accepted to IPEC'23

arXiv:2307.08982 [pdf, other]

Neural Network Pruning as Spectrum Preserving Process

Authors: Shibo Yao, Dantong Yu, Ioannis Koutis

Abstract: Neural networks have achieved remarkable performance in various application domains. Nevertheless, a large number of weights in pre-trained deep neural networks prohibit them from being deployed on smartphones and embedded systems. It is highly desirable to obtain lightweight versions of neural networks for inference in edge devices. Many cost-effective approaches were proposed to prune dense and… ▽ More Neural networks have achieved remarkable performance in various application domains. Nevertheless, a large number of weights in pre-trained deep neural networks prohibit them from being deployed on smartphones and embedded systems. It is highly desirable to obtain lightweight versions of neural networks for inference in edge devices. Many cost-effective approaches were proposed to prune dense and convolutional layers that are common in deep neural networks and dominant in the parameter space. However, a unified theoretical foundation for the problem mostly is missing. In this paper, we identify the close connection between matrix spectrum learning and neural network training for dense and convolutional layers and argue that weight pruning is essentially a matrix sparsification process to preserve the spectrum. Based on the analysis, we also propose a matrix sparsification algorithm tailored for neural network pruning that yields better pruning result. We carefully design and conduct experiments to support our arguments. Hence we provide a consolidated viewpoint for neural network pruning and enhance the interpretability of deep neural networks by identifying and preserving the critical neural weights. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2304.03452

arXiv:2305.06167 [pdf, other]

K-SpecPart: Supervised embedding algorithms and cut overlay for improved hypergraph partitioning

Authors: Ismail Bustany, Andrew B. Kahng, Ioannis Koutis, Bodhisatta Pramanik, Zhiang Wang

Abstract: State-of-the-art hypergraph partitioners follow the multilevel paradigm that constructs multiple levels of progressively coarser hypergraphs that are used to drive cut refinement on each level of the hierarchy. Multilevel partitioners are subject to two limitations: (i) hypergraph coarsening processes rely on local neighborhood structure without fully considering the global structure of the hyperg… ▽ More State-of-the-art hypergraph partitioners follow the multilevel paradigm that constructs multiple levels of progressively coarser hypergraphs that are used to drive cut refinement on each level of the hierarchy. Multilevel partitioners are subject to two limitations: (i) hypergraph coarsening processes rely on local neighborhood structure without fully considering the global structure of the hypergraph; and (ii) refinement heuristics risk entrapment in local minima. In this paper, we describe K-SpecPart, a supervised spectral framework for multi-way partitioning that directly tackles these two limitations. K-SpecPart relies on the computation of generalized eigenvectors and supervised dimensionality reduction techniques to generate vertex embeddings. These are computational primitives that are fast and capture global structural properties of the hypergraph that are not explicitly considered by existing partitioners. K-SpecPart then converts the vertex embeddings into multiple partitioning solutions. K-SpecPart introduces the idea of ''ensembling'' multiple solutions via a cut-overlay clustering technique that often enables the use of computationally demanding partitioning methods such as ILP (integer linear programming). Using the output of a standard partitioner as a supervision hint, K-SpecPart effectively combines the strengths of established multilevel partitioning techniques with the benefits of spectral graph theory and other combinatorial algorithms. K-SpecPart significantly extends ideas and algorithms that first appeared in our previous work on the bipartitioner SpecPart. Our experiments demonstrate the effectiveness of K-SpecPart. For bipartitioning, K-SpecPart produces solutions with up to 15% cutsize improvement over SpecPart. For multi-way partitioning, K-SpecPart produces solutions with up to 20% cutsize improvement over leading partitioners hMETIS and KaHyPar. △ Less

Submitted 3 June, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

arXiv:2209.10545 [pdf, other]

SGC: A semi-supervised pipeline for gene clustering using self-training approach in gene co-expression networks

Authors: Niloofar Aghaieabiane, Ioannis Koutis

Abstract: A widely used approach for extracting information from gene expression data employ the construction of a gene co-expression network and the subsequent application of algorithms that discover network structure. In particular, a common goal is the computational discovery of gene clusters, commonly called modules. When applied on a novel gene expression dataset, the quality of the computed modules ca… ▽ More A widely used approach for extracting information from gene expression data employ the construction of a gene co-expression network and the subsequent application of algorithms that discover network structure. In particular, a common goal is the computational discovery of gene clusters, commonly called modules. When applied on a novel gene expression dataset, the quality of the computed modules can be evaluated automatically, using Gene Ontology enrichment, a method that measures the frequencies of Gene Ontology terms in the computed modules and evaluates their statistical likelihood. In this work we propose SGC a novel pipeline for gene clustering based on relatively recent seminal work in the mathematics of spectral network theory. SGC consists of multiple novel steps that enable the computation of highly enriched modules in an unsupervised manner. But unlike all existing frameworks, it further incorporates a novel step that leverages Gene Ontology information in a semi-supervised clustering method that further improves the quality of the computed modules. Comparing with already well-known existing frameworks, we show that SGC results in higher enrichment in real data. In particular, in 12 real gene expression datasets, SGC outperforms in all except one. △ Less

Submitted 21 September, 2022; originally announced September 2022.

arXiv:1805.00181 [pdf, ps, other]

Spectrally Robust Graph Isomorphism

Authors: Alexandra Kolla, Ioannis Koutis, Vivek Madan, Ali Kemal Sinop

Abstract: We initiate the study of spectral generalizations of the graph isomorphism problem. (a)The Spectral Graph Dominance (SGD) problem: On input of two graphs $G$ and $H$ does there exist a permutation $π$ such that $G\preceq π(H)$? (b) The Spectrally Robust Graph Isomorphism (SRGI) problem: On input of two graphs $G$ and $H$, find the smallest number $κ$ over all permutations $π$ such that… ▽ More We initiate the study of spectral generalizations of the graph isomorphism problem. (a)The Spectral Graph Dominance (SGD) problem: On input of two graphs $G$ and $H$ does there exist a permutation $π$ such that $G\preceq π(H)$? (b) The Spectrally Robust Graph Isomorphism (SRGI) problem: On input of two graphs $G$ and $H$, find the smallest number $κ$ over all permutations $π$ such that $ π(H) \preceq G\preceq κc π(H)$ for some $c$. SRGI is a natural formulation of the network alignment problem that has various applications, most notably in computational biology. Here $G\preceq c H$ means that for all vectors $x$ we have $x^T L_G x \leq c x^T L_H x$, where $L_G$ is the Laplacian $G$. We prove NP-hardness for SGD. We also present a $κ$-approximation algorithm for SRGI for the case when both $G$ and $H$ are bounded-degree trees. The algorithm runs in polynomial time when $κ$ is a constant. △ Less

Submitted 1 May, 2018; originally announced May 2018.

Comments: Extended version of a paper appearing in the proceedings of ICALP 2018

arXiv:1607.04002 [pdf, ps, other]

Directed Hamiltonicity and Out-Branchings via Generalized Laplacians

Authors: Andreas Björklund, Petteri Kaski, Ioannis Koutis

Abstract: We are motivated by a tantalizing open question in exact algorithms: can we detect whether an $n$-vertex directed graph $G$ has a Hamiltonian cycle in time significantly less than $2^n$? We present new randomized algorithms that improve upon several previous works: 1. We show that for any constant $0<λ<1$ and prime $p$ we can count the Hamiltonian cycles modulo… ▽ More We are motivated by a tantalizing open question in exact algorithms: can we detect whether an $n$-vertex directed graph $G$ has a Hamiltonian cycle in time significantly less than $2^n$? We present new randomized algorithms that improve upon several previous works: 1. We show that for any constant $0<λ<1$ and prime $p$ we can count the Hamiltonian cycles modulo $p^{\lfloor (1-λ)\frac{n}{3p}\rfloor}$ in expected time less than $c^n$ for a constant $c<2$ that depends only on $p$ and $λ$. Such an algorithm was previously known only for the case of counting modulo two [Björklund and Husfeldt, FOCS 2013]. 2. We show that we can detect a Hamiltonian cycle in $O^*(3^{n-α(G)})$ time and polynomial space, where $α(G)$ is the size of the maximum independent set in $G$. In particular, this yields an $O^*(3^{n/2})$ time algorithm for bipartite directed graphs, which is faster than the exponential-space algorithm in [Cygan et al., STOC 2013]. Our algorithms are based on the algebraic combinatorics of "incidence assignments" that we can capture through evaluation of determinants of Laplacian-like matrices, inspired by the Matrix--Tree Theorem for directed graphs. In addition to the novel algorithms for directed Hamiltonicity, we use the Matrix--Tree Theorem to derive simple algebraic algorithms for detecting out-branchings. Specifically, we give an $O^*(2^k)$-time randomized algorithm for detecting out-branchings with at least $k$ internal vertices, improving upon the algorithms of [Zehavi, ESA 2015] and [Björklund et al., ICALP 2015]. We also present an algebraic algorithm for the directed $k$-Leaf problem, based on a non-standard monomial detection problem. △ Less

Submitted 25 April, 2017; v1 submitted 14 July, 2016; originally announced July 2016.

arXiv:1604.02094 [pdf, other]

doi 10.1109/FOCS.2016.44

On Fully Dynamic Graph Sparsifiers

Authors: Ittai Abraham, David Durfee, Ioannis Koutis, Sebastian Krinninger, Richard Peng

Abstract: We initiate the study of dynamic algorithms for graph sparsification problems and obtain fully dynamic algorithms, allowing both edge insertions and edge deletions, that take polylogarithmic time after each update in the graph. Our three main results are as follows. First, we give a fully dynamic algorithm for maintaining a $ (1 \pm ε) $-spectral sparsifier with amortized update time… ▽ More We initiate the study of dynamic algorithms for graph sparsification problems and obtain fully dynamic algorithms, allowing both edge insertions and edge deletions, that take polylogarithmic time after each update in the graph. Our three main results are as follows. First, we give a fully dynamic algorithm for maintaining a $ (1 \pm ε) $-spectral sparsifier with amortized update time $poly(\log{n}, ε^{-1})$. Second, we give a fully dynamic algorithm for maintaining a $ (1 \pm ε) $-cut sparsifier with \emph{worst-case} update time $poly(\log{n}, ε^{-1})$. Both sparsifiers have size $ n \cdot poly(\log{n}, ε^{-1})$. Third, we apply our dynamic sparsifier algorithm to obtain a fully dynamic algorithm for maintaining a $(1 + ε)$-approximation to the value of the maximum flow in an unweighted, undirected, bipartite graph with amortized update time $poly(\log{n}, ε^{-1})$. △ Less

Submitted 7 October, 2016; v1 submitted 7 April, 2016; originally announced April 2016.

Comments: A preliminary version of this paper appears in the 57th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2016)

arXiv:1601.05675 [pdf, other]

Incremental Spectral Sparsification for Large-Scale Graph-Based Semi-Supervised Learning

Authors: Daniele Calandriello, Alessandro Lazaric, Michal Valko, Ioannis Koutis

Abstract: While the harmonic function solution performs well in many semi-supervised learning (SSL) tasks, it is known to scale poorly with the number of samples. Recent successful and scalable methods, such as the eigenfunction method focus on efficiently approximating the whole spectrum of the graph Laplacian constructed from the data. This is in contrast to various subsampling and quantization methods pr… ▽ More While the harmonic function solution performs well in many semi-supervised learning (SSL) tasks, it is known to scale poorly with the number of samples. Recent successful and scalable methods, such as the eigenfunction method focus on efficiently approximating the whole spectrum of the graph Laplacian constructed from the data. This is in contrast to various subsampling and quantization methods proposed in the past, which may fail in preserving the graph spectra. However, the impact of the approximation of the spectrum on the final generalization error is either unknown, or requires strong assumptions on the data. In this paper, we introduce Sparse-HFS, an efficient edge-sparsification algorithm for SSL. By constructing an edge-sparse and spectrally similar graph, we are able to leverage the approximation guarantees of spectral sparsification methods to bound the generalization error of Sparse-HFS. As a result, we obtain a theoretically-grounded approximation scheme for graph-based SSL that also empirically matches the performance of known large-scale methods. △ Less

Submitted 21 January, 2016; originally announced January 2016.

arXiv:1601.04746 [pdf, other]

Scalable Constrained Clustering: A Generalized Spectral Method

Authors: Mihai Cucuringu, Ioannis Koutis, Sanjay Chawla, Gary Miller, Richard Peng

Abstract: We present a simple spectral approach to the well-studied constrained clustering problem. It captures constrained clustering as a generalized eigenvalue problem with graph Laplacians. The algorithm works in nearly-linear time and provides concrete guarantees for the quality of the clusters, at least for the case of 2-way partitioning. In practice this translates to a very fast implementation that… ▽ More We present a simple spectral approach to the well-studied constrained clustering problem. It captures constrained clustering as a generalized eigenvalue problem with graph Laplacians. The algorithm works in nearly-linear time and provides concrete guarantees for the quality of the clusters, at least for the case of 2-way partitioning. In practice this translates to a very fast implementation that consistently outperforms existing spectral approaches both in speed and quality. △ Less

Submitted 18 January, 2016; originally announced January 2016.

Comments: accepted to appear in AISTATS 2016. arXiv admin note: text overlap with arXiv:1504.00653

arXiv:1504.00653

Scalable Constrained Clustering: A Generalized Spectral Method

Authors: Mihai Cucuringu, Ioannis Koutis, Sanjay Chawla

Abstract: We present a principled spectral approach to the well-studied constrained clustering problem. It reduces clustering to a generalized eigenvalue problem on Laplacians. The method works in nearly-linear time and provides concrete guarantees for the quality of the clusters, at least for the case of 2-way partitioning. In practice this translates to a very fast implementation that consistently outperf… ▽ More We present a principled spectral approach to the well-studied constrained clustering problem. It reduces clustering to a generalized eigenvalue problem on Laplacians. The method works in nearly-linear time and provides concrete guarantees for the quality of the clusters, at least for the case of 2-way partitioning. In practice this translates to a very fast implementation that consistently outperforms existing spectral approaches. We support this claim with experiments on various data sets: our approach recovers correct clusters in examples where previous methods fail, and handles data sets with millions of data points - two orders of magnitude larger than before. △ Less

Submitted 18 January, 2016; v1 submitted 2 April, 2015; originally announced April 2015.

Comments: this paper is superseded by the article "Scalable Constrained Clustering: A Generalized Spectral Method" authored by M. Cucuring, I. Koutis, S. Chawla, G. Miller and R. Peng

arXiv:1412.6075 [pdf, ps, other]

A Generalized Cheeger Inequality

Authors: Ioannis Koutis, Gary Miller, Richard Peng

Abstract: The generalized conductance $φ(G,H)$ between two graphs $G$ and $H$ on the same vertex set $V$ is defined as the ratio $$ φ(G,H) = \min_{S\subseteq V} \frac{cap_G(S,\bar{S})}{ cap_H(S,\bar{S})}, $$ where $cap_G(S,\bar{S})$ is the total weight of the edges crossing from $S$ to $\bar{S}=V-S$. We show that the minimum generalized eigenvalue $λ(L_G,L_H)$ of the pair of Laplacians $L_G$ and $L_H$ sat… ▽ More The generalized conductance $φ(G,H)$ between two graphs $G$ and $H$ on the same vertex set $V$ is defined as the ratio $$ φ(G,H) = \min_{S\subseteq V} \frac{cap_G(S,\bar{S})}{ cap_H(S,\bar{S})}, $$ where $cap_G(S,\bar{S})$ is the total weight of the edges crossing from $S$ to $\bar{S}=V-S$. We show that the minimum generalized eigenvalue $λ(L_G,L_H)$ of the pair of Laplacians $L_G$ and $L_H$ satisfies $$ λ(L_G,L_H) \geq φ(G,H) φ(G)/8, $$ where $φ(G)$ is the usual conductance of $G$. A generalized cut that meets this bound can be obtained from the generalized eigenvector corresponding to $λ(L_G,L_H)$. The inequality complements a recent proof that $φ(G)$ cannot be replaced by $Θ(φ(G,H))$ in the above inequality, unless the Unique Games Conjecture is false. △ Less

Submitted 22 October, 2014; originally announced December 2014.

arXiv:1402.3851 [pdf, ps, other]

Simple parallel and distributed algorithms for spectral graph sparsification

Authors: Ioannis Koutis

Abstract: We describe a simple algorithm for spectral graph sparsification, based on iterative computations of weighted spanners and uniform sampling. Leveraging the algorithms of Baswana and Sen for computing spanners, we obtain the first distributed spectral sparsification algorithm. We also obtain a parallel algorithm with improved work and time guarantees. Combining this algorithm with the parallel fram… ▽ More We describe a simple algorithm for spectral graph sparsification, based on iterative computations of weighted spanners and uniform sampling. Leveraging the algorithms of Baswana and Sen for computing spanners, we obtain the first distributed spectral sparsification algorithm. We also obtain a parallel algorithm with improved work and time guarantees. Combining this algorithm with the parallel framework of Peng and Spielman for solving symmetric diagonally dominant linear systems, we get a parallel solver which is much closer to being practical and significantly more efficient in terms of the total work. △ Less

Submitted 17 April, 2014; v1 submitted 16 February, 2014; originally announced February 2014.

Comments: replaces "A simple parallel and distributed algorithm for spectral sparsification". Minor changes

arXiv:1209.5821 [pdf, ps, other]

Faster spectral sparsification and numerical algorithms for SDD matrices

Authors: Ioannis Koutis, Alex Levin, Richard Peng

Abstract: We study algorithms for spectral graph sparsification. The input is a graph $G$ with $n$ vertices and $m$ edges, and the output is a sparse graph $\tilde{G}$ that approximates $G$ in an algebraic sense. Concretely, for all vectors $x$ and any $ε>0$, $\tilde{G}$ satisfies $$ (1-ε) x^T L_G x \leq x^T L_{\tilde{G}} x \leq (1+ε) x^T L_G x, $$ where $L_G$ and $L_{\tilde{G}}$ are the Laplacians of $G$ a… ▽ More We study algorithms for spectral graph sparsification. The input is a graph $G$ with $n$ vertices and $m$ edges, and the output is a sparse graph $\tilde{G}$ that approximates $G$ in an algebraic sense. Concretely, for all vectors $x$ and any $ε>0$, $\tilde{G}$ satisfies $$ (1-ε) x^T L_G x \leq x^T L_{\tilde{G}} x \leq (1+ε) x^T L_G x, $$ where $L_G$ and $L_{\tilde{G}}$ are the Laplacians of $G$ and $\tilde{G}$ respectively. We show that the fastest known algorithm for computing a sparsifier with $O(n\log n/ε^2)$ edges can actually run in $\tilde{O}(m\log^2 n)$ time, an $O(\log n)$ factor faster than before. We also present faster sparsification algorithms for slightly dense graphs. Specifically, we give an algorithm that runs in $\tilde{O}(m\log n)$ time and generates a sparsifier with $\tilde{O}(n\log^3{n}/ε^2)$ edges. This implies that a sparsifier with $O(n\log n/ε^2)$ edges can be computed in $\tilde{O}(m\log n)$ time for graphs with more than $O(n\log^4 n)$ edges. We also give an $\tilde{O}(m)$ time algorithm for graphs with more than $n\log^5 n (\log \log n)^3$ edges of polynomially bounded weights, and an $O(m)$ algorithm for unweighted graphs with more than $n\log^8 n (\log \log n)^3 $ edges and $n\log^{10} n (\log \log n)^5$ edges in the weighted case. The improved sparsification algorithms are employed to accelerate linear system solvers and algorithms for computing fundamental eigenvectors of slightly dense SDD matrices. △ Less

Submitted 16 November, 2013; v1 submitted 25 September, 2012; originally announced September 2012.

Comments: This work subsumes the results reported in our STACS 2012 paper "Improved spectral sparsification and numerical algorithms for SDD matrices". The first two algorithms are identical but the fastest O(mloglog n) time algorithm applies now for graphs of average degree log^5 n and more, improving upon the average degree n^c, c>0 of our previous work. Version 2 fixes a few typos

arXiv:1206.3483 [pdf, ps, other]

doi 10.1016/j.ipl.2012.08.008

Constrained multilinear detection for faster functional motif discovery

Authors: Ioannis Koutis

Abstract: The GRAPH MOTIF problem asks whether a given multiset of colors appears on a connected subgraph of a vertex-colored graph. The fastest known parameterized algorithm for this problem is based on a reduction to the $k$-Multilinear Detection (k-MlD) problem: the detection of multilinear terms of total degree k in polynomials presented as circuits. We revisit k-MLD and define k-CMLD, a constrained ver… ▽ More The GRAPH MOTIF problem asks whether a given multiset of colors appears on a connected subgraph of a vertex-colored graph. The fastest known parameterized algorithm for this problem is based on a reduction to the $k$-Multilinear Detection (k-MlD) problem: the detection of multilinear terms of total degree k in polynomials presented as circuits. We revisit k-MLD and define k-CMLD, a constrained version of it which reflects GRAPH MOTIF more faithfully. We then give a fast algorithm for k-CMLD. As a result we obtain faster parameterized algorithms for GRAPH MOTIF and variants of it. △ Less

Submitted 22 August, 2012; v1 submitted 15 June, 2012; originally announced June 2012.

arXiv:1111.1750 [pdf, ps, other]

Near Linear-Work Parallel SDD Solvers, Low-Diameter Decomposition, and Low-Stretch Subgraphs

Authors: Guy E. Blelloch, Anupam Gupta, Ioannis Koutis, Gary L. Miller, Richard Peng, Kanat Tangwongsan

Abstract: We present the design and analysis of a near linear-work parallel algorithm for solving symmetric diagonally dominant (SDD) linear systems. On input of a SDD $n$-by-$n$ matrix $A$ with $m$ non-zero entries and a vector $b$, our algorithm computes a vector $\tilde{x}$ such that $\norm[A]{\tilde{x} - A^+b} \leq \vareps \cdot \norm[A]{A^+b}$ in $O(m\log^{O(1)}{n}\log{\frac1ε})$ work and… ▽ More We present the design and analysis of a near linear-work parallel algorithm for solving symmetric diagonally dominant (SDD) linear systems. On input of a SDD $n$-by-$n$ matrix $A$ with $m$ non-zero entries and a vector $b$, our algorithm computes a vector $\tilde{x}$ such that $\norm[A]{\tilde{x} - A^+b} \leq \vareps \cdot \norm[A]{A^+b}$ in $O(m\log^{O(1)}{n}\log{\frac1ε})$ work and $O(m^{1/3+θ}\log \frac1ε)$ depth for any fixed $θ> 0$. The algorithm relies on a parallel algorithm for generating low-stretch spanning trees or spanning subgraphs. To this end, we first develop a parallel decomposition algorithm that in polylogarithmic depth and $\otilde(|E|)$ work, partitions a graph into components with polylogarithmic diameter such that only a small fraction of the original edges are between the components. This can be used to generate low-stretch spanning trees with average stretch $O(n^α)$ in $O(n^{1+α})$ work and $O(n^α)$ depth. Alternatively, it can be used to generate spanning subgraphs with polylogarithmic average stretch in $\otilde(|E|)$ work and polylogarithmic depth. We apply this subgraph construction to derive a parallel linear system solver. By using this solver in known applications, our results imply improved parallel randomized algorithms for several problems, including single-source shortest paths, maximum flow, minimum-cost flow, and approximate maximum flow. △ Less

Submitted 7 November, 2011; originally announced November 2011.

arXiv:1102.4842 [pdf, ps, other]

A nearly-mlogn time solver for SDD linear systems

Authors: Ioannis Koutis, Gary Miller, Richard Peng

Abstract: We present an improved algorithm for solving symmetrically diagonally dominant linear systems. On input of an $n\times n$ symmetric diagonally dominant matrix $A$ with $m$ non-zero entries and a vector $b$ such that $A\bar{x} = b$ for some (unknown) vector $\bar{x}$, our algorithm computes a vector $x$ such that $||{x}-\bar{x}||_A < ε||\bar{x}||_A $ {$||\cdot||_A$ denotes the A-norm} in time… ▽ More We present an improved algorithm for solving symmetrically diagonally dominant linear systems. On input of an $n\times n$ symmetric diagonally dominant matrix $A$ with $m$ non-zero entries and a vector $b$ such that $A\bar{x} = b$ for some (unknown) vector $\bar{x}$, our algorithm computes a vector $x$ such that $||{x}-\bar{x}||_A < ε||\bar{x}||_A $ {$||\cdot||_A$ denotes the A-norm} in time $${\tilde O}(m\log n \log (1/ε)).$$ The solver utilizes in a standard way a `preconditioning' chain of progressively sparser graphs. To claim the faster running time we make a two-fold improvement in the algorithm for constructing the chain. The new chain exploits previously unknown properties of the graph sparsification algorithm given in [Koutis,Miller,Peng, FOCS 2010], allowing for stronger preconditioning properties. We also present an algorithm of independent interest that constructs nearly-tight low-stretch spanning trees in time $\tilde{O}(m\log{n})$, a factor of $O(\log{n})$ faster than the algorithm in [Abraham,Bartal,Neiman, FOCS 2008]. This speedup directly reflects on the construction time of the preconditioning chain. △ Less

Submitted 18 August, 2011; v1 submitted 23 February, 2011; originally announced February 2011.

Comments: to appear in FOCS11

arXiv:1003.2958 [pdf, ps, other]

Approaching optimality for solving SDD systems

Authors: Ioannis Koutis, Gary L. Miller, Richard Peng

Abstract: We present an algorithm that on input of an $n$-vertex $m$-edge weighted graph $G$ and a value $k$, produces an {\em incremental sparsifier} $\hat{G}$ with $n-1 + m/k$ edges, such that the condition number of $G$ with $\hat{G}$ is bounded above by $\tilde{O}(k\log^2 n)$, with probability $1-p$. The algorithm runs in time $$\tilde{O}((m \log{n} + n\log^2{n})\log(1/p)).$$ As a result, we obtain… ▽ More We present an algorithm that on input of an $n$-vertex $m$-edge weighted graph $G$ and a value $k$, produces an {\em incremental sparsifier} $\hat{G}$ with $n-1 + m/k$ edges, such that the condition number of $G$ with $\hat{G}$ is bounded above by $\tilde{O}(k\log^2 n)$, with probability $1-p$. The algorithm runs in time $$\tilde{O}((m \log{n} + n\log^2{n})\log(1/p)).$$ As a result, we obtain an algorithm that on input of an $n\times n$ symmetric diagonally dominant matrix $A$ with $m$ non-zero entries and a vector $b$, computes a vector ${x}$ satisfying $||{x}-A^{+}b||_A<ε||A^{+}b||_A $, in expected time $$\tilde{O}(m\log^2{n}\log(1/ε)).$$ The solver is based on repeated applications of the incremental sparsifier that produces a chain of graphs which is then used as input to a recursive preconditioned Chebyshev iteration. △ Less

Submitted 3 August, 2010; v1 submitted 15 March, 2010; originally announced March 2010.

Comments: To appear in FOCS 2010

Showing 1–20 of 20 results for author: Koutis, I