Skip to main content

Showing 1–27 of 27 results for author: Tsitsulin, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10727  [pdf, other

    cs.LG

    Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights

    Authors: Zhikai Chen, Haitao Mao, **gzhe Liu, Yu Song, Bingheng Li, Wei **, Bahare Fatemi, Anton Tsitsulin, Bryan Perozzi, Hui Liu, Jiliang Tang

    Abstract: Given the ubiquity of graph data and its applications in diverse domains, building a Graph Foundation Model (GFM) that can work well across different graphs and tasks with a unified backbone has recently garnered significant interests. A major obstacle to achieving this goal stems from the fact that graphs from different domains often exhibit diverse node features. Inspired by multi-modal models t… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Preliminary version: if you find any mistakes regarding the evaluation, feel free to contact the first author

  2. arXiv:2406.09170  [pdf, other

    cs.CL

    Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

    Authors: Bahare Fatemi, Mehran Kazemi, Anton Tsitsulin, Karishma Malkan, **yeong Yim, John Palowitch, Sungyong Seo, Jonathan Halcrow, Bryan Perozzi

    Abstract: Large language models (LLMs) have showcased remarkable reasoning capabilities, yet they remain susceptible to errors, particularly in temporal reasoning tasks involving complex temporal logic. Existing research has explored LLM performance on temporal reasoning using diverse datasets and benchmarks. However, these studies often rely on real-world data that LLMs may have encountered during pre-trai… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2405.18512  [pdf, ps, other

    cs.LG cs.AI

    Understanding Transformer Reasoning Capabilities via Graph Algorithms

    Authors: Clayton Sanford, Bahare Fatemi, Ethan Hall, Anton Tsitsulin, Mehran Kazemi, Jonathan Halcrow, Bryan Perozzi, Vahab Mirrokni

    Abstract: Which transformer scaling regimes are able to perfectly solve different classes of algorithmic problems? While tremendous empirical advances have been attained by transformer-based neural networks, a theoretical understanding of their algorithmic reasoning capabilities in realistic parameter regimes is lacking. We investigate this question in terms of the network's depth, width, and number of extr… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 43 pages, 8 figures

  4. arXiv:2405.18414  [pdf, other

    cs.CL cs.AI cs.LG cs.SI

    Don't Forget to Connect! Improving RAG with Graph-based Reranking

    Authors: Jialin Dong, Bahare Fatemi, Bryan Perozzi, Lin F. Yang, Anton Tsitsulin

    Abstract: Retrieval Augmented Generation (RAG) has greatly improved the performance of Large Language Model (LLM) responses by grounding generation with context from existing documents. These systems work well when documents are clearly relevant to a question context. But what about when a document has partial information, or less obvious connections to the context? And how should we reason about connection… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  5. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  6. arXiv:2402.05862  [pdf, other

    cs.LG cs.AI cs.SI stat.ML

    Let Your Graph Do the Talking: Encoding Structured Data for LLMs

    Authors: Bryan Perozzi, Bahare Fatemi, Dustin Zelle, Anton Tsitsulin, Mehran Kazemi, Rami Al-Rfou, Jonathan Halcrow

    Abstract: How can we best encode structured data into sequential form for use in large language models (LLMs)? In this work, we introduce a parameter-efficient method to explicitly represent structured data for LLMs. Our method, GraphToken, learns an encoding function to extend prompts with explicit structured information. Unlike other work which focuses on limited domains (e.g. knowledge graph representati… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    ACM Class: I.5.1; I.2.6; I.2.7

  7. arXiv:2312.04762  [pdf, other

    cs.LG cs.AI cs.SI

    The Graph Lottery Ticket Hypothesis: Finding Sparse, Informative Graph Structure

    Authors: Anton Tsitsulin, Bryan Perozzi

    Abstract: Graph learning methods help utilize implicit relationships among data items, thereby reducing training label requirements and improving task performance. However, determining the optimal graph structure for a particular learning task remains a challenging research problem. In this work, we introduce the Graph Lottery Ticket (GLT) Hypothesis - that there is an extremely sparse backbone for every… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  8. arXiv:2308.10737  [pdf, other

    cs.LG

    UGSL: A Unified Framework for Benchmarking Graph Structure Learning

    Authors: Bahare Fatemi, Sami Abu-El-Haija, Anton Tsitsulin, Mehran Kazemi, Dustin Zelle, Neslihan Bulut, Jonathan Halcrow, Bryan Perozzi

    Abstract: Graph neural networks (GNNs) demonstrate outstanding performance in a broad range of applications. While the majority of GNN applications assume that a graph structure is given, some recent methods substantially expanded the applicability of GNNs by showing that they may be effective even when no graph structure is explicitly provided. The GNN parameters and a graph structure are jointly learned.… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  9. arXiv:2307.14490  [pdf, other

    cs.LG cs.DC cs.SI

    HUGE: Huge Unsupervised Graph Embeddings with TPUs

    Authors: Brandon Mayer, Anton Tsitsulin, Hendrik Fichtenberger, Jonathan Halcrow, Bryan Perozzi

    Abstract: Graphs are a representation of structured data that captures the relationships between sets of objects. With the ubiquity of available network data, there is increasing industrial and academic need to quickly analyze graphs with billions of nodes and trillions of edges. A common first step for network understanding is Graph Embedding, the process of creating a continuous representation of nodes in… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: As appeared at KDD 2023

  10. arXiv:2307.08881  [pdf, other

    cs.SI cs.LG

    Examining the Effects of Degree Distribution and Homophily in Graph Learning Models

    Authors: Mustafa Yasir, John Palowitch, Anton Tsitsulin, Long Tran-Thanh, Bryan Perozzi

    Abstract: Despite a surge in interest in GNN development, homogeneity in benchmarking datasets still presents a fundamental issue to GNN research. GraphWorld is a recent solution which uses the Stochastic Block Model (SBM) to generate diverse populations of synthetic graphs for benchmarking any GNN task. Despite its success, the SBM imposed fundamental limitations on the kinds of graph structure GraphWorld… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted to Workshop on Graph Learning Benchmarks at KDD 2023

  11. arXiv:2305.16562  [pdf, other

    cs.LG stat.ML

    Unsupervised Embedding Quality Evaluation

    Authors: Anton Tsitsulin, Marina Munkhoeva, Bryan Perozzi

    Abstract: Unsupervised learning has recently significantly gained in popularity, especially with deep learning-based approaches. Despite numerous successes and approaching supervised-level performance on a variety of academic benchmarks, it is still hard to train and evaluate SSL models in practice due to the unsupervised nature of the problem. Even with networks trained in a supervised fashion, it is often… ▽ More

    Submitted 17 July, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: As appeared at the 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML) at the 40th International Conference on Machine Learning (ICML), Honolulu, Hawaii, USA. 2023

  12. Spectral Graph Complexity

    Authors: Anton Tsitsulin, Davide Mottin, Panagiotis Karras, Alex Bronstein, Emmanuel Müller

    Abstract: We introduce a spectral notion of graph complexity derived from the Weyl's law. We experimentally demonstrate its correlation to how well the graph can be embedded in a low-dimensional Euclidean space.

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: BigNet workshop at the Web conferece'2019

  13. arXiv:2210.10014  [pdf, other

    cs.LG cs.AI

    On Classification Thresholds for Graph Attention with Edge Features

    Authors: Kimon Fountoulakis, Dake He, Silvio Lattanzi, Bryan Perozzi, Anton Tsitsulin, Shenghao Yang

    Abstract: The recent years we have seen the rise of graph neural networks for prediction tasks on graphs. One of the dominant architectures is graph attention due to its ability to make predictions using weighted edge features and not only node features. In this paper we analyze, theoretically and empirically, graph attention networks and their ability of correctly labelling nodes in a classic classificatio… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: 37 pages, 5 figures, 5 Tables

  14. arXiv:2207.06944  [pdf, ps, other

    cs.CR cs.LG cs.SI stat.ML

    Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank

    Authors: Alessandro Epasto, Vahab Mirrokni, Bryan Perozzi, Anton Tsitsulin, Peilin Zhong

    Abstract: Personalized PageRank (PPR) is a fundamental tool in unsupervised learning of graph representations such as node ranking, labeling, and graph embedding. However, while data privacy is one of the most important recent concerns, existing PPR algorithms are not designed to protect user privacy. PPR is highly sensitive to the input graph edges: the difference of only one edge may cause a big change in… ▽ More

    Submitted 14 February, 2024; v1 submitted 14 July, 2022; originally announced July 2022.

  15. arXiv:2207.03522  [pdf, other

    cs.LG cs.NE cs.SI physics.soc-ph stat.ML

    TF-GNN: Graph Neural Networks in TensorFlow

    Authors: Oleksandr Ferludin, Arno Eigenwillig, Martin Blais, Dustin Zelle, Jan Pfeifer, Alvaro Sanchez-Gonzalez, Wai Lok Sibon Li, Sami Abu-El-Haija, Peter Battaglia, Neslihan Bulut, Jonathan Halcrow, Filipe Miguel Gonçalves de Almeida, Pedro Gonnet, Liangze Jiang, Parth Kothari, Silvio Lattanzi, André Linhares, Brandon Mayer, Vahab Mirrokni, John Palowitch, Mihir Paradkar, Jennifer She, Anton Tsitsulin, Kevin Villela, Lisa Wang , et al. (2 additional authors not shown)

    Abstract: TensorFlow-GNN (TF-GNN) is a scalable library for Graph Neural Networks in TensorFlow. It is designed from the bottom up to support the kinds of rich heterogeneous graph data that occurs in today's information ecosystems. In addition to enabling machine learning researchers and advanced developers, TF-GNN offers low-code solutions to empower the broader developer community in graph learning. Many… ▽ More

    Submitted 23 July, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

  16. arXiv:2205.10403  [pdf, other

    cs.LG cs.CC

    Tackling Provably Hard Representative Selection via Graph Neural Networks

    Authors: Mehran Kazemi, Anton Tsitsulin, Hossein Esfandiari, MohammadHossein Bateni, Deepak Ramachandran, Bryan Perozzi, Vahab Mirrokni

    Abstract: Representative Selection (RS) is the problem of finding a small subset of exemplars from a dataset that is representative of the dataset. In this paper, we study RS for attributed graphs, and focus on finding representative nodes that optimize the accuracy of a model trained on the selected representatives. Theoretically, we establish a new hardness result forRS (in the absence of a graph structur… ▽ More

    Submitted 19 July, 2023; v1 submitted 20 May, 2022; originally announced May 2022.

    Comments: Accepted at the Transactions of Machine Learning Research (TMLR) Journal

  17. arXiv:2204.01376  [pdf, other

    cs.LG cs.SI

    Synthetic Graph Generation to Benchmark Graph Learning

    Authors: Anton Tsitsulin, Benedek Rozemberczki, John Palowitch, Bryan Perozzi

    Abstract: Graph learning algorithms have attained state-of-the-art performance on many graph analysis tasks such as node classification, link prediction, and clustering. It has, however, become hard to track the field's burgeoning progress. One reason is due to the very small number of datasets used in practice to benchmark the performance of graph learning algorithms. This shockingly small sample size (~10… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: 4 pages. Appeared at the GLB'21 workshop

  18. GraphWorld: Fake Graphs Bring Real Insights for GNNs

    Authors: John Palowitch, Anton Tsitsulin, Brandon Mayer, Bryan Perozzi

    Abstract: Despite advances in the field of Graph Neural Networks (GNNs), only a small number (~5) of datasets are currently used to evaluate new models. This continued reliance on a handful of datasets provides minimal insight into the performance differences between models, and is especially challenging for industrial practitioners who are likely to have datasets which look very different from those used a… ▽ More

    Submitted 7 July, 2022; v1 submitted 28 February, 2022; originally announced March 2022.

    Comments: Uploading KDD camera-ready version

  19. arXiv:2106.05729  [pdf, ps, other

    cs.IR

    GRASP: Graph Alignment through Spectral Signatures

    Authors: Judith Hermanns, Anton Tsitsulin, Marina Munkhoeva, Alex Bronstein, Davide Mottin, Panagiotis Karras

    Abstract: What is the best way to match the nodes of two graphs? This graph alignment problem generalizes graph isomorphism and arises in applications from social network analysis to bioinformatics. Some solutions assume that auxiliary information on known matches or node or edge attributes is available, or utilize arbitrary graph features. Such methods fare poorly in the pure form of the problem, in which… ▽ More

    Submitted 11 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Accepted to APWeb-WAIM

  20. arXiv:2010.06992  [pdf, other

    cs.LG cs.AI cs.SI stat.ML

    InstantEmbedding: Efficient Local Node Representations

    Authors: Ştefan Postăvaru, Anton Tsitsulin, Filipe Miguel Gonçalves de Almeida, Yingtao Tian, Silvio Lattanzi, Bryan Perozzi

    Abstract: In this paper, we introduce InstantEmbedding, an efficient method for generating single-node representations using local PageRank computations. We theoretically prove that our approach produces globally consistent representations in sublinear time. We demonstrate this empirically by conducting extensive experiments on real-world datasets with over a billion edges. Our experiments confirm that Inst… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: 23 pages, 9 figures

  21. arXiv:2006.16904  [pdf, other

    cs.LG cs.SI stat.ML

    Graph Clustering with Graph Neural Networks

    Authors: Anton Tsitsulin, John Palowitch, Bryan Perozzi, Emmanuel Müller

    Abstract: Graph Neural Networks (GNNs) have achieved state-of-the-art results on many graph analysis tasks such as node classification and link prediction. However, important unsupervised problems on graphs, such as graph clustering, have proved more resistant to advances in GNNs. Graph clustering has the same overall goal as node pooling in GNNs - does this mean that GNN pooling methods do a good job at cl… ▽ More

    Submitted 31 May, 2023; v1 submitted 30 June, 2020; originally announced June 2020.

    Comments: JMLR 24(127) 1-21 2023

  22. arXiv:2006.04746  [pdf, other

    cs.LG cs.SI stat.ML

    FREDE: Anytime Graph Embeddings

    Authors: Anton Tsitsulin, Marina Munkhoeva, Davide Mottin, Panagiotis Karras, Ivan Oseledets, Emmanuel Müller

    Abstract: Low-dimensional representations, or embeddings, of a graph's nodes facilitate several practical data science and data engineering tasks. As such embeddings rely, explicitly or implicitly, on a similarity measure among nodes, they require the computation of a quadratic similarity matrix, inducing a tradeoff between space complexity and embedding quality. To date, no graph embedding work combines (i… ▽ More

    Submitted 5 January, 2023; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: As appeared in VLDB 14

  23. Just SLaQ When You Approximate: Accurate Spectral Distances for Web-Scale Graphs

    Authors: Anton Tsitsulin, Marina Munkhoeva, Bryan Perozzi

    Abstract: Graph comparison is a fundamental operation in data mining and information retrieval. Due to the combinatorial nature of graphs, it is hard to balance the expressiveness of the similarity measure and its scalability. Spectral analysis provides quintessential tools for studying the multi-scale structure of graphs and is a well-suited foundation for reasoning about differences between graphs. Howeve… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

    Comments: To appear at TheWebConf (WWW) 2020

  24. arXiv:1905.11141  [pdf, other

    stat.ML cs.LG

    The Shape of Data: Intrinsic Distance for Data Distributions

    Authors: Anton Tsitsulin, Marina Munkhoeva, Davide Mottin, Panagiotis Karras, Alex Bronstein, Ivan Oseledets, Emmanuel Müller

    Abstract: The ability to represent and compare machine learning models is crucial in order to quantify subtle model changes, evaluate generative models, and gather insights on neural network architectures. Existing techniques for comparing data distributions focus on global data properties such as mean and covariance; in that sense, they are extrinsic and uni-scale. We develop a first-of-its-kind intrinsic… ▽ More

    Submitted 15 February, 2020; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: Published in ICLR'2020

  25. arXiv:1811.06237  [pdf, ps, other

    cs.SI cs.AI cs.LG

    SGR: Self-Supervised Spectral Graph Representation Learning

    Authors: Anton Tsitsulin, Davide Mottin, Panagiotis Karras, Alex Bronstein, Emmanuel Müller

    Abstract: Representing a graph as a vector is a challenging task; ideally, the representation should be easily computable and conducive to efficient comparisons among graphs, tailored to the particular data and analytical task at hand. Unfortunately, a "one-size-fits-all" solution is unattainable, as different analytical tasks may require different attention to global or local graph features. We develop SGR… ▽ More

    Submitted 15 November, 2018; originally announced November 2018.

    Comments: As appeared in KDD Deep Learning Day workshop

  26. NetLSD: Hearing the Shape of a Graph

    Authors: Anton Tsitsulin, Davide Mottin, Panagiotis Karras, Alex Bronstein, Emmanuel Müller

    Abstract: Comparison among graphs is ubiquitous in graph analytics. However, it is a hard task in terms of the expressiveness of the employed similarity measure and the efficiency of its computation. Ideally, graph comparison should be invariant to the order of nodes and the sizes of compared graphs, adaptive to the scale of graph patterns, and scalable. Unfortunately, these properties have not been address… ▽ More

    Submitted 29 May, 2018; v1 submitted 27 May, 2018; originally announced May 2018.

    Comments: KDD '18: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, August 19--23, 2018, London, United Kingdom

  27. VERSE: Versatile Graph Embeddings from Similarity Measures

    Authors: Anton Tsitsulin, Davide Mottin, Panagiotis Karras, Emmanuel Müller

    Abstract: Embedding a web-scale information network into a low-dimensional vector space facilitates tasks such as link prediction, classification, and visualization. Past research has addressed the problem of extracting such embeddings by adopting methods from words to graphs, without defining a clearly comprehensible graph-related objective. Yet, as we show, the objectives used in past works implicitly uti… ▽ More

    Submitted 13 March, 2018; originally announced March 2018.

    Comments: In WWW 2018: The Web Conference. 10 pages, 5 figures