Skip to main content

Showing 1–50 of 55 results for author: Perozzi, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10727  [pdf, other

    cs.LG

    Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights

    Authors: Zhikai Chen, Haitao Mao, **gzhe Liu, Yu Song, Bingheng Li, Wei **, Bahare Fatemi, Anton Tsitsulin, Bryan Perozzi, Hui Liu, Jiliang Tang

    Abstract: Given the ubiquity of graph data and its applications in diverse domains, building a Graph Foundation Model (GFM) that can work well across different graphs and tasks with a unified backbone has recently garnered significant interests. A major obstacle to achieving this goal stems from the fact that graphs from different domains often exhibit diverse node features. Inspired by multi-modal models t… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Preliminary version: if you find any mistakes regarding the evaluation, feel free to contact the first author

  2. arXiv:2406.09170  [pdf, other

    cs.CL

    Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

    Authors: Bahare Fatemi, Mehran Kazemi, Anton Tsitsulin, Karishma Malkan, **yeong Yim, John Palowitch, Sungyong Seo, Jonathan Halcrow, Bryan Perozzi

    Abstract: Large language models (LLMs) have showcased remarkable reasoning capabilities, yet they remain susceptible to errors, particularly in temporal reasoning tasks involving complex temporal logic. Existing research has explored LLM performance on temporal reasoning using diverse datasets and benchmarks. However, these studies often rely on real-world data that LLMs may have encountered during pre-trai… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2405.18512  [pdf, ps, other

    cs.LG cs.AI

    Understanding Transformer Reasoning Capabilities via Graph Algorithms

    Authors: Clayton Sanford, Bahare Fatemi, Ethan Hall, Anton Tsitsulin, Mehran Kazemi, Jonathan Halcrow, Bryan Perozzi, Vahab Mirrokni

    Abstract: Which transformer scaling regimes are able to perfectly solve different classes of algorithmic problems? While tremendous empirical advances have been attained by transformer-based neural networks, a theoretical understanding of their algorithmic reasoning capabilities in realistic parameter regimes is lacking. We investigate this question in terms of the network's depth, width, and number of extr… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 43 pages, 8 figures

  4. arXiv:2405.18414  [pdf, other

    cs.CL cs.AI cs.LG cs.SI

    Don't Forget to Connect! Improving RAG with Graph-based Reranking

    Authors: Jialin Dong, Bahare Fatemi, Bryan Perozzi, Lin F. Yang, Anton Tsitsulin

    Abstract: Retrieval Augmented Generation (RAG) has greatly improved the performance of Large Language Model (LLM) responses by grounding generation with context from existing documents. These systems work well when documents are clearly relevant to a question context. But what about when a document has partial information, or less obvious connections to the context? And how should we reason about connection… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  5. arXiv:2402.05862  [pdf, other

    cs.LG cs.AI cs.SI stat.ML

    Let Your Graph Do the Talking: Encoding Structured Data for LLMs

    Authors: Bryan Perozzi, Bahare Fatemi, Dustin Zelle, Anton Tsitsulin, Mehran Kazemi, Rami Al-Rfou, Jonathan Halcrow

    Abstract: How can we best encode structured data into sequential form for use in large language models (LLMs)? In this work, we introduce a parameter-efficient method to explicitly represent structured data for LLMs. Our method, GraphToken, learns an encoding function to extend prompts with explicit structured information. Unlike other work which focuses on limited domains (e.g. knowledge graph representati… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    ACM Class: I.5.1; I.2.6; I.2.7

  6. arXiv:2312.04762  [pdf, other

    cs.LG cs.AI cs.SI

    The Graph Lottery Ticket Hypothesis: Finding Sparse, Informative Graph Structure

    Authors: Anton Tsitsulin, Bryan Perozzi

    Abstract: Graph learning methods help utilize implicit relationships among data items, thereby reducing training label requirements and improving task performance. However, determining the optimal graph structure for a particular learning task remains a challenging research problem. In this work, we introduce the Graph Lottery Ticket (GLT) Hypothesis - that there is an extremely sparse backbone for every… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  7. arXiv:2310.04560  [pdf, other

    cs.LG

    Talk like a Graph: Encoding Graphs for Large Language Models

    Authors: Bahare Fatemi, Jonathan Halcrow, Bryan Perozzi

    Abstract: Graphs are a powerful tool for representing and analyzing complex relationships in real-world applications such as social networks, recommender systems, and computational finance. Reasoning on graphs is essential for drawing inferences about the relationships between entities in a complex system, and to identify hidden patterns and trends. Despite the remarkable progress in automated reasoning wit… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  8. arXiv:2308.13490  [pdf, other

    cs.LG cs.AR cs.SI

    TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs

    Authors: Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Kaidi Cao, Bahare Fatemi, Mike Burrows, Charith Mendis, Bryan Perozzi

    Abstract: Precise hardware performance models play a crucial role in code optimizations. They can assist compilers in making heuristic decisions or aid autotuners in identifying the optimal configuration for a given program. For example, the autotuner for XLA, a machine learning compiler, discovered 10-20% speedup on state-of-the-art models serving substantial production traffic at Google. Although there ex… ▽ More

    Submitted 5 December, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

  9. arXiv:2308.10737  [pdf, other

    cs.LG

    UGSL: A Unified Framework for Benchmarking Graph Structure Learning

    Authors: Bahare Fatemi, Sami Abu-El-Haija, Anton Tsitsulin, Mehran Kazemi, Dustin Zelle, Neslihan Bulut, Jonathan Halcrow, Bryan Perozzi

    Abstract: Graph neural networks (GNNs) demonstrate outstanding performance in a broad range of applications. While the majority of GNN applications assume that a graph structure is given, some recent methods substantially expanded the applicability of GNNs by showing that they may be effective even when no graph structure is explicitly provided. The GNN parameters and a graph structure are jointly learned.… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  10. arXiv:2307.14490  [pdf, other

    cs.LG cs.DC cs.SI

    HUGE: Huge Unsupervised Graph Embeddings with TPUs

    Authors: Brandon Mayer, Anton Tsitsulin, Hendrik Fichtenberger, Jonathan Halcrow, Bryan Perozzi

    Abstract: Graphs are a representation of structured data that captures the relationships between sets of objects. With the ubiquity of available network data, there is increasing industrial and academic need to quickly analyze graphs with billions of nodes and trillions of edges. A common first step for network understanding is Graph Embedding, the process of creating a continuous representation of nodes in… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: As appeared at KDD 2023

  11. arXiv:2307.08881  [pdf, other

    cs.SI cs.LG

    Examining the Effects of Degree Distribution and Homophily in Graph Learning Models

    Authors: Mustafa Yasir, John Palowitch, Anton Tsitsulin, Long Tran-Thanh, Bryan Perozzi

    Abstract: Despite a surge in interest in GNN development, homogeneity in benchmarking datasets still presents a fundamental issue to GNN research. GraphWorld is a recent solution which uses the Stochastic Block Model (SBM) to generate diverse populations of synthetic graphs for benchmarking any GNN task. Despite its success, the SBM imposed fundamental limitations on the kinds of graph structure GraphWorld… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted to Workshop on Graph Learning Benchmarks at KDD 2023

  12. arXiv:2306.03256  [pdf, other

    cs.LG stat.ML

    Explaining and Adapting Graph Conditional Shift

    Authors: Qi Zhu, Yizhu Jiao, Natalia Ponomareva, Jiawei Han, Bryan Perozzi

    Abstract: Graph Neural Networks (GNNs) have shown remarkable performance on graph-structured data. However, recent empirical studies suggest that GNNs are very susceptible to distribution shift. There is still significant ambiguity about why graph-based models seem more vulnerable to these shifts. In this work we provide a thorough theoretical analysis on it by quantifying the magnitude of conditional shift… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  13. arXiv:2305.16562  [pdf, other

    cs.LG stat.ML

    Unsupervised Embedding Quality Evaluation

    Authors: Anton Tsitsulin, Marina Munkhoeva, Bryan Perozzi

    Abstract: Unsupervised learning has recently significantly gained in popularity, especially with deep learning-based approaches. Despite numerous successes and approaching supervised-level performance on a variety of academic benchmarks, it is still hard to train and evaluate SSL models in practice due to the unsupervised nature of the problem. Even with networks trained in a supervised fashion, it is often… ▽ More

    Submitted 17 July, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: As appeared at the 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML) at the 40th International Conference on Machine Learning (ICML), Honolulu, Hawaii, USA. 2023

  14. arXiv:2305.12322  [pdf, other

    cs.LG cs.SI

    Learning Large Graph Property Prediction via Graph Segment Training

    Authors: Kaidi Cao, Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Dustin Zelle, Yanqi Zhou, Charith Mendis, Jure Leskovec, Bryan Perozzi

    Abstract: Learning to predict properties of large graphs is challenging because each prediction requires the knowledge of an entire graph, while the amount of memory available during training is bounded. Here we propose Graph Segment Training (GST), a general framework that utilizes a divide-and-conquer approach to allow learning large graph property prediction with a constant memory footprint. GST first di… ▽ More

    Submitted 5 November, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

  15. arXiv:2210.10014  [pdf, other

    cs.LG cs.AI

    On Classification Thresholds for Graph Attention with Edge Features

    Authors: Kimon Fountoulakis, Dake He, Silvio Lattanzi, Bryan Perozzi, Anton Tsitsulin, Shenghao Yang

    Abstract: The recent years we have seen the rise of graph neural networks for prediction tasks on graphs. One of the dominant architectures is graph attention due to its ability to make predictions using weighted edge features and not only node features. In this paper we analyze, theoretically and empirically, graph attention networks and their ability of correctly labelling nodes in a classic classificatio… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: 37 pages, 5 figures, 5 Tables

  16. arXiv:2207.06944  [pdf, ps, other

    cs.CR cs.LG cs.SI stat.ML

    Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank

    Authors: Alessandro Epasto, Vahab Mirrokni, Bryan Perozzi, Anton Tsitsulin, Peilin Zhong

    Abstract: Personalized PageRank (PPR) is a fundamental tool in unsupervised learning of graph representations such as node ranking, labeling, and graph embedding. However, while data privacy is one of the most important recent concerns, existing PPR algorithms are not designed to protect user privacy. PPR is highly sensitive to the input graph edges: the difference of only one edge may cause a big change in… ▽ More

    Submitted 14 February, 2024; v1 submitted 14 July, 2022; originally announced July 2022.

  17. arXiv:2207.04396  [pdf, other

    cs.LG cs.AI cs.CR

    Graph Generative Model for Benchmarking Graph Neural Networks

    Authors: Minji Yoon, Yue Wu, John Palowitch, Bryan Perozzi, Ruslan Salakhutdinov

    Abstract: As the field of Graph Neural Networks (GNN) continues to grow, it experiences a corresponding increase in the need for large, real-world datasets to train and test new GNN models on challenging, realistic problems. Unfortunately, such graph datasets are often generated from online, highly privacy-restricted ecosystems, which makes research and development on these datasets hard, if not impossible.… ▽ More

    Submitted 9 June, 2023; v1 submitted 10 July, 2022; originally announced July 2022.

  18. arXiv:2207.03522  [pdf, other

    cs.LG cs.NE cs.SI physics.soc-ph stat.ML

    TF-GNN: Graph Neural Networks in TensorFlow

    Authors: Oleksandr Ferludin, Arno Eigenwillig, Martin Blais, Dustin Zelle, Jan Pfeifer, Alvaro Sanchez-Gonzalez, Wai Lok Sibon Li, Sami Abu-El-Haija, Peter Battaglia, Neslihan Bulut, Jonathan Halcrow, Filipe Miguel Gonçalves de Almeida, Pedro Gonnet, Liangze Jiang, Parth Kothari, Silvio Lattanzi, André Linhares, Brandon Mayer, Vahab Mirrokni, John Palowitch, Mihir Paradkar, Jennifer She, Anton Tsitsulin, Kevin Villela, Lisa Wang , et al. (2 additional authors not shown)

    Abstract: TensorFlow-GNN (TF-GNN) is a scalable library for Graph Neural Networks in TensorFlow. It is designed from the bottom up to support the kinds of rich heterogeneous graph data that occurs in today's information ecosystems. In addition to enabling machine learning researchers and advanced developers, TF-GNN offers low-code solutions to empower the broader developer community in graph learning. Many… ▽ More

    Submitted 23 July, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

  19. arXiv:2205.10403  [pdf, other

    cs.LG cs.CC

    Tackling Provably Hard Representative Selection via Graph Neural Networks

    Authors: Mehran Kazemi, Anton Tsitsulin, Hossein Esfandiari, MohammadHossein Bateni, Deepak Ramachandran, Bryan Perozzi, Vahab Mirrokni

    Abstract: Representative Selection (RS) is the problem of finding a small subset of exemplars from a dataset that is representative of the dataset. In this paper, we study RS for attributed graphs, and focus on finding representative nodes that optimize the accuracy of a model trained on the selected representatives. Theoretically, we establish a new hardness result forRS (in the absence of a graph structur… ▽ More

    Submitted 19 July, 2023; v1 submitted 20 May, 2022; originally announced May 2022.

    Comments: Accepted at the Transactions of Machine Learning Research (TMLR) Journal

  20. arXiv:2204.01376  [pdf, other

    cs.LG cs.SI

    Synthetic Graph Generation to Benchmark Graph Learning

    Authors: Anton Tsitsulin, Benedek Rozemberczki, John Palowitch, Bryan Perozzi

    Abstract: Graph learning algorithms have attained state-of-the-art performance on many graph analysis tasks such as node classification, link prediction, and clustering. It has, however, become hard to track the field's burgeoning progress. One reason is due to the very small number of datasets used in practice to benchmark the performance of graph learning algorithms. This shockingly small sample size (~10… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: 4 pages. Appeared at the GLB'21 workshop

  21. arXiv:2203.02018  [pdf, other

    cs.LG

    Zero-shot Transfer Learning within a Heterogeneous Graph via Knowledge Transfer Networks

    Authors: Minji Yoon, John Palowitch, Dustin Zelle, Ziniu Hu, Ruslan Salakhutdinov, Bryan Perozzi

    Abstract: Data continuously emitted from industrial ecosystems such as social or e-commerce platforms are commonly represented as heterogeneous graphs (HG) composed of multiple node/edge types. State-of-the-art graph learning methods for HGs known as heterogeneous graph neural networks (HGNNs) are applied to learn deep context-informed node representations. However, many HG datasets from industrial applicat… ▽ More

    Submitted 12 October, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

  22. GraphWorld: Fake Graphs Bring Real Insights for GNNs

    Authors: John Palowitch, Anton Tsitsulin, Brandon Mayer, Bryan Perozzi

    Abstract: Despite advances in the field of Graph Neural Networks (GNNs), only a small number (~5) of datasets are currently used to evaluate new models. This continued reliance on a handful of datasets provides minimal insight into the performance differences between models, and is especially challenging for industrial practitioners who are likely to have datasets which look very different from those used a… ▽ More

    Submitted 7 July, 2022; v1 submitted 28 February, 2022; originally announced March 2022.

    Comments: Uploading KDD camera-ready version

  23. arXiv:2108.01099  [pdf, other

    cs.LG

    Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training Data

    Authors: Qi Zhu, Natalia Ponomareva, Jiawei Han, Bryan Perozzi

    Abstract: There has been a recent surge of interest in designing Graph Neural Networks (GNNs) for semi-supervised learning tasks. Unfortunately this work has assumed that the nodes labeled for use in training were selected uniformly at random (i.e. are an IID sample). However in many real world scenarios gathering labels for graph nodes is both expensive and inherently biased -- so this assumption can not b… ▽ More

    Submitted 26 October, 2021; v1 submitted 2 August, 2021; originally announced August 2021.

    Comments: NeurIPS 2021

  24. arXiv:2102.04350  [pdf, other

    cs.LG

    Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning

    Authors: Elan Markowitz, Keshav Balasubramanian, Mehrnoosh Mirtaheri, Sami Abu-El-Haija, Bryan Perozzi, Greg Ver Steeg, Aram Galstyan

    Abstract: Graph Representation Learning (GRL) methods have impacted fields from chemistry to social science. However, their algorithmic implementations are specialized to specific use-cases e.g.message passing methods are run differently from node embedding ones. Despite their apparent differences, all these methods utilize the graph structure, and therefore, their learning can be approximated with stochast… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: To appear in ICLR 2021

  25. arXiv:2010.12878  [pdf, other

    cs.LG cs.AI cs.SI

    Pathfinder Discovery Networks for Neural Message Passing

    Authors: Benedek Rozemberczki, Peter Englert, Amol Kapoor, Martin Blais, Bryan Perozzi

    Abstract: In this work we propose Pathfinder Discovery Networks (PDNs), a method for jointly learning a message passing graph over a multiplex network with a downstream semi-supervised model. PDNs inductively learn an aggregated weight for each edge, optimized to produce the best outcome for the downstream learning task. PDNs are a generalization of attention mechanisms on graphs which allow flexible constr… ▽ More

    Submitted 16 February, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

    Comments: Code is available here: https://github.com/benedekrozemberczki/PDN/

  26. arXiv:2010.06992  [pdf, other

    cs.LG cs.AI cs.SI stat.ML

    InstantEmbedding: Efficient Local Node Representations

    Authors: Ştefan Postăvaru, Anton Tsitsulin, Filipe Miguel Gonçalves de Almeida, Yingtao Tian, Silvio Lattanzi, Bryan Perozzi

    Abstract: In this paper, we introduce InstantEmbedding, an efficient method for generating single-node representations using local PageRank computations. We theoretically prove that our approach produces globally consistent representations in sublinear time. We demonstrate this empirically by conducting extensive experiments on real-world datasets with over a billion edges. Our experiments confirm that Inst… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: 23 pages, 9 figures

  27. arXiv:2007.12002  [pdf, other

    cs.LG cs.SI stat.ML

    Grale: Designing Networks for Graph Learning

    Authors: Jonathan Halcrow, Alexandru MoÅŸoi, Sam Ruth, Bryan Perozzi

    Abstract: How can we find the right graph for semi-supervised learning? In real world applications, the choice of which edges to use for computation is the first step in any graph learning process. Interestingly, there are often many types of similarity available to choose as the edges between nodes, and the choice of edges can drastically affect the performance of downstream semi-supervised learning system… ▽ More

    Submitted 23 July, 2020; originally announced July 2020.

    Comments: 10 pages, 6 figures, to be published in KDD'20

  28. arXiv:2007.03113  [pdf, other

    cs.LG cs.SI

    Examining COVID-19 Forecasting using Spatio-Temporal Graph Neural Networks

    Authors: Amol Kapoor, Xue Ben, Luyang Liu, Bryan Perozzi, Matt Barnes, Martin Blais, Shawn O'Banion

    Abstract: In this work, we examine a novel forecasting approach for COVID-19 case prediction that uses Graph Neural Networks and mobility data. In contrast to existing time series forecasting models, the proposed approach learns from a single large-scale spatio-temporal graph, where nodes represent the region-level human mobility, spatial edges represent the human mobility based inter-region connectivity, a… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

  29. arXiv:2007.01570  [pdf, other

    cs.LG cs.SI stat.ML

    Scaling Graph Neural Networks with Approximate PageRank

    Authors: Aleksandar Bojchevski, Johannes Gasteiger, Bryan Perozzi, Amol Kapoor, Martin Blais, Benedek Rózemberczki, Michal Lukasik, Stephan Günnemann

    Abstract: Graph neural networks (GNNs) have emerged as a powerful approach for solving many network mining tasks. However, learning on large graphs remains a challenge - many recently proposed scalable GNN approaches rely on an expensive message-passing procedure to propagate information through the graph. We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs… ▽ More

    Submitted 5 April, 2022; v1 submitted 3 July, 2020; originally announced July 2020.

    Comments: Published as a Conference Paper at ACM SIGKDD 2020. Author name changed from Johannes Klicpera to Johannes Gasteiger

  30. arXiv:2006.16904  [pdf, other

    cs.LG cs.SI stat.ML

    Graph Clustering with Graph Neural Networks

    Authors: Anton Tsitsulin, John Palowitch, Bryan Perozzi, Emmanuel Müller

    Abstract: Graph Neural Networks (GNNs) have achieved state-of-the-art results on many graph analysis tasks such as node classification and link prediction. However, important unsupervised problems on graphs, such as graph clustering, have proved more resistant to advances in GNNs. Graph clustering has the same overall goal as node pooling in GNNs - does this mean that GNN pooling methods do a good job at cl… ▽ More

    Submitted 31 May, 2023; v1 submitted 30 June, 2020; originally announced June 2020.

    Comments: JMLR 24(127) 1-21 2023

  31. arXiv:2005.03675  [pdf, other

    cs.LG cs.NE cs.SI stat.ML

    Machine Learning on Graphs: A Model and Comprehensive Taxonomy

    Authors: Ines Chami, Sami Abu-El-Haija, Bryan Perozzi, Christopher Ré, Kevin Murphy

    Abstract: There has been a surge of recent interest in learning representations for graph-structured data. Graph representation learning methods have generally fallen into three main categories, based on the availability of labeled data. The first, network embedding (such as shallow graph embedding or graph auto-encoders), focuses on learning unsupervised representations of relational structure. The second,… ▽ More

    Submitted 11 April, 2022; v1 submitted 7 May, 2020; originally announced May 2020.

  32. Just SLaQ When You Approximate: Accurate Spectral Distances for Web-Scale Graphs

    Authors: Anton Tsitsulin, Marina Munkhoeva, Bryan Perozzi

    Abstract: Graph comparison is a fundamental operation in data mining and information retrieval. Due to the combinatorial nature of graphs, it is hard to balance the expressiveness of the similarity measure and its scalability. Spectral analysis provides quintessential tools for studying the multi-scale structure of graphs and is a well-suited foundation for reasoning about differences between graphs. Howeve… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

    Comments: To appear at TheWebConf (WWW) 2020

  33. arXiv:1909.11793  [pdf, other

    cs.LG cs.SI stat.ML

    MONET: Debiasing Graph Embeddings via the Metadata-Orthogonal Training Unit

    Authors: John Palowitch, Bryan Perozzi

    Abstract: Are Graph Neural Networks (GNNs) fair? In many real world graphs, the formation of edges is related to certain node attributes (e.g. gender, community, reputation). In this case, standard GNNs using these edges will be biased by this information, as it is encoded in the structure of the adjacency matrix itself. In this paper, we show that when metadata is correlated with the formation of node neig… ▽ More

    Submitted 25 February, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

  34. arXiv:1905.02138  [pdf, other

    cs.SI cs.LG stat.ML

    Is a Single Embedding Enough? Learning Node Representations that Capture Multiple Social Contexts

    Authors: Alessandro Epasto, Bryan Perozzi

    Abstract: Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representat… ▽ More

    Submitted 6 May, 2019; originally announced May 2019.

    ACM Class: I.2.6; H.2.8; G.2.2

    Journal ref: In Proceedings of "The Web Conference" 2019, WWW, 2019

  35. arXiv:1905.00067  [pdf, other

    cs.LG cs.SI stat.ML

    MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing

    Authors: Sami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, Nazanin Alipourfard, Kristina Lerman, Hrayr Harutyunyan, Greg Ver Steeg, Aram Galstyan

    Abstract: Existing popular methods for semi-supervised learning with Graph Neural Networks (such as the Graph Convolutional Network) provably cannot learn a general class of neighborhood mixing relationships. To address this weakness, we propose a new model, MixHop, that can learn these relationships, including difference operators, by repeatedly mixing feature representations of neighbors at various distan… ▽ More

    Submitted 19 June, 2019; v1 submitted 30 April, 2019; originally announced May 2019.

  36. arXiv:1904.09671  [pdf, other

    cs.LG cs.IR cs.SI stat.ML

    DDGK: Learning Graph Representations for Deep Divergence Graph Kernels

    Authors: Rami Al-Rfou, Dustin Zelle, Bryan Perozzi

    Abstract: Can neural networks learn to compare graphs without feature engineering? In this paper, we show that it is possible to learn representations for graph similarity with neither domain knowledge nor supervision (i.e.\ feature engineering or labeled graphs). We propose Deep Divergence Graph Kernels, an unsupervised method for learning representations over graphs that encodes a relaxed notion of graph… ▽ More

    Submitted 21 April, 2019; originally announced April 2019.

    Comments: www '19

    Journal ref: Proceedings of the 2019 World Wide Web Conference (WWW '19), May 13--17, 2019, San Francisco, CA, USA

  37. arXiv:1809.05124  [pdf, other

    cs.SI physics.soc-ph

    Enhanced Network Embeddings via Exploiting Edge Labels

    Authors: Haochen Chen, Xiaofei Sun, Yingtao Tian, Bryan Perozzi, Muhao Chen, Steven Skiena

    Abstract: Network embedding methods aim at learning low-dimensional latent representation of nodes in a network. While achieving competitive performance on a variety of network inference tasks such as node classification and link prediction, these methods treat the relations between nodes as a binary variable and ignore the rich semantics of edges. In this work, we attempt to learn network embeddings which… ▽ More

    Submitted 13 September, 2018; originally announced September 2018.

    Comments: CIKM 2018

  38. arXiv:1808.02590  [pdf, other

    cs.SI

    A Tutorial on Network Embeddings

    Authors: Haochen Chen, Bryan Perozzi, Rami Al-Rfou, Steven Skiena

    Abstract: Network embedding methods aim at learning low-dimensional latent representation of nodes in a network. These representations can be used as features for a wide range of tasks on graphs such as classification, clustering, link prediction, and visualization. In this survey, we give an overview of network embeddings by summarizing and categorizing recent advancements in this research field. We first… ▽ More

    Submitted 7 August, 2018; originally announced August 2018.

    Comments: 23 pages, 6 figures

  39. arXiv:1802.08888  [pdf, other

    cs.LG cs.SI stat.ML

    N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification

    Authors: Sami Abu-El-Haija, Amol Kapoor, Bryan Perozzi, Joonseok Lee

    Abstract: Graph Convolutional Networks (GCNs) have shown significant improvements in semi-supervised learning on graph-structured data. Concurrently, unsupervised learning of graph embeddings has benefited from the information contained in random walks. In this paper, we propose a model: Network of GCNs (N-GCN), which marries these two lines of work. At its core, N-GCN trains multiple instances of GCNs over… ▽ More

    Submitted 24 February, 2018; originally announced February 2018.

  40. arXiv:1712.09731  [pdf, other

    cs.DC cs.SI

    ASYMP: Fault-tolerant Mining of Massive Graphs

    Authors: Eduardo Fleury, Silvio Lattanzi, Vahab Mirrokni, Bryan Perozzi

    Abstract: We present ASYMP, a distributed graph processing system developed for the timely analysis of graphs with trillions of edges. ASYMP has several distinguishing features including a robust fault tolerance mechanism, a lockless architecture which scales seamlessly to thousands of machines, and efficient data access patterns to reduce per-machine overhead. ASYMP is used to analyze the largest graphs at… ▽ More

    Submitted 27 December, 2017; originally announced December 2017.

  41. arXiv:1710.09599  [pdf, other

    cs.LG cs.SI stat.ML

    Watch Your Step: Learning Node Embeddings via Graph Attention

    Authors: Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou, Alex Alemi

    Abstract: Graph embedding methods represent nodes in a continuous vector space, preserving information from the graph (e.g. by sampling random walks). There are many hyper-parameters to these methods (such as random walk length) which have to be manually tuned for every graph. In this paper, we replace random walk hyper-parameters with trainable parameters that we automatically learn via backpropagation. In… ▽ More

    Submitted 12 September, 2018; v1 submitted 26 October, 2017; originally announced October 2017.

  42. arXiv:1706.07845  [pdf, other

    cs.SI

    HARP: Hierarchical Representation Learning for Networks

    Authors: Haochen Chen, Bryan Perozzi, Yifan Hu, Steven Skiena

    Abstract: We present HARP, a novel method for learning low dimensional embeddings of a graph's nodes which preserves higher-order structural features. Our proposed method achieves this by compressing the input graph prior to embedding it, effectively avoiding troublesome embedding configurations (i.e. local minima) which can pose problems to non-convex optimization. HARP works by finding a smaller graph whi… ▽ More

    Submitted 16 November, 2017; v1 submitted 23 June, 2017; originally announced June 2017.

    Comments: To appear in AAAI 2018

  43. arXiv:1705.05615  [pdf, other

    cs.LG cs.SI stat.ML

    Learning Edge Representations via Low-Rank Asymmetric Projections

    Authors: Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou

    Abstract: We propose a new method for embedding graphs while preserving directed edge information. Learning such continuous-space vector representations (or embeddings) of nodes in a graph is an important first step for using network information (from social networks, user-item graphs, knowledge bases, etc.) in many machine learning tasks. Unlike previous work, we (1) explicitly model an edge as a functio… ▽ More

    Submitted 13 September, 2017; v1 submitted 16 May, 2017; originally announced May 2017.

    Journal ref: ACM International Conference on Information and Knowledge Management, 2017

  44. arXiv:1701.09039  [pdf, other

    cs.SI cs.IR physics.soc-ph

    Ties That Bind - Characterizing Classes by Attributes and Social Ties

    Authors: Aria Rezaei, Bryan Perozzi, Leman Akoglu

    Abstract: Given a set of attributed subgraphs known to be from different classes, how can we discover their differences? There are many cases where collections of subgraphs may be contrasted against each other. For example, they may be assigned ground truth labels (spam/not-spam), or it may be desired to directly compare the biological networks of different species or compound networks of different chemical… ▽ More

    Submitted 31 January, 2017; originally announced January 2017.

    Comments: WWW'17 Web Science, 9 pages

  45. arXiv:1605.03956  [pdf, other

    cs.CL

    On the Convergent Properties of Word Embedding Methods

    Authors: Yingtao Tian, Vivek Kulkarni, Bryan Perozzi, Steven Skiena

    Abstract: Do word embeddings converge to learn similar things over different initializations? How repeatable are experiments with word embeddings? Are all word embedding techniques equally reliable? In this paper we propose evaluating methods for learning word representations by their consistency across initializations. We propose a measure to quantify the similarity of the learned word representations unde… ▽ More

    Submitted 12 May, 2016; originally announced May 2016.

    Comments: RepEval @ ACL 2016

  46. arXiv:1605.02115  [pdf, other

    cs.SI physics.soc-ph

    Don't Walk, Skip! Online Learning of Multi-scale Network Embeddings

    Authors: Bryan Perozzi, Vivek Kulkarni, Haochen Chen, Steven Skiena

    Abstract: We present Walklets, a novel approach for learning multiscale representations of vertices in a network. In contrast to previous works, these representations explicitly encode multiscale vertex relationships in a way that is analytically derivable. Walklets generates these multiscale relationships by subsampling short random walks on the vertices of a graph. By `skip**' over steps in each rando… ▽ More

    Submitted 24 June, 2017; v1 submitted 6 May, 2016; originally announced May 2016.

    Comments: 8 pages, ASONAM'17

  47. arXiv:1601.06711  [pdf, other

    cs.SI physics.soc-ph

    Scalable Anomaly Ranking of Attributed Neighborhoods

    Authors: Bryan Perozzi, Leman Akoglu

    Abstract: Given a graph with node attributes, what neighborhoods are anomalous? To answer this question, one needs a quality score that utilizes both structure and attributes. Popular existing measures either quantify the structure only and ignore the attributes (e.g., conductance), or only consider the connectedness of the nodes inside the neighborhood and ignore the cross-edges at the boundary (e.g., dens… ▽ More

    Submitted 25 January, 2016; originally announced January 2016.

    Comments: SDM16, 12 Pages

  48. arXiv:1510.06786  [pdf, other

    cs.CL cs.IR cs.LG

    Freshman or Fresher? Quantifying the Geographic Variation of Internet Language

    Authors: Vivek Kulkarni, Bryan Perozzi, Steven Skiena

    Abstract: We present a new computational technique to detect and analyze statistically significant geographic variation in language. Our meta-analysis approach captures statistical properties of word usage across geographical regions and uses statistical methods to identify significant changes specific to regions. While previous approaches have primarily focused on lexical variation between regions, our met… ▽ More

    Submitted 7 March, 2016; v1 submitted 22 October, 2015; originally announced October 2015.

    Comments: 11 pages (updated submission)

  49. arXiv:1411.3315  [pdf, other

    cs.CL cs.IR cs.LG

    Statistically Significant Detection of Linguistic Change

    Authors: Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, Steven Skiena

    Abstract: We propose a new computational approach for tracking and detecting statistically significant linguistic shifts in the meaning and usage of words. Such linguistic shifts are especially prevalent on the Internet, where the rapid exchange of ideas can quickly change a word's meaning. Our meta-analysis approach constructs property time series of word usage, and then uses statistically sound change poi… ▽ More

    Submitted 12 November, 2014; originally announced November 2014.

    Comments: 11 pages, 7 figures, 4 tables

    ACM Class: H.3.3; I.2.6

  50. arXiv:1410.3791  [pdf, other

    cs.CL cs.LG

    POLYGLOT-NER: Massive Multilingual Named Entity Recognition

    Authors: Rami Al-Rfou, Vivek Kulkarni, Bryan Perozzi, Steven Skiena

    Abstract: The increasing diversity of languages used on the web introduces a new level of complexity to Information Retrieval (IR) systems. We can no longer assume that textual content is written in one language or even the same language family. In this paper, we demonstrate how to build massive multilingual annotators with minimal human expertise and intervention. We describe a system that builds Named Ent… ▽ More

    Submitted 14 October, 2014; originally announced October 2014.

    Comments: 9 pages, 4 figures, 5 tables

    ACM Class: I.2.7; I.2.6