Search | arXiv e-print repository

Language-Agnostic Representation Learning of Source Code from Structure and Context

Authors: Daniel Zügner, Tobias Kirschstein, Michele Catasta, Jure Leskovec, Stephan Günnemann

Abstract: Source code (Context) and its parsed abstract syntax tree (AST; Structure) are two complementary representations of the same computer program. Traditionally, designers of machine learning models have relied predominantly either on Structure or Context. We propose a new model, which jointly learns on Context and Structure of source code. In contrast to previous approaches, our model uses only langu… ▽ More Source code (Context) and its parsed abstract syntax tree (AST; Structure) are two complementary representations of the same computer program. Traditionally, designers of machine learning models have relied predominantly either on Structure or Context. We propose a new model, which jointly learns on Context and Structure of source code. In contrast to previous approaches, our model uses only language-agnostic features, i.e., source code and features that can be computed directly from the AST. Besides obtaining state-of-the-art on monolingual code summarization on all five programming languages considered in this work, we propose the first multilingual code summarization model. We show that jointly training on non-parallel data from multiple programming languages improves results on all individual languages, where the strongest gains are on low-resource languages. Remarkably, multilingual training only from Context does not lead to the same improvements, highlighting the benefits of combining Structure and Context for representation learning on code. △ Less

Submitted 21 March, 2021; originally announced March 2021.

Comments: ICLR 2021

arXiv:2011.14115 [pdf, other]

Fast and Uncertainty-Aware Directional Message Passing for Non-Equilibrium Molecules

Authors: Johannes Gasteiger, Shankari Giri, Johannes T. Margraf, Stephan Günnemann

Abstract: Many important tasks in chemistry revolve around molecules during reactions. This requires predictions far from the equilibrium, while most recent work in machine learning for molecules has been focused on equilibrium or near-equilibrium states. In this paper we aim to extend this scope in three ways. First, we propose the DimeNet++ model, which is 8x faster and 10% more accurate than the original… ▽ More Many important tasks in chemistry revolve around molecules during reactions. This requires predictions far from the equilibrium, while most recent work in machine learning for molecules has been focused on equilibrium or near-equilibrium states. In this paper we aim to extend this scope in three ways. First, we propose the DimeNet++ model, which is 8x faster and 10% more accurate than the original DimeNet on the QM9 benchmark of equilibrium molecules. Second, we validate DimeNet++ on highly reactive molecules by develo** the challenging COLL dataset, which contains distorted configurations of small molecules during collisions. Finally, we investigate ensembling and mean-variance estimation for uncertainty quantification with the goal of accelerating the exploration of the vast space of non-equilibrium structures. Our DimeNet++ implementation as well as the COLL dataset are available online. △ Less

Submitted 5 April, 2022; v1 submitted 28 November, 2020; originally announced November 2020.

Comments: Published at the Machine Learning for Molecules Workshop at NeurIPS 2020. Author name changed from Johannes Klicpera to Johannes Gasteiger

arXiv:2010.15651 [pdf, ps, other]

Reliable Graph Neural Networks via Robust Aggregation

Authors: Simon Geisler, Daniel Zügner, Stephan Günnemann

Abstract: Perturbations targeting the graph structure have proven to be extremely effective in reducing the performance of Graph Neural Networks (GNNs), and traditional defenses such as adversarial training do not seem to be able to improve robustness. This work is motivated by the observation that adversarially injected edges effectively can be viewed as additional samples to a node's neighborhood aggregat… ▽ More Perturbations targeting the graph structure have proven to be extremely effective in reducing the performance of Graph Neural Networks (GNNs), and traditional defenses such as adversarial training do not seem to be able to improve robustness. This work is motivated by the observation that adversarially injected edges effectively can be viewed as additional samples to a node's neighborhood aggregation function, which results in distorted aggregations accumulating over the layers. Conventional GNN aggregation functions, such as a sum or mean, can be distorted arbitrarily by a single outlier. We propose a robust aggregation function motivated by the field of robust statistics. Our approach exhibits the largest possible breakdown point of 0.5, which means that the bias of the aggregation is bounded as long as the fraction of adversarial edges of a node is less than 50\%. Our novel aggregation function, Soft Medoid, is a fully differentiable generalization of the Medoid and therefore lends itself well for end-to-end deep learning. Equip** a GNN with our aggregation improves the robustness with respect to structure perturbations on Cora ML by a factor of 3 (and 5.5 on Citeseer) and by a factor of 8 for low-degree nodes. △ Less

Submitted 29 October, 2020; originally announced October 2020.

Comments: 23 pages, 9 figures, 6 Tables, Neural Information Processing Systems, NeurIPS, 2020

arXiv:2010.14986 [pdf, other]

Evaluating Robustness of Predictive Uncertainty Estimation: Are Dirichlet-based Models Reliable?

Authors: Anna-Kathrin Kopetzki, Bertrand Charpentier, Daniel Zügner, Sandhya Giri, Stephan Günnemann

Abstract: Dirichlet-based uncertainty (DBU) models are a recent and promising class of uncertainty-aware models. DBU models predict the parameters of a Dirichlet distribution to provide fast, high-quality uncertainty estimates alongside with class predictions. In this work, we present the first large-scale, in-depth study of the robustness of DBU models under adversarial attacks. Our results suggest that un… ▽ More Dirichlet-based uncertainty (DBU) models are a recent and promising class of uncertainty-aware models. DBU models predict the parameters of a Dirichlet distribution to provide fast, high-quality uncertainty estimates alongside with class predictions. In this work, we present the first large-scale, in-depth study of the robustness of DBU models under adversarial attacks. Our results suggest that uncertainty estimates of DBU models are not robust w.r.t. three important tasks: (1) indicating correctly and wrongly classified samples; (2) detecting adversarial examples; and (3) distinguishing between in-distribution (ID) and out-of-distribution (OOD) data. Additionally, we explore the first approaches to make DBU models more robust. While adversarial training has a minor effect, our median smoothing based approach significantly increases robustness of DBU models. △ Less

Submitted 11 June, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

Comments: Published at ICML 2021

arXiv:2010.03242 [pdf, other]

Scalable Normalizing Flows for Permutation Invariant Densities

Authors: Marin Biloš, Stephan Günnemann

Abstract: Modeling sets is an important problem in machine learning since this type of data can be found in many domains. A promising approach defines a family of permutation invariant densities with continuous normalizing flows. This allows us to maximize the likelihood directly and sample new realizations with ease. In this work, we demonstrate how calculating the trace, a crucial step in this method, rai… ▽ More Modeling sets is an important problem in machine learning since this type of data can be found in many domains. A promising approach defines a family of permutation invariant densities with continuous normalizing flows. This allows us to maximize the likelihood directly and sample new realizations with ease. In this work, we demonstrate how calculating the trace, a crucial step in this method, raises issues that occur both during training and inference, limiting its practicality. We propose an alternative way of defining permutation equivariant transformations that give closed form trace. This leads not only to improvements while training, but also to better final performance. We demonstrate the benefits of our approach on point processes and general set modeling. △ Less

Submitted 30 June, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

Comments: International Conference on Machine Learning (ICML) 2021

arXiv:2009.10633 [pdf, other]

ThingML+ Augmenting Model-Driven Software Engineering for the Internet of Things with Machine Learning

Authors: Armin Moin, Stephan Rössler, Stephan Günnemann

Abstract: In this paper, we present the current position of the research project ML-Quadrat, which aims to extend the methodology, modeling language and tool support of ThingML - an open source modeling tool for IoT/CPS - to address Machine Learning needs for the IoT applications. Currently, ThingML offers a modeling language and tool support for modeling the components of the system, their communication in… ▽ More In this paper, we present the current position of the research project ML-Quadrat, which aims to extend the methodology, modeling language and tool support of ThingML - an open source modeling tool for IoT/CPS - to address Machine Learning needs for the IoT applications. Currently, ThingML offers a modeling language and tool support for modeling the components of the system, their communication interfaces as well as their behaviors. The latter is done through state machines. However, we argue that in many cases IoT/CPS services involve system components and physical processes, whose behaviors are not well understood in order to be modeled using state machines. Hence, quite often a data-driven approach that enables inference based on the observed data, e.g., using Machine Learning is preferred. To this aim, ML-Quadrat integrates the necessary Machine Learning concepts into ThingML both on the modeling level (syntax and semantics of the modeling language) and on the code generators level. We plan to support two target platforms for code generation regarding Stream Processing and Complex Event Processing, namely Apache SAMOA and Apama. △ Less

Submitted 22 September, 2020; originally announced September 2020.

Comments: Published in Proc. of the International Conference on Model Driven Engineering Languages and Systems (MODELS) 2018 Workshops (MDE4IoT)

Journal ref: Proceedings of MODELS 2018 Workshops (MDE4IoT), co-located with the ACM/IEEE 21st International Conference on Model Driven Engineering Languages and Systems (MODELS), Copenhagen, Denmark, 2018

arXiv:2009.10632 [pdf, other]

doi 10.1145/3417990.3420057

From Things' Modeling Language (ThingML) to Things' Machine Learning (ThingML2)

Authors: Armin Moin, Stephan Rössler, Marouane Sayih, Stephan Günnemann

Abstract: In this paper, we illustrate how to enhance an existing state-of-the-art modeling language and tool for the Internet of Things (IoT), called ThingML, to support machine learning on the modeling level. To this aim, we extend the Domain-Specific Language (DSL) of ThingML, as well as its code generation framework. Our DSL allows one to define things, which are in charge of carrying out data analytics… ▽ More In this paper, we illustrate how to enhance an existing state-of-the-art modeling language and tool for the Internet of Things (IoT), called ThingML, to support machine learning on the modeling level. To this aim, we extend the Domain-Specific Language (DSL) of ThingML, as well as its code generation framework. Our DSL allows one to define things, which are in charge of carrying out data analytics. Further, our code generators can automatically produce the complete implementation in Java and Python. The generated Python code is responsible for data analytics and employs APIs of machine learning libraries, such as Keras, Tensorflow and Scikit Learn. Our prototype is available as open source software on Github. △ Less

Submitted 22 September, 2020; originally announced September 2020.

Comments: International Conference on Model Driven Engineering Languages and Systems (MODELS) 2020 Poster Companion (Extended Abstract)

Journal ref: Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings, 2020

arXiv:2008.12952 [pdf, other]

Efficient Robustness Certificates for Discrete Data: Sparsity-Aware Randomized Smoothing for Graphs, Images and More

Authors: Aleksandar Bojchevski, Johannes Gasteiger, Stephan Günnemann

Abstract: Existing techniques for certifying the robustness of models for discrete data either work only for a small class of models or are general at the expense of efficiency or tightness. Moreover, they do not account for sparsity in the input which, as our findings show, is often essential for obtaining non-trivial guarantees. We propose a model-agnostic certificate based on the randomized smoothing fra… ▽ More Existing techniques for certifying the robustness of models for discrete data either work only for a small class of models or are general at the expense of efficiency or tightness. Moreover, they do not account for sparsity in the input which, as our findings show, is often essential for obtaining non-trivial guarantees. We propose a model-agnostic certificate based on the randomized smoothing framework which subsumes earlier work and is tight, efficient, and sparsity-aware. Its computational complexity does not depend on the number of discrete categories or the dimension of the input (e.g. the graph size), making it highly scalable. We show the effectiveness of our approach on a wide variety of models, datasets, and tasks -- specifically highlighting its use for Graph Neural Networks. So far, obtaining provable guarantees for GNNs has been difficult due to the discrete and non-i.i.d. nature of graph data. Our method can certify any GNN and handles perturbations to both the graph structure and the node attributes. △ Less

Submitted 27 February, 2023; v1 submitted 29 August, 2020; originally announced August 2020.

Comments: Proceedings of the 37th International Conference on Machine Learning (ICML 2020)

arXiv:2007.14120 [pdf, other]

Reachable Sets of Classifiers and Regression Models: (Non-)Robustness Analysis and Robust Training

Authors: Anna-Kathrin Kopetzki, Stephan Günnemann

Abstract: Neural networks achieve outstanding accuracy in classification and regression tasks. However, understanding their behavior still remains an open challenge that requires questions to be addressed on the robustness, explainability and reliability of predictions. We answer these questions by computing reachable sets of neural networks, i.e. sets of outputs resulting from continuous sets of inputs. We… ▽ More Neural networks achieve outstanding accuracy in classification and regression tasks. However, understanding their behavior still remains an open challenge that requires questions to be addressed on the robustness, explainability and reliability of predictions. We answer these questions by computing reachable sets of neural networks, i.e. sets of outputs resulting from continuous sets of inputs. We provide two efficient approaches that lead to over- and under-approximations of the reachable set. This principle is highly versatile, as we show. First, we use it to analyze and enhance the robustness properties of both classifiers and regression models. This is in contrast to existing works, which are mainly focused on classification. Specifically, we verify (non-)robustness, propose a robust training procedure, and show that our approach outperforms adversarial attacks as well as state-of-the-art methods of verifying classifiers for non-norm bound perturbations. Second, we provide techniques to distinguish between reliable and non-reliable predictions for unlabeled inputs, to quantify the influence of each feature on a prediction, and compute a feature ranking. △ Less

Submitted 12 May, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

Comments: Published as a journal paper at ECML PKDD 2021

arXiv:2007.07740 [pdf, other]

Deep Representation Learning and Clustering of Traffic Scenarios

Authors: Nick Harmening, Marin Biloš, Stephan Günnemann

Abstract: Determining the traffic scenario space is a major challenge for the homologation and coverage assessment of automated driving functions. In contrast to current approaches that are mainly scenario-based and rely on expert knowledge, we introduce two data driven autoencoding models that learn a latent representation of traffic scenes. First is a CNN based spatio-temporal model that autoencodes a gri… ▽ More Determining the traffic scenario space is a major challenge for the homologation and coverage assessment of automated driving functions. In contrast to current approaches that are mainly scenario-based and rely on expert knowledge, we introduce two data driven autoencoding models that learn a latent representation of traffic scenes. First is a CNN based spatio-temporal model that autoencodes a grid of traffic participants' positions. Secondly, we develop a pure temporal RNN based model that auto-encodes a sequence of sets. To handle the unordered set data, we had to incorporate the permutation invariance property. Finally, we show how the latent scenario embeddings can be used for clustering traffic scenarios and similarity retrieval. △ Less

Submitted 15 July, 2020; originally announced July 2020.

Comments: Workshop on AI for Autonomous Driving, International Conference on Machine Learning (ICML) 2020

arXiv:2007.01570 [pdf, other]

doi 10.1145/3394486.3403296

Scaling Graph Neural Networks with Approximate PageRank

Authors: Aleksandar Bojchevski, Johannes Gasteiger, Bryan Perozzi, Amol Kapoor, Martin Blais, Benedek Rózemberczki, Michal Lukasik, Stephan Günnemann

Abstract: Graph neural networks (GNNs) have emerged as a powerful approach for solving many network mining tasks. However, learning on large graphs remains a challenge - many recently proposed scalable GNN approaches rely on an expensive message-passing procedure to propagate information through the graph. We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs… ▽ More Graph neural networks (GNNs) have emerged as a powerful approach for solving many network mining tasks. However, learning on large graphs remains a challenge - many recently proposed scalable GNN approaches rely on an expensive message-passing procedure to propagate information through the graph. We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs resulting in significant speed gains while maintaining state-of-the-art prediction performance. In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings. We demonstrate that PPRGo outperforms baselines in both distributed and single-machine training environments on a number of commonly used academic graphs. To better analyze the scalability of large-scale graph learning methods, we introduce a novel benchmark graph with 12.4 million nodes, 173 million edges, and 2.8 million node features. We show that training PPRGo from scratch and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph. We discuss the practical application of PPRGo to solve large-scale node classification problems at Google. △ Less

Submitted 5 April, 2022; v1 submitted 3 July, 2020; originally announced July 2020.

Comments: Published as a Conference Paper at ACM SIGKDD 2020. Author name changed from Johannes Klicpera to Johannes Gasteiger

arXiv:2007.01072 [pdf, other]

Scene Graph Reasoning for Visual Question Answering

Authors: Marcel Hildebrandt, Hang Li, Rajat Koner, Volker Tresp, Stephan Günnemann

Abstract: Visual question answering is concerned with answering free-form questions about an image. Since it requires a deep linguistic understanding of the question and the ability to associate it with various objects that are present in the image, it is an ambitious task and requires techniques from both computer vision and natural language processing. We propose a novel method that approaches the task by… ▽ More Visual question answering is concerned with answering free-form questions about an image. Since it requires a deep linguistic understanding of the question and the ability to associate it with various objects that are present in the image, it is an ambitious task and requires techniques from both computer vision and natural language processing. We propose a novel method that approaches the task by performing context-driven, sequential reasoning based on the objects and their semantic and spatial relationships present in the scene. As a first step, we derive a scene graph which describes the objects in the image, as well as their attributes and their mutual relationships. A reinforcement agent then learns to autonomously navigate over the extracted scene graph to generate paths, which are then the basis for deriving answers. We conduct a first experimental study on the challenging GQA dataset with manually curated scene graphs, where our method almost reaches the level of human performance. △ Less

Submitted 2 July, 2020; originally announced July 2020.

Comments: ICML Workshop Graph Representation Learning and Beyond (GRL+)

arXiv:2006.12631 [pdf, other]

Fast and Flexible Temporal Point Processes with Triangular Maps

Authors: Oleksandr Shchur, Nicholas Gao, Marin Biloš, Stephan Günnemann

Abstract: Temporal point process (TPP) models combined with recurrent neural networks provide a powerful framework for modeling continuous-time event data. While such models are flexible, they are inherently sequential and therefore cannot benefit from the parallelism of modern hardware. By exploiting the recent developments in the field of normalizing flows, we design TriTPP -- a new class of non-recurrent… ▽ More Temporal point process (TPP) models combined with recurrent neural networks provide a powerful framework for modeling continuous-time event data. While such models are flexible, they are inherently sequential and therefore cannot benefit from the parallelism of modern hardware. By exploiting the recent developments in the field of normalizing flows, we design TriTPP -- a new class of non-recurrent TPP models, where both sampling and likelihood computation can be done in parallel. TriTPP matches the flexibility of RNN-based methods but permits orders of magnitude faster sampling. This enables us to use the new model for variational inference in continuous-time discrete-state systems. We demonstrate the advantages of the proposed framework on synthetic and real-world datasets. △ Less

Submitted 10 November, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

arXiv:2006.09239 [pdf, other]

Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts

Authors: Bertrand Charpentier, Daniel Zügner, Stephan Günnemann

Abstract: Accurate estimation of aleatoric and epistemic uncertainty is crucial to build safe and reliable systems. Traditional approaches, such as dropout and ensemble methods, estimate uncertainty by sampling probability predictions from different submodels, which leads to slow uncertainty estimation at inference time. Recent works address this drawback by directly predicting parameters of prior distribut… ▽ More Accurate estimation of aleatoric and epistemic uncertainty is crucial to build safe and reliable systems. Traditional approaches, such as dropout and ensemble methods, estimate uncertainty by sampling probability predictions from different submodels, which leads to slow uncertainty estimation at inference time. Recent works address this drawback by directly predicting parameters of prior distributions over the probability predictions with a neural network. While this approach has demonstrated accurate uncertainty estimation, it requires defining arbitrary target parameters for in-distribution data and makes the unrealistic assumption that out-of-distribution (OOD) data is known at training time. In this work we propose the Posterior Network (PostNet), which uses Normalizing Flows to predict an individual closed-form posterior distribution over predicted probabilites for any input sample. The posterior distributions learned by PostNet accurately reflect uncertainty for in- and out-of-distribution data -- without requiring access to OOD data at training time. PostNet achieves state-of-the art results in OOD detection and in uncertainty calibration under dataset shifts. △ Less

Submitted 22 October, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: Neurips 2020

arXiv:2003.13432 [pdf, other]

Graph Hawkes Neural Network for Forecasting on Temporal Knowledge Graphs

Authors: Zhen Han, Yunpu Ma, Yuyi Wang, Stephan Günnemann, Volker Tresp

Abstract: The Hawkes process has become a standard method for modeling self-exciting event sequences with different event types. A recent work has generalized the Hawkes process to a neurally self-modulating multivariate point process, which enables the capturing of more complex and realistic impacts of past events on future events. However, this approach is limited by the number of possible event types, ma… ▽ More The Hawkes process has become a standard method for modeling self-exciting event sequences with different event types. A recent work has generalized the Hawkes process to a neurally self-modulating multivariate point process, which enables the capturing of more complex and realistic impacts of past events on future events. However, this approach is limited by the number of possible event types, making it impossible to model the dynamics of evolving graph sequences, where each possible link between two nodes can be considered as an event type. The number of event types increases even further when links are directional and labeled. To address this issue, we propose the Graph Hawkes Neural Network that can capture the dynamics of evolving graph sequences and can predict the occurrence of a fact in a future time instance. Extensive experiments on large-scale temporal multi-relational databases, such as temporal knowledge graphs, demonstrate the effectiveness of our approach. △ Less

Submitted 14 June, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

Comments: Automated Knowledge Base Construction 2020, conference paper

arXiv:2003.03123 [pdf, other]

Directional Message Passing for Molecular Graphs

Authors: Johannes Gasteiger, Janek Groß, Stephan Günnemann

Abstract: Graph neural networks have recently achieved great successes in predicting quantum mechanical properties of molecules. These models represent a molecule as a graph using only the distance between atoms (nodes). They do not, however, consider the spatial direction from one atom to another, despite directional information playing a central role in empirical potentials for molecules, e.g. in angular… ▽ More Graph neural networks have recently achieved great successes in predicting quantum mechanical properties of molecules. These models represent a molecule as a graph using only the distance between atoms (nodes). They do not, however, consider the spatial direction from one atom to another, despite directional information playing a central role in empirical potentials for molecules, e.g. in angular potentials. To alleviate this limitation we propose directional message passing, in which we embed the messages passed between atoms instead of the atoms themselves. Each message is associated with a direction in coordinate space. These directional message embeddings are rotationally equivariant since the associated directions rotate with the molecule. We propose a message passing scheme analogous to belief propagation, which uses the directional information by transforming messages based on the angle between them. Additionally, we use spherical Bessel functions and spherical harmonics to construct theoretically well-founded, orthogonal representations that achieve better performance than the currently prevalent Gaussian radial basis representations while using fewer than 1/4 of the parameters. We leverage these innovations to construct the directional message passing neural network (DimeNet). DimeNet outperforms previous GNNs on average by 76% on MD17 and by 31% on QM9. Our implementation is available online. △ Less

Submitted 5 April, 2022; v1 submitted 6 March, 2020; originally announced March 2020.

Comments: Published as a conference paper at ICLR 2020. Author name changed from Johannes Klicpera to Johannes Gasteiger

arXiv:1912.05007 [pdf, other]

Oktoberfest Food Dataset

Authors: Alexander Ziller, Julius Hansjakob, Vitalii Rusinov, Daniel Zügner, Peter Vogel, Stephan Günnemann

Abstract: We release a realistic, diverse, and challenging dataset for object detection on images. The data was recorded at a beer tent in Germany and consists of 15 different categories of food and drink items. We created more than 2,500 object annotations by hand for 1,110 images captured by a video camera above the checkout. We further make available the remaining 600GB of (unlabeled) data containing day… ▽ More We release a realistic, diverse, and challenging dataset for object detection on images. The data was recorded at a beer tent in Germany and consists of 15 different categories of food and drink items. We created more than 2,500 object annotations by hand for 1,110 images captured by a video camera above the checkout. We further make available the remaining 600GB of (unlabeled) data containing days of footage. Additionally, we provide our trained models as a benchmark. Possible applications include automated checkout systems which could significantly speed up the process. △ Less

Submitted 22 November, 2019; originally announced December 2019.

Comments: Dataset publication of Oktoberfest Food Dataset. 4 pages, 6 figures

arXiv:1911.05503 [pdf, other]

Uncertainty on Asynchronous Time Event Prediction

Authors: Marin Biloš, Bertrand Charpentier, Stephan Günnemann

Abstract: Asynchronous event sequences are the basis of many applications throughout different industries. In this work, we tackle the task of predicting the next event (given a history), and how this prediction changes with the passage of time. Since at some time points (e.g. predictions far into the future) we might not be able to predict anything with confidence, capturing uncertainty in the predictions… ▽ More Asynchronous event sequences are the basis of many applications throughout different industries. In this work, we tackle the task of predicting the next event (given a history), and how this prediction changes with the passage of time. Since at some time points (e.g. predictions far into the future) we might not be able to predict anything with confidence, capturing uncertainty in the predictions is crucial. We present two new architectures, WGP-LN and FD-Dir, modelling the evolution of the distribution on the probability simplex with time-dependent logistic normal and Dirichlet distributions. In both cases, the combination of RNNs with either Gaussian process or function decomposition allows to express rich temporal evolution of the distribution parameters, and naturally captures uncertainty. Experiments on class prediction, time prediction and anomaly detection demonstrate the high performances of our models on various datasets compared to other approaches. △ Less

Submitted 8 January, 2020; v1 submitted 13 November, 2019; originally announced November 2019.

Comments: Neurips 2019 (Spotlight)

arXiv:1911.05485 [pdf, other]

Diffusion Improves Graph Learning

Authors: Johannes Gasteiger, Stefan Weißenberger, Stephan Günnemann

Abstract: Graph convolution is the core of most Graph Neural Networks (GNNs) and usually approximated by message passing between direct (one-hop) neighbors. In this work, we remove the restriction of using only the direct neighbors by introducing a powerful, yet spatially localized graph convolution: Graph diffusion convolution (GDC). GDC leverages generalized graph diffusion, examples of which are the heat… ▽ More Graph convolution is the core of most Graph Neural Networks (GNNs) and usually approximated by message passing between direct (one-hop) neighbors. In this work, we remove the restriction of using only the direct neighbors by introducing a powerful, yet spatially localized graph convolution: Graph diffusion convolution (GDC). GDC leverages generalized graph diffusion, examples of which are the heat kernel and personalized PageRank. It alleviates the problem of noisy and often arbitrarily defined edges in real graphs. We show that GDC is closely related to spectral-based models and thus combines the strengths of both spatial (message passing) and spectral methods. We demonstrate that replacing message passing with graph diffusion convolution consistently leads to significant performance improvements across a wide range of models on both supervised and unsupervised tasks and a variety of datasets. Furthermore, GDC is not limited to GNNs but can trivially be combined with any graph-based model or algorithm (e.g. spectral clustering) without requiring any changes to the latter or affecting its computational complexity. Our implementation is available online. △ Less

Submitted 5 April, 2022; v1 submitted 28 October, 2019; originally announced November 2019.

Comments: Published as a conference paper at NeurIPS 2019. Author name changed from Johannes Klicpera to Johannes Gasteiger

Journal ref: Thirty-third Conference on Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2019

arXiv:1910.14356 [pdf, other]

Certifiable Robustness to Graph Perturbations

Authors: Aleksandar Bojchevski, Stephan Günnemann

Abstract: Despite the exploding interest in graph neural networks there has been little effort to verify and improve their robustness. This is even more alarming given recent findings showing that they are extremely vulnerable to adversarial attacks on both the graph structure and the node attributes. We propose the first method for verifying certifiable (non-)robustness to graph perturbations for a general… ▽ More Despite the exploding interest in graph neural networks there has been little effort to verify and improve their robustness. This is even more alarming given recent findings showing that they are extremely vulnerable to adversarial attacks on both the graph structure and the node attributes. We propose the first method for verifying certifiable (non-)robustness to graph perturbations for a general class of models that includes graph neural networks and label/feature propagation. By exploiting connections to PageRank and Markov decision processes our certificates can be efficiently (and under many threat models exactly) computed. Furthermore, we investigate robust training procedures that increase the number of certifiably robust nodes while maintaining or improving the clean predictive accuracy. △ Less

Submitted 19 December, 2019; v1 submitted 31 October, 2019; originally announced October 2019.

Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

arXiv:1910.13874 [pdf, other]

Group Centrality Maximization for Large-scale Graphs

Authors: Eugenio Angriman, Alexander van der Grinten, Aleksandar Bojchevski, Daniel Zügner, Stephan Günnemann, Henning Meyerhenke

Abstract: The study of vertex centrality measures is a key aspect of network analysis. Naturally, such centrality measures have been generalized to groups of vertices; for popular measures it was shown that the problem of finding the most central group is $\mathcal{NP}$-hard. As a result, approximation algorithms to maximize group centralities were introduced recently. Despite a nearly-linear running time,… ▽ More The study of vertex centrality measures is a key aspect of network analysis. Naturally, such centrality measures have been generalized to groups of vertices; for popular measures it was shown that the problem of finding the most central group is $\mathcal{NP}$-hard. As a result, approximation algorithms to maximize group centralities were introduced recently. Despite a nearly-linear running time, approximation algorithms for group betweenness and (to a lesser extent) group closeness are rather slow on large networks due to high constant overheads. That is why we introduce GED-Walk centrality, a new submodular group centrality measure inspired by Katz centrality. In contrast to closeness and betweenness, it considers walks of any length rather than shortest paths, with shorter walks having a higher contribution. We define algorithms that (i) efficiently approximate the GED-Walk score of a given group and (ii) efficiently approximate the (proved to be $\mathcal{NP}$-hard) problem of finding a group with highest GED-Walk score. Experiments on several real-world datasets show that scores obtained by GED-Walk improve performance on common graph mining tasks such as collective classification and graph-level classification. An evaluation of empirical running times demonstrates that maximizing GED-Walk (in approximation) is two orders of magnitude faster compared to group betweenness approximation and for group sizes $\leq 100$ one to two orders faster than group closeness approximation. For graphs with tens of millions of edges, approximate GED-Walk maximization typically needs less than one minute. Furthermore, our experiments suggest that the maximization algorithms scale linearly with the size of the input graph and the size of the group. △ Less

Submitted 30 October, 2019; originally announced October 2019.

arXiv:1909.12201 [pdf, other]

Overlap** Community Detection with Graph Neural Networks

Authors: Oleksandr Shchur, Stephan Günnemann

Abstract: Community detection is a fundamental problem in machine learning. While deep learning has shown great promise in many graphrelated tasks, develo** neural models for community detection has received surprisingly little attention. The few existing approaches focus on detecting disjoint communities, even though communities in real graphs are well known to be overlap**. We address this shortcoming… ▽ More Community detection is a fundamental problem in machine learning. While deep learning has shown great promise in many graphrelated tasks, develo** neural models for community detection has received surprisingly little attention. The few existing approaches focus on detecting disjoint communities, even though communities in real graphs are well known to be overlap**. We address this shortcoming and propose a graph neural network (GNN) based model for overlap** community detection. Despite its simplicity, our model outperforms the existing baselines by a large margin in the task of community recovery. We establish through an extensive experimental evaluation that the proposed model is effective, scalable and robust to hyperparameter settings. We also perform an ablation study that confirms that GNN is the key ingredient to the power of the proposed model. △ Less

Submitted 26 September, 2019; originally announced September 2019.

Comments: The First International Workshop on Deep Learning on Graphs (In Conjunction with the 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining) https://dlg2019.bitbucket.io/

arXiv:1909.12127 [pdf, other]

Intensity-Free Learning of Temporal Point Processes

Authors: Oleksandr Shchur, Marin Biloš, Stephan Günnemann

Abstract: Temporal point processes are the dominant paradigm for modeling sequences of events happening at irregular intervals. The standard way of learning in such models is by estimating the conditional intensity function. However, parameterizing the intensity function usually incurs several trade-offs. We show how to overcome the limitations of intensity-based approaches by directly modeling the conditio… ▽ More Temporal point processes are the dominant paradigm for modeling sequences of events happening at irregular intervals. The standard way of learning in such models is by estimating the conditional intensity function. However, parameterizing the intensity function usually incurs several trade-offs. We show how to overcome the limitations of intensity-based approaches by directly modeling the conditional distribution of inter-event times. We draw on the literature on normalizing flows to design models that are flexible and efficient. We additionally propose a simple mixture model that matches the flexibility of flow-based models, but also permits sampling and computing moments in closed form. The proposed models achieve state-of-the-art performance in standard prediction tasks and are suitable for novel applications, such as learning sequence embeddings and imputing missing data. △ Less

Submitted 23 January, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

Comments: International Conference on Learning Representations (ICLR) 2020

arXiv:1906.12269 [pdf, other]

doi 10.1145/3292500.3330905

Certifiable Robustness and Robust Training for Graph Convolutional Networks

Authors: Daniel Zügner, Stephan Günnemann

Abstract: Recent works show that Graph Neural Networks (GNNs) are highly non-robust with respect to adversarial attacks on both the graph structure and the node attributes, making their outcomes unreliable. We propose the first method for certifiable (non-)robustness of graph convolutional networks with respect to perturbations of the node attributes. We consider the case of binary node attributes (e.g. bag… ▽ More Recent works show that Graph Neural Networks (GNNs) are highly non-robust with respect to adversarial attacks on both the graph structure and the node attributes, making their outcomes unreliable. We propose the first method for certifiable (non-)robustness of graph convolutional networks with respect to perturbations of the node attributes. We consider the case of binary node attributes (e.g. bag-of-words) and perturbations that are L_0-bounded. If a node has been certified with our method, it is guaranteed to be robust under any possible perturbation given the attack model. Likewise, we can certify non-robustness. Finally, we propose a robust semi-supervised training procedure that treats the labeled and unlabeled nodes jointly. As shown in our experimental evaluation, our method significantly improves the robustness of the GNN with only minimal effect on the predictive accuracy. △ Less

Submitted 28 June, 2019; originally announced June 2019.

Comments: Published as a Conference Paper at ACM SIGKDD 2019

arXiv:1905.05955 [pdf, other]

GhostLink: Latent Network Inference for Influence-aware Recommendation

Authors: Subhabrata Mukherjee, Stephan Guennemann

Abstract: Social influence plays a vital role in sha** a user's behavior in online communities dealing with items of fine taste like movies, food, and beer. For online recommendation, this implies that users' preferences and ratings are influenced due to other individuals. Given only time-stamped reviews of users, can we find out who-influences-whom, and characteristics of the underlying influence network… ▽ More Social influence plays a vital role in sha** a user's behavior in online communities dealing with items of fine taste like movies, food, and beer. For online recommendation, this implies that users' preferences and ratings are influenced due to other individuals. Given only time-stamped reviews of users, can we find out who-influences-whom, and characteristics of the underlying influence network? Can we use this network to improve recommendation? While prior works in social-aware recommendation have leveraged social interaction by considering the observed social network of users, many communities like Amazon, Beeradvocate, and Ratebeer do not have explicit user-user links. Therefore, we propose GhostLink, an unsupervised probabilistic graphical model, to automatically learn the latent influence network underlying a review community -- given only the temporal traces (timestamps) of users' posts and their content. Based on extensive experiments with four real-world datasets with 13 million reviews, we show that GhostLink improves item recommendation by around 23% over state-of-the-art methods that do not consider this influence. As additional use-cases, we show that GhostLink can be used to differentiate between users' latent preferences and influenced ones, as well as to detect influential users based on the learned influence graph. △ Less

Submitted 15 May, 2019; originally announced May 2019.

arXiv:1902.08412 [pdf, ps, other]

Adversarial Attacks on Graph Neural Networks via Meta Learning

Authors: Daniel Zügner, Stephan Günnemann

Abstract: Deep learning models for graphs have advanced the state of the art on many tasks. Despite their recent success, little is known about their robustness. We investigate training time attacks on graph neural networks for node classification that perturb the discrete graph structure. Our core principle is to use meta-gradients to solve the bilevel problem underlying training-time attacks, essentially… ▽ More Deep learning models for graphs have advanced the state of the art on many tasks. Despite their recent success, little is known about their robustness. We investigate training time attacks on graph neural networks for node classification that perturb the discrete graph structure. Our core principle is to use meta-gradients to solve the bilevel problem underlying training-time attacks, essentially treating the graph as a hyperparameter to optimize. Our experiments show that small graph perturbations consistently lead to a strong decrease in performance for graph convolutional networks, and even transfer to unsupervised embeddings. Remarkably, the perturbations created by our algorithm can misguide the graph neural networks such that they perform worse than a simple baseline that ignores all relational information. Our attacks do not assume any knowledge about or access to the target classifiers. △ Less

Submitted 28 January, 2024; v1 submitted 22 February, 2019; originally announced February 2019.

Comments: ICLR submission

Journal ref: International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 2019

arXiv:1811.05868 [pdf, other]

Pitfalls of Graph Neural Network Evaluation

Authors: Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, Stephan Günnemann

Abstract: Semi-supervised node classification in graphs is a fundamental problem in graph mining, and the recently proposed graph neural networks (GNNs) have achieved unparalleled results on this task. Due to their massive success, GNNs have attracted a lot of attention, and many novel architectures have been put forward. In this paper we show that existing evaluation strategies for GNN models have serious… ▽ More Semi-supervised node classification in graphs is a fundamental problem in graph mining, and the recently proposed graph neural networks (GNNs) have achieved unparalleled results on this task. Due to their massive success, GNNs have attracted a lot of attention, and many novel architectures have been put forward. In this paper we show that existing evaluation strategies for GNN models have serious shortcomings. We show that using the same train/validation/test splits of the same datasets, as well as making significant changes to the training procedure (e.g. early stop** criteria) precludes a fair comparison of different architectures. We perform a thorough empirical evaluation of four prominent GNN models and show that considering different splits of the data leads to dramatically different rankings of models. Even more importantly, our findings suggest that simpler GNN architectures are able to outperform the more sophisticated ones if the hyperparameters and the training procedure are tuned fairly for all models. △ Less

Submitted 18 June, 2019; v1 submitted 14 November, 2018; originally announced November 2018.

arXiv:1811.04451 [pdf, other]

Multi-Source Neural Variational Inference

Authors: Richard Kurle, Stephan Günnemann, Patrick van der Smagt

Abstract: Learning from multiple sources of information is an important problem in machine-learning research. The key challenges are learning representations and formulating inference methods that take into account the complementarity and redundancy of various information sources. In this paper we formulate a variational autoencoder based multi-source learning framework in which each encoder is conditioned… ▽ More Learning from multiple sources of information is an important problem in machine-learning research. The key challenges are learning representations and formulating inference methods that take into account the complementarity and redundancy of various information sources. In this paper we formulate a variational autoencoder based multi-source learning framework in which each encoder is conditioned on a different information source. This allows us to relate the sources via the shared latent variables by computing divergence measures between individual source's posterior approximations. We explore a variety of options to learn these encoders and to integrate the beliefs they compute into a consistent posterior approximation. We visualise learned beliefs on a toy dataset and evaluate our methods for learning shared representations and structured output prediction, showing trade-offs of learning separate encoders for each information source. Furthermore, we demonstrate how conflict detection and redundancy can increase robustness of inference in a multi-source setting. △ Less

Submitted 17 November, 2018; v1 submitted 11 November, 2018; originally announced November 2018.

Comments: AAAI 2019, Association for the Advancement of Artificial Intelligence (AAAI) 2019

arXiv:1810.11953 [pdf, other]

Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift

Authors: Stephan Rabanser, Stephan Günnemann, Zachary C. Lipton

Abstract: We might hope that when faced with unexpected inputs, well-designed software systems would fire off warnings. Machine learning (ML) systems, however, which depend strongly on properties of their inputs (e.g. the i.i.d. assumption), tend to fail silently. This paper explores the problem of building ML systems that fail loudly, investigating methods for detecting dataset shift, identifying exemplars… ▽ More We might hope that when faced with unexpected inputs, well-designed software systems would fire off warnings. Machine learning (ML) systems, however, which depend strongly on properties of their inputs (e.g. the i.i.d. assumption), tend to fail silently. This paper explores the problem of building ML systems that fail loudly, investigating methods for detecting dataset shift, identifying exemplars that most typify the shift, and quantifying shift malignancy. We focus on several datasets and various perturbations to both covariates and label distributions with varying magnitudes and fractions of data affected. Interestingly, we show that across the dataset shifts that we explore, a two-sample-testing-based approach, using pre-trained classifiers for dimensionality reduction, performs best. Moreover, we demonstrate that domain-discriminating approaches tend to be helpful for characterizing shifts qualitatively and determining if they are harmful. △ Less

Submitted 28 October, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

Comments: Advances in Neural Information Processing Systems (NeurIPS) 2019

arXiv:1810.05997 [pdf, other]

Predict then Propagate: Graph Neural Networks meet Personalized PageRank

Authors: Johannes Gasteiger, Aleksandar Bojchevski, Stephan Günnemann

Abstract: Neural message passing algorithms for semi-supervised classification on graphs have recently achieved great success. However, for classifying a node these methods only consider nodes that are a few propagation steps away and the size of this utilized neighborhood is hard to extend. In this paper, we use the relationship between graph convolutional networks (GCN) and PageRank to derive an improved… ▽ More Neural message passing algorithms for semi-supervised classification on graphs have recently achieved great success. However, for classifying a node these methods only consider nodes that are a few propagation steps away and the size of this utilized neighborhood is hard to extend. In this paper, we use the relationship between graph convolutional networks (GCN) and PageRank to derive an improved propagation scheme based on personalized PageRank. We utilize this propagation procedure to construct a simple model, personalized propagation of neural predictions (PPNP), and its fast approximation, APPNP. Our model's training time is on par or faster and its number of parameters on par or lower than previous models. It leverages a large, adjustable neighborhood for classification and can be easily combined with any neural network. We show that this model outperforms several recently proposed methods for semi-supervised classification in the most thorough study done so far for GCN-like models. Our implementation is available online. △ Less

Submitted 5 April, 2022; v1 submitted 14 October, 2018; originally announced October 2018.

Comments: Published as a conference paper at ICLR 2019. Author name changed from Johannes Klicpera to Johannes Gasteiger

Journal ref: International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 2019

arXiv:1810.01836 [pdf, other]

Mining Contrasting Quasi-Clique Patterns

Authors: Roberto Alonso, Stephan Günnemann

Abstract: Mining dense quasi-cliques is a well-known clustering task with applications ranging from social networks over collaboration graphs to document analysis. Recent work has extended this task to multiple graphs; i.e. the goal is to find groups of vertices highly dense among multiple graphs. In this paper, we argue that in a multi-graph scenario the sparsity is valuable for knowledge extraction as wel… ▽ More Mining dense quasi-cliques is a well-known clustering task with applications ranging from social networks over collaboration graphs to document analysis. Recent work has extended this task to multiple graphs; i.e. the goal is to find groups of vertices highly dense among multiple graphs. In this paper, we argue that in a multi-graph scenario the sparsity is valuable for knowledge extraction as well. We introduce the concept of contrasting quasi-clique patterns: a collection of vertices highly dense in one graph but highly sparse (i.e. less connected) in a second graph. Thus, these patterns specifically highlight the difference/contrast between the considered graphs. Based on our novel model, we propose an algorithm that enables fast computation of contrasting patterns by exploiting intelligent traversal and pruning techniques. We showcase the potential of contrasting patterns on a variety of synthetic and real-world datasets. △ Less

Submitted 3 October, 2018; originally announced October 2018.

Comments: 10 pages

arXiv:1809.01093 [pdf, other]

Adversarial Attacks on Node Embeddings via Graph Poisoning

Authors: Aleksandar Bojchevski, Stephan Günnemann

Abstract: The goal of network representation learning is to learn low-dimensional node embeddings that capture the graph structure and are useful for solving downstream tasks. However, despite the proliferation of such methods, there is currently no study of their robustness to adversarial attacks. We provide the first adversarial vulnerability analysis on the widely used family of methods based on random w… ▽ More The goal of network representation learning is to learn low-dimensional node embeddings that capture the graph structure and are useful for solving downstream tasks. However, despite the proliferation of such methods, there is currently no study of their robustness to adversarial attacks. We provide the first adversarial vulnerability analysis on the widely used family of methods based on random walks. We derive efficient adversarial perturbations that poison the network structure and have a negative effect on both the quality of the embeddings and the downstream tasks. We further show that our attacks are transferable since they generalize to many models and are successful even when the attacker is restricted. △ Less

Submitted 27 May, 2019; v1 submitted 4 September, 2018; originally announced September 2018.

Comments: ICML 2019, PMLR 97:695-704

arXiv:1806.00770 [pdf, other]

Dual-Primal Graph Convolutional Networks

Authors: Federico Monti, Oleksandr Shchur, Aleksandar Bojchevski, Or Litany, Stephan Günnemann, Michael M. Bronstein

Abstract: In recent years, there has been a surge of interest in develo** deep learning methods for non-Euclidean structured data such as graphs. In this paper, we propose Dual-Primal Graph CNN, a graph convolutional architecture that alternates convolution-like operations on the graph and its dual. Our approach allows to learn both vertex- and edge features and generalizes the previous graph attention (G… ▽ More In recent years, there has been a surge of interest in develo** deep learning methods for non-Euclidean structured data such as graphs. In this paper, we propose Dual-Primal Graph CNN, a graph convolutional architecture that alternates convolution-like operations on the graph and its dual. Our approach allows to learn both vertex- and edge features and generalizes the previous graph attention (GAT) model. We provide extensive experimental validation showing state-of-the-art results on a variety of tasks tested on established graph benchmarks, including CORA and Citeseer citation networks as well as MovieLens, Flixter, Douban and Yahoo Music graph-guided recommender systems. △ Less

Submitted 3 June, 2018; originally announced June 2018.

arXiv:1805.07984 [pdf, other]

doi 10.1145/3219819.3220078

Adversarial Attacks on Neural Networks for Graph Data

Authors: Daniel Zügner, Amir Akbarnejad, Stephan Günnemann

Abstract: Deep learning models for graphs have achieved strong performance for the task of node classification. Despite their proliferation, currently there is no study of their robustness to adversarial attacks. Yet, in domains where they are likely to be used, e.g. the web, adversaries are common. Can deep learning models for graphs be easily fooled? In this work, we introduce the first study of adversari… ▽ More Deep learning models for graphs have achieved strong performance for the task of node classification. Despite their proliferation, currently there is no study of their robustness to adversarial attacks. Yet, in domains where they are likely to be used, e.g. the web, adversaries are common. Can deep learning models for graphs be easily fooled? In this work, we introduce the first study of adversarial attacks on attributed graphs, specifically focusing on models exploiting ideas of graph convolutions. In addition to attacks at test time, we tackle the more challenging class of poisoning/causative attacks, which focus on the training phase of a machine learning model. We generate adversarial perturbations targeting the node's features and the graph structure, thus, taking the dependencies between instances in account. Moreover, we ensure that the perturbations remain unnoticeable by preserving important data characteristics. To cope with the underlying discrete domain we propose an efficient algorithm Nettack exploiting incremental computations. Our experimental study shows that accuracy of node classification significantly drops even when performing only few perturbations. Even more, our attacks are transferable: the learned attacks generalize to other state-of-the-art node classification models and unsupervised approaches, and likewise are successful even when only limited knowledge about the graph is given. △ Less

Submitted 9 December, 2021; v1 submitted 21 May, 2018; originally announced May 2018.

Comments: Accepted as a full paper at KDD 2018 on May 6, 2018

Journal ref: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, pp. 2847-2856

arXiv:1803.00816 [pdf, other]

NetGAN: Generating Graphs via Random Walks

Authors: Aleksandar Bojchevski, Oleksandr Shchur, Daniel Zügner, Stephan Günnemann

Abstract: We propose NetGAN - the first implicit generative model for graphs able to mimic real-world networks. We pose the problem of graph generation as learning the distribution of biased random walks over the input graph. The proposed model is based on a stochastic neural network that generates discrete output samples and is trained using the Wasserstein GAN objective. NetGAN is able to produce graphs t… ▽ More We propose NetGAN - the first implicit generative model for graphs able to mimic real-world networks. We pose the problem of graph generation as learning the distribution of biased random walks over the input graph. The proposed model is based on a stochastic neural network that generates discrete output samples and is trained using the Wasserstein GAN objective. NetGAN is able to produce graphs that exhibit well-known network patterns without explicitly specifying them in the model definition. At the same time, our model exhibits strong generalization properties, as highlighted by its competitive link prediction performance, despite not being trained specifically for this task. Being the first approach to combine both of these desirable properties, NetGAN opens exciting avenues for further research. △ Less

Submitted 1 June, 2018; v1 submitted 2 March, 2018; originally announced March 2018.

Comments: ICML 2018

Journal ref: Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden, pp. 609-618

arXiv:1711.10781 [pdf, other]

Introduction to Tensor Decompositions and their Applications in Machine Learning

Authors: Stephan Rabanser, Oleksandr Shchur, Stephan Günnemann

Abstract: Tensors are multidimensional arrays of numerical values and therefore generalize matrices to multiple dimensions. While tensors first emerged in the psychometrics community in the $20^{\text{th}}$ century, they have since then spread to numerous other disciplines, including machine learning. Tensors and their decompositions are especially beneficial in unsupervised learning settings, but are gaini… ▽ More Tensors are multidimensional arrays of numerical values and therefore generalize matrices to multiple dimensions. While tensors first emerged in the psychometrics community in the $20^{\text{th}}$ century, they have since then spread to numerous other disciplines, including machine learning. Tensors and their decompositions are especially beneficial in unsupervised learning settings, but are gaining popularity in other sub-disciplines like temporal and multi-relational data analysis, too. The scope of this paper is to give a broad overview of tensors, their decompositions, and how they are used in machine learning. As part of this, we are going to introduce basic tensor concepts, discuss why tensors can be considered more rigid than matrices with respect to the uniqueness of their decomposition, explain the most important factorization algorithms and their properties, provide concrete examples of tensor decomposition applications in machine learning, conduct a case study on tensor-based estimation of mixture models, talk about the current state of research, and provide references to available software libraries. △ Less

Submitted 29 November, 2017; originally announced November 2017.

Comments: 13 pages, 12 figures

arXiv:1707.03815 [pdf, other]

Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking

Authors: Aleksandar Bojchevski, Stephan Günnemann

Abstract: Methods that learn representations of nodes in a graph play a critical role in network analysis since they enable many downstream learning tasks. We propose Graph2Gauss - an approach that can efficiently learn versatile node embeddings on large scale (attributed) graphs that show strong performance on tasks such as link prediction and node classification. Unlike most approaches that represent node… ▽ More Methods that learn representations of nodes in a graph play a critical role in network analysis since they enable many downstream learning tasks. We propose Graph2Gauss - an approach that can efficiently learn versatile node embeddings on large scale (attributed) graphs that show strong performance on tasks such as link prediction and node classification. Unlike most approaches that represent nodes as point vectors in a low-dimensional continuous space, we embed each node as a Gaussian distribution, allowing us to capture uncertainty about the representation. Furthermore, we propose an unsupervised method that handles inductive learning scenarios and is applicable to different types of graphs: plain/attributed, directed/undirected. By leveraging both the network structure and the associated node attributes, we are able to generalize to unseen nodes without additional training. To learn the embeddings we adopt a personalized ranking formulation w.r.t. the node distances that exploits the natural ordering of the nodes imposed by the network structure. Experiments on real world networks demonstrate the high performance of our approach, outperforming state-of-the-art network embedding methods on several different tasks. Additionally, we demonstrate the benefits of modeling uncertainty - by analyzing it we can estimate neighborhood diversity and detect the intrinsic latent dimensionality of a graph. △ Less

Submitted 27 February, 2018; v1 submitted 12 July, 2017; originally announced July 2017.

Comments: Updated: ICLR 2018 camera-ready version

Journal ref: International Conference on Learning Representations, ICLR 2018

arXiv:1705.02669 [pdf, other]

doi 10.1145/2939672.2939780

Item Recommendation with Continuous Experience Evolution of Users using Brownian Motion

Authors: Subhabrata Mukherjee, Stephan Guennemann, Gerhard Weikum

Abstract: Online review communities are dynamic as users join and leave, adopt new vocabulary, and adapt to evolving trends. Recent work has shown that recommender systems benefit from explicit consideration of user experience. However, prior work assumes a fixed number of discrete experience levels, whereas in reality users gain experience and mature continuously over time. This paper presents a new model… ▽ More Online review communities are dynamic as users join and leave, adopt new vocabulary, and adapt to evolving trends. Recent work has shown that recommender systems benefit from explicit consideration of user experience. However, prior work assumes a fixed number of discrete experience levels, whereas in reality users gain experience and mature continuously over time. This paper presents a new model that captures the continuous evolution of user experience, and the resulting language model in reviews and other posts. Our model is unsupervised and combines principles of Geometric Brownian Motion, Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal progression of user experience and language model respectively. We develop practical algorithms for estimating the model parameters from data and for inference with our model (e.g., to recommend items). Extensive experiments with five real-world datasets show that our model not only fits data better than discrete-model baselines, but also outperforms state-of-the-art methods for predicting item ratings. △ Less

Submitted 9 August, 2017; v1 submitted 7 May, 2017; originally announced May 2017.

arXiv:1602.04650 [pdf, other]

Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques

Authors: Saskia Metzler, Stephan Günnemann, Pauli Miettinen

Abstract: Cliques are frequently used to model communities: a community is a set of nodes where each pair is equally likely to be connected. But studying real-world communities reveals that they have more structure than that. In particular, the nodes can be ordered in such a way that (almost) all edges in the community lie below a hyperbola. In this paper we present three new models for communities that cap… ▽ More Cliques are frequently used to model communities: a community is a set of nodes where each pair is equally likely to be connected. But studying real-world communities reveals that they have more structure than that. In particular, the nodes can be ordered in such a way that (almost) all edges in the community lie below a hyperbola. In this paper we present three new models for communities that capture this phenomenon. Our models explain the structure of the communities differently, but we also prove that they are identical in their expressive power. Our models fit to real-world data much better than traditional block models or previously-proposed hyperbolic models, both of which are a special case of our model. Our models also allow for intuitive interpretation of the parameters, enabling us to summarize the shapes of the communities in graphs effectively. △ Less

Submitted 4 October, 2016; v1 submitted 15 February, 2016; originally announced February 2016.

Comments: 31 pages, 18 figures. This is an extended version of a paper of the same title accepted for publication in the proceedings of the 2016 IEEE International Conference on Data Mining (ICDM). For source code, see http://people.mpi-inf.mpg.de/~pmiettin/hybobo/

arXiv:1511.06030 [pdf, other]

BIRDNEST: Bayesian Inference for Ratings-Fraud Detection

Authors: Bryan Hooi, Neil Shah, Alex Beutel, Stephan Gunnemann, Leman Akoglu, Mohit Kumar, Disha Makhija, Christos Faloutsos

Abstract: Review fraud is a pervasive problem in online commerce, in which fraudulent sellers write or purchase fake reviews to manipulate perception of their products and services. Fake reviews are often detected based on several signs, including 1) they occur in short bursts of time; 2) fraudulent user accounts have skewed rating distributions. However, these may both be true in any given dataset. Hence,… ▽ More Review fraud is a pervasive problem in online commerce, in which fraudulent sellers write or purchase fake reviews to manipulate perception of their products and services. Fake reviews are often detected based on several signs, including 1) they occur in short bursts of time; 2) fraudulent user accounts have skewed rating distributions. However, these may both be true in any given dataset. Hence, in this paper, we propose an approach for detecting fraudulent reviews which combines these 2 approaches in a principled manner, allowing successful detection even when one of these signs is not present. To combine these 2 approaches, we formulate our Bayesian Inference for Rating Data (BIRD) model, a flexible Bayesian model of user rating behavior. Based on our model we formulate a likelihood-based suspiciousness metric, Normalized Expected Surprise Total (NEST). We propose a linear-time algorithm for performing Bayesian inference using our model and computing the metric. Experiments on real data show that BIRDNEST successfully spots review fraud in large, real-world graphs: the 50 most suspicious users of the Flipkart platform flagged by our algorithm were investigated and all identified as fraudulent by domain experts at Flipkart. △ Less

Submitted 7 March, 2016; v1 submitted 18 November, 2015; originally announced November 2015.

Comments: 9 pages; v2: minor typos corrected

arXiv:1510.05544 [pdf, other]

EdgeCentric: Anomaly Detection in Edge-Attributed Networks

Authors: Neil Shah, Alex Beutel, Bryan Hooi, Leman Akoglu, Stephan Gunnemann, Disha Makhija, Mohit Kumar, Christos Faloutsos

Abstract: Given a network with attributed edges, how can we identify anomalous behavior? Networks with edge attributes are commonplace in the real world. For example, edges in e-commerce networks often indicate how users rated products and services in terms of number of stars, and edges in online social and phonecall networks contain temporal information about when friendships were formed and when users com… ▽ More Given a network with attributed edges, how can we identify anomalous behavior? Networks with edge attributes are commonplace in the real world. For example, edges in e-commerce networks often indicate how users rated products and services in terms of number of stars, and edges in online social and phonecall networks contain temporal information about when friendships were formed and when users communicated with each other -- in such cases, edge attributes capture information about how the adjacent nodes interact with other entities in the network. In this paper, we aim to utilize exactly this information to discern suspicious from typical node behavior. Our work has a number of notable contributions, including (a) formulation: while most other graph-based anomaly detection works use structural graph connectivity or node information, we focus on the new problem of leveraging edge information, (b) methodology: we introduce EdgeCentric, an intuitive and scalable compression-based approach for detecting edge-attributed graph anomalies, and (c) practicality: we show that EdgeCentric successfully spots numerous such anomalies in several large, edge-attributed real-world graphs, including the Flipkart e-commerce graph with over 3 million product reviews between 1.1 million users and 545 thousand products, where it achieved 0.87 precision over the top 100 results. △ Less

Submitted 18 November, 2015; v1 submitted 19 October, 2015; originally announced October 2015.

arXiv:1407.3850 [pdf, other]

KDD-SC: Subspace Clustering Extensions for Knowledge Discovery Frameworks

Authors: Stephan Günnemann, Hardy Kremer, Matthias Hannen, Thomas Seidl

Abstract: Analyzing high dimensional data is a challenging task. For these data it is known that traditional clustering algorithms fail to detect meaningful patterns. As a solution, subspace clustering techniques have been introduced. They analyze arbitrary subspace projections of the data to detect clustering structures. In this paper, we present our subspace clustering extension for KDD frameworks, term… ▽ More Analyzing high dimensional data is a challenging task. For these data it is known that traditional clustering algorithms fail to detect meaningful patterns. As a solution, subspace clustering techniques have been introduced. They analyze arbitrary subspace projections of the data to detect clustering structures. In this paper, we present our subspace clustering extension for KDD frameworks, termed KDD-SC. In contrast to existing subspace clustering toolkits, our solution neither is a standalone product nor is it tightly coupled to a specific KDD framework. Our extension is realized by a common codebase and easy-to-use plugins for three of the most popular KDD frameworks, namely KNIME, RapidMiner, and WEKA. KDD-SC extends these frameworks such that they offer a wide range of different subspace clustering functionalities. It provides a multitude of algorithms, data generators, evaluation measures, and visualization techniques specifically designed for subspace clustering. These functionalities integrate seamlessly with the frameworks' existing features such that they can be flexibly combined. KDD-SC is publicly available on our website. △ Less

Submitted 14 July, 2014; originally announced July 2014.

Comments: 8 pages, 8 figures

arXiv:1406.7288 [pdf, other]

Linearized and Single-Pass Belief Propagation

Authors: Wolfgang Gatterbauer, Stephan Günnemann, Danai Koutra, Christos Faloutsos

Abstract: How can we tell when accounts are fake or real in a social network? And how can we tell which accounts belong to liberal, conservative or centrist users? Often, we can answer such questions and label nodes in a network based on the labels of their neighbors and appropriate assumptions of homophily ("birds of a feather flock together") or heterophily ("opposites attract"). One of the most widely us… ▽ More How can we tell when accounts are fake or real in a social network? And how can we tell which accounts belong to liberal, conservative or centrist users? Often, we can answer such questions and label nodes in a network based on the labels of their neighbors and appropriate assumptions of homophily ("birds of a feather flock together") or heterophily ("opposites attract"). One of the most widely used methods for this kind of inference is Belief Propagation (BP) which iteratively propagates the information from a few nodes with explicit labels throughout a network until convergence. One main problem with BP, however, is that there are no known exact guarantees of convergence in graphs with loops. This paper introduces Linearized Belief Propagation (LinBP), a linearization of BP that allows a closed-form solution via intuitive matrix equations and, thus, comes with convergence guarantees. It handles homophily, heterophily, and more general cases that arise in multi-class settings. Plus, it allows a compact implementation in SQL. The paper also introduces Single-pass Belief Propagation (SBP), a "localized" version of LinBP that propagates information across every edge at most once and for which the final class assignments depend only on the nearest labeled neighbors. In addition, SBP allows fast incremental updates in dynamic networks. Our runtime experiments show that LinBP and SBP are orders of magnitude faster than standard △ Less

Submitted 16 October, 2014; v1 submitted 27 June, 2014; originally announced June 2014.

Comments: 17 pages, 11 figures, 4 algorithms. Includes following major changes since v1: renaming of "turbo BP" to "single-pass BP", convergence criteria now give sufficient *and* necessary conditions, more detailed experiments, more detailed comparison with prior BP convergence results, overall improved exposition

Showing 101–143 of 143 results for author: Günnemann, S