-
Penalty shootouts are tough, but the alternating order is fair
Authors:
Silvan Vollmer,
David Schoch,
Ulrik Brandes
Abstract:
We compare conversion rates of association football (soccer) penalties during regulation or extra time with those during shootouts. Our data consists of roughly 50,000 penalties from the eleven~most recent seasons in European men's football competitions. About one third of the penalties are from more than 1,500 penalty shootouts. We find that shootout conversion rates are significantly lower, and…
▽ More
We compare conversion rates of association football (soccer) penalties during regulation or extra time with those during shootouts. Our data consists of roughly 50,000 penalties from the eleven~most recent seasons in European men's football competitions. About one third of the penalties are from more than 1,500 penalty shootouts. We find that shootout conversion rates are significantly lower, and attribute this to worse performance of shooters rather than better performance of goalkeepers. We also find that, statistically, there is no advantage for either team in the usual alternating shooting order. These main findings are complemented by a number of more detailed analyses.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Tight Sampling in Unbounded Networks
Authors:
Kshitijaa Jaglan,
Meher Chaitanya,
Triansh Sharma,
Abhijeeth Singam,
Nidhi Goyal,
Ponnurangam Kumaraguru,
Ulrik Brandes
Abstract:
The default approach to deal with the enormous size and limited accessibility of many Web and social media networks is to sample one or more subnetworks from a conceptually unbounded unknown network. Clearly, the extracted subnetworks will crucially depend on the sampling scheme. Motivated by studies of homophily and opinion formation, we propose a variant of snowball sampling designed to prioriti…
▽ More
The default approach to deal with the enormous size and limited accessibility of many Web and social media networks is to sample one or more subnetworks from a conceptually unbounded unknown network. Clearly, the extracted subnetworks will crucially depend on the sampling scheme. Motivated by studies of homophily and opinion formation, we propose a variant of snowball sampling designed to prioritize inclusion of entire cohesive communities rather than any kind of representativeness, breadth, or depth of coverage. The method is illustrated on a concrete example, and experiments on synthetic networks suggest that it behaves as desired.
△ Less
Submitted 5 October, 2023; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Stop Simulating! Efficient Computation of Tournament Winning Probabilities
Authors:
Ulrik Brandes,
Gordana Marmulla,
Ivana Smokovic
Abstract:
In the run-up to any major sports tournament, winning probabilities of participants are publicized for engagement and betting purposes. These are generally based on simulating the tournament tens of thousands of times by sampling from single-match outcome models. We show that, by virtue of the tournament schedule, exact computation of winning probabilties can be substantially faster than their app…
▽ More
In the run-up to any major sports tournament, winning probabilities of participants are publicized for engagement and betting purposes. These are generally based on simulating the tournament tens of thousands of times by sampling from single-match outcome models. We show that, by virtue of the tournament schedule, exact computation of winning probabilties can be substantially faster than their approximation through simulation. This notably applies to the 2022 and 2023 FIFA World Cup Finals, and is independent of the model used for individual match outcomes.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Motif-Based Visual Analysis of Dynamic Networks
Authors:
Eren Cakmak,
Johannes Fuchs,
Dominik Jäckle,
Tobias Schreck,
Ulrik Brandes,
Daniel Keim
Abstract:
Many data analysis problems rely on dynamic networks, such as social or communication network analyses. Providing a scalable overview of long sequences of such dynamic networks remains challenging due to the underlying large-scale data containing elusive topological changes. We propose two complementary pixel-based visualizations, which reflect occurrences of selected sub-networks (motifs) and pro…
▽ More
Many data analysis problems rely on dynamic networks, such as social or communication network analyses. Providing a scalable overview of long sequences of such dynamic networks remains challenging due to the underlying large-scale data containing elusive topological changes. We propose two complementary pixel-based visualizations, which reflect occurrences of selected sub-networks (motifs) and provide a time-scalable overview of dynamic networks: a network-level census (motif significance profiles) linked with a node-level sub-network metric (graphlet degree vectors) views to reveal structural changes, trends, states, and outliers. The network census captures significantly occurring motifs compared to their expected occurrences in random networks and exposes structural changes in a dynamic network. The sub-network metrics display the local topological neighborhood of a node in a single network belonging to the dynamic network. The linked pixel-based visualizations allow exploring motifs in different-sized networks to analyze the changing structures within and across dynamic networks, for instance, to visually analyze the shape and rate of changes in the network topology. We describe the identification of visual patterns, also considering different reordering strategies to emphasize visual patterns. We demonstrate the approach's usefulness by a use case analysis based on real-world large-scale dynamic networks, such as the evolving social networks of Reddit or Facebook.
△ Less
Submitted 25 August, 2022;
originally announced August 2022.
-
GraphFramEx: Towards Systematic Evaluation of Explainability Methods for Graph Neural Networks
Authors:
Kenza Amara,
Rex Ying,
Zitao Zhang,
Zhihao Han,
Yinan Shan,
Ulrik Brandes,
Sebastian Schemm,
Ce Zhang
Abstract:
As one of the most popular machine learning models today, graph neural networks (GNNs) have attracted intense interest recently, and so does their explainability. Users are increasingly interested in a better understanding of GNN models and their outcomes. Unfortunately, today's evaluation frameworks for GNN explainability often rely on few inadequate synthetic datasets, leading to conclusions of…
▽ More
As one of the most popular machine learning models today, graph neural networks (GNNs) have attracted intense interest recently, and so does their explainability. Users are increasingly interested in a better understanding of GNN models and their outcomes. Unfortunately, today's evaluation frameworks for GNN explainability often rely on few inadequate synthetic datasets, leading to conclusions of limited scope due to a lack of complexity in the problem instances. As GNN models are deployed to more mission-critical applications, we are in dire need for a common evaluation protocol of explainability methods of GNNs. In this paper, we propose, to our best knowledge, the first systematic evaluation framework for GNN explainability, considering explainability on three different "user needs". We propose a unique metric that combines the fidelity measures and classifies explanations based on their quality of being sufficient or necessary. We scope ourselves to node classification tasks and compare the most representative techniques in the field of input-level explainability for GNNs. For the inadequate but widely used synthetic benchmarks, surprisingly shallow techniques such as personalized PageRank have the best performance for a minimum computation time. But when the graph structure is more complex and nodes have meaningful features, gradient-based methods are the best according to our evaluation criteria. However, none dominates the others on all evaluation dimensions and there is always a trade-off. We further apply our evaluation protocol in a case study for frauds explanation on eBay transaction graphs to reflect the production environment.
△ Less
Submitted 22 May, 2024; v1 submitted 20 June, 2022;
originally announced June 2022.
-
On Dasgupta's hierarchical clustering objective and its relation to other graph parameters
Authors:
Svein Høgemo,
Benjamin Bergougnoux,
Ulrik Brandes,
Christophe Paul,
Jan Arne Telle
Abstract:
The minimum height of vertex and edge partition trees are well-studied graph parameters known as, for instance, vertex and edge ranking number. While they are NP-hard to determine in general, linear-time algorithms exist for trees. Motivated by a correspondence with Dasgupta's objective for hierarchical clustering we consider the total rather than maximum depth of vertices as an alternative object…
▽ More
The minimum height of vertex and edge partition trees are well-studied graph parameters known as, for instance, vertex and edge ranking number. While they are NP-hard to determine in general, linear-time algorithms exist for trees. Motivated by a correspondence with Dasgupta's objective for hierarchical clustering we consider the total rather than maximum depth of vertices as an alternative objective for minimization. For vertex partition trees this leads to a new parameter with a natural interpretation as a measure of robustness against vertex removal.
As tools for the study of this family of parameters we show that they have similar recursive expressions and prove a binary tree rotation lemma. The new parameter is related to trivially perfect graph completion and therefore intractable like the other three are known to be. We give polynomial-time algorithms for both total-depth variants on caterpillars and on trees with a bounded number of leaf neighbors. For general trees, we obtain a 2-approximation algorithm.
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
Stochastic Gradient Descent Works Really Well for Stress Minimization
Authors:
Katharina Börsig,
Ulrik Brandes,
Barna Pasztor
Abstract:
Stress minimization is among the best studied force-directed graph layout methods because it reliably yields high-quality layouts. It thus comes as a surprise that a novel approach based on stochastic gradient descent (Zheng, Pawar and Goodman, TVCG 2019) is claimed to improve on state-of-the-art approaches based on majorization. We present experimental evidence that the new approach does not actu…
▽ More
Stress minimization is among the best studied force-directed graph layout methods because it reliably yields high-quality layouts. It thus comes as a surprise that a novel approach based on stochastic gradient descent (Zheng, Pawar and Goodman, TVCG 2019) is claimed to improve on state-of-the-art approaches based on majorization. We present experimental evidence that the new approach does not actually yield better layouts, but that it is still to be preferred because it is simpler and robust against poor initialization.
△ Less
Submitted 24 August, 2020;
originally announced August 2020.
-
Erratum: Fast and Simple Horizontal Coordinate Assignment
Authors:
Ulrik Brandes,
Julian Walter,
Johannes Zink
Abstract:
We point out two flaws in the algorithm of Brandes and Köpf (Proc. GD 2001), which is often used for the horizontal coordinate assignment in Sugiyama's framework for layered layouts. One of them has been noted and fixed multiple times, the other has not been documented before and requires a non-trivial adaptation. On the bright side, neither running time nor extensions of the algorithm are affecte…
▽ More
We point out two flaws in the algorithm of Brandes and Köpf (Proc. GD 2001), which is often used for the horizontal coordinate assignment in Sugiyama's framework for layered layouts. One of them has been noted and fixed multiple times, the other has not been documented before and requires a non-trivial adaptation. On the bright side, neither running time nor extensions of the algorithm are affected adversely.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
Recent Advances in Scalable Network Generation
Authors:
Manuel Penschuck,
Ulrik Brandes,
Michael Hamann,
Sebastian Lamm,
Ulrich Meyer,
Ilya Safro,
Peter Sanders,
Christian Schulz
Abstract:
Random graph models are frequently used as a controllable and versatile data source for experimental campaigns in various research fields. Generating such data-sets at scale is a non-trivial task as it requires design decisions typically spanning multiple areas of expertise. Challenges begin with the identification of relevant domain-specific network features, continue with the question of how to…
▽ More
Random graph models are frequently used as a controllable and versatile data source for experimental campaigns in various research fields. Generating such data-sets at scale is a non-trivial task as it requires design decisions typically spanning multiple areas of expertise. Challenges begin with the identification of relevant domain-specific network features, continue with the question of how to compile such features into a tractable model, and culminate in algorithmic details arising while implementing the pertaining model.
In the present survey, we explore crucial aspects of random graph models with known scalable generators. We begin by briefly introducing network features considered by such models, and then discuss random graphs alongside with generation algorithms. Our focus lies on modelling techniques and algorithmic primitives that have proven successful in obtaining massive graphs. We consider concepts and graph models for various domains (such as social network, infrastructure, ecology, and numerical simulations), and discuss generators for different models of computation (including shared-memory parallelism, massive-parallel GPUs, and distributed systems).
△ Less
Submitted 2 March, 2020;
originally announced March 2020.
-
A Sparse Stress Model
Authors:
Mark Ortmann,
Mirza Klimenta,
Ulrik Brandes
Abstract:
Force-directed layout methods constitute the most common approach to draw general graphs. Among them, stress minimization produces layouts of comparatively high quality but also imposes comparatively high computational demands. We propose a speed-up method based on the aggregation of terms in the objective function. It is akin to aggregate repulsion from far-away nodes during spring embedding but…
▽ More
Force-directed layout methods constitute the most common approach to draw general graphs. Among them, stress minimization produces layouts of comparatively high quality but also imposes comparatively high computational demands. We propose a speed-up method based on the aggregation of terms in the objective function. It is akin to aggregate repulsion from far-away nodes during spring embedding but transfers the idea from the layout space into a preprocessing phase. An initial experimental study informs a method to select representatives, and subsequent more extensive experiments indicate that our method yields better approximations of minimum-stress layouts in less time than related methods.
△ Less
Submitted 28 November, 2016; v1 submitted 31 August, 2016;
originally announced August 2016.
-
Fast Quasi-Threshold Editing
Authors:
Ulrik Brandes,
Michael Hamann,
Ben Strasser,
Dorothea Wagner
Abstract:
We introduce Quasi-Threshold Mover (QTM), an algorithm to solve the quasi-threshold (also called trivially perfect) graph editing problem with edge insertion and deletion. Given a graph it computes a quasi-threshold graph which is close in terms of edit count. This edit problem is NP-hard. We present an extensive experimental study, in which we show that QTM is the first algorithm that is able to…
▽ More
We introduce Quasi-Threshold Mover (QTM), an algorithm to solve the quasi-threshold (also called trivially perfect) graph editing problem with edge insertion and deletion. Given a graph it computes a quasi-threshold graph which is close in terms of edit count. This edit problem is NP-hard. We present an extensive experimental study, in which we show that QTM is the first algorithm that is able to scale to large real-world graphs in practice. As a side result we further present a simple linear-time algorithm for the quasi-threshold recognition problem.
△ Less
Submitted 28 April, 2015;
originally announced April 2015.
-
Link Prediction with Social Vector Clocks
Authors:
Conrad Lee,
Bobo Nick,
Ulrik Brandes,
Pádraig Cunningham
Abstract:
State-of-the-art link prediction utilizes combinations of complex features derived from network panel data. We here show that computationally less expensive features can achieve the same performance in the common scenario in which the data is available as a sequence of interactions. Our features are based on social vector clocks, an adaptation of the vector-clock concept introduced in distributed…
▽ More
State-of-the-art link prediction utilizes combinations of complex features derived from network panel data. We here show that computationally less expensive features can achieve the same performance in the common scenario in which the data is available as a sequence of interactions. Our features are based on social vector clocks, an adaptation of the vector-clock concept introduced in distributed computing to social interaction networks. In fact, our experiments suggest that by taking into account the order and spacing of interactions, social vector clocks exploit different aspects of link formation so that their combination with previous approaches yields the most accurate predictor to date.
△ Less
Submitted 15 April, 2013;
originally announced April 2013.
-
Network Connection Games with Disconnected Equilibria
Authors:
Ulrik Brandes,
Martin Hoefer,
Bobo Nick
Abstract:
In this paper we extend a popular non-cooperative network creation game (NCG) to allow for disconnected equilibrium networks. There are n players, each is a vertex in a graph, and a strategy is a subset of players to build edges to. For each edge a player must pay a cost α, and the individual cost for a player represents a trade-off between edge costs and shortest path lengths to all other playe…
▽ More
In this paper we extend a popular non-cooperative network creation game (NCG) to allow for disconnected equilibrium networks. There are n players, each is a vertex in a graph, and a strategy is a subset of players to build edges to. For each edge a player must pay a cost α, and the individual cost for a player represents a trade-off between edge costs and shortest path lengths to all other players. We extend the model to a penalized game (PCG), for which we reduce the penalty counted towards the individual cost for a pair of disconnected players to a finite value β. Our analysis concentrates on existence, structure, and cost of disconnected Nash and strong equilibria. Although the PCG is not a potential game, pure Nash equilibria always and pure strong equilibria very often exist. We provide tight conditions under which disconnected Nash (strong) equilibria can evolve. Components of these equilibria must be Nash (strong) equilibria of a smaller NCG. However, in contrast to the NCG, for almost all parameter values no tree is a stable component. Finally, we present a detailed characterization of the price of anarchy that reveals cases in which the price of anarchy is Θ(n) and thus several orders of magnitude larger than in the NCG. Perhaps surprisingly, the strong price of anarchy increases to at most 4. This indicates that global communication and coordination can be extremely valuable to overcome socially inferior topologies in distributed selfish network design.
△ Less
Submitted 27 October, 2008; v1 submitted 28 May, 2008;
originally announced May 2008.
-
Maximizing Modularity is hard
Authors:
U. Brandes,
D. Delling,
M. Gaertler,
R. Goerke,
M. Hoefer,
Z. Nikoloski,
D. Wagner
Abstract:
Several algorithms have been proposed to compute partitions of networks into communities that score high on a graph clustering index called modularity. While publications on these algorithms typically contain experimental evaluations to emphasize the plausibility of results, none of these algorithms has been shown to actually compute optimal partitions. We here settle the unknown complexity stat…
▽ More
Several algorithms have been proposed to compute partitions of networks into communities that score high on a graph clustering index called modularity. While publications on these algorithms typically contain experimental evaluations to emphasize the plausibility of results, none of these algorithms has been shown to actually compute optimal partitions. We here settle the unknown complexity status of modularity maximization by showing that the corresponding decision version is NP-complete in the strong sense. As a consequence, any efficient, i.e. polynomial-time, algorithm is only heuristic and yields suboptimal partitions on many instances.
△ Less
Submitted 30 August, 2006; v1 submitted 25 August, 2006;
originally announced August 2006.