-
LGDE: Local Graph-based Dictionary Expansion
Authors:
Dominik J. Schindler,
Sneha Jha,
Xixuan Zhang,
Kilian Buehling,
Annett Heft,
Mauricio Barahona
Abstract:
Expanding a dictionary of pre-selected keywords is crucial for tasks in information retrieval, such as database query and online data collection. Here we propose Local Graph-based Dictionary Expansion (LGDE), a method that uses tools from manifold learning and network science for the data-driven discovery of keywords starting from a seed dictionary. At the heart of LGDE lies the creation of a word…
▽ More
Expanding a dictionary of pre-selected keywords is crucial for tasks in information retrieval, such as database query and online data collection. Here we propose Local Graph-based Dictionary Expansion (LGDE), a method that uses tools from manifold learning and network science for the data-driven discovery of keywords starting from a seed dictionary. At the heart of LGDE lies the creation of a word similarity graph derived from word embeddings and the application of local community detection based on graph diffusion to discover semantic neighbourhoods of pre-defined seed keywords. The diffusion in the local graph manifold allows the exploration of the complex nonlinear geometry of word embeddings and can capture word similarities based on paths of semantic association. We validate our method on a corpus of hate speech-related posts from Reddit and Gab and show that LGDE enriches the list of keywords and achieves significantly better performance than threshold methods based on direct word similarities. We further demonstrate the potential of our method through a real-world use case from communication science, where LGDE is evaluated quantitatively on data collected and analysed by domain experts by expanding a conspiracy-related dictionary.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Persistent Homology of the Multiscale Clustering Filtration
Authors:
Dominik J. Schindler,
Mauricio Barahona
Abstract:
In many applications in data clustering, it is desirable to find not just a single partition into clusters but a sequence of partitions describing the data at different scales, or levels of coarseness. A natural problem then is to analyse and compare the (not necessarily hierarchical) sequences of partitions that underpin such multiscale descriptions of data. Here, we introduce a filtration of abs…
▽ More
In many applications in data clustering, it is desirable to find not just a single partition into clusters but a sequence of partitions describing the data at different scales, or levels of coarseness. A natural problem then is to analyse and compare the (not necessarily hierarchical) sequences of partitions that underpin such multiscale descriptions of data. Here, we introduce a filtration of abstract simplicial complexes, denoted the Multiscale Clustering Filtration (MCF), which encodes arbitrary patterns of cluster assignments across scales, and we prove that the MCF produces stable persistence diagrams. We then show that the zero-dimensional persistent homology of the MCF measures the degree of hierarchy in the sequence of partitions, and that the higher-dimensional persistent homology tracks the emergence and resolution of conflicts between cluster assignments across the sequence of partitions. To broaden the theoretical foundations of the MCF, we also provide an equivalent construction via a nerve complex filtration, and we show that in the hierarchical case, the MCF reduces to a Vietoris-Rips filtration of an ultrametric space. We briefly illustrate how the MCF can serve to characterise multiscale clustering structures in numerical experiments on synthetic data.
△ Less
Submitted 21 September, 2023; v1 submitted 7 May, 2023;
originally announced May 2023.
-
PyGenStability: Multiscale community detection with generalized Markov Stability
Authors:
Alexis Arnaudon,
Dominik J. Schindler,
Robert L. Peach,
Adam Gosztolai,
Maxwell Hodges,
Michael T. Schaub,
Mauricio Barahona
Abstract:
We present PyGenStability, a general-use Python software package that provides a suite of analysis and visualisation tools for unsupervised multiscale community detection in graphs. PyGenStability finds optimized partitions of a graph at different levels of resolution by maximizing the generalized Markov Stability quality function with the Louvain or Leiden algorithms. The package includes automat…
▽ More
We present PyGenStability, a general-use Python software package that provides a suite of analysis and visualisation tools for unsupervised multiscale community detection in graphs. PyGenStability finds optimized partitions of a graph at different levels of resolution by maximizing the generalized Markov Stability quality function with the Louvain or Leiden algorithms. The package includes automatic detection of robust graph partitions and allows the flexibility to choose quality functions for weighted undirected, directed and signed graphs, and to include other user-defined quality functions.
△ Less
Submitted 8 November, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
Community as a Vague Operator: Epistemological Questions for a Critical Heuristics of Community Detection Algorithms
Authors:
Dominik J. Schindler,
Matthew Fuller
Abstract:
In this article, we aim to analyse the nature and epistemic consequences of what figures in network science as patterns of nodes and edges called 'communities'. Tracing these patterns as multi-faceted and ambivalent, we propose to describe the concept of community as a 'vague operator', a variant of Susan Leigh Star's notion of the boundary object, and propose that the ability to construct differe…
▽ More
In this article, we aim to analyse the nature and epistemic consequences of what figures in network science as patterns of nodes and edges called 'communities'. Tracing these patterns as multi-faceted and ambivalent, we propose to describe the concept of community as a 'vague operator', a variant of Susan Leigh Star's notion of the boundary object, and propose that the ability to construct different modes of description that are both vague in some registers and hyper-precise in others, is core both to digital politics and the analysis of 'communities'. Engaging with these formations in terms drawn from mathematics and software studies enables a wider map** of their formation. Disentangling different lineages in network science then allows us to contextualise the founding account of 'community' popularised by Michelle Girvan and Mark Newman in 2002. After studying one particular community detection algorithm, the widely-used 'Louvain algorithm', we comment on controversies arising with some of their more ambiguous applications. We argue that 'community' can act as a real abstraction with the power to reshape social relations such as producing echo chambers in social networking sites. To rework the epistemological terms of community detection and propose a reconsideration of vague operators, we draw on debates and propositions within the literature of network science to imagine a 'critical heuristics' that embraces partiality, epistemic humbleness, reflexivity and artificiality.
△ Less
Submitted 24 May, 2023; v1 submitted 6 October, 2022;
originally announced October 2022.
-
Multiscale mobility patterns and the restriction of human movement
Authors:
Dominik J. Schindler,
Jonathan Clarke,
Mauricio Barahona
Abstract:
From the perspective of human mobility, the COVID-19 pandemic constituted a natural experiment of enormous reach in space and time. Here, we analyse the inherent multiple scales of human mobility using Facebook Movement Maps collected before and during the first UK lockdown. First, we obtain the pre-lockdown UK mobility graph, and employ multiscale community detection to extract, in an unsupervise…
▽ More
From the perspective of human mobility, the COVID-19 pandemic constituted a natural experiment of enormous reach in space and time. Here, we analyse the inherent multiple scales of human mobility using Facebook Movement Maps collected before and during the first UK lockdown. First, we obtain the pre-lockdown UK mobility graph, and employ multiscale community detection to extract, in an unsupervised manner, a set of robust partitions into flow communities at different levels of coarseness. The partitions so obtained capture intrinsic mobility scales with better coverage than NUTS regions, which suffer from mismatches between human mobility and administrative divisions. Furthermore, the flow communities in the fine scale partition match well the UK Travel to Work Areas (TTWAs) but also capture mobility patterns beyond commuting to work. We also examine the evolution of mobility under lockdown, and show that mobility first reverted towards fine scale flow communities already found in the pre-lockdown data, and then expanded back towards coarser flow communities as restrictions were lifted. The improved coverage induced by lockdown is well captured by a linear decay shock model, which allows us to quantify regional differences both in the strength of the effect and the recovery time from the lockdown shock.
△ Less
Submitted 14 August, 2023; v1 submitted 17 January, 2022;
originally announced January 2022.