-
A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis
Authors:
Minh H. Vu,
Daniel Edler,
Carl Wibom,
Tommy Löfstedt,
Beatrice Melin,
Martin Rosvall
Abstract:
Advancements in science rely on data sharing. In medicine, where personal data are often involved, synthetic tabular data generated by generative adversarial networks (GANs) offer a promising avenue. However, existing GANs struggle to capture the complexities of real-world tabular data, which often contain a mix of continuous and categorical variables with potential imbalances and dependencies. We…
▽ More
Advancements in science rely on data sharing. In medicine, where personal data are often involved, synthetic tabular data generated by generative adversarial networks (GANs) offer a promising avenue. However, existing GANs struggle to capture the complexities of real-world tabular data, which often contain a mix of continuous and categorical variables with potential imbalances and dependencies. We propose a novel correlation- and mean-aware loss function designed to address these challenges as a regularizer for GANs. To ensure a rigorous evaluation, we establish a comprehensive benchmarking framework using ten real-world datasets and eight established tabular GAN baselines. The proposed loss function demonstrates statistically significant improvements over existing methods in capturing the true data distribution, significantly enhancing the quality of synthetic data generated with GANs. The benchmarking framework shows that the enhanced synthetic data quality leads to improved performance in downstream machine learning (ML) tasks, ultimately paving the way for easier data sharing.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Community Detection with the Map Equation and Infomap: Theory and Applications
Authors:
Jelena Smiljanić,
Christopher Blöcker,
Anton Holmgren,
Daniel Edler,
Magnus Neuman,
Martin Rosvall
Abstract:
Real-world networks have a complex topology comprising many elements often structured into communities. Revealing these communities helps researchers uncover the organizational and functional structure of the system that the network represents. However, detecting community structures in complex networks requires selecting a community detection method among a multitude of alternatives with differen…
▽ More
Real-world networks have a complex topology comprising many elements often structured into communities. Revealing these communities helps researchers uncover the organizational and functional structure of the system that the network represents. However, detecting community structures in complex networks requires selecting a community detection method among a multitude of alternatives with different network representations, community interpretations, and underlying mechanisms. This review and tutorial focuses on a popular community detection method called the map equation and its search algorithm Infomap. The map equation framework for community detection describes communities by analyzing dynamic processes on the network. Thanks to its flexibility, the map equation provides extensions that can incorporate various assumptions about network structure and dynamics. To help decide if the map equation is a suitable community detection method for a given complex system and problem at hand -- and which variant to choose -- we review the map equation's theoretical framework and guide users in applying the map equation to various research problems.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Infomap Bioregions 2 - Exploring the interplay between biogeography and evolution
Authors:
Daniel Edler,
Anton Holmgren,
Alexis Rojas,
Joaquín Calatayud,
Martin Rosvall,
Alexandre Antonelli
Abstract:
Identifying and understanding the large-scale biodiversity patterns in time and space is vital for conservation and addressing fundamental ecological and evolutionary questions. Network-based methods have proven useful for simplifying and highlighting important structures in species distribution data. However, current network-based biogeography approaches cannot exploit the evolutionary informatio…
▽ More
Identifying and understanding the large-scale biodiversity patterns in time and space is vital for conservation and addressing fundamental ecological and evolutionary questions. Network-based methods have proven useful for simplifying and highlighting important structures in species distribution data. However, current network-based biogeography approaches cannot exploit the evolutionary information available in phylogenetic data. We introduce a method for incorporating evolutionary relationships into species occurrence networks to produce more biologically informative and robust bioregions. To keep the bipartite network structure where bioregions are grid cells indirectly connected through shared species, we incorporate the phylogenetic tree by connecting ancestral nodes to the grid cells where their descendant species occur. To incorporate the whole tree without destroying the spatial signal of narrowly distributed species or ancestral nodes, we weigh tree nodes by the geographic information they provide. For a more detailed analysis, we enable integration of the evolutionary relationships at a specific time in the tree. By swee** through the phylogenetic tree in time, our method interpolates between finding bioregions based only on distributional data and finding spatially segregated clades, uncovering evolutionarily distinct bioregions at different time slices. We also introduce a way to segregate the connections between evolutionary branches at a selected time to enable exploration of overlap** evolutionarily distinct regions. We have implemented these methods in Infomap Bioregions, an interactive web application that makes it easy to explore the possibly hierarchical and fuzzy patterns of biodiversity on different scales in time and space.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Map** change in higher-order networks with multilevel and overlap** communities
Authors:
Anton Holmgren,
Daniel Edler,
Martin Rosvall
Abstract:
New network models of complex systems use layers, state nodes, or hyperedges to capture higher-order interactions and dynamics. Simplifying how the higher-order networks change over time or depending on the network model would be easy with alluvial diagrams, which visualize community splits and merges between networks. However, alluvial diagrams were developed for networks with regular nodes assig…
▽ More
New network models of complex systems use layers, state nodes, or hyperedges to capture higher-order interactions and dynamics. Simplifying how the higher-order networks change over time or depending on the network model would be easy with alluvial diagrams, which visualize community splits and merges between networks. However, alluvial diagrams were developed for networks with regular nodes assigned to non-overlap** flat communities. How should they be defined for nodes in layers, state nodes, or hyperedges? How can they depict multilevel, overlap** communities? Here we generalize alluvial diagrams to map change in higher-order networks and provide an interactive tool for anyone to generate alluvial diagrams. We use the alluvial generator to illustrate the effect of modeling network flows with memory in a citation network, distinguishing multidisciplinary from field-specific journals.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
Variable Markov dynamics as a multi-focal lens to map multi-scale complex networks
Authors:
Daniel Edler,
Jelena Smiljanić,
Anton Holmgren,
Alexandre Antonelli,
Martin Rosvall
Abstract:
From traffic flows on road networks to electrical signals in brain networks, many real-world networks contain modular structures of different sizes and densities. In the networks where modular structures emerge due to coupling between nodes with similar dynamical functions, we can identify them using flow-based community detection methods. However, these methods implicitly assume that communities…
▽ More
From traffic flows on road networks to electrical signals in brain networks, many real-world networks contain modular structures of different sizes and densities. In the networks where modular structures emerge due to coupling between nodes with similar dynamical functions, we can identify them using flow-based community detection methods. However, these methods implicitly assume that communities are dense or clique-like which can shatter sparse communities due to a field-of-view limit inherent in one-step dynamics. Taking multiple steps with shorter or longer Markov time enables us to effectively zoom in or out to capture small or long-range communities. However, zooming out to avoid the field-of-view limit comes at the expense of introducing or increasing a lower resolution limit. Here we relax the constant Markov time constraint and introduce variable Markov dynamics as a multi-focal lens to capture functional communities in networks with a higher range of scales. With variable Markov time, a random walker can keep one-step dynamics in dense areas to avoid the resolution limit and move faster in sparse areas to detect long-range modular structures and prevent the field-of-view limit. We analyze the performance of variable Markov time using the flow-based community detection method called the map equation. We have implemented the map equation with variable Markov time in the search algorithm Infomap without any complexity overhead and tested its performance on synthetic and real-world networks from different domains. Results show that it outperforms the standard map equation in networks with constrained structures and locally sparse regions. In addition, the method estimates the optimal Markov time and avoids parameter tuning.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
Anomalous buoyancy of quantum bubbles in immiscible Bose mixtures
Authors:
Daniel Edler,
L. A. Peña Ardila,
Cesar R. Cabrera,
Luis Santos
Abstract:
Buoyancy is a well-known effect in immiscible binary Bose-Einstein condensates. Depending on the differential confinement experienced by the two components, a bubble of one component sitting at the center of the other eventually floats to the surface, around which it spreads either totally or partially. We discuss how quantum fluctuations may significantly change the volume and position of immisci…
▽ More
Buoyancy is a well-known effect in immiscible binary Bose-Einstein condensates. Depending on the differential confinement experienced by the two components, a bubble of one component sitting at the center of the other eventually floats to the surface, around which it spreads either totally or partially. We discuss how quantum fluctuations may significantly change the volume and position of immiscible bubbles. We consider the particular case of two miscible components, forming a pseudo-scalar bubble condensate with enhanced quantum fluctuations (quantum bubble), immersed in a bath provided by a third component, with which they are immiscible. We show that in such a peculiar effective binary mixture, quantum fluctuations change the equilibrium of pressures that define the bubble volume and modify as well the criterion for buoyancy. Once buoyancy sets in, in contrast to the mean-field case, quantum fluctuations may place the bubble at an intermediate position between the center and the surface. At the surface, the quantum bubble may transition into a floating self-bound droplet.
△ Less
Submitted 19 July, 2022; v1 submitted 1 April, 2022;
originally announced April 2022.
-
Map** flows on weighted and directed networks with incomplete observations
Authors:
Jelena Smiljanić,
Christopher Blöcker,
Daniel Edler,
Martin Rosvall
Abstract:
Detecting significant community structure in networks with incomplete observations is challenging because the evidence for specific solutions fades away with missing data. For example, recent research shows that flow-based community detection methods can highlight spurious communities in sparse undirected and unweighted networks with missing links. Current Bayesian approaches developed to overcome…
▽ More
Detecting significant community structure in networks with incomplete observations is challenging because the evidence for specific solutions fades away with missing data. For example, recent research shows that flow-based community detection methods can highlight spurious communities in sparse undirected and unweighted networks with missing links. Current Bayesian approaches developed to overcome this problem do not work for incomplete observations in weighted and directed networks that describe network flows. To address this gap, we extend the idea behind the Bayesian estimate of the map equation for unweighted and undirected networks to enable more robust community detection in weighted and directed networks. We derive a weighted and directed prior network that can incorporate metadata information and show how an efficient implementation in the community-detection method Infomap provides more reliable communities even with a significant fraction of data missing.
△ Less
Submitted 13 December, 2021; v1 submitted 28 June, 2021;
originally announced June 2021.
-
Map** flows on hypergraphs
Authors:
Anton Eriksson,
Daniel Edler,
Alexis Rojas,
Martin Rosvall
Abstract:
Hypergraphs offer an explicit formalism to describe multibody interactions in complex systems. To connect dynamics and function in systems with these higher-order interactions, network scientists have generalised random-walk models to hypergraphs and studied the multibody effects on flow-based centrality measures. But map** the large-scale structure of those flows requires effective community de…
▽ More
Hypergraphs offer an explicit formalism to describe multibody interactions in complex systems. To connect dynamics and function in systems with these higher-order interactions, network scientists have generalised random-walk models to hypergraphs and studied the multibody effects on flow-based centrality measures. But map** the large-scale structure of those flows requires effective community detection methods. We derive unipartite, bipartite, and multilayer network representations of hypergraph flows and explore how they and the underlying random-walk model change the number, size, depth, and overlap of identified multilevel communities. These results help researchers choose the appropriate modelling approach when map** flows on hypergraphs.
△ Less
Submitted 3 January, 2021;
originally announced January 2021.
-
Map** flows on sparse networks with missing links
Authors:
Jelena Smiljanić,
Daniel Edler,
Martin Rosvall
Abstract:
Unreliable network data can cause community-detection methods to overfit and highlight spurious structures with misleading information about the organization and function of complex systems. Here we show how to detect significant flow-based communities in sparse networks with missing links using the map equation. Since the map equation builds on Shannon entropy estimation, it assumes complete data…
▽ More
Unreliable network data can cause community-detection methods to overfit and highlight spurious structures with misleading information about the organization and function of complex systems. Here we show how to detect significant flow-based communities in sparse networks with missing links using the map equation. Since the map equation builds on Shannon entropy estimation, it assumes complete data such that analyzing undersampled networks can lead to overfitting. To overcome this problem, we incorporate a Bayesian approach with assumptions about network uncertainties into the map equation framework. Results in both synthetic and real-world networks show that the Bayesian estimate of the map equation provides a principled approach to revealing significant structures in undersampled networks.
△ Less
Submitted 7 July, 2020; v1 submitted 11 December, 2019;
originally announced December 2019.
-
Map** higher-order network flows in memory and multilayer networks with Infomap
Authors:
Daniel Edler,
Ludvig Bohlin,
Martin Rosvall
Abstract:
Comprehending complex systems by simplifying and highlighting important dynamical patterns requires modeling and map** higher-order network flows. However, complex systems come in many forms and demand a range of representations, including memory and multilayer networks, which in turn call for versatile community-detection algorithms to reveal important modular regularities in the flows. Here we…
▽ More
Comprehending complex systems by simplifying and highlighting important dynamical patterns requires modeling and map** higher-order network flows. However, complex systems come in many forms and demand a range of representations, including memory and multilayer networks, which in turn call for versatile community-detection algorithms to reveal important modular regularities in the flows. Here we show that various forms of higher-order network flows can be represented in a unified way with networks that distinguish physical nodes for representing a~complex system's objects from state nodes for describing flows between the objects. Moreover, these so-called sparse memory networks allow the information-theoretic community detection method known as the map equation to identify overlap** and nested flow modules in data from a range of~different higher-order interactions such as multistep, multi-source, and temporal data. We derive the map equation applied to sparse memory networks and describe its search algorithm Infomap, which can exploit the flexibility of sparse memory networks. Together they provide a general solution to reveal overlap** modular patterns in higher-order flows through complex systems.
△ Less
Submitted 16 October, 2017; v1 submitted 15 June, 2017;
originally announced June 2017.
-
Quantum fluctuations in quasi-one-dimensional dipolar Bose-Einstein condensates
Authors:
Daniel Edler,
Chinmayee Mishra,
Falk Wächtler,
Rejish Nath,
Subhasis Sinha,
Luis Santos
Abstract:
Recent experiments have revealed that beyond-mean-field corrections are much more relevant in weakly-interacting dipolar condensates than in their non-dipolar counterparts. We show that in quasi-one-dimensional geometries quantum corrections in dipolar and non-dipolar condensates are strikingly different due to the peculiar momentum dependence of the dipolar interactions. The energy correction of…
▽ More
Recent experiments have revealed that beyond-mean-field corrections are much more relevant in weakly-interacting dipolar condensates than in their non-dipolar counterparts. We show that in quasi-one-dimensional geometries quantum corrections in dipolar and non-dipolar condensates are strikingly different due to the peculiar momentum dependence of the dipolar interactions. The energy correction of the condensate presents not only a modified density dependence, but it may even change from attractive to repulsive at a critical density due to the surprising role played by the transversal directions. The anomalous quantum correction translates into a strongly modified physics for quantum-stabilized droplets and dipolar solitons. Moreover, and for similar reasons, quantum corrections of three-body correlations, and hence of three-body losses, are strongly modified by the dipolar interactions. This intriguing physics can be readily probed in current experiments with magnetic atoms.
△ Less
Submitted 3 August, 2017; v1 submitted 28 October, 2016;
originally announced October 2016.
-
Maps of sparse Markov chains efficiently reveal community structure in network flows with memory
Authors:
Christian Persson,
Ludvig Bohlin,
Daniel Edler,
Martin Rosvall
Abstract:
To better understand the flows of ideas or information through social and biological systems, researchers develop maps that reveal important patterns in network flows. In practice, network flow models have implied memoryless first-order Markov chains, but recently researchers have introduced higher-order Markov chain models with memory to capture patterns in multi-step pathways. Higher-order model…
▽ More
To better understand the flows of ideas or information through social and biological systems, researchers develop maps that reveal important patterns in network flows. In practice, network flow models have implied memoryless first-order Markov chains, but recently researchers have introduced higher-order Markov chain models with memory to capture patterns in multi-step pathways. Higher-order models are particularly important for effectively revealing actual, overlap** community structure, but higher-order Markov chain models suffer from the curse of dimensionality: their vast parameter spaces require exponentially increasing data to avoid overfitting and therefore make map** inefficient already for moderate-sized systems. To overcome this problem, we introduce an efficient cross-validated map** approach based on network flows modeled by sparse Markov chains. To illustrate our approach, we present a map of citation flows in science with research fields that overlap in multidisciplinary journals. Compared with currently used categories in science of science studies, the research fields form better units of analysis because the map more effectively captures how ideas flow through science.
△ Less
Submitted 27 June, 2016;
originally announced June 2016.
-
Linguistic neighbourhoods: explaining cultural borders on Wikipedia through multilingual co-editing activity
Authors:
Anna Samoilenko,
Fariba karimi,
Daniel Edler,
Jérôme Kunegis,
Markus Strohmaier
Abstract:
In this paper, we study the network of global interconnections between language communities, based on shared co-editing interests of Wikipedia editors, and show that although English is discussed as a potential lingua franca of the digital space, its domination disappears in the network of co-editing similarities, and instead local connections come to the forefront. Out of the hypotheses we explor…
▽ More
In this paper, we study the network of global interconnections between language communities, based on shared co-editing interests of Wikipedia editors, and show that although English is discussed as a potential lingua franca of the digital space, its domination disappears in the network of co-editing similarities, and instead local connections come to the forefront. Out of the hypotheses we explored, bilingualism, linguistic similarity of languages, and shared religion provide the best explanations for the similarity of interests between cultural communities. Population attraction and geographical proximity are also significant, but much weaker factors bringing communities together. In addition, we present an approach that allows for extracting significant cultural borders from editing activity of Wikipedia users, and comparing a set of hypotheses about the social mechanisms generating these borders. Our study sheds light on how culture is reflected in the collective process of archiving knowledge on Wikipedia, and demonstrates that cross-lingual interconnections on Wikipedia are not dominated by one powerful language. Our findings also raise some important policy questions for the Wikimedia Foundation.
△ Less
Submitted 14 March, 2016;
originally announced March 2016.
-
Infomap Bioregions: Interactive map** of biogeographical regions from species distributions
Authors:
Daniel Edler,
Thaís Guedes,
Alexander Zizka,
Martin Rosvall,
Alexandre Antonelli
Abstract:
Biogeographical regions (bioregions) reveal how different sets of species are spatially grouped and therefore are important units for conservation, historical biogeography, ecology and evolution. Several methods have been developed to identify bioregions based on species distribution data rather than expert opinion. One approach successfully applies network theory to simplify and highlight the und…
▽ More
Biogeographical regions (bioregions) reveal how different sets of species are spatially grouped and therefore are important units for conservation, historical biogeography, ecology and evolution. Several methods have been developed to identify bioregions based on species distribution data rather than expert opinion. One approach successfully applies network theory to simplify and highlight the underlying structure in species distributions. However, this method lacks tools for simple and efficient analysis. Here we present Infomap Bioregions, an interactive web application that inputs species distribution data and generates bioregion maps. Species distributions may be provided as georeferenced point occurrences or range maps, and can be of local, regional or global scale. The application uses a novel adaptive resolution method to make best use of often incomplete species distribution data. The results can be downloaded as vector graphics, shapefiles or in table format. We validate the tool by processing large datasets of publicly available species distribution data of the world's amphibians using species ranges, and mammals using point occurrences. We then calculate the fit between the inferred bioregions and WWF ecoregions. As examples of applications, researchers can reconstruct ancestral ranges in historical biogeography or identify indicator species for targeted conservation.
△ Less
Submitted 30 November, 2016; v1 submitted 2 December, 2015;
originally announced December 2015.