Search | arXiv e-print repository

A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis

Authors: Minh H. Vu, Daniel Edler, Carl Wibom, Tommy Löfstedt, Beatrice Melin, Martin Rosvall

Abstract: Advancements in science rely on data sharing. In medicine, where personal data are often involved, synthetic tabular data generated by generative adversarial networks (GANs) offer a promising avenue. However, existing GANs struggle to capture the complexities of real-world tabular data, which often contain a mix of continuous and categorical variables with potential imbalances and dependencies. We… ▽ More Advancements in science rely on data sharing. In medicine, where personal data are often involved, synthetic tabular data generated by generative adversarial networks (GANs) offer a promising avenue. However, existing GANs struggle to capture the complexities of real-world tabular data, which often contain a mix of continuous and categorical variables with potential imbalances and dependencies. We propose a novel correlation- and mean-aware loss function designed to address these challenges as a regularizer for GANs. To ensure a rigorous evaluation, we establish a comprehensive benchmarking framework using ten real-world datasets and eight established tabular GAN baselines. The proposed loss function demonstrates statistically significant improvements over existing methods in capturing the true data distribution, significantly enhancing the quality of synthetic data generated with GANs. The benchmarking framework shows that the enhanced synthetic data quality leads to improved performance in downstream machine learning (ML) tasks, ultimately paving the way for easier data sharing. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: n.a

arXiv:2106.14798 [pdf, other]

doi 10.1093/comnet/cnab044

Map** flows on weighted and directed networks with incomplete observations

Authors: Jelena Smiljanić, Christopher Blöcker, Daniel Edler, Martin Rosvall

Abstract: Detecting significant community structure in networks with incomplete observations is challenging because the evidence for specific solutions fades away with missing data. For example, recent research shows that flow-based community detection methods can highlight spurious communities in sparse undirected and unweighted networks with missing links. Current Bayesian approaches developed to overcome… ▽ More Detecting significant community structure in networks with incomplete observations is challenging because the evidence for specific solutions fades away with missing data. For example, recent research shows that flow-based community detection methods can highlight spurious communities in sparse undirected and unweighted networks with missing links. Current Bayesian approaches developed to overcome this problem do not work for incomplete observations in weighted and directed networks that describe network flows. To address this gap, we extend the idea behind the Bayesian estimate of the map equation for unweighted and undirected networks to enable more robust community detection in weighted and directed networks. We derive a weighted and directed prior network that can incorporate metadata information and show how an efficient implementation in the community-detection method Infomap provides more reliable communities even with a significant fraction of data missing. △ Less

Submitted 13 December, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

Journal ref: J. Complex Netw. 9, cnab044 (2021)

arXiv:1706.04792 [pdf, other]

doi 10.3390/a10040112

Map** higher-order network flows in memory and multilayer networks with Infomap

Authors: Daniel Edler, Ludvig Bohlin, Martin Rosvall

Abstract: Comprehending complex systems by simplifying and highlighting important dynamical patterns requires modeling and map** higher-order network flows. However, complex systems come in many forms and demand a range of representations, including memory and multilayer networks, which in turn call for versatile community-detection algorithms to reveal important modular regularities in the flows. Here we… ▽ More Comprehending complex systems by simplifying and highlighting important dynamical patterns requires modeling and map** higher-order network flows. However, complex systems come in many forms and demand a range of representations, including memory and multilayer networks, which in turn call for versatile community-detection algorithms to reveal important modular regularities in the flows. Here we show that various forms of higher-order network flows can be represented in a unified way with networks that distinguish physical nodes for representing a~complex system's objects from state nodes for describing flows between the objects. Moreover, these so-called sparse memory networks allow the information-theoretic community detection method known as the map equation to identify overlap** and nested flow modules in data from a range of~different higher-order interactions such as multistep, multi-source, and temporal data. We derive the map equation applied to sparse memory networks and describe its search algorithm Infomap, which can exploit the flexibility of sparse memory networks. Together they provide a general solution to reveal overlap** modular patterns in higher-order flows through complex systems. △ Less

Submitted 16 October, 2017; v1 submitted 15 June, 2017; originally announced June 2017.

Comments: 23 pages, 4 figures

Journal ref: Algorithms 2017, 10(4), 112

arXiv:1606.08328 [pdf, other]

Maps of sparse Markov chains efficiently reveal community structure in network flows with memory

Authors: Christian Persson, Ludvig Bohlin, Daniel Edler, Martin Rosvall

Abstract: To better understand the flows of ideas or information through social and biological systems, researchers develop maps that reveal important patterns in network flows. In practice, network flow models have implied memoryless first-order Markov chains, but recently researchers have introduced higher-order Markov chain models with memory to capture patterns in multi-step pathways. Higher-order model… ▽ More To better understand the flows of ideas or information through social and biological systems, researchers develop maps that reveal important patterns in network flows. In practice, network flow models have implied memoryless first-order Markov chains, but recently researchers have introduced higher-order Markov chain models with memory to capture patterns in multi-step pathways. Higher-order models are particularly important for effectively revealing actual, overlap** community structure, but higher-order Markov chain models suffer from the curse of dimensionality: their vast parameter spaces require exponentially increasing data to avoid overfitting and therefore make map** inefficient already for moderate-sized systems. To overcome this problem, we introduce an efficient cross-validated map** approach based on network flows modeled by sparse Markov chains. To illustrate our approach, we present a map of citation flows in science with research fields that overlap in multidisciplinary journals. Compared with currently used categories in science of science studies, the research fields form better units of analysis because the map more effectively captures how ideas flow through science. △ Less

Submitted 27 June, 2016; originally announced June 2016.

Comments: 7 pages, 5 figures, 1 table

arXiv:1603.04225 [pdf, other]

doi 10.1140/epjds/s13688-016-0070-8

Linguistic neighbourhoods: explaining cultural borders on Wikipedia through multilingual co-editing activity

Authors: Anna Samoilenko, Fariba karimi, Daniel Edler, Jérôme Kunegis, Markus Strohmaier

Abstract: In this paper, we study the network of global interconnections between language communities, based on shared co-editing interests of Wikipedia editors, and show that although English is discussed as a potential lingua franca of the digital space, its domination disappears in the network of co-editing similarities, and instead local connections come to the forefront. Out of the hypotheses we explor… ▽ More In this paper, we study the network of global interconnections between language communities, based on shared co-editing interests of Wikipedia editors, and show that although English is discussed as a potential lingua franca of the digital space, its domination disappears in the network of co-editing similarities, and instead local connections come to the forefront. Out of the hypotheses we explored, bilingualism, linguistic similarity of languages, and shared religion provide the best explanations for the similarity of interests between cultural communities. Population attraction and geographical proximity are also significant, but much weaker factors bringing communities together. In addition, we present an approach that allows for extracting significant cultural borders from editing activity of Wikipedia users, and comparing a set of hypotheses about the social mechanisms generating these borders. Our study sheds light on how culture is reflected in the collective process of archiving knowledge on Wikipedia, and demonstrates that cross-lingual interconnections on Wikipedia are not dominated by one powerful language. Our findings also raise some important policy questions for the Wikimedia Foundation. △ Less

Submitted 14 March, 2016; originally announced March 2016.

Comments: 20 pages, 5 figures, 3 tables Best poster award at the NetSciX'16 in Wroclaw, Poland

Journal ref: EPJ Data Science 2016 5(9)

Showing 1–5 of 5 results for author: Edler, D