-
A Practical Guide to Statistical Distances for Evaluating Generative Models in Science
Authors:
Sebastian Bischoff,
Alana Darcher,
Michael Deistler,
Richard Gao,
Franziska Gerken,
Manuel Gloeckler,
Lisa Haxel,
Jaivardhan Kapoor,
Janne K Lappalainen,
Jakob H Macke,
Guy Moss,
Matthijs Pals,
Felix Pei,
Rachel Rapp,
A Erdem Sağtekin,
Cornelius Schröder,
Auguste Schulz,
Zinovia Stefanidi,
Shoji Toyota,
Linda Ulmer,
Julius Vetter
Abstract:
Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the samples these models generate? This work aims to provide an accessible entry point to understanding popular notions of statistical distances, requiring only foundati…
▽ More
Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the samples these models generate? This work aims to provide an accessible entry point to understanding popular notions of statistical distances, requiring only foundational knowledge in mathematics and statistics. We focus on four commonly used notions of statistical distances representing different methodologies: Using low-dimensional projections (Sliced-Wasserstein; SW), obtaining a distance using classifiers (Classifier Two-Sample Tests; C2ST), using embeddings through kernels (Maximum Mean Discrepancy; MMD), or neural networks (Fréchet Inception Distance; FID). We highlight the intuition behind each distance and explain their merits, scalability, complexity, and pitfalls. To demonstrate how these distances are used in practice, we evaluate generative models from different scientific domains, namely a model of decision making and a model generating medical images. We showcase that distinct distances can give different results on similar data. Through this guide, we aim to help researchers to use, interpret, and evaluate statistical distances for generative models in science.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Parameter estimation for cellular automata
Authors:
Alexey Kazarnikov,
Nadja Ray,
Heikki Haario,
Joona Lappalainen,
Andreas Rupp
Abstract:
Self organizing complex systems can be modeled using cellular automaton models. However, the parametrization of these models is crucial and significantly determines the resulting structural pattern. In this research, we introduce and successfully apply a sound statistical method to estimate these parameters. The method is based on constructing Gaussian likelihoods using characteristics of the stru…
▽ More
Self organizing complex systems can be modeled using cellular automaton models. However, the parametrization of these models is crucial and significantly determines the resulting structural pattern. In this research, we introduce and successfully apply a sound statistical method to estimate these parameters. The method is based on constructing Gaussian likelihoods using characteristics of the structures such as the mean particle size. We show that our approach is robust with respect to the method parameters, domain size of patterns, or CA iterations.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
Improving Software Developer's Competence: Is the Personal Software Process Working?
Authors:
Pekka Abrahamsson,
Karlheinz Kautz,
Heikki Sieppi,
Jouni Lappalainen
Abstract:
Emerging agile software development methods are people oriented development approaches to be used by the software industry. The personal software process (PSP) is an accepted method for improving the capabilities of a single software engineer. Five original hypotheses regarding the impact of the PSP to individual performance are tested. Data is obtained from 58 computer science students in three u…
▽ More
Emerging agile software development methods are people oriented development approaches to be used by the software industry. The personal software process (PSP) is an accepted method for improving the capabilities of a single software engineer. Five original hypotheses regarding the impact of the PSP to individual performance are tested. Data is obtained from 58 computer science students in three university courses on the master level, which were held in two different educational institutions in Finland and Denmark. Statistical data treatment shows that the use of PSP did not improve size and time estimation skills but that the productivity did not decrease and the resulting product quality was improved. The implications of these findings are briefly addressed.
△ Less
Submitted 1 November, 2013;
originally announced November 2013.
-
Thick disks of edge-on galaxies seen through the Spitzer Survey of Stellar Structure in Galaxies (S4G): Lair of missing baryons?
Authors:
Sébastien Comerón,
Bruce G. Elmegreen,
Johan H. Knapen,
Heikki Salo,
Eija Laurikainen,
Jarkko Laine,
E. Athanassoula,
Albert Bosma,
Kartik Sheth,
Michael W. Regan,
Joannah L. Hinz,
Armando Gil de Paz,
Karín Menéndez-Delmestre,
Trisha Mizusawa,
Juan-Carlos Muñoz-Mateos,
Mark Seibert,
Taehyun Kim,
Debra M. Elmegreen,
Dimitri A. Gadotti,
Luis C. Ho,
Benne W. Holwerda,
Jani Lappalainen,
Eva Schinnerer,
Ramin Skibba
Abstract:
Most, if not all, disk galaxies have a thin (classical) disk and a thick disk. In most models thick disks are thought to be a necessary consequence of the disk formation and/or evolution of the galaxy. We present the results of a study of the thick disk properties in a sample of carefully selected edge-on galaxies with types ranging from T=3 to T=8. We fitted one-dimensional luminosity profiles wi…
▽ More
Most, if not all, disk galaxies have a thin (classical) disk and a thick disk. In most models thick disks are thought to be a necessary consequence of the disk formation and/or evolution of the galaxy. We present the results of a study of the thick disk properties in a sample of carefully selected edge-on galaxies with types ranging from T=3 to T=8. We fitted one-dimensional luminosity profiles with physically motivated functions - the solutions of two stellar and one gaseous isothermal coupled disks in equilibrium - which are likely to yield more accurate results than other functions used in previous studies. The images used for the fits come from the Spitzer Survey of Stellar Structure in Galaxies (S4G). We found that thick disks are on average more massive than previously reported, mostly due to the selected fitting function. Typically, the thin and the thick disk have similar masses. We also found that thick disks do not flare significantly within the observed range in galactocentric radii and that the ratio of thick to thin disk scaleheights is higher for galaxies of earlier types.
Our results tend to favor an in situ origin for most of the stars in the thick disk. In addition the thick disk may contain a significant amount of stars coming from satellites accreted after the initial build-up of the galaxy and an extra fraction of stars coming from the secular heating of the thin disk by its own overdensities.
Assigning thick disk light to the thin disk component may lead to an underestimate of the overall stellar mass in galaxies, because of different mass to light ratios in the two disk components. On the basis of our new results, we estimate that disk stellar masses are between 10% and 50% higher than previously thought and we suggest that thick disks are a reservoir of "local missing baryons".
△ Less
Submitted 17 August, 2011; v1 submitted 30 July, 2011;
originally announced August 2011.