-
Tests for categorical data beyond Pearson: A distance covariance and energy distance approach
Authors:
Fernando Castro-Prado,
Wenceslao González-Manteiga,
Javier Costas,
Fernando Facal,
Dominic Edelmann
Abstract:
Categorical variables are of uttermost importance in biomedical research. When two of them are considered, it is often the case that one wants to test whether or not they are statistically dependent. We show weaknesses of classical methods -- such as Pearson's and the G-test -- and we propose testing strategies based on distances that lack those drawbacks. We first develop this theory for classica…
▽ More
Categorical variables are of uttermost importance in biomedical research. When two of them are considered, it is often the case that one wants to test whether or not they are statistically dependent. We show weaknesses of classical methods -- such as Pearson's and the G-test -- and we propose testing strategies based on distances that lack those drawbacks. We first develop this theory for classical two-dimensional contingency tables, within the context of distance covariance, an association measure that characterises general statistical independence of two variables. We then apply the same fundamental ideas to one-dimensional tables, namely to the testing for goodness of fit to a discrete distribution, for which we resort to an analogous statistic called energy distance. We prove that our methodology has desirable theoretical properties, and we show how we can calibrate the null distribution of our test statistics without resorting to any resampling technique. We illustrate all this in simulations, as well as with some real data examples, demonstrating the adequate performance of our approach for biostatistical practice.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Testing for genetic interactions in complex disease with distance correlation
Authors:
Fernando Castro-Prado,
Javier Costas,
Dominic Edelmann,
Wenceslao González-Manteiga,
David R. Penas
Abstract:
Understanding epistasis (genetic interaction) may shed some light on the genomic basis of common diseases, including disorders of maximum interest due to their high socioeconomic burden, like schizophrenia. Distance correlation is an association measure that characterises general statistical independence between random variables, not only the linear one. Here, we propose distance correlation as a…
▽ More
Understanding epistasis (genetic interaction) may shed some light on the genomic basis of common diseases, including disorders of maximum interest due to their high socioeconomic burden, like schizophrenia. Distance correlation is an association measure that characterises general statistical independence between random variables, not only the linear one. Here, we propose distance correlation as a novel tool for the detection of epistasis from case-control data of single-nucleotide polymorphisms (SNPs). On the methodological side, we highlight the derivation of the explicit asymptotic distribution of the test statistic. We show that this is the only way to obtain enough computational speed for the method to be used in practice, in a scenario where the resampling techniques found in the literature are impractical. Our simulations show satisfactory calibration of significance, as well as comparable or better power than existing methodology. We conclude with the application of our technique to a schizophrenia genetics dataset, obtaining biologically sound insights.
△ Less
Submitted 27 April, 2023; v1 submitted 9 December, 2020;
originally announced December 2020.
-
An Unpublished Manuscript of John von Neumann on Shock Waves in Boostered Detonations
Authors:
Molly Riley Knoedler,
Julianna C. Costas,
Caroline Mary Hogan,
Harper Kerkhoff,
Chad M. Topaz
Abstract:
We report on an unpublished and previously unknown manuscript of John von Neumann and contextualize it within the development of the theory of shock waves and detonations during the nineteenth and twentieth centuries. Von Neumann studies bombs comprising a primary explosive charge along with explosive booster material. His goal is to calculate the minimal amount of booster needed to create a susta…
▽ More
We report on an unpublished and previously unknown manuscript of John von Neumann and contextualize it within the development of the theory of shock waves and detonations during the nineteenth and twentieth centuries. Von Neumann studies bombs comprising a primary explosive charge along with explosive booster material. His goal is to calculate the minimal amount of booster needed to create a sustainable detonation, presumably because booster material is often more expensive and more volatile. In service of this goal, he formulates and analyzes a partial differential equation based model describing a moving shock wave at the interface of detonated and undetonated material. We provide a complete transcription of von Neumann's work and give our own accompanying explanations and analyses, including the correction of two small errors in his calculations. Today, detonations are typically modeled using a combination of experimental results and numerical simulations particular to the shape and materials of the explosive, as the complex three dimensional dynamics of detonations are analytically intractable. Although von Neumann's manuscript will not revolutionize our modern understanding of detonations, the document is a valuable historical record of the state of hydrodynamics research during and after World War II.
△ Less
Submitted 7 April, 2020;
originally announced April 2020.