-
Evolutionary mismatch and the role of GxE interactions in human disease
Authors:
Amanda J. Lea,
Andrew G. Clark,
Andrew W. Dahl,
Orrin Devinsky,
Angela R. Garcia,
Christopher D. Golden,
Joseph Kamau,
Thomas S. Kraft,
Yvonne A. L. Lim,
Dino Martins,
Donald Mogoi,
Paivi Pajukanta,
George Perry,
Herman Pontzer,
Benjamin C. Trumble,
Samuel S. Urlacher,
Vivek V. Venkataraman,
Ian J. Wallace,
Michael Gurven,
Daniel Lieberman,
Julien F. Ayroles
Abstract:
Globally, we are witnessing the rise of complex, non-communicable diseases (NCDs) related to changes in our daily environments. Obesity, asthma, cardiovascular disease, and type 2 diabetes are part of a long list of "lifestyle" diseases that were rare throughout human history but are now common. A key idea from anthropology and evolutionary biology--the evolutionary mismatch hypothesis--seeks to e…
▽ More
Globally, we are witnessing the rise of complex, non-communicable diseases (NCDs) related to changes in our daily environments. Obesity, asthma, cardiovascular disease, and type 2 diabetes are part of a long list of "lifestyle" diseases that were rare throughout human history but are now common. A key idea from anthropology and evolutionary biology--the evolutionary mismatch hypothesis--seeks to explain this phenomenon. It posits that humans evolved in environments that radically differ from the ones experienced by most people today, and thus traits that were advantageous in past environments may now be "mismatched" and disease-causing. This hypothesis is, at its core, a genetic one: it predicts that loci with a history of selection will exhibit "genotype by environment" (GxE) interactions and have differential health effects in ancestral versus modern environments. Here, we discuss how this concept could be leveraged to uncover the genetic architecture of NCDs in a principled way. Specifically, we advocate for partnering with small-scale, subsistence-level groups that are currently transitioning from environments that are arguably more "matched" with their recent evolutionary history to those that are more "mismatched". These populations provide diverse genetic backgrounds as well as the needed levels and types of environmental variation necessary for map** GxE interactions in an explicit mismatch framework. Such work would make important contributions to our understanding of environmental and genetic risk factors for NCDs across diverse ancestries and sociocultural contexts.
△ Less
Submitted 13 February, 2023; v1 submitted 12 January, 2023;
originally announced January 2023.
-
Testing Causality in Scientific Modelling Software
Authors:
Andrew G. Clark,
Michael Foster,
Benedikt Prifling,
Neil Walkinshaw,
Robert M. Hierons,
Volker Schmidt,
Robert D. Turner
Abstract:
From simulating galaxy formation to viral transmission in a pandemic, scientific models play a pivotal role in develo** scientific theories and supporting government policy decisions that affect us all. Given these critical applications, a poor modelling assumption or bug could have far-reaching consequences. However, scientific models possess several properties that make them notoriously diffic…
▽ More
From simulating galaxy formation to viral transmission in a pandemic, scientific models play a pivotal role in develo** scientific theories and supporting government policy decisions that affect us all. Given these critical applications, a poor modelling assumption or bug could have far-reaching consequences. However, scientific models possess several properties that make them notoriously difficult to test, including a complex input space, long execution times, and non-determinism, rendering existing testing techniques impractical. In fields such as epidemiology, where researchers seek answers to challenging causal questions, a statistical methodology known as Causal Inference has addressed similar problems, enabling the inference of causal conclusions from noisy, biased, and sparse data instead of costly experiments. This paper introduces the Causal Testing Framework: a framework that uses Causal Inference techniques to establish causal effects from existing data, enabling users to conduct software testing activities concerning the effect of a change, such as Metamorphic Testing, a posteriori. We present three case studies covering real-world scientific models, demonstrating how the Causal Testing Framework can infer metamorphic test outcomes from reused, confounded test data to provide an efficient solution for testing scientific modelling software.
△ Less
Submitted 30 June, 2023; v1 submitted 1 September, 2022;
originally announced September 2022.
-
An empirical Bayes approach to estimating dynamic models of co-regulated gene expression
Authors:
Sara Venkatraman,
Sumanta Basu,
Andrew G. Clark,
Sofie Delbare,
Myung Hee Lee,
Martin T. Wells
Abstract:
Time-course gene expression datasets provide insight into the dynamics of complex biological processes, such as immune response and organ development. It is of interest to identify genes with similar temporal expression patterns because such genes are often biologically related. However, this task is challenging due to the high dimensionality of these datasets and the nonlinearity of gene expressi…
▽ More
Time-course gene expression datasets provide insight into the dynamics of complex biological processes, such as immune response and organ development. It is of interest to identify genes with similar temporal expression patterns because such genes are often biologically related. However, this task is challenging due to the high dimensionality of these datasets and the nonlinearity of gene expression time dynamics. We propose an empirical Bayes approach to estimating ordinary differential equation (ODE) models of gene expression, from which we derive a similarity metric between genes called the Bayesian lead-lag $R^2$ (LLR2). Importantly, the calculation of the LLR2 leverages biological databases that document known interactions amongst genes; this information is automatically used to define informative prior distributions on the ODE model's parameters. As a result, the LLR2 is a biologically-informed metric that can be used to identify clusters or networks of functionally-related genes with co-moving or time-delayed expression patterns. We then derive data-driven shrinkage parameters from Stein's unbiased risk estimate that optimally balance the ODE model's fit to both data and external biological information. Using real gene expression data, we demonstrate that our methodology allows us to recover interpretable gene clusters and sparse networks. These results reveal new insights about the dynamics of biological systems.
△ Less
Submitted 31 December, 2021;
originally announced December 2021.
-
Test case generation for agent-based models: A systematic literature review
Authors:
Andrew G. Clark,
Neil Walkinshaw,
Robert M. Hierons
Abstract:
Agent-based models play an important role in simulating complex emergent phenomena and supporting critical decisions. In this context, a software fault may result in poorly informed decisions that lead to disastrous consequences. The ability to rigorously test these models is therefore essential. In this systematic literature review, we answer five research questions related to the key aspects of…
▽ More
Agent-based models play an important role in simulating complex emergent phenomena and supporting critical decisions. In this context, a software fault may result in poorly informed decisions that lead to disastrous consequences. The ability to rigorously test these models is therefore essential. In this systematic literature review, we answer five research questions related to the key aspects of test case generation in agent-based models: What are the information artifacts used to generate tests? How are these tests generated? How is a verdict assigned to a generated test? How is the adequacy of a generated test suite measured? What level of abstraction of an agent-based model is targeted by a generated test? Our results show that whilst the majority of techniques are effective for testing functional requirements at the agent and integration levels of abstraction, there are comparatively few techniques capable of testing society-level behaviour. Additionally, we identify a need for more thorough evaluation using realistic case studies that feature challenging properties associated with a typical agent-based model.
△ Less
Submitted 18 March, 2021; v1 submitted 12 March, 2021;
originally announced March 2021.
-
Behavioral individuality reveals genetic control of phenotypic variability
Authors:
Julien F. Ayroles,
Sean M. Buchanan,
Chelsea Jenney,
Kyobi Skutt-Kakaria,
Jennifer Grenier,
Andrew G. Clark,
Daniel L. Hartl,
Benjamin L. de Bivort
Abstract:
Variability is ubiquitous in nature and a fundamental feature of complex systems. Few studies, however, have investigated variance itself as a trait under genetic control. By focusing primarily on trait means and ignoring the effect of alternative alleles on trait variability, we may be missing an important axis of genetic variation contributing to phenotypic differences among individuals. To stud…
▽ More
Variability is ubiquitous in nature and a fundamental feature of complex systems. Few studies, however, have investigated variance itself as a trait under genetic control. By focusing primarily on trait means and ignoring the effect of alternative alleles on trait variability, we may be missing an important axis of genetic variation contributing to phenotypic differences among individuals. To study genetic effects on individual-to-individual phenotypic variability (or intragenotypic variability), we used a panel of Drosophila inbred lines and focused on locomotor handedness, in an assay optimized to measure variability. We discovered that some lines had consistently high levels of intragenotypic variability among individuals while others had low levels. We demonstrate that the degree of variability is itself heritable. Using a genome-wide association study (GWAS) for the degree of intragenotypic variability as the phenotype across lines, we identified several genes expressed in the brain that affect variability in handedness without affecting the mean. One of these genes, Ten-a, implicated a neuropil in the central complex of the fly brain as influencing the magnitude of behavioral variability, a brain region involved in sensory integration and locomotor coordination. We have validated these results using genetic deficiencies, null alleles, and inducible RNAi transgenes. This study reveals the constellation of phenotypes that can arise from a single genotype and it shows that different genetic backgrounds differ dramatically in their propensity for phenotypic variability. Because traditional mean-focused GWASs ignore the contribution of variability to overall phenotypic variation, current methods may miss important links between genotype and phenotype.
△ Less
Submitted 10 September, 2014;
originally announced September 2014.
-
Neutral genomic regions refine models of recent rapid human population growth
Authors:
Elodie Gazave,
Li Ma,
Diana Chang,
Alex Coventry,
Feng Gao,
Donna Muzny,
Eric Boerwinkle,
Richard Gibbs,
Charles F. Sing,
Andrew G. Clark,
Alon Keinan
Abstract:
Human populations have experienced dramatic growth since the Neolithic revolution. Recent studies that sequenced a very large number of individuals observed an extreme excess of rare variants, and provided clear evidence of recent rapid growth in effective population size, though estimates have varied greatly among studies. All these studies were based on protein-coding genes, in which variants ar…
▽ More
Human populations have experienced dramatic growth since the Neolithic revolution. Recent studies that sequenced a very large number of individuals observed an extreme excess of rare variants, and provided clear evidence of recent rapid growth in effective population size, though estimates have varied greatly among studies. All these studies were based on protein-coding genes, in which variants are also impacted by natural selection. In this study, we introduce targeted sequencing data for studying recent human history with minimal confounding by natural selection. We sequenced loci very far from genes that meet a wide array of additional criteria such that mutations in these loci are putatively neutral. As population structure also skews allele frequencies, we sequenced a sample of relatively homogeneous ancestry by first analyzing the population structure of 9,716 European Americans. We employed very high coverage sequencing to reliably call rare variants, and fit an extensive array of models of recent European demographic history to the site frequency spectrum. The best-fit model estimates ~3.4% growth per generation during the last ~140 generations, resulting in a population size increase of two orders of magnitude. This model fits the data very well, largely due to our observation that assumptions of more ancient demography can impact estimates of recent growth. This observation and results also shed light on the discrepancy in demographic estimates among recent studies.
△ Less
Submitted 15 November, 2013; v1 submitted 25 September, 2013;
originally announced September 2013.
-
Characterizing the infection-induced transcriptome of Nasonia vitripennis reveals a preponderance of taxonomically-restricted immune genes
Authors:
Timothy B. Sackton,
John H. Werren,
Andrew G. Clark
Abstract:
The innate immune system in insects consists of a conserved core signaling network and rapidly diversifying effector and recognition components, often containing a high proportion of taxonomically-restricted genes. In the absence of functional annotation, genes encoding immune system proteins can thus be difficult to identify, as homology-based approaches generally cannot detect lineage-specific g…
▽ More
The innate immune system in insects consists of a conserved core signaling network and rapidly diversifying effector and recognition components, often containing a high proportion of taxonomically-restricted genes. In the absence of functional annotation, genes encoding immune system proteins can thus be difficult to identify, as homology-based approaches generally cannot detect lineage-specific genes. Here, we use RNA-seq to compare the uninfected and infection-induced transcriptome in the parasitoid wasp Nasonia vitripennis to identify genes regulated by infection. We identify 183 genes significantly up-regulated by infection and 61 genes significantly down-regulated by infection. We also produce a new homology-based immune catalog in N. vitripennis, and show that most infection-induced genes are not assigned an immune function from homology alone, suggesting the potential for substantial novel immune components in less-well-studied systems. Finally, we show that a high proportion of these novel induced genes are taxonomically-restricted, highlighting the rapid evolution of immune gene content. The combination of functional annotation using RNA-seq and homology-based annotation provides a robust method to characterize the innate immune response across a wide variety of insects, and reveals significant novel features of the Nasonia immune response.
△ Less
Submitted 3 January, 2014; v1 submitted 23 September, 2013;
originally announced September 2013.
-
Distortion of genealogical properties when the sample is very large
Authors:
Anand Bhaskar,
Andrew G. Clark,
Yun S. Song
Abstract:
Study sample sizes in human genetics are growing rapidly, and in due course it will become routine to analyze samples with hundreds of thousands if not millions of individuals. In addition to posing computational challenges, such large sample sizes call for carefully re-examining the theoretical foundation underlying commonly-used analytical tools. Here, we study the accuracy of the coalescent, a…
▽ More
Study sample sizes in human genetics are growing rapidly, and in due course it will become routine to analyze samples with hundreds of thousands if not millions of individuals. In addition to posing computational challenges, such large sample sizes call for carefully re-examining the theoretical foundation underlying commonly-used analytical tools. Here, we study the accuracy of the coalescent, a central model for studying the ancestry of a sample of individuals. The coalescent arises as a limit of a large class of random mating models and it is an accurate approximation to the original model provided that the population size is sufficiently larger than the sample size. We develop a method for performing exact computation in the discrete-time Wright-Fisher (DTWF) model and compare several key genealogical quantities of interest with the coalescent predictions. For realistic demographic scenarios, we find that there are a significant number of multiple- and simultaneous-merger events under the DTWF model, which are absent in the coalescent by construction. Furthermore, for large sample sizes, there are noticeable differences in the expected number of rare variants between the coalescent and the DTWF model. To balance the tradeoff between accuracy and computational efficiency, we propose a hybrid algorithm that utilizes the DTWF model for the recent past and the coalescent for the more distant past. Our results demonstrate that the hybrid method with only a handful of generations of the DTWF model leads to a frequency spectrum that is quite close to the prediction of the full DTWF model.
△ Less
Submitted 1 August, 2013;
originally announced August 2013.
-
Experimental and Physics Prospects at ATLAS and CMS - 2011 and Beyond
Authors:
Allan G. Clark
Abstract:
The ATLAS and CMS experiments have collected data at the CERN Large Hadron Collider (LHC) since December 2009, and with a collision energy \sprts=7 TeV since March 2010. Both detectors work remarkably well at this early stage of operation, and several physics analyses have already been published. It is currently expected that an integrated luminosity of ~1 fb-1 will be collected before a 15 month…
▽ More
The ATLAS and CMS experiments have collected data at the CERN Large Hadron Collider (LHC) since December 2009, and with a collision energy \sprts=7 TeV since March 2010. Both detectors work remarkably well at this early stage of operation, and several physics analyses have already been published. It is currently expected that an integrated luminosity of ~1 fb-1 will be collected before a 15 month shutdown from early 2012, and that up to 10 (100) fb-1 will be collected at a collision energy at or near \sqrts=14 Tev by the end of 2013 (2016). Taking account of the outstanding Tevatron results obtained so far, a perspective is given of key physics measurements using data at the end of 2011 and beyond.
△ Less
Submitted 7 December, 2010; v1 submitted 30 November, 2010;
originally announced November 2010.
-
Search for First-Generation Scalar Leptoquarks in $\bm{p \bar{p}}$ collisions at $\sqrt{s}$=1.96 TeV
Authors:
The CDF Collaboration,
D. Acosta,
J. Adelman,
T. Affolder,
T. Akimoto,
M. G. Albrow,
D. Ambrose,
S. Amerio,
D. Amidei,
A. Anastassov,
K. Anikeev,
A. Annovi,
J. Antos,
M. Aoki,
G. Apollinari,
T. Arisawa,
J-F. Arguin,
A. Artikov,
W. Ashmanskas,
A. Attal,
F. Azfar,
P. Azzi-Bacchetta,
N. Bacchetta,
H. Bachacou,
W. Badgett
, et al. (605 additional authors not shown)
Abstract:
We report on a search for pair production of first-generation scalar leptoquarks ($LQ$) in $p \bar{p}$ collisions at $\sqrt{s}$=1.96 TeV using an integrated luminosity of 203 $pb^{-1}$ collected at the Fermilab Tevatron collider by the CDF experiment. We observe no evidence for $LQ$ production in the topologies arising from $LQ \bar{LQ} \to eqeq$ and $LQ \bar{LQ} \to eq νq$, and derive 95% C.L.…
▽ More
We report on a search for pair production of first-generation scalar leptoquarks ($LQ$) in $p \bar{p}$ collisions at $\sqrt{s}$=1.96 TeV using an integrated luminosity of 203 $pb^{-1}$ collected at the Fermilab Tevatron collider by the CDF experiment. We observe no evidence for $LQ$ production in the topologies arising from $LQ \bar{LQ} \to eqeq$ and $LQ \bar{LQ} \to eq νq$, and derive 95% C.L. upper limits on the $LQ$ production cross section. %as a function of $β$, where $β$ is the branching fraction for $LQ \to eq$. The results are combined with those obtained from a separately reported CDF search in the topology arising from $LQ\bar{LQ} \to νq νq$ and 95% C.L. lower limits on the LQ mass as a function of $β= BR(LQ \to eq) $ are derived. The limits are 236, 205 and 145 GeV/c$^2$ for $β$ = 1, $β$ = 0.5 and $β$ = 0.1, respectively.
△ Less
Submitted 29 June, 2005;
originally announced June 2005.
-
Comparison of Three-jet Events in Proton-Antiproton Collisions at Center-of-mass Energy 1.8 TeV to Predictions from a Next-to-leading Order QCD Calculation
Authors:
D. Acosta,
T. Affolder,
M. G. Albrow,
D. Ambrose,
D. Amidei,
K. Anikeev,
J. Antos,
G. Apollinari,
T. Arisawa,
A. Artikov,
W. Ashmanskas,
F. Azfar,
P. Azzi-Bacchetta,
N. Bacchetta,
H. Bachacou,
W. Badgett,
A. Barbaro-Galtieri,
V. E. Barnes,
B. A. Barnett,
S. Baroiant,
M. Barone,
G. Bauer,
F. Bedeschi,
S. Behari,
S. Belforte
, et al. (388 additional authors not shown)
Abstract:
The properties of three-jet events with total transverse energy greater than 320 GeV and individual jet energy greater than 20 GeV have been analyzed and compared to absolute predictions from a next-to-leading order (NLO) perturbative QCD calculation. These data, of integrated luminosity 86 pb^-1, were recorded by the CDF Experiment for proton-antiproton collisions at sqrt{s}=1.8 TeV. This study…
▽ More
The properties of three-jet events with total transverse energy greater than 320 GeV and individual jet energy greater than 20 GeV have been analyzed and compared to absolute predictions from a next-to-leading order (NLO) perturbative QCD calculation. These data, of integrated luminosity 86 pb^-1, were recorded by the CDF Experiment for proton-antiproton collisions at sqrt{s}=1.8 TeV. This study tests a model of higher order QCD processes that result in gluon emission and can be used to estimate the magnitude of the contribution of processes higher than NLO. The total cross section is measured to be 466 +/- 3(stat.)^{+207}_{-70}(syst.) pb. The differential cross section is furthermore measured for all kinematically accessible regions of the Dalitz plane, including those for which the theoretical prediction is unreliable. While the measured cross section is consistent with the theoretical prediction in magnitude, the two differ somewhat in shape in the Dalitz plane.
△ Less
Submitted 6 October, 2004;
originally announced October 2004.