-
Bézier interpolation improves the inference of dynamical models from data
Authors:
Kai Shimagaki,
John P. Barton
Abstract:
Many dynamical systems, from quantum many-body systems to evolving populations to financial markets, are described by stochastic processes. Parameters characterizing such processes can often be inferred using information integrated over stochastic paths. However, estimating time-integrated quantities from real data with limited time resolution is challenging. Here, we propose a framework for accur…
▽ More
Many dynamical systems, from quantum many-body systems to evolving populations to financial markets, are described by stochastic processes. Parameters characterizing such processes can often be inferred using information integrated over stochastic paths. However, estimating time-integrated quantities from real data with limited time resolution is challenging. Here, we propose a framework for accurately estimating time-integrated quantities using Bézier interpolation. We applied our approach to two dynamical inference problems: determining fitness parameters for evolving populations and inferring forces driving Ornstein-Uhlenbeck processes. We found that Bézier interpolation reduces the estimation bias for both dynamical inference problems. This improvement was especially noticeable for data sets with limited time resolution. Our method could be broadly applied to improve accuracy for other dynamical inference problems using finitely sampled data.
△ Less
Submitted 6 October, 2022; v1 submitted 22 September, 2022;
originally announced September 2022.
-
Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure
Authors:
Halie M. Rando,
Adam L. MacLean,
Alexandra J. Lee,
Ronan Lordan,
Sandipan Ray,
Vikas Bansal,
Ashwin N. Skelly,
Elizabeth Sell,
John J. Dziak,
Lamonica Shinholster,
Lucy D'Agostino McGowan,
Marouen Ben Guebila,
Nils Wellhausen,
Sergey Knyazev,
Simina M. Boca,
Stephen Capone,
Yanjun Qi,
YoSon Park,
Yuchen Sun,
David Mai,
Joel D. Boerckel,
Christian Brueffer,
James Brian Byrd,
Jeremy P. Kamil,
**hui Wang
, et al. (9 additional authors not shown)
Abstract:
The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the…
▽ More
The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the ways in which the human immune system can respond. Here, we contextualize SARS-CoV-2 among other coronaviruses and identify what is known and what can be inferred about its behavior once inside a human host. Because the genomic content of coronaviruses, which specifies the virus's structure, is highly conserved, early genomic analysis provided a significant head start in predicting viral pathogenesis and in understanding potential differences among variants. The pathogenesis of the virus offers insights into symptomatology, transmission, and individual susceptibility. Additionally, prior research into interactions between the human immune system and coronaviruses has identified how these viruses can evade the immune system's protective mechanisms. We also explore systems-level research into the regulatory and proteomic effects of SARS-CoV-2 infection and the immune response. Understanding the structure and behavior of the virus serves to contextualize the many facets of the COVID-19 pandemic and can influence efforts to control the virus and treat the disease.
△ Less
Submitted 3 December, 2021; v1 submitted 1 February, 2021;
originally announced February 2021.
-
Inference of compressed Potts graphical models
Authors:
Francesca Rizzato,
Alice Coucke,
Eleonora de Leonardis,
J. P. Barton,
Jérôme Tubiana,
Remi Monasson,
Simona Cocco
Abstract:
We consider the problem of inferring a graphical Potts model on a population of variables, with a non-uniform number of Potts colors (symbols) across variables. This inverse Potts problem generally involves the inference of a large number of parameters, often larger than the number of available data, and, hence, requires the introduction of regularization. We study here a double regularization sch…
▽ More
We consider the problem of inferring a graphical Potts model on a population of variables, with a non-uniform number of Potts colors (symbols) across variables. This inverse Potts problem generally involves the inference of a large number of parameters, often larger than the number of available data, and, hence, requires the introduction of regularization. We study here a double regularization scheme, in which the number of colors available to each variable is reduced, and interaction networks are made sparse. To achieve this color compression scheme, only Potts states with large empirical frequency (exceeding some threshold) are explicitly modeled on each site, while the others are grouped into a single state. We benchmark the performances of this mixed regularization approach, with two inference algorithms, the Adaptive Cluster Expansion (ACE) and the PseudoLikelihood Maximization (PLM) on synthetic data obtained by sampling disordered Potts models on an Erdos-Renyi random graphs. We show in particular that color compression does not affect the quality of reconstruction of the parameters corresponding to high-frequency symbols, while drastically reducing the number of the other parameters and thus the computational time. Our procedure is also applied to multi-sequence alignments of protein families, with similar results.
△ Less
Submitted 3 January, 2020; v1 submitted 30 July, 2019;
originally announced July 2019.
-
Identification of drug resistance mutations in HIV from constraints on natural evolution
Authors:
Thomas C. Butler,
John P. Barton,
Mehran Kardar,
Arup K. Chakraborty
Abstract:
Human immunodeficiency virus (HIV) evolves with extraordinary rapidity. However, its evolution is constrained by interactions between mutations in its fitness landscape. Here we show that an Ising model describing these interactions, inferred from sequence data obtained prior to the use of antiretroviral drugs, can be used to identify clinically significant sites of resistance mutations. Successfu…
▽ More
Human immunodeficiency virus (HIV) evolves with extraordinary rapidity. However, its evolution is constrained by interactions between mutations in its fitness landscape. Here we show that an Ising model describing these interactions, inferred from sequence data obtained prior to the use of antiretroviral drugs, can be used to identify clinically significant sites of resistance mutations. Successful predictions of the resistance sites indicate progress in the development of successful models of real viral evolution at the single residue level, and suggest that our approach may be applied to help design new therapies that are less prone to failure even where resistance data is not yet available.
△ Less
Submitted 6 August, 2015;
originally announced August 2015.
-
Remarks on the energy costs of insulators in enzymatic cascades
Authors:
John P. Barton,
Eduardo D. Sontag
Abstract:
The connection between optimal biological function and energy use, measured for example by the rate of metabolite consumption, is a current topic of interest in the systems biology literature which has been explored in several different contexts. In [J. P. Barton and E. D. Sontag, Biophys. J. 104, 6 (2013)], we related the metabolic cost of enzymatic futile cycles with their capacity to act as ins…
▽ More
The connection between optimal biological function and energy use, measured for example by the rate of metabolite consumption, is a current topic of interest in the systems biology literature which has been explored in several different contexts. In [J. P. Barton and E. D. Sontag, Biophys. J. 104, 6 (2013)], we related the metabolic cost of enzymatic futile cycles with their capacity to act as insulators which facilitate modular interconnections in biochemical networks. There we analyzed a simple model system in which a signal molecule regulates the transcription of one or more target proteins by interacting with their promoters. In this note, we consider the case of a protein with an active and an inactive form, and whose activation is controlled by the signal molecule. As in the original case, higher rates of energy consumption are required for better insulator performance.
△ Less
Submitted 27 December, 2014;
originally announced December 2014.
-
Large Pseudo-Counts and $L_2$-Norm Penalties Are Necessary for the Mean-Field Inference of Ising and Potts Models
Authors:
J. P. Barton,
S. Cocco,
E. De Leonardis,
R. Monasson
Abstract:
Mean field (MF) approximation offers a simple, fast way to infer direct interactions between elements in a network of correlated variables, a common, computationally challenging problem with practical applications in fields ranging from physics and biology to the social sciences. However, MF methods achieve their best performance with strong regularization, well beyond Bayesian expectations, an em…
▽ More
Mean field (MF) approximation offers a simple, fast way to infer direct interactions between elements in a network of correlated variables, a common, computationally challenging problem with practical applications in fields ranging from physics and biology to the social sciences. However, MF methods achieve their best performance with strong regularization, well beyond Bayesian expectations, an empirical fact that is poorly understood. In this work, we study the influence of pseudo-count and $L_2$-norm regularization schemes on the quality of inferred Ising or Potts interaction networks from correlation data within the MF approximation. We argue, based on the analysis of small systems, that the optimal value of the regularization strength remains finite even if the sampling noise tends to zero, in order to correct for systematic biases introduced by the MF approximation. Our claim is corroborated by extensive numerical studies of diverse model systems and by the analytical study of the $m$-component spin model, for large but finite $m$. Additionally we find that pseudo-count regularization is robust against sampling noise, and often outperforms $L_2$-norm regularization, particularly when the underlying network of interactions is strongly heterogeneous. Much better performances are generally obtained for the Ising model than for the Potts model, for which only couplings incoming onto medium-frequency symbols are reliably inferred.
△ Less
Submitted 1 May, 2014;
originally announced May 2014.
-
Spin models inferred from patient data faithfully describe HIV fitness landscapes and enable rational vaccine design
Authors:
Karthik Shekhar,
Claire F. Ruberman,
Andrew L. Ferguson,
John P. Barton,
Mehran Kardar,
Arup K. Chakraborty
Abstract:
Mutational escape from vaccine induced immune responses has thwarted the development of a successful vaccine against AIDS, whose causative agent is HIV, a highly mutable virus. Knowing the virus' fitness as a function of its proteomic sequence can enable rational design of potent vaccines, as this information can focus vaccine induced immune responses to target mutational vulnerabilities of the vi…
▽ More
Mutational escape from vaccine induced immune responses has thwarted the development of a successful vaccine against AIDS, whose causative agent is HIV, a highly mutable virus. Knowing the virus' fitness as a function of its proteomic sequence can enable rational design of potent vaccines, as this information can focus vaccine induced immune responses to target mutational vulnerabilities of the virus. Spin models have been proposed as a means to infer intrinsic fitness landscapes of HIV proteins from patient-derived viral protein sequences. These sequences are the product of non-equilibrium viral evolution driven by patient-specific immune responses, and are subject to phylogenetic constraints. How can such sequence data allow inference of intrinsic fitness landscapes? We combined computer simulations and variational theory á la Feynman to show that, in most circumstances, spin models inferred from patient-derived viral sequences reflect the correct rank order of the fitness of mutant viral strains. Our findings are relevant for diverse viruses.
△ Less
Submitted 9 June, 2013;
originally announced June 2013.