-
Knowledge from Large-Scale Protein Contact Prediction Models Can Be Transferred to the Data-Scarce RNA Contact Prediction Task
Authors:
Yiren Jian,
Chongyang Gao,
Chen Zeng,
Yunjie Zhao,
Soroush Vosoughi
Abstract:
RNA, whose functionality is largely determined by its structure, plays an important role in many biological activities. The prediction of pairwise structural proximity between each nucleotide of an RNA sequence can characterize the structural information of the RNA. Historically, this problem has been tackled by machine learning models using expert-engineered features and trained on scarce labeled…
▽ More
RNA, whose functionality is largely determined by its structure, plays an important role in many biological activities. The prediction of pairwise structural proximity between each nucleotide of an RNA sequence can characterize the structural information of the RNA. Historically, this problem has been tackled by machine learning models using expert-engineered features and trained on scarce labeled datasets. Here, we find that the knowledge learned by a protein-coevolution Transformer-based deep neural network can be transferred to the RNA contact prediction task. As protein datasets are orders of magnitude larger than those for RNA contact prediction, our findings and the subsequent framework greatly reduce the data scarcity bottleneck. Experiments confirm that RNA contact prediction through transfer learning using a publicly available protein model is greatly improved. Our findings indicate that the learned structural patterns of proteins can be transferred to RNAs, opening up potential new avenues for research.
△ Less
Submitted 18 January, 2024; v1 submitted 13 February, 2023;
originally announced February 2023.
-
From predictions to prescriptions: A data-driven response to COVID-19
Authors:
Dimitris Bertsimas,
Léonard Boussioux,
Ryan Cory Wright,
Arthur Delarue,
Vassilis Digalakis Jr.,
Alexandre Jacquillat,
Driss Lahlou Kitane,
Galit Lukin,
Michael Lingzhi Li,
Luca Mingardi,
Omid Nohadani,
Agni Orfanoudaki,
Theodore Papalexopoulos,
Ivan Paskov,
Jean Pauphilet,
Omar Skali Lami,
Bartolomeo Stellato,
Hamza Tazi Bouardi,
Kimberly Villalobos Carballo,
Holly Wiberg,
Cynthia Zeng
Abstract:
The COVID-19 pandemic has created unprecedented challenges worldwide. Strained healthcare providers make difficult decisions on patient triage, treatment and care management on a daily basis. Policy makers have imposed social distancing measures to slow the disease, at a steep economic price. We design analytical tools to support these decisions and combat the pandemic. Specifically, we propose a…
▽ More
The COVID-19 pandemic has created unprecedented challenges worldwide. Strained healthcare providers make difficult decisions on patient triage, treatment and care management on a daily basis. Policy makers have imposed social distancing measures to slow the disease, at a steep economic price. We design analytical tools to support these decisions and combat the pandemic. Specifically, we propose a comprehensive data-driven approach to understand the clinical characteristics of COVID-19, predict its mortality, forecast its evolution, and ultimately alleviate its impact. By leveraging cohort-level clinical data, patient-level hospital data, and census-level epidemiological data, we develop an integrated four-step approach, combining descriptive, predictive and prescriptive analytics. First, we aggregate hundreds of clinical studies into the most comprehensive database on COVID-19 to paint a new macroscopic picture of the disease. Second, we build personalized calculators to predict the risk of infection and mortality as a function of demographics, symptoms, comorbidities, and lab values. Third, we develop a novel epidemiological model to project the pandemic's spread and inform social distancing policies. Fourth, we propose an optimization model to re-allocate ventilators and alleviate shortages. Our results have been used at the clinical level by several hospitals to triage patients, guide care management, plan ICU capacity, and re-distribute ventilators. At the policy level, they are currently supporting safe back-to-work policies at a major institution and equitable vaccine distribution planning at a major pharmaceutical company, and have been integrated into the US Center for Disease Control's pandemic forecast.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
Direct Information Reweighted by Contact Templates: Improved RNA Contact Prediction by Combining Structural Features
Authors:
Yiren Jian,
Chen Zeng,
Yunjie Zhao
Abstract:
It is acknowledged that co-evolutionary nucleotide-nucleotide interactions are essential for RNA structures and functions. Currently, direct coupling analysis (DCA) infers nucleotide contacts in a sequence from its homologous sequence alignment across different species. DCA and similar approaches that use sequence information alone usually yield a low accuracy, especially when the available homolo…
▽ More
It is acknowledged that co-evolutionary nucleotide-nucleotide interactions are essential for RNA structures and functions. Currently, direct coupling analysis (DCA) infers nucleotide contacts in a sequence from its homologous sequence alignment across different species. DCA and similar approaches that use sequence information alone usually yield a low accuracy, especially when the available homologous sequences are limited. Here we present a new method that incorporates a Restricted Boltzmann Machine (RBM) to augment the information on sequence co-variations with structural patterns in contact inference. We thus name our method DIRECT that stands for Direct Information REweighted by Contact Templates. Benchmark tests demonstrate that DIRECT produces a substantial enhancement of 13% in accuracy on average for contact prediction in comparison to the traditional DCA. These results suggest that DIRECT could be used for improving predictions of RNA tertiary structures and functions. The source codes and dataset of DIRECT are available at http:// http://zhao.phy.ccnu.edu.cn:8122/DIRECT/index.html.
△ Less
Submitted 28 November, 2017;
originally announced November 2017.
-
Contact Mechanics of a Small Icosahedral Virus
Authors:
Cheng Zeng,
Mercedes Hernando-Pérez,
Xiang Ma,
Paul van der Schoot,
Roya Zandi,
Bogdan Dragnea
Abstract:
Virus binding to a surface results at least locally, at the contact area, in stress and potential structural perturbation of the virus cage. Here we address the question of the role of substrate-induced deformation in the overall virus mechanical response to the adsorption event. This question may be especially important for the broad category of viruses that have their shells stabilized by weak,…
▽ More
Virus binding to a surface results at least locally, at the contact area, in stress and potential structural perturbation of the virus cage. Here we address the question of the role of substrate-induced deformation in the overall virus mechanical response to the adsorption event. This question may be especially important for the broad category of viruses that have their shells stabilized by weak, non-covalent interactions. We utilize atomic force microscopy to measure the height change distributions of the brome mosaic virus upon adsorption from liquid on atomically flat substrates and present a continuum model which captures well the behavior. Height data fitting according the model provides, without recourse to indentation, estimates of virus elastic properties and of the interfacial energy.
△ Less
Submitted 26 January, 2017;
originally announced January 2017.
-
Methods for scoring the collective effect of SNPs: Minor alleles of common SNPs quantitatively affect traits/diseases and are under both positive and negative selection
Authors:
Dejian Yuan,
Zuobin Zhu,
Xiaohua Tan,
Jie Liang,
Ceng Zeng,
Jiegen Zhang,
Jun Chen,
Long Ma,
Ayca Dogan,
Gudrun Brockmann,
Oliver Goldmann,
Eva Medina,
Amanda D. Rice,
Richard W. Moyer,
Xian Man,
Ke Yi,
Yanke Li,
Qing Lu,
Yimin Huang,
Dapeng Wang,
Jun Yu,
Hui Guo,
Kun Xia,
Shi Huang
Abstract:
Most common SNPs are popularly assumed to be neutral. We here developed novel methods to examine in animal models and humans whether extreme amount of minor alleles (MAs) carried by an individual may represent extreme trait values and common diseases. We analyzed panels of genetic reference populations and identified the MAs in each panel and the MA content (MAC) that each strain carried. We also…
▽ More
Most common SNPs are popularly assumed to be neutral. We here developed novel methods to examine in animal models and humans whether extreme amount of minor alleles (MAs) carried by an individual may represent extreme trait values and common diseases. We analyzed panels of genetic reference populations and identified the MAs in each panel and the MA content (MAC) that each strain carried. We also analyzed 21 published GWAS datasets of human diseases and identified the MAC of each case or control. MAC was nearly linearly linked to quantitative variations in numerous traits in model organisms, including life span, tumor susceptibility, learning and memory, sensitivity to alcohol and anti-psychotic drugs, and two correlated traits poor reproductive fitness and strong immunity. Similarly, in Europeans or European Americans, enrichment of MAs of fast but not slow evolutionary rate was linked to autoimmune and numerous other diseases, including type 2 diabetes, Parkinson's disease, psychiatric disorders, alcohol and cocaine addictions, cancer, and less life span. Therefore, both high and low MAC correlated with extreme values in many traits, indicating stabilizing selection on most MAs. The methods here are broadly applicable and may help solve the missing heritability problem in complex traits and diseases.
△ Less
Submitted 15 July, 2013; v1 submitted 12 September, 2012;
originally announced September 2012.
-
Identifying Proteins of High Designability via Surface-Exposure Patterns
Authors:
Eldon G. Emberly,
Jonathan Miller,
Chen Zeng,
Ned S. Wingreen,
Chao Tang
Abstract:
Using an off-lattice model, we fully enumerate folded conformations of polypeptide chains of up to N = 19 monomers. Structures are found to differ markedly in designability, defined as the number of sequences with that structure as a unique lowest-energy conformation. We find that designability is closely correlated with the pattern of surface exposure of the folded structure. For longer chains,…
▽ More
Using an off-lattice model, we fully enumerate folded conformations of polypeptide chains of up to N = 19 monomers. Structures are found to differ markedly in designability, defined as the number of sequences with that structure as a unique lowest-energy conformation. We find that designability is closely correlated with the pattern of surface exposure of the folded structure. For longer chains, complete enumeration of structures is impractical. Instead, structures can be randomly sampled, and relative designability estimated either from designability within the random sample, or directly from surface-exposure pattern. We compare the surface-exposure patterns of those structures identified as highly designable to the patterns of naturally occurring proteins.
△ Less
Submitted 10 October, 2001;
originally announced October 2001.
-
Emergence of highly-designable protein-backbone conformations in an off-lattice model
Authors:
J. Miller,
C. Zeng,
N. S. Wingreen,
C. Tang
Abstract:
Despite the variety of protein sizes, shapes, and backbone configurations found in nature, the design of novel protein folds remains an open problem. Within simple lattice models it has been shown that all structures are not equally suitable for design. Rather, certain structures are distinguished by unusually high designability: the number of amino-acid sequences for which they represent the un…
▽ More
Despite the variety of protein sizes, shapes, and backbone configurations found in nature, the design of novel protein folds remains an open problem. Within simple lattice models it has been shown that all structures are not equally suitable for design. Rather, certain structures are distinguished by unusually high designability: the number of amino-acid sequences for which they represent the unique ground state; sequences associated with such structures possess both robustness to mutation and thermodynamic stability. Here we report that highly designable backbone conformations also emerge in a realistic off-lattice model. The highly designable conformations of a chain of 23 amino acids are identified, and found to be remarkably insensitive to model parameters. While some of these conformations correspond closely to known natural protein folds, such as the zinc finger and the helix-turn-helix motifs, others do not resemble known folds and may be candidates for novel fold design.
△ Less
Submitted 17 September, 2001;
originally announced September 2001.