Search | arXiv e-print repository

Protein language model rescue mutations highlight variant effects and structure in clinically relevant genes

Authors: Onuralp Soylemez, Pablo Cordero

Abstract: Despite being self-supervised, protein language models have shown remarkable performance in fundamental biological tasks such as predicting impact of genetic variation on protein structure and function. The effectiveness of these models on diverse set of tasks suggests that they learn meaningful representations of fitness landscape that can be useful for downstream clinical applications. Here, we… ▽ More Despite being self-supervised, protein language models have shown remarkable performance in fundamental biological tasks such as predicting impact of genetic variation on protein structure and function. The effectiveness of these models on diverse set of tasks suggests that they learn meaningful representations of fitness landscape that can be useful for downstream clinical applications. Here, we interrogate the use of these language models in characterizing known pathogenic mutations in curated, medically actionable genes through an exhaustive search of putative compensatory mutations on each variant's genetic background. Systematic analysis of the predicted effects of these compensatory mutations reveal unappreciated structural features of proteins that are missed by other structure predictors like AlphaFold. While deep mutational scan experiments provide an unbiased estimate of the mutational landscape, we encourage the community to generate and curate rescue mutation experiments to inform the design of more sophisticated co-masking strategies and leverage large language models more effectively for downstream clinical prediction tasks. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: NeurIPS 2022, Workshop on Learning Meaningful Representations of Life

arXiv:1401.5459 [pdf]

doi 10.1261/rna.044321.114

Correcting a SHAPE-directed RNA structure by a mutate-map-rescue approach

Authors: Siqi Tian, Pablo Cordero, Wipapat Kladwang, Rhiju Das

Abstract: The three-dimensional conformations of non-coding RNAs underpin their biochemical functions but have largely eluded experimental characterization. Here, we report that integrating a classic mutation/rescue strategy with high-throughput chemical map** enables rapid RNA structure inference with unusually strong validation. We revisit a paradigmatic 16S rRNA domain for which SHAPE (selective 2`-hyd… ▽ More The three-dimensional conformations of non-coding RNAs underpin their biochemical functions but have largely eluded experimental characterization. Here, we report that integrating a classic mutation/rescue strategy with high-throughput chemical map** enables rapid RNA structure inference with unusually strong validation. We revisit a paradigmatic 16S rRNA domain for which SHAPE (selective 2`-hydroxyl acylation with primer extension) suggested a conformational change between apo- and holo-ribosome conformations. Computational support estimates, data from alternative chemical probes, and mutate-and-map (M2) experiments expose limitations of prior methodology and instead give a near-crystallographic secondary structure. Systematic interrogation of single base pairs via a high-throughput mutation/rescue approach then permits incisive validation and refinement of the M2-based secondary structure and further uncovers the functional conformation as an excited state (25+/-5% population) accessible via a single-nucleotide register shift. These results correct an erroneous SHAPE inference of a ribosomal conformational change and suggest a general mutate-map-rescue approach for dissecting RNA dynamic structure landscapes. △ Less

Submitted 21 January, 2014; originally announced January 2014.

arXiv:1305.3507 [pdf]

doi 10.1093/nar/gkt501

HiTRACE-Web: an online tool for robust analysis of high-throughput capillary electrophoresis

Authors: Hanjoo Kim, Pablo Cordero, Rhiju Das, Sungroh Yoon

Abstract: To facilitate the analysis of large-scale high-throughput capillary electrophoresis data, we previously proposed a suite of efficient analysis software named HiTRACE (High Throughput Robust Analysis of Capillary Electrophoresis). HiTRACE has been used extensively for quantitating data from RNA and DNA structure map** experiments, including mutate-and-map contact inference, chromatin footprinting… ▽ More To facilitate the analysis of large-scale high-throughput capillary electrophoresis data, we previously proposed a suite of efficient analysis software named HiTRACE (High Throughput Robust Analysis of Capillary Electrophoresis). HiTRACE has been used extensively for quantitating data from RNA and DNA structure map** experiments, including mutate-and-map contact inference, chromatin footprinting, the EteRNA RNA design project and other high-throughput applications. However, HiTRACE is based on a suite of command-line MATLAB scripts that requires nontrivial efforts to learn, use, and extend. Here we present HiTRACE-Web, an online version of HiTRACE that includes standard features previously available in the command-line version as well as additional features such as automated band annotation and flexible adjustment of annotations, all via a user-friendly environment. By making use of parallelization, the on-line workflow is also faster than software implementations available to most users on their local computers. Free access: http://hitrace.org △ Less

Submitted 21 May, 2013; v1 submitted 15 May, 2013; originally announced May 2013.

arXiv:1301.7734 [pdf]

A mutate-and-map protocol for inferring base pairs in structured RNA

Authors: Pablo Cordero, Wipapat Kladwang, Christopher C. VanLang, Rhiju Das

Abstract: Chemical map** is a widespread technique for structural analysis of nucleic acids in which a molecule's reactivity to different probes is quantified at single-nucleotide resolution and used to constrain structural modeling. This experimental framework has been extensively revisited in the past decade with new strategies for high-throughput read-outs, chemical modification, and rapid data analysi… ▽ More Chemical map** is a widespread technique for structural analysis of nucleic acids in which a molecule's reactivity to different probes is quantified at single-nucleotide resolution and used to constrain structural modeling. This experimental framework has been extensively revisited in the past decade with new strategies for high-throughput read-outs, chemical modification, and rapid data analysis. Recently, we have coupled the technique to high-throughput mutagenesis. Point mutations of a base-paired nucleotide can lead to exposure of not only that nucleotide but also its interaction partner. Carrying out the mutation and map** for the entire system gives an experimental approximation of the molecules contact map. Here, we give our in-house protocol for this mutate-and-map strategy, based on 96-well capillary electrophoresis, and we provide practical tips on interpreting the data to infer nucleic acid structure. △ Less

Submitted 31 January, 2013; originally announced January 2013.

Comments: 22 pages, 5 figures

arXiv:1207.1312 [pdf]

Quantitative DMS map** for automated RNA secondary structure inference

Authors: Pablo Cordero, Wipapat Kladwang, Christopher C. VanLang, Rhiju Das

Abstract: For decades, dimethyl sulfate (DMS) map** has informed manual modeling of RNA structure in vitro and in vivo. Here, we incorporate DMS data into automated secondary structure inference using a pseudo-energy framework developed for 2'-OH acylation (SHAPE) map**. On six non-coding RNAs with crystallographic models, DMS- guided modeling achieves overall false negative and false discovery rates of… ▽ More For decades, dimethyl sulfate (DMS) map** has informed manual modeling of RNA structure in vitro and in vivo. Here, we incorporate DMS data into automated secondary structure inference using a pseudo-energy framework developed for 2'-OH acylation (SHAPE) map**. On six non-coding RNAs with crystallographic models, DMS- guided modeling achieves overall false negative and false discovery rates of 9.5% and 11.6%, comparable or better than SHAPE-guided modeling; and non-parametric bootstrap** provides straightforward confidence estimates. Integrating DMS/SHAPE data and including CMCT reactivities give small additional improvements. These results establish DMS map** - an already routine technique - as a quantitative tool for unbiased RNA structure modeling. △ Less

Submitted 5 July, 2012; originally announced July 2012.

arXiv:1110.0235 [pdf]

The Stanford RNA Map** Database for sharing and visualizing RNA structure map** experiments

Authors: Pablo Cordero, Julius Lucks, Rhiju Das

Abstract: We have established an RNA Map** Database (RMDB) to enable a new generation of structural, thermodynamic, and kinetic studies from quantitative single-nucleotide-resolution RNA structure map** (freely available at http://rmdb.stanford.edu). Chemical and enzymatic map** is a rapid, robust, and widespread approach to RNA characterization. Since its recent coupling with high-throughput sequenci… ▽ More We have established an RNA Map** Database (RMDB) to enable a new generation of structural, thermodynamic, and kinetic studies from quantitative single-nucleotide-resolution RNA structure map** (freely available at http://rmdb.stanford.edu). Chemical and enzymatic map** is a rapid, robust, and widespread approach to RNA characterization. Since its recent coupling with high-throughput sequencing techniques, accelerated software pipelines, and large-scale mutagenesis, the volume of map** data has greatly increased, and there is a critical need for a database to enable sharing, visualization, and meta-analyses of these data. Through its on-line front-end, the RMDB allows users to explore single-nucleotide-resolution chemical accessibility data in heat-map, bar-graph, and colored secondary structure graphics; to leverage these data to generate secondary structure hypotheses; and to download the data in standardized and computer-friendly files, including the RDAT and community-consensus SNRNASM formats. At the time of writing, the database houses 38 entries, describing 2659 RNA sequences and comprising 355,084 data points, and is growing rapidly. △ Less

Submitted 2 October, 2011; originally announced October 2011.

Comments: 20 pages, 2 figures

arXiv:1104.0979 [pdf]

Two-dimensional chemical map** for non-coding RNAs

Authors: Wipapat Kladwang, Christopher C. VanLang, Pablo Cordero, Rhiju Das

Abstract: Non-coding RNA molecules fold into precise base pairing patterns to carry out critical roles in genetic regulation and protein synthesis. We show here that coupling systematic mutagenesis with high-throughput SHAPE chemical map** enables accurate base pair inference of domains from ribosomal RNA, ribozymes, and riboswitches. For a six-RNA benchmark that challenged prior chemical/computational me… ▽ More Non-coding RNA molecules fold into precise base pairing patterns to carry out critical roles in genetic regulation and protein synthesis. We show here that coupling systematic mutagenesis with high-throughput SHAPE chemical map** enables accurate base pair inference of domains from ribosomal RNA, ribozymes, and riboswitches. For a six-RNA benchmark that challenged prior chemical/computational methods, this mutate-and-map strategy gives secondary structures in agreement with crystallographic data (2 % error rates), including a blind test on a double-glycine riboswitch. Through modeling of partially ordered RNA states, the method enables the first test of an 'interdomain helix-swap' hypothesis for ligand-binding cooperativity in a glycine riboswitch. Finally, the mutate-and-map data report on tertiary contacts within non-coding RNAs; coupled with the Rosetta/FARFAR algorithm, these data give nucleotide-resolution three-dimensional models (5.7 Å helix RMSD) of an adenine riboswitch. These results highlight the promise of a two-dimensional chemical strategy for inferring the secondary and tertiary structures that underlie non-coding RNA behavior. △ Less

Submitted 5 April, 2011; originally announced April 2011.

arXiv:1103.5458 [pdf]

doi 10.1021/bi200524n

Understanding the errors of SHAPE-directed RNA structure modeling

Authors: Wipapat Kladwang, Christopher C. VanLang, Pablo Cordero, Rhiju Das

Abstract: Single-nucleotide-resolution chemical map** for structured RNA is being rapidly advanced by new chemistries, faster readouts, and coupling to computational algorithms. Recent tests have shown that selective 2'-hydroxyl acylation by primer extension (SHAPE) can give near-zero error rates (0-2%) in modeling the helices of RNA secondary structure. Here, we benchmark the method using six molecules f… ▽ More Single-nucleotide-resolution chemical map** for structured RNA is being rapidly advanced by new chemistries, faster readouts, and coupling to computational algorithms. Recent tests have shown that selective 2'-hydroxyl acylation by primer extension (SHAPE) can give near-zero error rates (0-2%) in modeling the helices of RNA secondary structure. Here, we benchmark the method using six molecules for which crystallographic data are available: tRNA(phe) and 5S rRNA from Escherichia coli, the P4-P6 domain of the Tetrahymena group I ribozyme, and ligand-bound domains from riboswitches for adenine, cyclic di-GMP, and glycine. SHAPE-directed modeling of these highly structured RNAs gave an overall false negative rate (FNR) of 17% and a false discovery rate (FDR) of 21%, with at least one helix prediction error in five of the six cases. Extensive variations of data processing, normalization, and modeling parameters did not significantly mitigate modeling errors. Only one varation, filtering out data collected with deoxyinosine triphosphate during primer extension, gave a modest improvement (FNR = 12%, and FDR = 14%). The residual structure modeling errors are explained by the insufficient information content of these RNAs' SHAPE data, as evaluated by a nonparametric bootstrap** analysis. Beyond these benchmark cases, bootstrap** suggests a low level of confidence (<50%) in the majority of helices in a previously proposed SHAPE-directed model for the HIV-1 RNA genome. Thus, SHAPE-directed RNA modeling is not always unambiguous, and helix-by-helix confidence estimates, as described herein, may be critical for interpreting results from this powerful methodology. △ Less

Submitted 7 September, 2011; v1 submitted 28 March, 2011; originally announced March 2011.

Comments: Biochemistry, Article ASAP (Aug. 15, 2011)

Showing 1–8 of 8 results for author: Cordero, P