Search | arXiv e-print repository

Chromatin remodeling due to transient-link-and-pass activity enhances subnuclear dynamics

Authors: Rakesh Das, Takahiro Sakaue, G. V. Shivashankar, Jacques Prost, Tetsuya Hiraiwa

Abstract: Spatiotemporal coordination of chromatin and subnuclear compartments is crucial for cells. Numerous enzymes act inside nucleus\textemdash some of those transiently link and pass two chromatin segments. Here we study how such an active perturbation affects fluctuating dynamics of an inclusion in the chromatic medium. Using numerical simulations and a versatile effective model, we categorize inclusi… ▽ More Spatiotemporal coordination of chromatin and subnuclear compartments is crucial for cells. Numerous enzymes act inside nucleus\textemdash some of those transiently link and pass two chromatin segments. Here we study how such an active perturbation affects fluctuating dynamics of an inclusion in the chromatic medium. Using numerical simulations and a versatile effective model, we categorize inclusion dynamics into three distinct modes. The transient-link-and-pass activity speeds up inclusion dynamics by affecting a slow mode related to chromatin remodeling, viz., size and shape of the chromatin meshes. △ Less

Submitted 17 January, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

arXiv:2304.04673 [pdf]

Regional Deep Atrophy: a Self-Supervised Learning Method to Automatically Identify Regions Associated With Alzheimer's Disease Progression From Longitudinal MRI

Authors: Meng** Dong, Long Xie, Sandhitsu R. Das, Jiancong Wang, Laura E. M. Wisse, Robin deFlores, David A. Wolk, Paul A. Yushkevich

Abstract: Longitudinal assessment of brain atrophy, particularly in the hippocampus, is a well-studied biomarker for neurodegenerative diseases, such as Alzheimer's disease (AD). In clinical trials, estimation of brain progressive rates can be applied to track therapeutic efficacy of disease modifying treatments. However, most state-of-the-art measurements calculate changes directly by segmentation and/or d… ▽ More Longitudinal assessment of brain atrophy, particularly in the hippocampus, is a well-studied biomarker for neurodegenerative diseases, such as Alzheimer's disease (AD). In clinical trials, estimation of brain progressive rates can be applied to track therapeutic efficacy of disease modifying treatments. However, most state-of-the-art measurements calculate changes directly by segmentation and/or deformable registration of MRI images, and may misreport head motion or MRI artifacts as neurodegeneration, impacting their accuracy. In our previous study, we developed a deep learning method DeepAtrophy that uses a convolutional neural network to quantify differences between longitudinal MRI scan pairs that are associated with time. DeepAtrophy has high accuracy in inferring temporal information from longitudinal MRI scans, such as temporal order or relative inter-scan interval. DeepAtrophy also provides an overall atrophy score that was shown to perform well as a potential biomarker of disease progression and treatment efficacy. However, DeepAtrophy is not interpretable, and it is unclear what changes in the MRI contribute to progression measurements. In this paper, we propose Regional Deep Atrophy (RDA), which combines the temporal inference approach from DeepAtrophy with a deformable registration neural network and attention mechanism that highlights regions in the MRI image where longitudinal changes are contributing to temporal inference. RDA has similar prediction accuracy as DeepAtrophy, but its additional interpretability makes it more acceptable for use in clinical settings, and may lead to more sensitive biomarkers for disease monitoring in clinical trials of early AD. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: Submitted to NeuroImage for review

arXiv:2202.04806 [pdf, other]

doi 10.1063/5.0087815

Mechanical feedback controls the emergence of dynamical memory in growing tissue monolayers

Authors: Sumit Sinha, Xin Li, Rajsekhar Das, D. Thirumalai

Abstract: The growth of a tissue, which depends on cell-cell interactions and biologically relevant process such as cell division and apoptosis, is regulated by a mechanical feedback mechanism. We account for these effects in a minimal two-dimensional model in order to investigate the consequences of mechanical feedback, which is controlled by a critical pressure, $p_c$. A cell can only grow and divide if t… ▽ More The growth of a tissue, which depends on cell-cell interactions and biologically relevant process such as cell division and apoptosis, is regulated by a mechanical feedback mechanism. We account for these effects in a minimal two-dimensional model in order to investigate the consequences of mechanical feedback, which is controlled by a critical pressure, $p_c$. A cell can only grow and divide if the pressure it experiences, due to interaction with its neighbors, is less than $p_c$. Because temperature is an irrelevant variable in the model, the cell dynamics is driven by self-generated active forces (SGAFs) that are created by cell division. It is shown that even in the absence of intercellular interactions, cells undergo diffusive behavior. The SGAF driven diffusion is indistinguishable from the well-known dynamics of a free Brownian particle at a fixed finite temperature. When intercellular interactions are taken into account, we find persistent temporal correlations in the force-force autocorrelation function ($FAF$) that extends over timescale of several cell division times. The time-dependence of the $FAF$ reveals memory effects, which increases as pc increases. The observed non-Markovian effects emerge due to the interplay of cell division and mechanical feedback, and is inherently a non-equilibrium phenomenon. △ Less

Submitted 9 February, 2022; originally announced February 2022.

Comments: 6 pages, 4 figures

arXiv:2112.10460 [pdf, other]

How enzymatic activity is involved in chromatin organization

Authors: Rakesh Das, Takahiro Sakaue, G. V. Shivashankar, Jacques Prost, Tetsuya Hiraiwa

Abstract: Spatial organization of chromatin plays a critical role in genome regulation. Various types of affinity mediators and enzymes have been attributed to regulate spatial organization of chromatin from a thermodynamics perspective. However, at the mechanistic level, enzymes act in their unique ways. Here, we construct a polymer physics model following the mechanistic scheme of Topoisomerase-II, an enz… ▽ More Spatial organization of chromatin plays a critical role in genome regulation. Various types of affinity mediators and enzymes have been attributed to regulate spatial organization of chromatin from a thermodynamics perspective. However, at the mechanistic level, enzymes act in their unique ways. Here, we construct a polymer physics model following the mechanistic scheme of Topoisomerase-II, an enzyme resolving topological constraints of chromatin, and investigate its role on interphase chromatin organization. Our computer simulations demonstrate Topoisomerase-II's ability to phase separate chromatin into eu- and heterochromatic regions with a characteristic wall-like organization of the euchromatic regions. Exploiting a mean-field framework, we argue that the ability of the euchromatic regions crossing each other due to enzymatic activity of Topoisomerase-II induces this phase separation. Motivated from a recent experimental observation on different structural states of the eu- and the heterochromatic units, we further extend our model to a bidisperse setting and show that the characteristic features of the enzymatic activity driven phase separation survives there. The existence of these characteristic features, even under the non-localized action of the enzyme, highlights the critical role of enzymatic activity in chromatin organization, and points out the importance of further experiments along this line. △ Less

Submitted 20 December, 2021; originally announced December 2021.

Comments: 4 figures, 3 supplementary figures

arXiv:2111.05728 [pdf, other]

Diversity of symptom phenotypes in SARS-CoV-2 community infections observed in multiple large datasets

Authors: Martyn Fyles, Karina-Doris Vihta, Carole H Sudre, Harry Long, Rajenki Das, Caroline Jay, Tom Wingfield, Fergus Cumming, William Green, Pantelis Hadjipantelis, Joni Kirk, Claire J Steves, Sebastien Ourselin, Graham F Medley, Elizabeth Fearon, Thomas House

Abstract: Through the use of cutting-edge unsupervised classification techniques from statistics and machine learning, we characterise symptom phenotypes among symptomatic SARS-CoV-2 PCR-positive community cases. We first analyse each dataset in isolation and across age bands, before using methods that allow us to compare multiple datasets. While we observe separation due to the total number of symptoms exp… ▽ More Through the use of cutting-edge unsupervised classification techniques from statistics and machine learning, we characterise symptom phenotypes among symptomatic SARS-CoV-2 PCR-positive community cases. We first analyse each dataset in isolation and across age bands, before using methods that allow us to compare multiple datasets. While we observe separation due to the total number of symptoms experienced by cases, we also see a separation of symptoms into gastrointestinal, respiratory and other types, and different symptom co-occurrence patterns at the extremes of age. In this way, we are able to demonstrate the deep structure of symptoms of COVID-19 without usual biases due to study design. This is expected to have implications for the identification and management of community SARS-CoV-2 cases and could be further applied to symptom-based management of other diseases and syndromes. △ Less

Submitted 20 November, 2023; v1 submitted 10 November, 2021; originally announced November 2021.

Comments: 60 pages; 29 figures

MSC Class: 62P10

arXiv:2110.07531 [pdf]

Deep learning models for predicting RNA degradation via dual crowdsourcing

Authors: Hannah K. Wayment-Steele, Wipapat Kladwang, Andrew M. Watkins, Do Soon Kim, Bojan Tunguz, Walter Reade, Maggie Demkin, Jonathan Romano, Roger Wellington-Oguri, John J. Nicol, Jiayang Gao, Kazuki Onodera, Kazuki Fujikawa, Hanfei Mao, Gilles Vandewiele, Michele Tinti, Bram Steenwinckel, Takuya Ito, Taiga Noumi, Shujun He, Keiichiro Ishi, Youhan Lee, Fatih Öztürk, Anthony Chiu, Emin Öztürk , et al. (4 additional authors not shown)

Abstract: Messenger RNA-based medicines hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a ke… ▽ More Messenger RNA-based medicines hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition ("Stanford OpenVaccine") on Kaggle, involving single-nucleotide resolution measurements on 6043 102-130-nucleotide diverse RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6 months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504-1588 nucleotides) with improved accuracy compared to previously published models. Top teams integrated natural language processing architectures and data augmentation techniques with predictions from previous dynamic programming models for RNA secondary structure. These results indicate that such models are capable of representing in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for data set creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales. △ Less

Submitted 22 April, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

arXiv:2109.03900 [pdf, other]

doi 10.1371/journal.pcbi.1009853

Machine learning modeling of family wide enzyme-substrate specificity screens

Authors: Samuel Goldman, Ria Das, Kevin K. Yang, Connor W. Coley

Abstract: Biocatalysis is a promising approach to sustainably synthesize pharmaceuticals, complex natural products, and commodity chemicals at scale. However, the adoption of biocatalysis is limited by our ability to select enzymes that will catalyze their natural chemical transformation on non-natural substrates. While machine learning and in silico directed evolution are well-posed for this predictive mod… ▽ More Biocatalysis is a promising approach to sustainably synthesize pharmaceuticals, complex natural products, and commodity chemicals at scale. However, the adoption of biocatalysis is limited by our ability to select enzymes that will catalyze their natural chemical transformation on non-natural substrates. While machine learning and in silico directed evolution are well-posed for this predictive modeling challenge, efforts to date have primarily aimed to increase activity against a single known substrate, rather than to identify enzymes capable of acting on new substrates of interest. To address this need, we curate 6 different high-quality enzyme family screens from the literature that each measure multiple enzymes against multiple substrates. We compare machine learning-based compound-protein interaction (CPI) modeling approaches from the literature used for predicting drug-target interactions. Surprisingly, comparing these interaction-based models against collections of independent (single task) enzyme-only or substrate-only models reveals that current CPI approaches are incapable of learning interactions between compounds and proteins in the current family level data regime. We further validate this observation by demonstrating that our no-interaction baseline can outperform CPI-based models from the literature used to guide the discovery of kinase inhibitors. Given the high performance of non-interaction based models, we introduce a new structure-based strategy for pooling residue representations across a protein sequence. Altogether, this work motivates a principled path forward in order to build and evaluate meaningful predictive models for biocatalysis and other drug discovery applications. △ Less

Submitted 8 September, 2021; originally announced September 2021.

arXiv:2108.06610 [pdf, other]

SquiggleFilter: An Accelerator for Portable Virus Detection

Authors: Tim Dunn, Harisankar Sadasivan, Jack Wadden, Kush Goliya, Kuan-Yu Chen, Reetuparna Das, David Blaauw, Satish Narayanasamy

Abstract: The MinION is a recent-to-market handheld nanopore sequencer. It can be used to determine the whole genome of a target virus in a biological sample. Its Read Until feature allows us to skip sequencing a majority of non-target reads (DNA/RNA fragments), which constitutes more than 99% of all reads in a typical sample. However, it does not have any on-board computing, which significantly limits its… ▽ More The MinION is a recent-to-market handheld nanopore sequencer. It can be used to determine the whole genome of a target virus in a biological sample. Its Read Until feature allows us to skip sequencing a majority of non-target reads (DNA/RNA fragments), which constitutes more than 99% of all reads in a typical sample. However, it does not have any on-board computing, which significantly limits its portability. We analyze the performance of a Read Until metagenomic pipeline for detecting target viruses and identifying strain-specific mutations. We find new sources of performance bottlenecks (basecaller in classification of a read) that are not addressed by past genomics accelerators. We present SquiggleFilter, a novel hardware accelerated dynamic time war** (DTW) based filter that directly analyzes MinION's raw squiggles and filters everything except target viral reads, thereby avoiding the expensive basecalling step. We show that our 14.3W 13.25mm2 accelerator has 274X greater throughput and 3481X lower latency than existing GPU-based solutions while consuming half the power, enabling Read Until for the next generation of nanopore sequencers. △ Less

Submitted 23 September, 2021; v1 submitted 14 August, 2021; originally announced August 2021.

Comments: https://micro2021ae.hotcrp.com/paper/12?cap=012aOJj-0U08_9o

arXiv:2010.12948 [pdf]

DeepAtrophy: Teaching a Neural Network to Differentiate Progressive Changes from Noise on Longitudinal MRI in Alzheimer's Disease

Authors: Meng** Dong, Long Xie, Sandhitsu R. Das, Jiancong Wang, Laura E. M. Wisse, Robin deFlores, David A. Wolk, Paul Yushkevich

Abstract: Volume change measures derived from longitudinal MRI (e.g. hippocampal atrophy) are a well-studied biomarker of disease progression in Alzheimer's Disease (AD) and are used in clinical trials to track the therapeutic efficacy of disease-modifying treatments. However, longitudinal MRI change measures can be confounded by non-biological factors, such as different degrees of head motion and susceptib… ▽ More Volume change measures derived from longitudinal MRI (e.g. hippocampal atrophy) are a well-studied biomarker of disease progression in Alzheimer's Disease (AD) and are used in clinical trials to track the therapeutic efficacy of disease-modifying treatments. However, longitudinal MRI change measures can be confounded by non-biological factors, such as different degrees of head motion and susceptibility artifact between pairs of MRI scans. We hypothesize that deep learning methods applied directly to pairs of longitudinal MRI scans can be trained to differentiate between biological changes and non-biological factors better than conventional approaches based on deformable image registration. To achieve this, we make a simplifying assumption that biological factors are associated with time (i.e. the hippocampus shrinks overtime in the aging population) whereas non-biological factors are independent of time. We then formulate deep learning networks to infer the temporal order of same-subject MRI scans input to the network in arbitrary order; as well as to infer ratios between interscan intervals for two pairs of same-subject MRI scans. In the test dataset, these networks perform better in tasks of temporal ordering (89.3%) and interscan interval inference (86.1%) than a state-of-the-art deformation-based morphometry method ALOHA (76.6% and 76.1% respectively) (Das et al., 2012). Furthermore, we derive a disease progression score from the network that is able to detect a group difference between 58 preclinical AD and 75 beta-amyloid-negative cognitively normal individuals within one year, compared to two years for ALOHA. This suggests that deep learning can be trained to differentiate MRI changes due to biological factors (tissue loss) from changes due to non-biological factors, leading to novel biomarkers that are more sensitive to longitudinal changes at the earliest stages of AD. △ Less

Submitted 24 October, 2020; originally announced October 2020.

Comments: Submitted to a journal, IF ~ 6

arXiv:2005.04937 [pdf, other]

doi 10.1016/j.idm.2020.06.008

Using statistics and mathematical modelling to understand infectious disease outbreaks: COVID-19 as an example

Authors: Christopher E. Overton, Helena B. Stage, Shazaad Ahmad, Jacob Curran-Sebastian, Paul Dark, Rajenki Das, Elizabeth Fearon, Timothy Felton, Martyn Fyles, Nick Gent, Ian Hall, Thomas House, Hugo Lewkowicz, Xiaoxi Pang, Lorenzo Pellis, Robert Sawko, Andrew Ustianowski, Bindu Vekaria, Luke Webb

Abstract: During an infectious disease outbreak, biases in the data and complexities of the underlying dynamics pose significant challenges in mathematically modelling the outbreak and designing policy. Motivated by the ongoing response to COVID-19, we provide a toolkit of statistical and mathematical models beyond the simple SIR-type differential equation models for analysing the early stages of an outbrea… ▽ More During an infectious disease outbreak, biases in the data and complexities of the underlying dynamics pose significant challenges in mathematically modelling the outbreak and designing policy. Motivated by the ongoing response to COVID-19, we provide a toolkit of statistical and mathematical models beyond the simple SIR-type differential equation models for analysing the early stages of an outbreak and assessing interventions. In particular, we focus on parameter estimation in the presence of known biases in the data, and the effect of non-pharmaceutical interventions in enclosed subpopulations, such as households and care homes. We illustrate these methods by applying them to the COVID-19 pandemic. △ Less

Submitted 11 May, 2020; originally announced May 2020.

Journal ref: Infectious Disease Modelling, Volume 5 (2020), 409-441

arXiv:2004.00117 [pdf, other]

Challenges in control of Covid-19: short doubling time and long delay to effect of interventions

Authors: Lorenzo Pellis, Francesca Scarabel, Helena B. Stage, Christopher E. Overton, Lauren H. K. Chappell, Katrina A. Lythgoe, Elizabeth Fearon, Emma Bennett, Jacob Curran-Sebastian, Rajenki Das, Martyn Fyles, Hugo Lewkowicz, Xiaoxi Pang, Bindu Vekaria, Luke Webb, Thomas House, Ian Hall

Abstract: Early assessments of the spreading rate of COVID-19 were subject to significant uncertainty, as expected with limited data and difficulties in case ascertainment, but more reliable inferences can now be made. Here, we estimate from European data that COVID-19 cases are expected to double initially every three days, until social distancing interventions slow this growth, and that the impact of such… ▽ More Early assessments of the spreading rate of COVID-19 were subject to significant uncertainty, as expected with limited data and difficulties in case ascertainment, but more reliable inferences can now be made. Here, we estimate from European data that COVID-19 cases are expected to double initially every three days, until social distancing interventions slow this growth, and that the impact of such measures is typically only seen nine days - i.e. three doubling times - after their implementation. We argue that such temporal patterns are more critical than precise estimates of the basic reproduction number for initiating interventions. This observation has particular implications for the low- and middle-income countries currently in the early stages of their local epidemics. △ Less

Submitted 31 March, 2020; originally announced April 2020.

Comments: Main text: 13 pages (1-13), 3 figures, 1 table; Supplementary Information: 9 pages (14-22), 4 figures, 1 table

arXiv:1805.01260 [pdf, other]

White Matter Network Architecture Guides Direct Electrical Stimulation Through Optimal State Transitions

Authors: Jennifer Stiso, Ankit N. Khambhati, Tommaso Menara, Ari E. Kahn, Joel M. Stein, Sandihitsu R. Das, Richard Gorniak, Joseph Tracy, Brian Litt, Kathryn A. Davis, Fabio Pasqualetti, Timothy Lucas, Danielle S. Bassett

Abstract: Electrical brain stimulation is currently being investigated as a therapy for neurological disease. However, opportunities to optimize such therapies are challenged by the fact that the beneficial impact of focal stimulation on both neighboring and distant regions is not well understood. Here, we use network control theory to build a model of brain network function that makes predictions about how… ▽ More Electrical brain stimulation is currently being investigated as a therapy for neurological disease. However, opportunities to optimize such therapies are challenged by the fact that the beneficial impact of focal stimulation on both neighboring and distant regions is not well understood. Here, we use network control theory to build a model of brain network function that makes predictions about how stimulation spreads through the brain's white matter network and influences large-scale dynamics. We test these predictions using combined electrocorticography (ECoG) and diffusion weighted imaging (DWI) data who volunteered to participate in an extensive stimulation regimen. We posit a specific model-based manner in which white matter tracts constrain stimulation, defining its capacity to drive the brain to new states, including states associated with successful memory encoding. In a first validation of our model, we find that the true pattern of white matter tracts can be used to more accurately predict the state transitions induced by direct electrical stimulation than the artificial patterns of null models. We then use a targeted optimal control framework to solve for the optimal energy required to drive the brain to a given state. We show that, intuitively, our model predicts larger energy requirements when starting from states that are farther away from a target memory state. We then suggest testable hypotheses about which structural properties will lead to efficient stimulation for improving memory based on energy requirements. Our work demonstrates that individual white matter architecture plays a vital role in guiding the dynamics of direct electrical stimulation, more generally offering empirical support for the utility of network control theoretic models of brain response to stimulation. △ Less

Submitted 3 May, 2018; originally announced May 2018.

arXiv:1803.03146 [pdf]

SentRNA: Improving computational RNA design by incorporating a prior of human design strategies

Authors: Jade Shi, Rhiju Das, Vijay S. Pande

Abstract: Solving the RNA inverse folding problem is a critical prerequisite to RNA design, an emerging field in bioengineering with a broad range of applications from reaction catalysis to cancer therapy. Although significant progress has been made in develo** machine-based inverse RNA folding algorithms, current approaches still have difficulty designing sequences for large or complex targets. On the ot… ▽ More Solving the RNA inverse folding problem is a critical prerequisite to RNA design, an emerging field in bioengineering with a broad range of applications from reaction catalysis to cancer therapy. Although significant progress has been made in develo** machine-based inverse RNA folding algorithms, current approaches still have difficulty designing sequences for large or complex targets. On the other hand, human players of the online RNA design game EteRNA have consistently shown superior performance in this regard, being able to readily design sequences for targets that are challenging for machine algorithms. Here we present a novel approach to the RNA design problem, SentRNA, a design agent consisting of a fully-connected neural network trained end-to-end using human-designed RNA sequences. We show that through this approach, SentRNA can solve complex targets previously unsolvable by any machine-based approach and achieve state-of-the-art performance on two separate challenging test sets. Our results demonstrate that incorporating human design strategies into a design algorithm can significantly boost machine performance and suggests a new paradigm for machine-based RNA design. △ Less

Submitted 5 March, 2019; v1 submitted 8 March, 2018; originally announced March 2018.

Comments: 27 pages (not including Supplementary Information), 9 figures, 7 tables

arXiv:1608.02038 [pdf]

Responding to an enquiry concerning the geographic population structure (GPS) approach and the origin of Ashkenazic Jews - a reply to Flegontov et al

Authors: Ranajit Das, Paul Wexler, Mehdi Pirooznia, Eran Elhaik

Abstract: Recently, we investigated the geographical origins of Ashkenazic Jews (AJs) and their native language Yiddish by applying a biogeographical tool, the Geographic Population Structure (GPS), to a cohort of 367 exclusively Yiddish-speaking and multilingual AJs genotyped on the Genochip microarray. GPS localized most AJs along major ancient trade routes in northeastern Turkey adjacent to primeval vill… ▽ More Recently, we investigated the geographical origins of Ashkenazic Jews (AJs) and their native language Yiddish by applying a biogeographical tool, the Geographic Population Structure (GPS), to a cohort of 367 exclusively Yiddish-speaking and multilingual AJs genotyped on the Genochip microarray. GPS localized most AJs along major ancient trade routes in northeastern Turkey adjacent to primeval villages with names that may be derived from the word "Ashkenaz." These findings were compatible with the hypothesis of an Irano-Turko-Slavic origin for AJs and a Slavic origin for Yiddish and at odds with the Rhineland hypothesis advocating a German origin of both. Our approach has been recently adopted by Flegontov et al. (2016a) to trace the origin of the Siberian Ket people and their language. Recently, Flegontov et al. (2016b) have raised several questions concerning the accuracy of the Genochip microarray and GPS, specifically in relation to AJs and Yiddish. Although many of these issues have been addressed in our previous papers, we take this opportunity to clarify the principles of the GPS approach, review the recent biogeographical and ancient DNA findings regarding AJs, and comment on the origin of Yiddish. △ Less

Submitted 17 August, 2016; v1 submitted 5 August, 2016; originally announced August 2016.

Comments: 32 pages, 2 figures, 2 tables

arXiv:1401.5459 [pdf]

doi 10.1261/rna.044321.114

Correcting a SHAPE-directed RNA structure by a mutate-map-rescue approach

Authors: Siqi Tian, Pablo Cordero, Wipapat Kladwang, Rhiju Das

Abstract: The three-dimensional conformations of non-coding RNAs underpin their biochemical functions but have largely eluded experimental characterization. Here, we report that integrating a classic mutation/rescue strategy with high-throughput chemical map** enables rapid RNA structure inference with unusually strong validation. We revisit a paradigmatic 16S rRNA domain for which SHAPE (selective 2`-hyd… ▽ More The three-dimensional conformations of non-coding RNAs underpin their biochemical functions but have largely eluded experimental characterization. Here, we report that integrating a classic mutation/rescue strategy with high-throughput chemical map** enables rapid RNA structure inference with unusually strong validation. We revisit a paradigmatic 16S rRNA domain for which SHAPE (selective 2`-hydroxyl acylation with primer extension) suggested a conformational change between apo- and holo-ribosome conformations. Computational support estimates, data from alternative chemical probes, and mutate-and-map (M2) experiments expose limitations of prior methodology and instead give a near-crystallographic secondary structure. Systematic interrogation of single base pairs via a high-throughput mutation/rescue approach then permits incisive validation and refinement of the M2-based secondary structure and further uncovers the functional conformation as an excited state (25+/-5% population) accessible via a single-nucleotide register shift. These results correct an erroneous SHAPE inference of a ribosomal conformational change and suggest a general mutate-map-rescue approach for dissecting RNA dynamic structure landscapes. △ Less

Submitted 21 January, 2014; originally announced January 2014.

arXiv:1305.3507 [pdf]

doi 10.1093/nar/gkt501

HiTRACE-Web: an online tool for robust analysis of high-throughput capillary electrophoresis

Authors: Hanjoo Kim, Pablo Cordero, Rhiju Das, Sungroh Yoon

Abstract: To facilitate the analysis of large-scale high-throughput capillary electrophoresis data, we previously proposed a suite of efficient analysis software named HiTRACE (High Throughput Robust Analysis of Capillary Electrophoresis). HiTRACE has been used extensively for quantitating data from RNA and DNA structure map** experiments, including mutate-and-map contact inference, chromatin footprinting… ▽ More To facilitate the analysis of large-scale high-throughput capillary electrophoresis data, we previously proposed a suite of efficient analysis software named HiTRACE (High Throughput Robust Analysis of Capillary Electrophoresis). HiTRACE has been used extensively for quantitating data from RNA and DNA structure map** experiments, including mutate-and-map contact inference, chromatin footprinting, the EteRNA RNA design project and other high-throughput applications. However, HiTRACE is based on a suite of command-line MATLAB scripts that requires nontrivial efforts to learn, use, and extend. Here we present HiTRACE-Web, an online version of HiTRACE that includes standard features previously available in the command-line version as well as additional features such as automated band annotation and flexible adjustment of annotations, all via a user-friendly environment. By making use of parallelization, the on-line workflow is also faster than software implementations available to most users on their local computers. Free access: http://hitrace.org △ Less

Submitted 21 May, 2013; v1 submitted 15 May, 2013; originally announced May 2013.

arXiv:1304.1072 [pdf]

Massively Parallel RNA Chemical Map** with a Reduced Bias MAP-seq Protocol

Authors: Matthew G. Seetin, Wipapat Kladwang, J. P. Bida, Rhiju Das

Abstract: Chemical map** methods probe RNA structure by revealing and leveraging correlations of a nucleotide's structural accessibility or flexibility with its reactivity to various chemical probes. Pioneering work by Lucks and colleagues has expanded this method to probe hundreds of molecules at once on an Illumina sequencing platform, obviating the use of slab gels or capillary electrophoresis on one m… ▽ More Chemical map** methods probe RNA structure by revealing and leveraging correlations of a nucleotide's structural accessibility or flexibility with its reactivity to various chemical probes. Pioneering work by Lucks and colleagues has expanded this method to probe hundreds of molecules at once on an Illumina sequencing platform, obviating the use of slab gels or capillary electrophoresis on one molecule at a time. Here, we describe optimizations to this method from our lab, resulting in the MAP-seq protocol (Multiplexed Accessibility Probing read out through sequencing), version 1.0. The protocol permits the quantitative probing of thousands of RNAs at once, by several chemical modification reagents, on the time scale of a day using a table-top Illumina machine. This method and a software package MAPseeker (http://simtk.org/home/map_seeker) address several potential sources of bias, by eliminating PCR steps, improving ligation efficiencies of ssDNA adapters, and avoiding problematic heuristics in prior algorithms. We hope that the step-by-step description of MAP-seq 1.0 will help other RNA map** laboratories to transition from electrophoretic to next-generation sequencing methods and to further reduce the turnaround time and any remaining biases of the protocol. △ Less

Submitted 3 April, 2013; originally announced April 2013.

Comments: 22 pages, 5 figures

arXiv:1302.0029 [pdf]

doi 10.1371/journal.pone.0063906

Serverification of Molecular Modeling Applications: the Rosetta Online Server that Includes Everyone (ROSIE)

Authors: Sergey Lyskov, Fang-Chieh Chou, Shane Ó Conchúir, Bryan S. Der, Kevin Drew, Daisuke Kuroda, Jianqing Xu, Brian D. Weitzner, P. Douglas Renfrew, Parin Sripakdeevong, Benjamin Borgo, James J. Havranek, Brian Kuhlman, Tanja Kortemme, Richard Bonneau, Jeffrey J. Gray, Rhiju Das

Abstract: The Rosetta molecular modeling software package provides experimentally tested and rapidly evolving tools for the 3D structure prediction and high-resolution design of proteins, nucleic acids, and a growing number of non-natural polymers. Despite its free availability to academic users and improving documentation, use of Rosetta has largely remained confined to developers and their immediate colla… ▽ More The Rosetta molecular modeling software package provides experimentally tested and rapidly evolving tools for the 3D structure prediction and high-resolution design of proteins, nucleic acids, and a growing number of non-natural polymers. Despite its free availability to academic users and improving documentation, use of Rosetta has largely remained confined to developers and their immediate collaborators due to the code's difficulty of use, the requirement for large computational resources, and the unavailability of servers for most of the Rosetta applications. Here, we present a unified web framework for Rosetta applications called ROSIE (Rosetta Online Server that Includes Everyone). ROSIE provides (a) a common user interface for Rosetta protocols, (b) a stable application programming interface for developers to add additional protocols, (c) a flexible back-end to allow leveraging of computer cluster resources shared by RosettaCommons member institutions, and (d) centralized administration by the RosettaCommons to ensure continuous maintenance. This paper describes the ROSIE server infrastructure, a step-by-step 'serverification' protocol for use by Rosetta developers, and the deployment of the first nine ROSIE applications by six separate developer teams: Docking, RNA de novo, ERRASER, Antibody, Sequence Tolerance, Supercharge, Beta peptide design, NCBB design, and VIP redesign. As illustrated by the number and diversity of these applications, ROSIE offers a general and speedy paradigm for serverification of Rosetta applications that incurs negligible cost to developers and lowers barriers to Rosetta use for the broader biological community. ROSIE is available at http://rosie.rosettacommons.org. △ Less

Submitted 31 January, 2013; originally announced February 2013.

arXiv:1301.7734 [pdf]

A mutate-and-map protocol for inferring base pairs in structured RNA

Authors: Pablo Cordero, Wipapat Kladwang, Christopher C. VanLang, Rhiju Das

Abstract: Chemical map** is a widespread technique for structural analysis of nucleic acids in which a molecule's reactivity to different probes is quantified at single-nucleotide resolution and used to constrain structural modeling. This experimental framework has been extensively revisited in the past decade with new strategies for high-throughput read-outs, chemical modification, and rapid data analysi… ▽ More Chemical map** is a widespread technique for structural analysis of nucleic acids in which a molecule's reactivity to different probes is quantified at single-nucleotide resolution and used to constrain structural modeling. This experimental framework has been extensively revisited in the past decade with new strategies for high-throughput read-outs, chemical modification, and rapid data analysis. Recently, we have coupled the technique to high-throughput mutagenesis. Point mutations of a base-paired nucleotide can lead to exposure of not only that nucleotide but also its interaction partner. Carrying out the mutation and map** for the entire system gives an experimental approximation of the molecules contact map. Here, we give our in-house protocol for this mutate-and-map strategy, based on 96-well capillary electrophoresis, and we provide practical tips on interpreting the data to infer nucleic acid structure. △ Less

Submitted 31 January, 2013; originally announced January 2013.

Comments: 22 pages, 5 figures

arXiv:1208.2680 [pdf]

doi 10.1371/journal.pone.0074830

Atomic-accuracy prediction of protein loop structures through an RNA-inspired ansatz

Authors: Rhiju Das

Abstract: Consistently predicting biopolymer structure at atomic resolution from sequence alone remains a difficult problem, even for small sub-segments of large proteins. Such loop prediction challenges, which arise frequently in comparative modeling and protein design, can become intractable as loop lengths exceed 10 residues and if surrounding side-chain conformations are erased. This article introduces… ▽ More Consistently predicting biopolymer structure at atomic resolution from sequence alone remains a difficult problem, even for small sub-segments of large proteins. Such loop prediction challenges, which arise frequently in comparative modeling and protein design, can become intractable as loop lengths exceed 10 residues and if surrounding side-chain conformations are erased. This article introduces a modeling strategy based on a 'stepwise ansatz', recently developed for RNA modeling, which posits that any realistic all-atom molecular conformation can be built up by residue-by-residue stepwise enumeration. When harnessed to a dynamic-programming-like recursion in the Rosetta framework, the resulting stepwise assembly (SWA) protocol enables enumerative sampling of a 12 residue loop at a significant but achievable cost of thousands of CPU-hours. In a previously established benchmark, SWA recovers crystallographic conformations with sub-Angstrom accuracy for 19 of 20 loops, compared to 14 of 20 by KIC modeling with a comparable expenditure of computational power. Furthermore, SWA gives high accuracy results on an additional set of 15 loops highlighted in the biological literature for their irregularity or unusual length. Successes include cis-Pro touch turns, loops that pass through tunnels of other side-chains, and loops of lengths up to 24 residues. Remaining problem cases are traced to inaccuracies in the Rosetta all-atom energy function. In five additional blind tests, SWA achieves sub-Angstrom accuracy models, including the first such success in a protein/RNA binding interface, the YbxF/kink-turn interaction in the fourth RNA-puzzle competition. These results establish all-atom enumeration as a systematic approach to protein structure that can leverage high performance computing and physically realistic energy functions to more consistently achieve atomic resolution. △ Less

Submitted 24 May, 2013; v1 submitted 2 August, 2012; originally announced August 2012.

Comments: Identity of four-loop blind test protein and parts of figures 5 have been omitted in this preprint to ensure confidentiality of the protein structure prior to its public release

arXiv:1207.1312 [pdf]

Quantitative DMS map** for automated RNA secondary structure inference

Authors: Pablo Cordero, Wipapat Kladwang, Christopher C. VanLang, Rhiju Das

Abstract: For decades, dimethyl sulfate (DMS) map** has informed manual modeling of RNA structure in vitro and in vivo. Here, we incorporate DMS data into automated secondary structure inference using a pseudo-energy framework developed for 2'-OH acylation (SHAPE) map**. On six non-coding RNAs with crystallographic models, DMS- guided modeling achieves overall false negative and false discovery rates of… ▽ More For decades, dimethyl sulfate (DMS) map** has informed manual modeling of RNA structure in vitro and in vivo. Here, we incorporate DMS data into automated secondary structure inference using a pseudo-energy framework developed for 2'-OH acylation (SHAPE) map**. On six non-coding RNAs with crystallographic models, DMS- guided modeling achieves overall false negative and false discovery rates of 9.5% and 11.6%, comparable or better than SHAPE-guided modeling; and non-parametric bootstrap** provides straightforward confidence estimates. Integrating DMS/SHAPE data and including CMCT reactivities give small additional improvements. These results establish DMS map** - an already routine technique - as a quantitative tool for unbiased RNA structure modeling. △ Less

Submitted 5 July, 2012; originally announced July 2012.

arXiv:1202.4794 [pdf]

Ultraviolet Shadowing of RNA Causes Substantial Non-Poissonian Chemical Damage in Seconds

Authors: Wipapat Kladwang, Justine Hum, Rhiju Das

Abstract: Chemical purity of RNA samples is critical for high-precision studies of RNA folding and catalytic behavior, but such purity may be compromised by photodamage accrued during ultraviolet (UV) visualization of gel-purified samples. Here, we quantitatively assess the breadth and extent of such damage by using reverse transcription followed by single-nucleotide-resolution capillary electrophoresis. We… ▽ More Chemical purity of RNA samples is critical for high-precision studies of RNA folding and catalytic behavior, but such purity may be compromised by photodamage accrued during ultraviolet (UV) visualization of gel-purified samples. Here, we quantitatively assess the breadth and extent of such damage by using reverse transcription followed by single-nucleotide-resolution capillary electrophoresis. We detected UV-induced lesions across a dozen natural and artificial RNAs including riboswitch domains, other non-coding RNAs, and artificial sequences; across multiple sequence contexts, dominantly at but not limited to pyrimidine doublets; and from multiple lamps that are recommended for UV shadowing in the literature. Most strikingly, irradiation time-courses reveal detectable damage within a few seconds of exposure, and these data can be quantitatively fit to a 'skin effect' model that accounts for the increased exposure of molecules near the top of irradiated gel slices. The results indicate that 200-nucleotide RNAs subjected to 20 seconds or less of UV shadowing can incur damage to 20% of molecules, and the molecule-by-molecule distribution of these lesions is more heterogeneous than a Poisson distribution. Photodamage from UV shadowing is thus likely a widespread but unappreciated cause of artifactual heterogeneity in quantitative and single-molecule-resolution RNA biophysical measurements. △ Less

Submitted 21 February, 2012; originally announced February 2012.

arXiv:1110.0800 [pdf]

Automated RNA structure prediction uncovers a missing link in double glycine riboswitches

Authors: Wipapat Kladwang, Fang-Chieh Chou, Rhiju Das

Abstract: The tertiary structures of functional RNA molecules remain difficult to decipher. A new generation of automated RNA structure prediction methods may help address these challenges but have not yet been experimentally validated. Here we apply four prediction tools to a remarkable class of double glycine riboswitches that exhibit ligand-binding cooperativity. A novel method (BPPalign), RMdetect, JAR3… ▽ More The tertiary structures of functional RNA molecules remain difficult to decipher. A new generation of automated RNA structure prediction methods may help address these challenges but have not yet been experimentally validated. Here we apply four prediction tools to a remarkable class of double glycine riboswitches that exhibit ligand-binding cooperativity. A novel method (BPPalign), RMdetect, JAR3D, and Rosetta 3D modeling give consistent predictions for a new stem P0 and kink-turn motif. These elements structure the linker between the RNAs' double aptamers. Chemical map** on the F. nucleatum riboswitch with SHAPE, DMS, and CMCT probing, mutate-and-map studies, and mutation/rescue experiments all provide strong evidence for the structured linker. Under solution conditions that separate two glycine binding transitions, disrupting this helix-junction-helix structure gives 120-fold and 6- to 30-fold poorer association constants for the two transitions, corresponding to an overall energetic impact of 4.3 \pm 0.5 kcal/mol. Prior biochemical and crystallography studies from several labs did not include this critical element due to over-truncation of the RNA. We argue that several further undiscovered elements are likely to exist in the flanking regions of this and other RNA switches, and automated prediction tools can now play a powerful role in their detection and dissection. △ Less

Submitted 4 October, 2011; originally announced October 2011.

arXiv:1110.0276 [pdf]

doi 10.1038/nmeth.2262

Correcting pervasive errors in RNA crystallography through enumerative structure prediction

Authors: Fang-Chieh Chou, Parin Sripakdeevong, Sergey M. Dibrov, Thomas Hermann, Rhiju Das

Abstract: Three-dimensional RNA models fitted into crystallographic density maps exhibit pervasive conformational ambiguities, geometric errors and steric clashes. To address these problems, we present enumerative real-space refinement assisted by electron density under Rosetta (ERRASER), coupled to Python-based hierarchical environment for integrated 'xtallography' (PHENIX) diffraction-based refinement. On… ▽ More Three-dimensional RNA models fitted into crystallographic density maps exhibit pervasive conformational ambiguities, geometric errors and steric clashes. To address these problems, we present enumerative real-space refinement assisted by electron density under Rosetta (ERRASER), coupled to Python-based hierarchical environment for integrated 'xtallography' (PHENIX) diffraction-based refinement. On 24 data sets, ERRASER automatically corrects the majority of MolProbity-assessed errors, improves the average Rfree factor, resolves functionally important discrepancies in noncanonical structure and refines low-resolution models to better match higher-resolution models. △ Less

Submitted 2 December, 2012; v1 submitted 3 October, 2011; originally announced October 2011.

arXiv:1110.0235 [pdf]

The Stanford RNA Map** Database for sharing and visualizing RNA structure map** experiments

Authors: Pablo Cordero, Julius Lucks, Rhiju Das

Abstract: We have established an RNA Map** Database (RMDB) to enable a new generation of structural, thermodynamic, and kinetic studies from quantitative single-nucleotide-resolution RNA structure map** (freely available at http://rmdb.stanford.edu). Chemical and enzymatic map** is a rapid, robust, and widespread approach to RNA characterization. Since its recent coupling with high-throughput sequenci… ▽ More We have established an RNA Map** Database (RMDB) to enable a new generation of structural, thermodynamic, and kinetic studies from quantitative single-nucleotide-resolution RNA structure map** (freely available at http://rmdb.stanford.edu). Chemical and enzymatic map** is a rapid, robust, and widespread approach to RNA characterization. Since its recent coupling with high-throughput sequencing techniques, accelerated software pipelines, and large-scale mutagenesis, the volume of map** data has greatly increased, and there is a critical need for a database to enable sharing, visualization, and meta-analyses of these data. Through its on-line front-end, the RMDB allows users to explore single-nucleotide-resolution chemical accessibility data in heat-map, bar-graph, and colored secondary structure graphics; to leverage these data to generate secondary structure hypotheses; and to download the data in standardized and computer-friendly files, including the RDAT and community-consensus SNRNASM formats. At the time of writing, the database houses 38 entries, describing 2659 RNA sequences and comprising 355,084 data points, and is growing rapidly. △ Less

Submitted 2 October, 2011; originally announced October 2011.

Comments: 20 pages, 2 figures

arXiv:1104.5278 [pdf]

Can biopolymer structures be sampled enumeratively? Atomic-accuracy RNA loop modeling by a stepwise ansatz

Authors: Parin Sripakdeevong, Wipapat Kladwang, Rhiju Das

Abstract: Atomic-accuracy structure prediction of macromolecules is a long-sought goal of computational biophysics. Accurate modeling should be achievable by optimizing a physically realistic energy function but is presently precluded by incomplete sampling of a biopolymer's many degrees of freedom. We present herein a working hypothesis, called the "stepwise ansatz", for recursively constructing well-packe… ▽ More Atomic-accuracy structure prediction of macromolecules is a long-sought goal of computational biophysics. Accurate modeling should be achievable by optimizing a physically realistic energy function but is presently precluded by incomplete sampling of a biopolymer's many degrees of freedom. We present herein a working hypothesis, called the "stepwise ansatz", for recursively constructing well-packed atomic-detail models in small steps, enumerating several million conformations for each monomer and covering all build-up paths. By implementing the strategy in Rosetta and making use of high-performance computing, we provide first tests of this hypothesis on a benchmark of fifteen RNA loop modeling problems drawn from riboswitches, ribozymes, and the ribosome, including ten cases that were not solvable by prior knowledge based modeling approaches. For each loop problem, this deterministic stepwise assembly (SWA) method either reaches atomic accuracy or exposes flaws in Rosetta's all-atom energy function, indicating the resolution of the conformational sampling bottleneck. To our knowledge, SWA is the first enumerative, ab initio build-up method to systematically outperform existing Monte Carlo and knowledge-based methods for 3D structure prediction. As a rigorous experimental test, we have applied SWA to a small RNA motif of previously unknown structure, the C7.2 tetraloop/tetraloop-receptor, and stringently tested this blind prediction with nucleotide-resolution structure map** data. △ Less

Submitted 27 April, 2011; originally announced April 2011.

arXiv:1104.4337 [pdf, other]

doi 10.1093/bioinformatics/btr277

HiTRACE: High-throughput robust analysis for capillary electrophoresis

Authors: Sungroh Yoon, **kyu Kim, Justine Hum, Hanjoo Kim, Seunghyun Park, Wipapat Kladwang, Rhiju Das

Abstract: Motivation: Capillary electrophoresis (CE) of nucleic acids is a workhorse technology underlying high-throughput genome analysis and large-scale chemical map** for nucleic acid structural inference. Despite the wide availability of CE-based instruments, there remain challenges in leveraging their full power for quantitative analysis of RNA and DNA structure, thermodynamics, and kinetics. In part… ▽ More Motivation: Capillary electrophoresis (CE) of nucleic acids is a workhorse technology underlying high-throughput genome analysis and large-scale chemical map** for nucleic acid structural inference. Despite the wide availability of CE-based instruments, there remain challenges in leveraging their full power for quantitative analysis of RNA and DNA structure, thermodynamics, and kinetics. In particular, the slow rate and poor automation of available analysis tools have bottlenecked a new generation of studies involving hundreds of CE profiles per experiment. Results: We propose a computational method called high-throughput robust analysis for capillary electrophoresis (HiTRACE) to automate the key tasks in large-scale nucleic acid CE analysis, including the profile alignment that has heretofore been a rate-limiting step in the highest throughput experiments. We illustrate the application of HiTRACE on thirteen data sets representing 4 different RNAs, three chemical modification strategies, and up to 480 single mutant variants; the largest data sets each include 87,360 bands. By applying a series of robust dynamic programming algorithms, HiTRACE outperforms prior tools in terms of alignment and fitting quality, as assessed by measures including the correlation between quantified band intensities between replicate data sets. Furthermore, while the smallest of these data sets required 7 to 10 hours of manual intervention using prior approaches, HiTRACE quantitation of even the largest data sets herein was achieved in 3 to 12 minutes. The HiTRACE method therefore resolves a critical barrier to the efficient and accurate analysis of nucleic acid structure in experiments involving tens of thousands of electrophoretic bands. △ Less

Submitted 12 May, 2011; v1 submitted 21 April, 2011; originally announced April 2011.

Comments: Revised to include Supplement. Availability: HiTRACE is freely available for download at http://hitrace.stanford.edu

arXiv:1104.0979 [pdf]

Two-dimensional chemical map** for non-coding RNAs

Authors: Wipapat Kladwang, Christopher C. VanLang, Pablo Cordero, Rhiju Das

Abstract: Non-coding RNA molecules fold into precise base pairing patterns to carry out critical roles in genetic regulation and protein synthesis. We show here that coupling systematic mutagenesis with high-throughput SHAPE chemical map** enables accurate base pair inference of domains from ribosomal RNA, ribozymes, and riboswitches. For a six-RNA benchmark that challenged prior chemical/computational me… ▽ More Non-coding RNA molecules fold into precise base pairing patterns to carry out critical roles in genetic regulation and protein synthesis. We show here that coupling systematic mutagenesis with high-throughput SHAPE chemical map** enables accurate base pair inference of domains from ribosomal RNA, ribozymes, and riboswitches. For a six-RNA benchmark that challenged prior chemical/computational methods, this mutate-and-map strategy gives secondary structures in agreement with crystallographic data (2 % error rates), including a blind test on a double-glycine riboswitch. Through modeling of partially ordered RNA states, the method enables the first test of an 'interdomain helix-swap' hypothesis for ligand-binding cooperativity in a glycine riboswitch. Finally, the mutate-and-map data report on tertiary contacts within non-coding RNAs; coupled with the Rosetta/FARFAR algorithm, these data give nucleotide-resolution three-dimensional models (5.7 Å helix RMSD) of an adenine riboswitch. These results highlight the promise of a two-dimensional chemical strategy for inferring the secondary and tertiary structures that underlie non-coding RNA behavior. △ Less

Submitted 5 April, 2011; originally announced April 2011.

arXiv:1103.5458 [pdf]

doi 10.1021/bi200524n

Understanding the errors of SHAPE-directed RNA structure modeling

Authors: Wipapat Kladwang, Christopher C. VanLang, Pablo Cordero, Rhiju Das

Abstract: Single-nucleotide-resolution chemical map** for structured RNA is being rapidly advanced by new chemistries, faster readouts, and coupling to computational algorithms. Recent tests have shown that selective 2'-hydroxyl acylation by primer extension (SHAPE) can give near-zero error rates (0-2%) in modeling the helices of RNA secondary structure. Here, we benchmark the method using six molecules f… ▽ More Single-nucleotide-resolution chemical map** for structured RNA is being rapidly advanced by new chemistries, faster readouts, and coupling to computational algorithms. Recent tests have shown that selective 2'-hydroxyl acylation by primer extension (SHAPE) can give near-zero error rates (0-2%) in modeling the helices of RNA secondary structure. Here, we benchmark the method using six molecules for which crystallographic data are available: tRNA(phe) and 5S rRNA from Escherichia coli, the P4-P6 domain of the Tetrahymena group I ribozyme, and ligand-bound domains from riboswitches for adenine, cyclic di-GMP, and glycine. SHAPE-directed modeling of these highly structured RNAs gave an overall false negative rate (FNR) of 17% and a false discovery rate (FDR) of 21%, with at least one helix prediction error in five of the six cases. Extensive variations of data processing, normalization, and modeling parameters did not significantly mitigate modeling errors. Only one varation, filtering out data collected with deoxyinosine triphosphate during primer extension, gave a modest improvement (FNR = 12%, and FDR = 14%). The residual structure modeling errors are explained by the insufficient information content of these RNAs' SHAPE data, as evaluated by a nonparametric bootstrap** analysis. Beyond these benchmark cases, bootstrap** suggests a low level of confidence (<50%) in the majority of helices in a previously proposed SHAPE-directed model for the HIV-1 RNA genome. Thus, SHAPE-directed RNA modeling is not always unambiguous, and helix-by-helix confidence estimates, as described herein, may be critical for interpreting results from this powerful methodology. △ Less

Submitted 7 September, 2011; v1 submitted 28 March, 2011; originally announced March 2011.

Comments: Biochemistry, Article ASAP (Aug. 15, 2011)

arXiv:1103.3042 [pdf]

doi 10.1371/journal.pone.0020044

Four small puzzles that Rosetta doesn't solve

Authors: Rhiju Das

Abstract: A complete macromolecule modeling package must be able to solve the simplest structure prediction problems. Despite recent successes in high resolution structure modeling and design, the Rosetta software suite fares poorly on deceptively small protein and RNA puzzles, some as small as four residues. To illustrate these problems, this manuscript presents extensive Rosetta results for four well-defi… ▽ More A complete macromolecule modeling package must be able to solve the simplest structure prediction problems. Despite recent successes in high resolution structure modeling and design, the Rosetta software suite fares poorly on deceptively small protein and RNA puzzles, some as small as four residues. To illustrate these problems, this manuscript presents extensive Rosetta results for four well-defined test cases: the 20-residue mini-protein Trp cage, an even smaller disulfide-stabilized conotoxin, the reactive loop of a serine protease inhibitor, and a UUCG RNA tetraloop. In contrast to previous Rosetta studies, several lines of evidence indicate that conformational sampling is not the major bottleneck in modeling these small systems. Instead, approximations and omissions in the Rosetta all-atom energy function currently preclude discriminating experimentally observed conformations from de novo models at atomic resolution. These molecular "puzzles" should serve as useful model systems for developers wishing to make foundational improvements to this powerful modeling suite. △ Less

Submitted 6 June, 2011; v1 submitted 15 March, 2011; originally announced March 2011.

Comments: Published in PLoS One as a manuscript for the RosettaCon 2010 Special Collection

Journal ref: Das R (2011) Four Small Puzzles That Rosetta Doesn't Solve. PLoS ONE 6(5): e20044

arXiv:1103.3032 [pdf]

Why Can't We Predict RNA Structure At Atomic Resolution?

Authors: Kyle Beauchamp, Parin Sripakdeevong, Rhiju Das

Abstract: No existing algorithm can start with arbitrary RNA sequences and return the precise, three-dimensional structures that ensures their biological function. This chapter outlines current algorithms for automated RNA structure prediction (including our own FARNA-FARFAR), highlights their successes, and dissects their limitations, using a tetraloop and the sarcin/ricin motif as examples. The barriers t… ▽ More No existing algorithm can start with arbitrary RNA sequences and return the precise, three-dimensional structures that ensures their biological function. This chapter outlines current algorithms for automated RNA structure prediction (including our own FARNA-FARFAR), highlights their successes, and dissects their limitations, using a tetraloop and the sarcin/ricin motif as examples. The barriers to future advances are considered in light of three particular challenges: improving computational sampling, reducing reliance on experimentally solved structures, and avoiding coarse-grained representations of atomic-level interactions. To help meet these challenges and better understand the current state of the field, we propose an ongoing community-wide CASP-style experiment for evaluating the performance of current structure prediction algorithms. △ Less

Submitted 15 March, 2011; originally announced March 2011.

Comments: K. Beauchamp & P. Sripakdeevong are equally contributing authors. Submission for book: RNA 3D Structure Analysis and Prediction, editors: N. Leontis & E. Westhof

arXiv:cond-mat/0703583 [pdf, ps, other]

doi 10.1103/PhysRevE.77.061912

Interaction Between Motor Domains Can Explain the Complex Dynamics of Heterodimeric Kinesins

Authors: Rahul Kumar Das, Anatoly B. Kolomeisky

Abstract: Motor proteins are active enzyme molecules that play a crucial role in many biological processes. They transform the chemical energy into the mechanical work and move unidirectionally along rigid cytoskeleton filaments. Single-molecule experiments suggest that motor proteins, consisting of two motor domains, move in a hand-over-hand mechanism when each subunit changes between trailing and leadin… ▽ More Motor proteins are active enzyme molecules that play a crucial role in many biological processes. They transform the chemical energy into the mechanical work and move unidirectionally along rigid cytoskeleton filaments. Single-molecule experiments suggest that motor proteins, consisting of two motor domains, move in a hand-over-hand mechanism when each subunit changes between trailing and leading positions in alternating steps, and these subunits do not interact with each other. However, recent experiments on heterodimeric kinesins suggest that the motion of motor domains is not independent, but rather strongly coupled and coordinated, although the mechanism of these interactions are not known. We propose a simple discrete stochastic model to describe the dynamics of homodimeric and heterodimeric two-headed motor proteins. It is argued that interactions between motor domains modify free energy landscapes of each motor subunit, and motor proteins still move via the hand-over-hand mechanism but with different transitions rates. Our calculations of biophysical properties agree with experimental observations. Several ways to test the theoretical model are proposed. △ Less

Submitted 22 March, 2007; originally announced March 2007.

Comments: To appear in New J. Phys

Showing 1–32 of 32 results for author: Das, R