-
The Stanford RNA Map** Database for sharing and visualizing RNA structure map** experiments
Authors:
Pablo Cordero,
Julius Lucks,
Rhiju Das
Abstract:
We have established an RNA Map** Database (RMDB) to enable a new generation of structural, thermodynamic, and kinetic studies from quantitative single-nucleotide-resolution RNA structure map** (freely available at http://rmdb.stanford.edu). Chemical and enzymatic map** is a rapid, robust, and widespread approach to RNA characterization. Since its recent coupling with high-throughput sequenci…
▽ More
We have established an RNA Map** Database (RMDB) to enable a new generation of structural, thermodynamic, and kinetic studies from quantitative single-nucleotide-resolution RNA structure map** (freely available at http://rmdb.stanford.edu). Chemical and enzymatic map** is a rapid, robust, and widespread approach to RNA characterization. Since its recent coupling with high-throughput sequencing techniques, accelerated software pipelines, and large-scale mutagenesis, the volume of map** data has greatly increased, and there is a critical need for a database to enable sharing, visualization, and meta-analyses of these data. Through its on-line front-end, the RMDB allows users to explore single-nucleotide-resolution chemical accessibility data in heat-map, bar-graph, and colored secondary structure graphics; to leverage these data to generate secondary structure hypotheses; and to download the data in standardized and computer-friendly files, including the RDAT and community-consensus SNRNASM formats. At the time of writing, the database houses 38 entries, describing 2659 RNA sequences and comprising 355,084 data points, and is growing rapidly.
△ Less
Submitted 2 October, 2011;
originally announced October 2011.
-
RNA structure characterization from chemical map** experiments
Authors:
Sharon Aviran,
Julius B. Lucks,
Lior Pachter
Abstract:
Despite great interest in solving RNA secondary structures due to their impact on function, it remains an open problem to determine structure from sequence. Among experimental approaches, a promising candidate is the "chemical modification strategy", which involves application of chemicals to RNA that are sensitive to structure and that result in modifications that can be assayed via sequencing te…
▽ More
Despite great interest in solving RNA secondary structures due to their impact on function, it remains an open problem to determine structure from sequence. Among experimental approaches, a promising candidate is the "chemical modification strategy", which involves application of chemicals to RNA that are sensitive to structure and that result in modifications that can be assayed via sequencing technologies. One approach that can reveal paired nucleotides via chemical modification followed by sequencing is SHAPE, and it has been used in conjunction with capillary electrophoresis (SHAPE-CE) and high-throughput sequencing (SHAPE-Seq). The solution of mathematical inverse problems is needed to relate the sequence data to the modified sites, and a number of approaches have been previously suggested for SHAPE-CE, and separately for SHAPE-Seq analysis. Here we introduce a new model for inference of chemical modification experiments, whose formulation results in closed-form maximum likelihood estimates that can be easily applied to data. The model can be specialized to both SHAPE-CE and SHAPE-Seq, and therefore allows for a direct comparison of the two technologies. We then show that the extra information obtained with SHAPE-Seq but not with SHAPE-CE is valuable with respect to ML estimation.
△ Less
Submitted 29 June, 2011; v1 submitted 24 June, 2011;
originally announced June 2011.
-
Python - All a Scientist Needs
Authors:
Julius B. Lucks
Abstract:
Any cutting-edge scientific research project requires a myriad of computational tools for data generation, management, analysis and visualization. Python is a flexible and extensible scientific programming platform that offered the perfect solution in our recent comparative genomics investigation (J. B. Lucks, D. R. Nelson, G. Kudla, J. B. Plotkin. Genome landscapes and bacteriophage codon usage…
▽ More
Any cutting-edge scientific research project requires a myriad of computational tools for data generation, management, analysis and visualization. Python is a flexible and extensible scientific programming platform that offered the perfect solution in our recent comparative genomics investigation (J. B. Lucks, D. R. Nelson, G. Kudla, J. B. Plotkin. Genome landscapes and bacteriophage codon usage, PLoS Computational Biology, 4, 1000001, 2008). In this paper, we discuss the challenges of this project, and how the combined power of Biopython, Matplotlib and SWIG were utilized for the required computational tasks. We finish by discussing how python goes beyond being a convenient programming language, and promotes good scientific practice by enabling clean code, integration with professional programming techniques such as unit testing, and strong data provenance.
△ Less
Submitted 12 March, 2008;
originally announced March 2008.
-
Genome landscapes and bacteriophage codon usage
Authors:
Julius B. Lucks,
David R. Nelson,
Grzegorz Kudla,
Joshua B. Plotkin
Abstract:
Across all kingdoms of biological life, protein-coding genes exhibit unequal usage of synonmous codons. Although alternative theories abound, translational selection has been accepted as an important mechanism that shapes the patterns of codon usage in prokaryotes and simple eukaryotes. Here we analyze patterns of codon usage across 74 diverse bacteriophages that infect E. coli, P. aeruginosa an…
▽ More
Across all kingdoms of biological life, protein-coding genes exhibit unequal usage of synonmous codons. Although alternative theories abound, translational selection has been accepted as an important mechanism that shapes the patterns of codon usage in prokaryotes and simple eukaryotes. Here we analyze patterns of codon usage across 74 diverse bacteriophages that infect E. coli, P. aeruginosa and L. lactis as their primary host. We introduce the concept of a `genome landscape,' which helps reveal non-trivial, long-range patterns in codon usage across a genome. We develop a series of randomization tests that allow us to interrogate the significance of one aspect of codon usage, such a GC content, while controlling for another aspect, such as adaptation to host-preferred codons. We find that 33 phage genomes exhibit highly non-random patterns in their GC3-content, use of host-preferred codons, or both. We show that the head and tail proteins of these phages exhibit significant bias towards host-preferred codons, relative to the non-structural phage proteins. Our results support the hypothesis of translational selection on viral genes for host-preferred codons, over a broad range of bacteriophages.
△ Less
Submitted 14 August, 2007;
originally announced August 2007.
-
Dynamics of RNA Translocation through a Nanopore
Authors:
J. B. Lucks,
Y. Kafri
Abstract:
We present a simplified model of the dynamics of translocation of RNA through a nanopore which only allows the passage of unbound nucleotides. In particular, we consider the disorder averaged translocation dynamics of random, two-component, single-stranded nucleotides, by reducing the dynamics to the motion of a random walker on a one-dimensional free energy landscape of translocation. These tra…
▽ More
We present a simplified model of the dynamics of translocation of RNA through a nanopore which only allows the passage of unbound nucleotides. In particular, we consider the disorder averaged translocation dynamics of random, two-component, single-stranded nucleotides, by reducing the dynamics to the motion of a random walker on a one-dimensional free energy landscape of translocation. These translocation landscapes are calculated from the folds of the RNA sequences and the voltage bias applied across the nanopore. We compute these landscapes for 1500 randomly drawn two-letter sequences of length 4000. Simulations of the dynamics on these landscapes display anomalous characteristics, similar to random forcing energy landscapes, where the translocation process proceeds slower than linearly in time for sufficiently small voltage biases across the nanopore, but moves linearly in time at large voltage biases. We argue that our simplified model provides an upper bound to the more realistic translocation dynamics, and thus we expect that all RNA translocation models will exhibit anomalous regimes.
△ Less
Submitted 12 March, 2007;
originally announced March 2007.
-
Pause Point Spectra in DNA Constant-Force Unzip**
Authors:
J. D. Weeks,
J. B. Lucks,
Y. Kafri,
C. Danilowicz,
D. R. Nelson,
M. Prentiss
Abstract:
Under constant applied force, the separation of double-stranded DNA into two single strands is known to proceed through a series of pauses and jumps. Given experimental traces of constant-force unzip**, we present a method whereby the locations of pause points can be extracted in the form of a pause point spectrum. A simple theoretical model of DNA constant-force unzip** is demonstrated to p…
▽ More
Under constant applied force, the separation of double-stranded DNA into two single strands is known to proceed through a series of pauses and jumps. Given experimental traces of constant-force unzip**, we present a method whereby the locations of pause points can be extracted in the form of a pause point spectrum. A simple theoretical model of DNA constant-force unzip** is demonstrated to produce good agreement with the experimental pause point spectrum of lambda phage DNA. The locations of peaks in the experimental and theoretical pause point spectra are found to be nearly coincident below 6000 bp. The model only requires the sequence, temperature and a set of empirical base pair binding and stacking energy parameters, and the good agreement with experiment suggests that pause points are primarily determined by the DNA sequence. The model is also used to predict pause point spectra for the BacterioPhage PhiX174 genome. The algorithm for extracting the pause point spectrum might also be useful for studying related systems which exhibit pausing behavior such as molecular motors.
△ Less
Submitted 10 June, 2004;
originally announced June 2004.