Search | arXiv e-print repository

A Tribute to Phil Bourne -- Scientist and Human

Authors: Cameron Mura, Emma Candelier, Lei Xie

Abstract: This Special Issue of Biomolecules, commissioned in honor of Dr. Philip E. Bourne, focuses on a new field of biomolecular data science. In this brief retrospective, we consider the arc of Bourne's 40-year scientific and professional career, particularly as it relates to the origins of this new field. This Special Issue of Biomolecules, commissioned in honor of Dr. Philip E. Bourne, focuses on a new field of biomolecular data science. In this brief retrospective, we consider the arc of Bourne's 40-year scientific and professional career, particularly as it relates to the origins of this new field. △ Less

Submitted 8 December, 2022; originally announced December 2022.

Comments: 5 pages, 1 figure

arXiv:2206.01092 [pdf]

doi 10.3389/fsysb.2022.959665

Innovations in Integrating Machine Learning and Agent-Based Modeling of Biomedical Systems

Authors: Nikita Sivakumar, Cameron Mura, Shayn M. Peirce

Abstract: Agent-based modeling (ABM) is a well-established paradigm for simulating complex systems via interactions between constituent entities. Machine learning (ML) refers to approaches whereby statistical algorithms 'learn' from data on their own, without imposing a priori theories of system behavior. Biological systems -- from molecules, to cells, to entire organisms -- consist of vast numbers of entit… ▽ More Agent-based modeling (ABM) is a well-established paradigm for simulating complex systems via interactions between constituent entities. Machine learning (ML) refers to approaches whereby statistical algorithms 'learn' from data on their own, without imposing a priori theories of system behavior. Biological systems -- from molecules, to cells, to entire organisms -- consist of vast numbers of entities, governed by complex webs of interactions that span many spatiotemporal scales and exhibit nonlinearity, stochasticity and intricate coupling between entities. The macroscopic properties and collective dynamics of such systems are difficult to capture via continuum modelling and mean-field formalisms. ABM takes a 'bottom-up' approach that obviates these difficulties by enabling one to easily propose and test a set of well-defined 'rules' to be applied to the individual entities (agents) in a system. Evaluating a system and propagating its state over discrete time-steps effectively simulates the system, allowing observables to be computed and system properties to be analyzed. Because the rules that govern an ABM can be difficult to abstract and formulate from experimental data, there is an opportunity to use ML to help infer optimal, system-specific ABM rules. Once such rule-sets are devised, ABM calculations can generate a wealth of data, and ML can be applied there too -- e.g., to probe statistical measures that meaningfully describe a system's stochastic properties. As an example of synergy in the other direction (from ABM to ML), ABM simulations can generate realistic datasets for training ML algorithms (e.g., for regularization, to mitigate overfitting). In these ways, one can envision various synergistic ABM$\rightleftharpoons$ML loops. This review summarizes how ABM and ML have been integrated in contexts that span spatiotemporal scales, from cellular to population-level epidemiology. △ Less

Submitted 9 November, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

Comments: 32 pages, 1 table, 8 figures

arXiv:2111.14283 [pdf, other]

Exploration of Dark Chemical Genomics Space via Portal Learning: Applied to Targeting the Undruggable Genome and COVID-19 Anti-Infective Polypharmacology

Authors: Tian Cai, Li Xie, Muge Chen, Yang Liu, Di He, Shuo Zhang, Cameron Mura, Philip E. Bourne, Lei Xie

Abstract: Advances in biomedicine are largely fueled by exploring uncharted territories of human biology. Machine learning can both enable and accelerate discovery, but faces a fundamental hurdle when applied to unseen data with distributions that differ from previously observed ones -- a common dilemma in scientific inquiry. We have developed a new deep learning framework, called {\textit{Portal Learning}}… ▽ More Advances in biomedicine are largely fueled by exploring uncharted territories of human biology. Machine learning can both enable and accelerate discovery, but faces a fundamental hurdle when applied to unseen data with distributions that differ from previously observed ones -- a common dilemma in scientific inquiry. We have developed a new deep learning framework, called {\textit{Portal Learning}}, to explore dark chemical and biological space. Three key, novel components of our approach include: (i) end-to-end, step-wise transfer learning, in recognition of biology's sequence-structure-function paradigm, (ii) out-of-cluster meta-learning, and (iii) stress model selection. Portal Learning provides a practical solution to the out-of-distribution (OOD) problem in statistical machine learning. Here, we have implemented Portal Learning to predict chemical-protein interactions on a genome-wide scale. Systematic studies demonstrate that Portal Learning can effectively assign ligands to unexplored gene families (unknown functions), versus existing state-of-the-art methods, thereby allowing us to target previously "undruggable" proteins and design novel polypharmacological agents for disrupting interactions between SARS-CoV-2 and human proteins. Portal Learning is general-purpose and can be further applied to other areas of scientific inquiry. △ Less

Submitted 23 November, 2021; originally announced November 2021.

Comments: 18 pages, 6 figures

MSC Class: 68T07

arXiv:2104.12003 [pdf]

A Birds-eye (Re)View of Acid-suppression Drugs, COVID-19, and the Highly Variable Literature

Authors: Cameron Mura, Saskia Preissner, Robert Preissner, Philip E. Bourne

Abstract: We consider the recent surge of information on the potential benefits of acid-suppression drugs in the context of COVID-19, with an eye on the variability (and confusion) across the reported findings--at least as regards the popular antacid famotidine. The inconsistencies reflect contradictory conclusions from independent clinical-based studies that took roughly similar approaches, in terms of exp… ▽ More We consider the recent surge of information on the potential benefits of acid-suppression drugs in the context of COVID-19, with an eye on the variability (and confusion) across the reported findings--at least as regards the popular antacid famotidine. The inconsistencies reflect contradictory conclusions from independent clinical-based studies that took roughly similar approaches, in terms of experimental design (retrospective, cohort-based, etc.) and statistical analyses (propensity-score matching and stratification, etc.). The confusion has significant ramifications in choosing therapeutic interventions: e.g., do potential benefits of famotidine indicate its use in a particular COVID-19 case? Beyond this pressing therapeutic issue, conflicting information on famotidine must be resolved before its integration in ontological and knowledge graph-based frameworks, which in turn are useful in drug repurposing efforts. To begin systematically structuring the rapidly accumulating information, in the hopes of clarifying and reconciling the discrepancies, we consider the contradictory information along three proposed 'axes': (1) a context-of-disease axis, (2) a degree-of-[therapeutic]-benefit axis, and (3) a mechanism-of-action axis. We suspect that incongruencies in how these axes have been (implicitly) treated in past studies has led to the contradictory indications for famotidine and COVID-19. We also trace the evolution of information on acid-suppression agents as regards the transmission, severity, and mortality of COVID-19, given the many literature reports that have accumulated. By grou** the studies conceptually and thematically, we identify three eras in the progression of our understanding of famotidine and COVID-19. Harmonizing these findings is a key goal for both clinical standards-of-care (COVID and beyond) as well as ontological and knowledge graph-based approaches. △ Less

Submitted 24 April, 2021; originally announced April 2021.

Comments: 10 pages, 1 figure

arXiv:2005.08443 [pdf, other]

Deep Learning of Protein Structural Classes: Any Evidence for an 'Urfold'?

Authors: Menuka Jaiswal, Saad Saleem, Yonghyeon Kweon, Eli J Draizen, Stella Veretnik, Cameron Mura, Philip E. Bourne

Abstract: Recent computational advances in the accurate prediction of protein three-dimensional (3D) structures from amino acid sequences now present a unique opportunity to decipher the interrelationships between proteins. This task entails--but is not equivalent to--a problem of 3D structure comparison and classification. Historically, protein domain classification has been a largely manual and subjective… ▽ More Recent computational advances in the accurate prediction of protein three-dimensional (3D) structures from amino acid sequences now present a unique opportunity to decipher the interrelationships between proteins. This task entails--but is not equivalent to--a problem of 3D structure comparison and classification. Historically, protein domain classification has been a largely manual and subjective activity, relying upon various heuristics. Databases such as CATH represent significant steps towards a more systematic (and automatable) approach, yet there still remains much room for the development of more scalable and quantitative classification methods, grounded in machine learning. We suspect that re-examining these relationships via a Deep Learning (DL) approach may entail a large-scale restructuring of classification schemes, improved with respect to the interpretability of distant relationships between proteins. Here, we describe our training of DL models on protein domain structures (and their associated physicochemical properties) in order to evaluate classification properties at CATH's "homologous superfamily" (SF) level. To achieve this, we have devised and applied an extension of image-classification methods and image segmentation techniques, utilizing a convolutional autoencoder model architecture. Our DL architecture allows models to learn structural features that, in a sense, 'define' different homologous SFs. We evaluate and quantify pairwise 'distances' between SFs by building one model per SF and comparing the loss functions of the models. Hierarchical clustering on these distance matrices provides a new view of protein interrelationships--a view that extends beyond simple structural/geometric similarity, and towards the realm of structure/function properties. △ Less

Submitted 17 May, 2020; originally announced May 2020.

Comments: 6 pages, 3 figures, 1 table; IEEE SIEDS conference submission

arXiv:1905.00455 [pdf]

Machine Learning for Classification of Protein Helix Cap** Motifs

Authors: Sean Mullane, Ruoyan Chen, Sri Vaishnavi Vemulapalli, Eli J. Draizen, Ke Wang, Cameron Mura, Philip E. Bourne

Abstract: The biological function of a protein stems from its 3-dimensional structure, which is thermodynamically determined by the energetics of interatomic forces between its amino acid building blocks (the order of amino acids, known as the sequence, defines a protein). Given the costs (time, money, human resources) of determining protein structures via experimental means such as X-ray crystallography, c… ▽ More The biological function of a protein stems from its 3-dimensional structure, which is thermodynamically determined by the energetics of interatomic forces between its amino acid building blocks (the order of amino acids, known as the sequence, defines a protein). Given the costs (time, money, human resources) of determining protein structures via experimental means such as X-ray crystallography, can we better describe and compare protein 3D structures in a robust and efficient manner, so as to gain meaningful biological insights? We begin by considering a relatively simple problem, limiting ourselves to just protein secondary structural elements. Historically, many computational methods have been devised to classify amino acid residues in a protein chain into one of several discrete secondary structures, of which the most well-characterized are the geometrically regular $α$-helix and $β$-sheet; irregular structural patterns, such as 'turns' and 'loops', are less understood. Here, we present a study of Deep Learning techniques to classify the loop-like end cap structures which delimit $α$-helices. Previous work used highly empirical and heuristic methods to manually classify helix cap** motifs. Instead, we use structural data directly--including (i) backbone torsion angles computed from 3D structures, (ii) macromolecular feature sets (e.g., physicochemical properties), and (iii) helix cap classification data (from CAPS-DB)--as the ground truth to train a bidirectional long short-term memory (BiLSTM) model to classify helix cap residues. We tried different network architectures and scanned hyperparameters in order to train and assess several models; we also trained a Support Vector Classifier (SVC) to use as a baseline. Ultimately, we achieved 85% class-balanced accuracy with a deep BiLSTM model. △ Less

Submitted 1 May, 2019; originally announced May 2019.

Comments: 6 pages, 3 figures, 4 tables

arXiv:1807.09247 [pdf]

doi 10.1016/j.sbi.2018.09.003

Structural biology meets data science: Does anything change?

Authors: Cameron Mura, Eli J. Draizen, Philip E. Bourne

Abstract: Data science has emerged from the proliferation of digital data, coupled with advances in algorithms, software and hardware (e.g., GPU computing). Innovations in structural biology have been driven by similar factors, spurring us to ask: can these two fields impact one another in deep and hitherto unforeseen ways? We posit that the answer is yes. New biological knowledge lies in the relationships… ▽ More Data science has emerged from the proliferation of digital data, coupled with advances in algorithms, software and hardware (e.g., GPU computing). Innovations in structural biology have been driven by similar factors, spurring us to ask: can these two fields impact one another in deep and hitherto unforeseen ways? We posit that the answer is yes. New biological knowledge lies in the relationships between sequence, structure, function and disease, all of which play out on the stage of evolution, and data science enables us to elucidate these relationships at scale. Here, we consider the above question from the five key pillars of data science: acquisition, engineering, analytics, visualization and policy, with an emphasis on machine learning as the premier analytics approach. △ Less

Submitted 1 October, 2018; v1 submitted 24 July, 2018; originally announced July 2018.

Comments: 20 pages total, 2 figures, 1 item of supplementary material

Journal ref: vol 52, 2018, pp 95-102

arXiv:1610.00216 [pdf]

doi 10.1021/acs.biomac.6b00951

Toward a Designable Extracellular Matrix: Molecular Dynamics Simulations of an Engineered Laminin-mimetic, Elastin-like Fusion Protein

Authors: James D. Tang, Charles E. McAnany, Cameron Mura, Kyle J. Lampe

Abstract: Native extracellular matrices (ECMs), such as those of the human brain and other neural tissues, exhibit networks of molecular interactions between specific matrix proteins and other tissue components. Guided by these naturally self-assembling supramolecular systems, we have designed a matrix-derived protein chimera that contains a laminin globular-like (LG) domain fused to an elastin-like polypep… ▽ More Native extracellular matrices (ECMs), such as those of the human brain and other neural tissues, exhibit networks of molecular interactions between specific matrix proteins and other tissue components. Guided by these naturally self-assembling supramolecular systems, we have designed a matrix-derived protein chimera that contains a laminin globular-like (LG) domain fused to an elastin-like polypeptide (ELP). All-atom, classical molecular dynamics simulations of our designed laminin-elastin fusion protein reveal temperature-dependent conformational changes, in terms of secondary structure composition, solvent accessible surface area, hydrogen bonding, and surface hydration. These properties illuminate the phase behavior of this fusion protein, via the emergence of $β$-sheet character in physiologically-relevant temperature ranges. △ Less

Submitted 1 October, 2016; originally announced October 2016.

Comments: 53 pages, 7 figures in the main text; Supporting Information contains 1 table, 12 figures, 4 trajectory animations (videos)

Journal ref: Biomacromolecules, 2016

arXiv:1606.03797 [pdf, other]

Understanding the Results of Electrostatics Calculations: Visualizing Molecular 'Isopotential' Surfaces

Authors: Cameron Mura

Abstract: This document attempts to clarify potential confusion regarding electrostatics calculations, specifically in the context of biomolecular structure and specifically as regards the units typically used to contour/visualize isopotential surfaces, potentials mapped onto molecular solvent-accessible surfaces, etc. This document attempts to clarify potential confusion regarding electrostatics calculations, specifically in the context of biomolecular structure and specifically as regards the units typically used to contour/visualize isopotential surfaces, potentials mapped onto molecular solvent-accessible surfaces, etc. △ Less

Submitted 12 June, 2016; originally announced June 2016.

Comments: 4 pages, a couple items of molecular graphics illustrations

arXiv:1606.03630 [pdf, other]

The Structures, Functions, and Evolution of Sm-like Archaeal Proteins (SmAPs)

Authors: Cameron Mura

Abstract: Sm proteins were discovered nearly 20 years ago as a group of small antigenic proteins ($\approx$ 90-120 residues). Since then, an extensive amount of biochemical and genetic data have illuminated the crucial roles of these proteins in forming ribonucleoprotein (RNP) complexes that are used in RNA processing, e.g., spliceosomal removal of introns from pre-mRNAs. Spliceosomes are large macromolecul… ▽ More Sm proteins were discovered nearly 20 years ago as a group of small antigenic proteins ($\approx$ 90-120 residues). Since then, an extensive amount of biochemical and genetic data have illuminated the crucial roles of these proteins in forming ribonucleoprotein (RNP) complexes that are used in RNA processing, e.g., spliceosomal removal of introns from pre-mRNAs. Spliceosomes are large macromolecular machines that are comparable to ribosomes in size and complexity, and are composed of uridine-rich small nuclear RNPs (U snRNPs). Various sets of seven different Sm proteins form the cores of most snRNPs. Despite their importance, very little is known about the atomic-resolution structure of snRNPs or their Sm cores. As a first step towards a high-resolution image of snRNPs and their hierarchic assembly, we have determined the crystal structures of archaeal homologs of Sm proteins, which we term Sm-like archaeal proteins (SmAPs). △ Less

Submitted 11 June, 2016; originally announced June 2016.

Comments: 215 pages, distributed across an Abstract, Synopsis, five Chapters (the main body) and an Appendix; the work in this dissertation was performed in the Eisenberg lab at UCLA from ca. 1999 to 2002

arXiv:1606.02737 [pdf, other]

doi 10.1021/acs.jpcb.6b03261

Claws, Disorder, and Conformational Dynamics of the C-terminal Region of Human Desmoplakin

Authors: Charles E. McAnany, Cameron Mura

Abstract: Multicellular organisms consist of cells that interact via elaborate adhesion complexes. Desmosomes are membrane-associated adhesion complexes that mechanically tether the cytoskeletal intermediate filaments (IFs) between two adjacent cells, creating a network of tough connections in tissues such as skin and heart. Desmoplakin (DP) is the key desmosomal protein that binds IFs, and the DP-IF associ… ▽ More Multicellular organisms consist of cells that interact via elaborate adhesion complexes. Desmosomes are membrane-associated adhesion complexes that mechanically tether the cytoskeletal intermediate filaments (IFs) between two adjacent cells, creating a network of tough connections in tissues such as skin and heart. Desmoplakin (DP) is the key desmosomal protein that binds IFs, and the DP-IF association poses a quandary: desmoplakin must stably and tightly bind IFs to maintain the structural integrity of the desmosome. Yet, newly synthesized DP must traffick along the cytoskeleton to the site of nascent desmosome assembly without 'sticking' to the IF network, implying weak or transient DP--IF contacts. Recent work reveals that these contacts are modulated by post-translational modifications (PTMs) in DP's C-terminal tail. Using molecular dynamics simulations, we have elucidated the structural basis of these PTM-induced effects. Our simulations, nearing 2 microseconds in aggregate, indicate that phosphorylation of S2849 induces an 'arginine claw' in desmoplakin's C-terminal tail (DPCTT). If a key arginine, R2834, is methylated, the DPCTT preferentially samples conformations that are geometrically well-suited as substrates for processive phosphorylation by the cognate kinase GSK3. We suggest that DPCTT is a molecular switch that modulates, via its conformational dynamics, DP's efficacy as a substrate for GSK3. Finally, we show that the fluctuating DPCTT can contact other parts of DP, suggesting a competitive binding mechanism for the modulation of DP--IF interactions. △ Less

Submitted 8 June, 2016; originally announced June 2016.

Comments: 68 pages (47 pp main text + 21 pp of Supporting Information ); 6 figures and 1 table in the main text; in press

Journal ref: The Journal of Physical Chemistry B (2016)

arXiv:1605.05419 [pdf, other]

doi 10.1371/journal.pcbi.1004867

An Introduction to Programming for Bioscientists: A Python-based Primer

Authors: Berk Ekmekci, Charles E. McAnany, Cameron Mura

Abstract: Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in the biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, i… ▽ More Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in the biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language's usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a 'variable', the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences. △ Less

Submitted 17 May, 2016; originally announced May 2016.

Comments: 65 pages total, including 45 pages text, 3 figures, 4 tables, numerous exercises, and 19 pages of Supporting Information; currently in press at PLOS Computational Biology

arXiv:1507.08262 [pdf]

doi 10.1002/bmb.20873

Known Structure, Unknown Function: An Inquiry-based Undergraduate Biochemistry Laboratory Course

Authors: Cynthia Gray, Carol W. Price, Christopher T. Lee, Alison H. Dewald, Matthew A. Cline, Charles E. McAnany, Linda Columbus, Cameron Mura

Abstract: Undergraduate biochemistry laboratory courses often do not provide students with an authentic research experience, particularly when the express purpose of the laboratory is purely instructional. However, an instructional laboratory course that is inquiry- and research-based could simultaneously impart scientific knowledge and foster a student's research expertise and confidence. We have developed… ▽ More Undergraduate biochemistry laboratory courses often do not provide students with an authentic research experience, particularly when the express purpose of the laboratory is purely instructional. However, an instructional laboratory course that is inquiry- and research-based could simultaneously impart scientific knowledge and foster a student's research expertise and confidence. We have developed a year-long undergraduate biochemistry laboratory curriculum wherein students determine, via experiment and computation, the function of a protein of known three-dimensional structure. The first half of the course is inquiry-based and modular in design; students learn general biochemical techniques while gaining preparation for research experiments in the second semester. Having learned standard biochemical methods in the first semester, students independently pursue their own (original) research projects in the second semester. This new curriculum has yielded an improvement in student performance and confidence as assessed by various metrics. To disseminate teaching resources to students and instructors alike, a freely accessible Biochemistry Laboratory Education resource is available at http://biochemlab.org. △ Less

Submitted 8 July, 2015; originally announced July 2015.

Comments: 75 pages, 8 figures, 2 tables, to appear in the journal "Biochemistry & Molecular Biology Education" (open-access license)

arXiv:1407.5218 [pdf, other]

doi 10.1107/S0021889811004481

Abstractions, Algorithms and Data Structures for Structural Bioinformatics in PyCogent

Authors: Marcin Cieslik, Zygmunt Derewenda, Cameron Mura

Abstract: To facilitate flexible and efficient structural bioinformatics analyses, new functionality for three-dimensional structure processing and analysis has been introduced into PyCogent -- a popular feature-rich framework for sequence-based bioinformatics, but one which has lacked equally powerful tools for handling stuctural/coordinate-based data. Extensible Python modules have been developed, which p… ▽ More To facilitate flexible and efficient structural bioinformatics analyses, new functionality for three-dimensional structure processing and analysis has been introduced into PyCogent -- a popular feature-rich framework for sequence-based bioinformatics, but one which has lacked equally powerful tools for handling stuctural/coordinate-based data. Extensible Python modules have been developed, which provide object-oriented abstractions (based on a hierarchical representation of macromolecules), efficient data structures (e.g. kD-trees), fast implementations of common algorithms (e.g. surface-area calculations), read/write support for Protein Data Bank-related file formats and wrappers for external command-line applications (e.g. Stride). Integration of this code into PyCogent is symbiotic, allowing sequence-based work to benefit from structure-derived data and, reciprocally, enabling structural studies to leverage PyCogent's versatile tools for phylogenetic and evolutionary analyses. △ Less

Submitted 19 July, 2014; originally announced July 2014.

Comments: 36 pages, 4 figures (including supplemental information)

Journal ref: Journal of Applied Crystallography (2011), 44(2), 424-428

arXiv:1407.5211 [pdf, other]

Development & Implementation of a PyMOL 'putty' Representation

Authors: Cameron Mura

Abstract: The PyMOL molecular graphics program has been modified to introduce a new 'putty' cartoon representation, akin to the 'sausage'-style representation of the MOLMOL molecular visualization (MolVis) software package. This document outlines the development and implementation of the putty representation. The PyMOL molecular graphics program has been modified to introduce a new 'putty' cartoon representation, akin to the 'sausage'-style representation of the MOLMOL molecular visualization (MolVis) software package. This document outlines the development and implementation of the putty representation. △ Less

Submitted 19 July, 2014; originally announced July 2014.

Comments: 3 pages, 4 figures

arXiv:1407.4378 [pdf]

PaPy: Parallel and Distributed Data-processing Pipelines in Python

Authors: Marcin Cieslik, Cameron Mura

Abstract: PaPy, which stands for parallel pipelines in Python, is a highly flexible framework that enables the construction of robust, scalable workflows for either generating or processing voluminous datasets. A workflow is created from user-written Python functions (nodes) connected by 'pipes' (edges) into a directed acyclic graph. These functions are arbitrarily definable, and can make use of any Python… ▽ More PaPy, which stands for parallel pipelines in Python, is a highly flexible framework that enables the construction of robust, scalable workflows for either generating or processing voluminous datasets. A workflow is created from user-written Python functions (nodes) connected by 'pipes' (edges) into a directed acyclic graph. These functions are arbitrarily definable, and can make use of any Python modules or external binaries. Given a user-defined topology and collection of input data, functions are composed into nested higher-order maps, which are transparently and robustly evaluated in parallel on a single computer or on remote hosts. Local and remote computational resources can be flexibly pooled and assigned to functional nodes, thereby allowing facile load-balancing and pipeline optimization to maximize computational throughput. Input items are processed by nodes in parallel, and traverse the graph in batches of adjustable size -- a trade-off between lazy-evaluation, parallelism, and memory consumption. The processing of a single item can be parallelized in a scatter/gather scheme. The simplicity and flexibility of distributed workflows using PaPy bridges the gap between desktop -> grid, enabling this new computing paradigm to be leveraged in the processing of large scientific datasets. △ Less

Submitted 14 July, 2014; originally announced July 2014.

Comments: 7 pages, 5 figures, 2 tables, some use-cases; more at http://muralab.org/PaPy

arXiv:1407.4071 [pdf]

doi 10.1371/journal.pcbi.1000918

Ten Simple Rules for Creating Biomolecular Graphics

Authors: Cameron Mura

Abstract: One need only compare the number of three-dimensional molecular illustrations in the first (1990) and third (2004) editions of Voet & Voet's "Biochemistry" in order to appreciate this field's profound communicative value in modern biological sciences -- ranging from medicine, physiology, and cell biology, to pharmaceutical chemistry and drug design, to structural and computational biology. The cli… ▽ More One need only compare the number of three-dimensional molecular illustrations in the first (1990) and third (2004) editions of Voet & Voet's "Biochemistry" in order to appreciate this field's profound communicative value in modern biological sciences -- ranging from medicine, physiology, and cell biology, to pharmaceutical chemistry and drug design, to structural and computational biology. The cliché about a picture being worth a thousand words is quite poignant here: The information 'content' of an effectively-constructed piece of molecular graphics can be immense. Because biological function arises from structure, it is difficult to overemphasize the utility of visualization and graphics in molding our current understanding of the molecular nature of biological systems. Nevertheless, creating effective molecular graphics is not easy -- neither conceptually, nor in terms of effort required. The present collection of Rules is meant as a guide for those embarking upon their first molecular illustrations. △ Less

Submitted 15 July, 2014; originally announced July 2014.

Comments: 3 pages, 0 figures; see also the full-length article in PLoS Computational Biology (cited below)

Journal ref: PLoS Computational Biology (2010), 6(8): e1000918

arXiv:1407.3752 [pdf]

doi 10.1080/08927022.2014.935372

An Introduction to Biomolecular Simulations and Docking

Authors: Cameron Mura, Charles E. McAnany

Abstract: The biomolecules in and around a living cell -- proteins, nucleic acids, lipids, carbohydrates -- continuously sample myriad conformational states that are thermally accessible at physiological temperatures. Simultaneously, a given biomolecule also samples (and is sampled by) a rapidly fluctuating local environment comprised of other biopolymers, small molecules, water, ions, etc. that diffuse to… ▽ More The biomolecules in and around a living cell -- proteins, nucleic acids, lipids, carbohydrates -- continuously sample myriad conformational states that are thermally accessible at physiological temperatures. Simultaneously, a given biomolecule also samples (and is sampled by) a rapidly fluctuating local environment comprised of other biopolymers, small molecules, water, ions, etc. that diffuse to within a few nanometers, leading to inter-molecular contacts that stitch together large supramolecular assemblies. Indeed, all biological systems can be viewed as dynamic networks of molecular interactions. As a complement to experimentation, molecular simulation offers a uniquely powerful approach to analyze biomolecular structure, mechanism, and dynamics; this is possible because the molecular contacts that define a complicated biomolecular system are governed by the same physical principles (forces, energetics) that characterize individual small molecules, and these simpler systems are relatively well-understood. With modern algorithms and computing capabilities, simulations are now an indispensable tool for examining biomolecular assemblies in atomic detail, from the conformational motion in an individual protein to the diffusional dynamics and inter-molecular collisions in the early stages of formation of cellular-scale assemblies such as the ribosome. This text introduces the physicochemical foundations of molecular simulations and docking, largely from the perspective of biomolecular interactions. △ Less

Submitted 14 July, 2014; originally announced July 2014.

Comments: The text is accompanied by 6 figures and 5 boxes. The text is in press at Molecular Simulation (2014), as part of a special issue on simulations in molecular biology

arXiv:0807.1375 [pdf]

doi 10.1093/nar/gkn473

Molecular Dynamics of a kB DNA Element: Base Flip** via Cross-strand Intercalative Stacking in a Microsecond-scale Simulation

Authors: Cameron Mura, J. Andrew McCammon

Abstract: The sequence-dependent structural variability and conformational dynamics of DNA play pivotal roles in many biological milieus, such as in the site-specific binding of transcription factors to target regulatory elements. To better understand DNA structure, function, and dynamics in general, and protein-DNA recognition in the 'kB' family of genetic regulatory elements in particular, we performed mo… ▽ More The sequence-dependent structural variability and conformational dynamics of DNA play pivotal roles in many biological milieus, such as in the site-specific binding of transcription factors to target regulatory elements. To better understand DNA structure, function, and dynamics in general, and protein-DNA recognition in the 'kB' family of genetic regulatory elements in particular, we performed molecular dynamics simulations of a 20-base pair DNA encompassing a cognate kB site recognized by the proto-oncogenic 'c-Rel' subfamily of NF-kB transcription factors. Simulations of the kB DNA in explicit water were extended to microsecond duration, providing a broad, atomically-detailed glimpse into the structural and dynamical behavior of double helical DNA over many timescales. Of particular note, novel (and structurally plausible) conformations of DNA developed only at the long times sampled in this simulation -- including a peculiar state arising at ~ 0.7 us and characterized by cross-strand intercalative stacking of nucleotides within a longitudinally-sheared base pair, followed (at ~ 1 us) by spontaneous base flip** of a neighboring thymine within the A-rich duplex. Results and predictions from the us-scale simulation include implications for a dynamical NF-kB recognition motif, and are amenable to testing and further exploration via specific experimental approaches that are suggested herein. △ Less

Submitted 17 July, 2014; v1 submitted 9 July, 2008; originally announced July 2008.

Comments: 21 pages, 9 figures; revised version has been updated to include figures

Journal ref: Nucleic Acids Research (2008), 36(15), 4941-4955

Showing 1–19 of 19 results for author: Mura, C