Search | arXiv e-print repository

nD-PDPA: nDimensional Probability Density Profile Analysis

Authors: Arjang Fahim, Stephanie Irausquin, Homayoun Valafar

Abstract: Despite the recent advances in various Structural Genomics Projects, a large gap remains between the number of sequenced and structurally characterized proteins. Some reasons for this discrepancy include technical difficulties, labor, and the cost related to determining a structure by experimental methods such as NMR spectroscopy. Several computational methods have been developed to expand the app… ▽ More Despite the recent advances in various Structural Genomics Projects, a large gap remains between the number of sequenced and structurally characterized proteins. Some reasons for this discrepancy include technical difficulties, labor, and the cost related to determining a structure by experimental methods such as NMR spectroscopy. Several computational methods have been developed to expand the applicability of NMR spectroscopy by addressing temporal and economical problems more efficiently. While these methods demonstrate successful outcomes to solve more challenging and structurally novel proteins, the cost has not been reduced significantly. Probability Density Profile Analysis (PDPA) has been previously introduced by our lab to directly address the economics of structure determination of routine proteins and the identification of novel structures from a minimal set of unassigned NMR data. 2D-PDPA (in which 2D denotes incorporation of data from two alignment media) has been successful in identifying the structural homolog of an unknown protein within a library of ~1000 decoy structures. In order to further expand the selectivity and sensitivity of PDPA, the incorporation of additional data was necessary. However, the expansion of the original PDPA approach was limited by its computational requirements where the inclusion of additional data would render it computationally intractable. Here we present the most recent developments of PDPA method (nD-PDPA: n Dimensional Probability Density Profile Analysis) that eliminate 2D-PDPA's computational limitations, and allows inclusion of RDC data from multiple vector types in multiple alignment media. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: Published in 2016

arXiv:2302.01337 [pdf]

doi 10.1109/CSCI58124.2022.00289

Comprehensive and user-analytics-friendly cancer patient database for physicians and researchers

Authors: Ali Firooz, Avery T. Funkhouser, Julie C. Martin, W. Jeffery Edenfield, Homayoun Valafar, Anna V. Blenda

Abstract: Nuanced cancer patient care is needed, as the development and clinical course of cancer is multifactorial with influences from the general health status of the patient, germline and neoplastic mutations, co-morbidities, and environment. To effectively tailor an individualized treatment to each patient, such multifactorial data must be presented to providers in an easy-to-access and easy-to-analyze… ▽ More Nuanced cancer patient care is needed, as the development and clinical course of cancer is multifactorial with influences from the general health status of the patient, germline and neoplastic mutations, co-morbidities, and environment. To effectively tailor an individualized treatment to each patient, such multifactorial data must be presented to providers in an easy-to-access and easy-to-analyze fashion. To address the need, a relational database has been developed integrating status of cancer-critical gene mutations, serum galectin profiles, serum and tumor glycomic profiles, with clinical, demographic, and lifestyle data points of individual cancer patients. The database, as a backend, provides physicians and researchers with a single, easily accessible repository of cancer profiling data to aid-in and enhance individualized treatment. Our interactive database allows care providers to amalgamate cohorts from these groups to find correlations between different data types with the possibility of finding "molecular signatures" based upon a combination of genetic mutations, galectin serum levels, glycan compositions, and patient clinical data and lifestyle choices. Our project provides a framework for an integrated, interactive, and growing database to analyze molecular and clinical patterns across cancer stages and subtypes and provides opportunities for increased diagnostic and prognostic power. △ Less

Submitted 1 February, 2023; originally announced February 2023.

Comments: 7 pages, 12 figures, peer reviewed and accepted in "International Conference on Computational Science and Computational Intelligence (CSCI 22)"

Journal ref: Proceedings of the 2022 International Conference on Computational Science and Computational Intelligence (CSCI)

arXiv:2012.09267 [pdf]

Reduction in the complexity of 1D 1H-NMR spectra by the use of Frequency to Information Transformation

Authors: Homayoun Valafar, Faramarz Valafar

Abstract: Analysis of 1H-NMR spectra is often hindered by large variations that occur during the collection of these spectra. Large solvent and standard peaks, base line drift and negative peaks (due to improper phasing) are among some of these variations. Furthermore, some instrument dependent alterations, such as incorrect shimming, are also embedded in the recorded spectrum. The unpredictable nature of t… ▽ More Analysis of 1H-NMR spectra is often hindered by large variations that occur during the collection of these spectra. Large solvent and standard peaks, base line drift and negative peaks (due to improper phasing) are among some of these variations. Furthermore, some instrument dependent alterations, such as incorrect shimming, are also embedded in the recorded spectrum. The unpredictable nature of these alterations of the signal has rendered the automated and instrument independent computer analysis of these spectra unreliable. In this paper, a novel method of extracting the information content of a signal (in this paper, frequency domain 1H-NMR spectrum), called the frequency-information transformation (FIT), is presented and compared to a previously used method (SPUTNIK). FIT can successfully extract the relevant information to a pattern matching task present in a signal, while discarding the remainder of a signal by transforming a Fourier transformed signal into an information spectrum (IS). This technique exhibits the ability of decreasing the inter-class correlation coefficients while increasing the intra-class correlation coefficients. Different spectra of the same molecule, in other words, will resemble more to each other while the spectra of different molecules will look more different from each other. This feature allows easier automated identification and analysis of molecules based on their spectral signatures using computer algorithms. △ Less

Submitted 16 December, 2020; originally announced December 2020.

Comments: 21 pages

arXiv:2012.06697 [pdf]

TALI: Protein Structure Alignment Using Backbone Torsion Angles

Authors: Xijiang Miao, Michael G. Bryson, Homayoun Valafar

Abstract: This article introduces a novel protein structure alignment method (named TALI) based on the protein backbone torsion angle instead of the more traditional distance matrix. Because the structural alignment of the two proteins is based on the comparison of two sequences of numbers (backbone torsion angles), we can take advantage of a large number of well-developed methods such as Smith-Waterman or… ▽ More This article introduces a novel protein structure alignment method (named TALI) based on the protein backbone torsion angle instead of the more traditional distance matrix. Because the structural alignment of the two proteins is based on the comparison of two sequences of numbers (backbone torsion angles), we can take advantage of a large number of well-developed methods such as Smith-Waterman or Needleman-Wunsch. Here we report the result of TALI in comparison to other structure alignment methods such as DALI, CE, and SSM ass well as sequence alignment based on PSI-BLAST. TALI demonstrated great success over all other methods in application to challenging proteins. TALI was more successful in recognizing remote structural homology. TALI also demonstrated an ability to identify structural homology between two proteins where the structural difference was due to a rotation of internal domains by nearly 180$^\circ$. △ Less

Submitted 11 December, 2020; originally announced December 2020.

Comments: Seven pages

Journal ref: Published in BIOCOMP 2006: 3-9

arXiv:2008.02072 [pdf]

A Comparative study of Artificial Neural Networks Using Reinforcement learning and Multidimensional Bayesian Classification Using Parzen Density Estimation for Identification of GC-EIMS Spectra of Partially Methylated Alditol Acetates

Authors: Faramarz Valafar, Homayoun Valafar

Abstract: This study reports the development of a pattern recognition search engine for a World Wide Web-based database of gas chromatography-electron impact mass spectra (GC-EIMS) of partially methylated Alditol Acetates (PMAAs). Here, we also report comparative results for two pattern recognition techniques that were employed for this study. The first technique is a statistical technique using Bayesian cl… ▽ More This study reports the development of a pattern recognition search engine for a World Wide Web-based database of gas chromatography-electron impact mass spectra (GC-EIMS) of partially methylated Alditol Acetates (PMAAs). Here, we also report comparative results for two pattern recognition techniques that were employed for this study. The first technique is a statistical technique using Bayesian classifiers and Parzen density estimators. The second technique involves an artificial neural network module trained with reinforcement learning. We demonstrate here that both systems perform well in identifying spectra with small amounts of noise. Both system's performance degrades with degrading signal-to-noise ratio (SNR). When dealing with partial spectra (missing data), the artificial neural network system performs better. The developed system is implemented on the world wide web, and is intended to identify PMAAs using submitted spectra of these molecules recorded on any GC-EIMS instrument. The system, therefore, is insensitive to instrument and column dependent variations in GC-EIMS spectra. △ Less

Submitted 31 July, 2020; originally announced August 2020.

Comments: 5 pages

Report number: Published in IEEE-ICAI 1999 554-558

arXiv:2008.01004 [pdf]

Identification of 1H-NMR Spectra of Xyloglucan Oligosaccharides: A Comparative Study of Artificial Neural Networks and Bayesian Classification Using Nonparametric Density Estimation

Authors: Faramarz Valafar, Homayoun Valafar, William S. York

Abstract: Proton nuclear magnetic resonance (1H-NMR) is a widely used tool for chemical structural analysis. However, 1H-NMR spectra suffer from natural aberrations that render computer-assisted automated identification of these spectra difficult, and at times impossible. Previous efforts have successfully implemented instrument dependent or conditional identification of these spectra. In this paper, we rep… ▽ More Proton nuclear magnetic resonance (1H-NMR) is a widely used tool for chemical structural analysis. However, 1H-NMR spectra suffer from natural aberrations that render computer-assisted automated identification of these spectra difficult, and at times impossible. Previous efforts have successfully implemented instrument dependent or conditional identification of these spectra. In this paper, we report the first instrument independent computer-assisted automated identification system for a group of complex carbohydrates known as the xyloglucan oligosaccharides. The developed system is also implemented on the world wide web (http://www.ccrc.uga.edu) as part of an identification package called the CCRC-Net and is intended to recognize any submitted 1H-NMR spectrum of these structures with reasonable signal-to-noise ratio, recorded on any 500 MHz NMR instrument. The system uses Artificial Neural Networks (ANNs) technology and is insensitive to the instrument and environment-dependent variations in 1H-NMR spectroscopy. In this paper, comparative results of the ANN engine versus a multidimensional Bayes' classifier is also presented. △ Less

Submitted 30 July, 2020; originally announced August 2020.

Comments: 6 pages. Published in IEEE ICAI99

Journal ref: Published in IEEE ICAI 1999 549-553

arXiv:2008.00539 [pdf]

An Investigation in Optimal Encoding of Protein Primary Sequence for Structure Prediction by Artificial Neural Networks

Authors: Aaron Hein, Casey Cole, Homayoun Valafar

Abstract: Machine learning and the use of neural networks has increased precipitously over the past few years primarily due to the ever-increasing accessibility to data and the growth of computation power. It has become increasingly easy to harness the power of machine learning for predictive tasks. Protein structure prediction is one area where neural networks are becoming increasingly popular and successf… ▽ More Machine learning and the use of neural networks has increased precipitously over the past few years primarily due to the ever-increasing accessibility to data and the growth of computation power. It has become increasingly easy to harness the power of machine learning for predictive tasks. Protein structure prediction is one area where neural networks are becoming increasingly popular and successful. Although very powerful, the use of ANN require selection of most appropriate input/output encoding, architecture, and class to produce the optimal results. In this investigation we have explored and evaluated the effect of several conventional and newly proposed input encodings and selected an optimal architecture. We considered 11 variations of input encoding, 11 alternative window sizes, and 7 different architectures. In total, we evaluated 2,541 permutations in application to the training and testing of more than 10,000 protein structures over the course of 3 months. Our investigations concluded that one-hot encoding, the use of LSTMs, and window sizes of 9, 11, and 15 produce the optimal outcome. Through this optimization, we were able to improve the quality of protein structure prediction by predicting the φ dihedrals to within 14° - 16° and ψ dihedrals to within 23°- 25°. This is a notable improvement compared to previously similar investigations. △ Less

Submitted 2 August, 2020; originally announced August 2020.

arXiv:2008.00018 [pdf, ps, other]

Process of Efficiently Parallelizing a Protein Structure Determination Algorithm

Authors: Michael Bryson, Xijiang Miao, Homayoun Valafar

Abstract: Computational protein structure determination involves optimization in a problem space much too large to exhaustively search. Existing approaches include optimization algorithms such as gradient descent and simulated annealing, but these typically only find local minima. One novel approach implemented in REDcRAFT is to instead of folding a protein all at the same time, fold it residue by residue.… ▽ More Computational protein structure determination involves optimization in a problem space much too large to exhaustively search. Existing approaches include optimization algorithms such as gradient descent and simulated annealing, but these typically only find local minima. One novel approach implemented in REDcRAFT is to instead of folding a protein all at the same time, fold it residue by residue. This simulates a protein folding as each residue exits from the generating ribosome. While REDcRAFT exponentially reduces the problem space so it can be explored in polynomial time, it is still extremely computationally demanding. This algorithm does have the advantage that most of the execution time is spent in inherently parallelizable code. However, preliminary results from parallel execution indicate that approximately two-thirds of execution time is dedicated to system overhead. Additionally, by carefully analyzing and timing the structure of the program the major bottlenecks can be identified. After addressing these issues, REDcRAFT becomes a scalable parallel application with nearly two orders of magnitude improvement. △ Less

Submitted 31 July, 2020; originally announced August 2020.

Comments: 7 pages published in PDPA2006

Journal ref: PDPTA 2006: 320-326

arXiv:2007.13469 [pdf]

A Preliminary Investigation in the Molecular Basis of Host Shutoff Mechanism in SARS-CoV

Authors: Niharika Pandala, Casey A. Cole, Devaun McFarland, Anita Nag, Homayoun Valafar

Abstract: Recent events leading to the worldwide pandemic of COVID-19 have demonstrated the effective use of genomic sequencing technologies to establish the genetic sequence of this virus. In contrast, the COVID-19 pandemic has demonstrated the absence of computational approaches to understand the molecular basis of this infection rapidly. Here we present an integrated approach to the study of the nsp1 pro… ▽ More Recent events leading to the worldwide pandemic of COVID-19 have demonstrated the effective use of genomic sequencing technologies to establish the genetic sequence of this virus. In contrast, the COVID-19 pandemic has demonstrated the absence of computational approaches to understand the molecular basis of this infection rapidly. Here we present an integrated approach to the study of the nsp1 protein in SARS-CoV-1, which plays an essential role in maintaining the expression of viral proteins and further disabling the host protein expression, also known as the host shutoff mechanism. We present three independent methods of evaluating two potential binding sites speculated to participate in host shutoff by nsp1. We have combined results from computed models of nsp1, with deep mining of all existing protein structures (using PDBMine), and binding site recognition (using msTALI) to examine the two sites consisting of residues 55-59 and 73-80. Based on our preliminary results, we conclude that the residues 73-80 appear as the regions that facilitate the critical initial steps in the function of nsp1. Given the 90% sequence identity between nsp1 from SARS-CoV-1 and SARS-CoV-2, we conjecture the same critical initiation step in the function of COVID-19 nsp1. △ Less

Submitted 23 July, 2020; originally announced July 2020.

Comments: Consists of 9 pages, 8 figures and 7 tables. 11th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics 2020

arXiv:2005.10739 [pdf]

Assessing the Precision and Recall of msTALI as Applied to an Active-Site Study on Fold Families

Authors: Devaun McFarland, Homayoun Valafar

Abstract: Proteins execute various activities required by biological cells. Further, they structurally support and pro-mote important biochemical reactions which functionally are sparked by active-sites. Active-sites are regions where reac-tions and binding events take place directly; they foster pro-tein purpose. Describing functional relationships depends on factors that incorporate sequence, structure, a… ▽ More Proteins execute various activities required by biological cells. Further, they structurally support and pro-mote important biochemical reactions which functionally are sparked by active-sites. Active-sites are regions where reac-tions and binding events take place directly; they foster pro-tein purpose. Describing functional relationships depends on factors that incorporate sequence, structure, and the biochem-ical properties of amino acids that form proteins. Our ap-proach to active-site description is computational, and many other approaches utilizing available protein data fall short of ideal. Successful recognition of functional interactions is cru-cial to advancements in protein annotation and the bioinfor-matics field at large. This research outlines our Multiple Structure Torsion Angle Alignment (msTALI) as a suitable strategy for addressing active-site identification by comparing results to other existing methods. Specifically, we address the precision of msTALI across three protein families. Our target proteins are PDBIDs 1A2B, 1B4V, 1B8S, 1COY, 1CXZ, 3COX, 1D7E, 1DPF, 1F9I, 1FTN, 1IJH, 1KOU, 1NWZ, 2PHY, and 1SIC. △ Less

Submitted 7 May, 2020; originally announced May 2020.

Comments: 8 pages, 3 figures, 5 tables. This is an extended version of a similar abridged or short version used in conference

arXiv:2003.05406 [pdf]

doi 10.1109/BIBM.2011.53

An Artificial Neural Network Based Approach for Identification of Native Protein Structures using an Extended ForceField

Authors: Timothy Matthew Fawcett, Stephanie Irausquin, Mikhail Simin, Homayoun Valafar

Abstract: Current protein forcefields like the ones seen in CHARMM or Xplor-NIH have many terms that include bonded and non-bonded terms. Yet the forcefields do not take into account the use of hydrogen bonds which are important for secondary structure creation and stabilization of proteins. SCOPE is an open-source program that generates proteins from rotamer space. It then creates a forcefield that uses on… ▽ More Current protein forcefields like the ones seen in CHARMM or Xplor-NIH have many terms that include bonded and non-bonded terms. Yet the forcefields do not take into account the use of hydrogen bonds which are important for secondary structure creation and stabilization of proteins. SCOPE is an open-source program that generates proteins from rotamer space. It then creates a forcefield that uses only non-bonded and hydrogen bond energy terms to create a profile for a given protein. The profiles can then be used in an artificial neural network to create a linear model that is funneled to the true protein conformation. △ Less

Submitted 5 March, 2020; originally announced March 2020.

Journal ref: 2011 IEEE International Conference on Bioinformatics and Biomedicine, 500-505

arXiv:2001.03092 [pdf]

De Novo Assembly of Uca minax Transcriptome from Next Generation Sequencing

Authors: Hanin Omar, Casey A. Cole, Arjang Fahim, Giuliana Gusmaroli, Stephen Borgianini, Homayoun Valafar

Abstract: High-throughput cDNA sequencing (RNA-seq) is a very powerful technique to quantify gene expression in an unbiased way. The Crustacean family is among the groups of organisms sparsely represented in current genomic databases. Here we present transcriptome data from Uca minax (red-jointed fiddler crab) as an opportunity to extend our knowledge. Next generation sequencing was performed on six tissue… ▽ More High-throughput cDNA sequencing (RNA-seq) is a very powerful technique to quantify gene expression in an unbiased way. The Crustacean family is among the groups of organisms sparsely represented in current genomic databases. Here we present transcriptome data from Uca minax (red-jointed fiddler crab) as an opportunity to extend our knowledge. Next generation sequencing was performed on six tissue samples from Uca minax using the Illumina HiSeq system. Six Transcriptome libraries were created using Trinity; a free, open-source software tool for de novo transcriptome assembly of high-throughput mRNA sequencing (RNA-seq) data with the absence of a reference genome. In addition, several tools that aid in management of data were used, such as RSEM, Bowtie, Blast, and IGV; a tool for visualizing RNA-seq analysis results. Fast quality control (FastQC) analysis of the raw sequenced files revealed that both adapter and PCR primer sequences were prevalently present, which may require a preprocessing step. △ Less

Submitted 9 January, 2020; originally announced January 2020.

Comments: 8 pages. BioComp 2015

arXiv:2001.03088 [pdf]

An Investigation of Minimum Data Requirement for Successful Structure Determination of Pf2048.1 with REDCRAFT

Authors: Casey A. Cole, Daniela Ishimaru, Mirko Hennig, Homayoun Valafar

Abstract: Traditional approaches to elucidation of protein structures by NMR spectroscopy rely on distance restraints also know as nuclear Overhauser effects (NOEs). The use of NOEs as the primary source of structure determination by NMR spectroscopy is time consuming and expensive. Residual Dipolar Couplings (RDCs) have become an alternate approach for structure calculation by NMR spectroscopy. In this wor… ▽ More Traditional approaches to elucidation of protein structures by NMR spectroscopy rely on distance restraints also know as nuclear Overhauser effects (NOEs). The use of NOEs as the primary source of structure determination by NMR spectroscopy is time consuming and expensive. Residual Dipolar Couplings (RDCs) have become an alternate approach for structure calculation by NMR spectroscopy. In this work we report our results for structure calculation of the novel protein PF2048.1 from RDC data and establish the minimum data requirement for successful structure calculation using the software package REDCRAFT. Our investigations start with utilizing four sets of synthetic RDC data in two alignment media and proceed by reducing the RDC data to the final limit of {CN, NH} and {NH} from two alignment media respectively. Our results indicate that structure elucidation of this protein is possible with as little as {CN, NH} and {NH} to within 0.533Å of the target structure. △ Less

Submitted 9 January, 2020; originally announced January 2020.

Comments: 8 pages. BioComp 2015

arXiv:1911.10978 [pdf]

Modelling of Sickle Cell Anemia Patients Response to Hydroxyurea using Artificial Neural Networks

Authors: Brendan E. Odigwe, Jesuloluwa S. Eyitayo, Celestine I. Odigwe, Homayoun Valafar

Abstract: Hydroxyurea (HU) has been shown to be effective in alleviating the symptoms of Sickle Cell Anemia disease. While Hydroxyurea reduces the complications associated with Sickle Cell Anemia in some patients, others do not benefit from this drug and experience deleterious effects since it is also a chemotherapeutic agent. Therefore, to whom, should the administration of HU be considered as a viable opt… ▽ More Hydroxyurea (HU) has been shown to be effective in alleviating the symptoms of Sickle Cell Anemia disease. While Hydroxyurea reduces the complications associated with Sickle Cell Anemia in some patients, others do not benefit from this drug and experience deleterious effects since it is also a chemotherapeutic agent. Therefore, to whom, should the administration of HU be considered as a viable option, is the main question asked by the responsible physician. We address this question by develo** modeling techniques that can predict a patient's response to HU and therefore spare the non-responsive patients from the unnecessary effects of HU on the values of 22 parameters that can be obtained from blood samples in 122 patients. Using this data, we developed Deep Artificial Neural Network models that can predict with 92.6% accuracy, the final HbF value of a subject after undergoing HU therapy. Our current studies are focussing on forecasting a patient's HbF response, 30 days ahead of time. △ Less

Submitted 25 November, 2019; originally announced November 2019.

Comments: 7 Pages, 9 figures, Int'l Conf. Health Informatics and Medical Systems | HIMS'19 |, Las Vegas, NV, July 2019

arXiv:1911.08614 [pdf]

PDBMine: A Reformulation of the Protein Data Bank to Facilitate Structural Data Mining

Authors: Casey A Cole, Christopher Ott, Diego Valdes, Homayoun Valafar

Abstract: Large scale initiatives such as the Human Genome Project, Structural Genomics, and individual research teams have provided large deposits of genomic and proteomic data. The transfer of data to knowledge has become one of the existing challenges, which is a consequence of capturing data in databases that are optimally designed for archiving and not mining. In this research, we have targeted the Pro… ▽ More Large scale initiatives such as the Human Genome Project, Structural Genomics, and individual research teams have provided large deposits of genomic and proteomic data. The transfer of data to knowledge has become one of the existing challenges, which is a consequence of capturing data in databases that are optimally designed for archiving and not mining. In this research, we have targeted the Protein Databank (PDB) and demonstrated a transformation of its content, named PDBMine, that reduces storage space by an order of magnitude, and allows for powerful mining in relation to the topic of protein structure determination. We have demonstrated the utility of PDBMine in exploring the prevalence of dimeric and trimeric amino acid sequences and provided a mechanism of predicting protein structure. △ Less

Submitted 19 November, 2019; originally announced November 2019.

Comments: 6 pages, 8 figures, IEEE Annual Conf. on Computational Science & Computational Intelligence (CSCI), December 2019

arXiv:1911.08612 [pdf]

Improvements of the REDCRAFT Software Package

Authors: Casey A Cole, Caleb Parks, Julian Rachele, Homayoun Valafar

Abstract: Traditional approaches to elucidation of protein structures by NMR spectroscopy rely on distance restraints also known as nuclear Overhauser effects (NOEs). The use of NOEs as the primary source of structure determination by NMR spectroscopy is time consuming and expensive. Residual Dipolar Couplings (RDCs) have become an alternate approach for structure calculation by NMR spectroscopy. In previou… ▽ More Traditional approaches to elucidation of protein structures by NMR spectroscopy rely on distance restraints also known as nuclear Overhauser effects (NOEs). The use of NOEs as the primary source of structure determination by NMR spectroscopy is time consuming and expensive. Residual Dipolar Couplings (RDCs) have become an alternate approach for structure calculation by NMR spectroscopy. In previous works, the software package REDCRAFT has been presented as a means of harnessing the information containing in RDCs for structure calculation of proteins. In this work, we present significant improvements to the REDCRAFT package including: refinement of the decimation procedure, the inclusion of graphical user interface, adoption of NEF standards, and addition of scripts for enhanced protein modeling options. The improvements to REDCRAFT have resulted in the ability to fold proteins that the previous versions were unable to fold. For instance, we report the results of folding of the protein 1A1Z in the presence of highly erroneous data. △ Less

Submitted 19 November, 2019; originally announced November 2019.

Comments: 7 pages, 5 figures, Int'l Conf. Bioinformatics and Computational Biology (BIOCOMP'19), Las Vegas, NV, August 2019

arXiv:1911.02406 [pdf]

Aligning Multiple Protein Structures using Biochemical and Biophysical Properties

Authors: Paul Shealy, Homayoun Valafar

Abstract: Aligning multiple protein structures can yield valuable information about structural similarities among related proteins, as well as provide insight into evolutionary relationships between proteins in a family. We have developed an algorithm (msTALI) for aligning multiple protein structures using biochemical and biophysical properties, including torsion angles, secondary structure, hydrophobicity,… ▽ More Aligning multiple protein structures can yield valuable information about structural similarities among related proteins, as well as provide insight into evolutionary relationships between proteins in a family. We have developed an algorithm (msTALI) for aligning multiple protein structures using biochemical and biophysical properties, including torsion angles, secondary structure, hydrophobicity, and surface accessibility. The algorithm is a progressive alignment algorithm motivated by popular techniques from multiple sequence alignment. It has demonstrated success in aligning the major structural regions of a set of proteins from the s/r kinase family. The algorithm was also successful at aligning functional residues of these proteins. In addition, the algorithm was also successful in aligning seven members of the acyl carrier protein family, including both experimentally derived as well as computationally modeled structures. △ Less

Submitted 6 November, 2019; originally announced November 2019.

Comments: BioComp 2009, 7 pages

arXiv:1911.02396 [pdf]

Using Residual Dipolar Couplings from Two Alignment Media to Detect Structural Homology

Authors: Ryan Yandle, Rishi Mukhopadhyay, Homayoun Valafar

Abstract: The method of Probability Density Profile Analysis has been introduced previously as a tool to find the best match between a set of experimentally generated Residual Dipolar Couplings and a set of known protein structures. While it proved effective on small databases in identifying protein fold families, and for picking the best result from computational protein folding tool ROBETTA, for larger da… ▽ More The method of Probability Density Profile Analysis has been introduced previously as a tool to find the best match between a set of experimentally generated Residual Dipolar Couplings and a set of known protein structures. While it proved effective on small databases in identifying protein fold families, and for picking the best result from computational protein folding tool ROBETTA, for larger data sets, more data is required. Here, the method of 2-D Probability Density Profile Analysis is presented which incorporates paired RDC data from 2 alignment media for N-H vectors. The method was tested using synthetic RDC data generated with +/-1 Hz error. The results show that the addition of information from a second alignment medium makes 2-D PDPA a much more effective tool that is able to identify a structure from a database of 600 protein fold family representatives. △ Less

Submitted 6 November, 2019; originally announced November 2019.

Comments: BioComp 2009, 6 pages

arXiv:1911.00526 [pdf]

Automated Assignment of Backbone Resonances Using Residual Dipolar Couplings Acquired from a Protein with Known Structure

Authors: P. Shealy, R. Mukhopadhyay, S. Smith, H. Valafar

Abstract: Resonance assignment is a critical first step in the investigation of protein structures using NMR spectroscopy. The development of assignment methods that require less experimental data is possible with prior knowledge of the macromolecular structure. Automated methods of performing the task of resonance assignment can significantly reduce the financial cost and time requirement for protein struc… ▽ More Resonance assignment is a critical first step in the investigation of protein structures using NMR spectroscopy. The development of assignment methods that require less experimental data is possible with prior knowledge of the macromolecular structure. Automated methods of performing the task of resonance assignment can significantly reduce the financial cost and time requirement for protein structure determination. Such methods can also be beneficial in validating a protein's solution state structure. Here we present a new approach to the assignment problem. Our approach uses only RDC data to assign backbone resonances. It provides simultaneous order tensor estimation and assignment. Our approach compares independent order tensor estimates to determine when the correct order tensor has been found. We demonstrate the algorithm's viability using simulated data from the protein domain 1A1Z. △ Less

Submitted 1 November, 2019; originally announced November 2019.

Comments: BioComp 2008, 7 pages

arXiv:1911.00383 [pdf]

Protein Fold Family Recognition From Unassigned Residual Dipolar Coupling Data

Authors: Rishi Mukhopadhyay, Paul Shealy, Homayoun Valafar

Abstract: Despite many advances in computational modeling of protein structures, these methods have not been widely utilized by experimental structural biologists. Two major obstacles are preventing the transition from a purely-experimental to a purely-computational mode of protein structure determination. The first problem is that most computational methods need a large library of computed structures that… ▽ More Despite many advances in computational modeling of protein structures, these methods have not been widely utilized by experimental structural biologists. Two major obstacles are preventing the transition from a purely-experimental to a purely-computational mode of protein structure determination. The first problem is that most computational methods need a large library of computed structures that span a large variety of protein fold families, while structural genomics initiatives have slowed in their ability to provide novel protein folds in recent years. The second problem is an unwillingness to trust computational models that have no experimental backing. In this paper we test a potential solution to these problems that we have called Probability Density Profile Analysis (PDPA) that utilizes unassigned residual dipolar coupling data that are relatively cheap to acquire from NMR experiments. △ Less

Submitted 1 November, 2019; originally announced November 2019.

Comments: BioComp 2008, 7 pages

arXiv:1910.14469 [pdf]

Minimum Data Requirements and Supplemental Angle Constraints for Protein Structure Prediction with REDCRAFT

Authors: E. Timko, P. Shealy, M. Bryson, H. Valafar

Abstract: One algorithm to predict protein structure is the residual dipolar coupling based residue assembly and filter tool (REDCRAFT). This algorithm exploits an exponential reduction of the search space of all possible structures to find a structure that best fits a set of experimental residual dipolar couplings. However, the minimum amount of data required to successfully determine a protein's structure… ▽ More One algorithm to predict protein structure is the residual dipolar coupling based residue assembly and filter tool (REDCRAFT). This algorithm exploits an exponential reduction of the search space of all possible structures to find a structure that best fits a set of experimental residual dipolar couplings. However, the minimum amount of data required to successfully determine a protein's structure using REDCRAFT has not been previously investigated. Here we explore the effect of reducing the amount of data used to fold proteins. Our goal is to reduce experimental data collection times while retaining the accuracy levels previously achieved with larger amounts of data. We also investigate incorporating a priori secondary structure information into REDCRAFT to improve its structure prediction ability. △ Less

Submitted 6 November, 2019; v1 submitted 31 October, 2019; originally announced October 2019.

Comments: 7 pages, BioComp 2008

Showing 1–21 of 21 results for author: Valafar, H