-
Improved prediction of ligand-protein binding affinities by meta-modeling
Authors:
Ho-Joon Lee,
Prashant S. Emani,
Mark B. Gerstein
Abstract:
The accurate screening of candidate drug ligands against target proteins through computational approaches is of prime interest to drug development efforts. Such virtual screening depends in part on methods to predict the binding affinity between ligands and proteins. Many computational models for binding affinity prediction have been developed, but with varying results across targets. Given that e…
▽ More
The accurate screening of candidate drug ligands against target proteins through computational approaches is of prime interest to drug development efforts. Such virtual screening depends in part on methods to predict the binding affinity between ligands and proteins. Many computational models for binding affinity prediction have been developed, but with varying results across targets. Given that ensembling or meta-modeling methods have shown great promise in reducing model-specific biases, we develop a framework to integrate published force-field-based empirical docking and sequence-based deep learning models. In building this framework, we evaluate many combinations of individual base models, training databases, and several meta-modeling approaches. We show that many of our meta-models significantly improve affinity predictions over base models. Our best meta-models achieve comparable performance to state-of-the-art deep learning tools exclusively based on structures, while allowing for improved database scalability and flexibility through the explicit inclusion of features such as physicochemical properties or molecular descriptors. Overall, we demonstrate that diverse modeling approaches can be ensembled together to gain improvement in binding affinity prediction.
△ Less
Submitted 18 May, 2024; v1 submitted 5 October, 2023;
originally announced October 2023.
-
Quantum Computing at the Frontiers of Biological Sciences
Authors:
Prashant S. Emani,
Jonathan Warrell,
Alan Anticevic,
Stefan Bekiranov,
Michael Gandal,
Michael J. McConnell,
Guillermo Sapiro,
Alán Aspuru-Guzik,
Justin Baker,
Matteo Bastiani,
Patrick McClure,
John Murray,
Stamatios N Sotiropoulos,
Jacob Taylor,
Geetha Senthil,
Thomas Lehner,
Mark B. Gerstein,
Aram W. Harrow
Abstract:
The search for meaningful structure in biological data has relied on cutting-edge advances in computational technology and data science methods. However, challenges arise as we push the limits of scale and complexity in biological problems. Innovation in massively parallel, classical computing hardware and algorithms continues to address many of these challenges, but there is a need to simultaneou…
▽ More
The search for meaningful structure in biological data has relied on cutting-edge advances in computational technology and data science methods. However, challenges arise as we push the limits of scale and complexity in biological problems. Innovation in massively parallel, classical computing hardware and algorithms continues to address many of these challenges, but there is a need to simultaneously consider new paradigms to circumvent current barriers to processing speed. Accordingly, we articulate a view towards quantum computation and quantum information science, where algorithms have demonstrated potential polynomial and exponential computational speedups in certain applications, such as machine learning. The maturation of the field of quantum computing, in hardware and algorithm development, also coincides with the growth of several collaborative efforts to address questions across length and time scales, and scientific disciplines. We use this coincidence to explore the potential for quantum computing to aid in one such endeavor: the merging of insights from genetics, genomics, neuroimaging and behavioral phenoty**. By examining joint opportunities for computational innovation across fields, we highlight the need for a common language between biological data analysis and quantum computing. Ultimately, we consider current and future prospects for the employment of quantum computing algorithms in the biological sciences.
△ Less
Submitted 16 November, 2019;
originally announced November 2019.
-
Copy Number Variants and Segmental Duplications Show Different Formation Signatures
Authors:
Philip M. Kim,
Jan O. Korbel,
Xueying Chen,
Mark B. Gerstein
Abstract:
In addition to variation in terms of single nucleotide polymorphisms (SNPs), whole regions ranging from several kilobases up to a megabase in length differ in copy number among individuals. These differences are referred to as Copy Number Variants (CNVs) and extensive map** of these is underway. Recent studies have highlighted their great prevalence in the human genome. Segmental Duplications…
▽ More
In addition to variation in terms of single nucleotide polymorphisms (SNPs), whole regions ranging from several kilobases up to a megabase in length differ in copy number among individuals. These differences are referred to as Copy Number Variants (CNVs) and extensive map** of these is underway. Recent studies have highlighted their great prevalence in the human genome. Segmental Duplications (SDs) are long (>1kb) stretches of duplicated DNA with high sequence identity. First, we analyzed the co-localization of SDs and find that SDs are significantly co-localized with each other, resulting in a power-law distribution, which suggests a preferential attachment mechanism, i.e. existing SDs are likely to be involved in creating new ones nearby. Second, we look at the relationship of CNVs/SDs with various types of repeats. We we find that the previously recognized association of SDs with Alu elements is significantly stronger for older SDs and is sharply decreasing for younger ones. While it might be expected that the patterns should be similar for SDs and CNVs, we find, surprisingly, no association of CNVs with Alu elements. This trend is consistent with the decreasing correlation between Alu elements and younger SDs, the activity of Alu elements has been decreasing and by now it they seem no longer active. Furthermore, we find a striking association of SDs with processed pseudogenes suggesting that they may also have mediated SD formation. Moreover, find strong association with microsatellites for both SDs and CNVs that suggests a role for satellites in the formation of both.
△ Less
Submitted 26 September, 2007;
originally announced September 2007.
-
Comparing Classical Pathways and Modern Networks: Towards the Development of an Edge Ontology
Authors:
Long J. Lu,
Andrea Sboner,
Yuanpeng J. Huang,
Hao Xin Lu,
Tara A. Gianoulis,
Kevin Y. Yip,
Philip M. Kim,
Gaetano T. Montelione,
Mark B. Gerstein
Abstract:
Pathways are integral to systems biology. Their classical representation has proven useful but is inconsistent in the meaning assigned to each arrow (or edge) and inadvertently implies the isolation of one pathway from another. Conversely, modern high-throughput experiments give rise to standardized networks facilitating topological calculations. Combining these perspectives, we can embed classi…
▽ More
Pathways are integral to systems biology. Their classical representation has proven useful but is inconsistent in the meaning assigned to each arrow (or edge) and inadvertently implies the isolation of one pathway from another. Conversely, modern high-throughput experiments give rise to standardized networks facilitating topological calculations. Combining these perspectives, we can embed classical pathways within large-scale networks and thus demonstrate the crosstalk between them. As more diverse types of high-throughput data become available, we can effectively merge both perspectives, embedding pathways simultaneously in multiple networks. However, the original problem still remains - the current edge representation is inadequate to accurately convey all the information in pathways. Therefore, we suggest that a standardized, well-defined, edge ontology is necessary and propose a prototype here, as a starting point for reaching this goal.
△ Less
Submitted 1 June, 2007;
originally announced June 2007.