Search | arXiv e-print repository

Improved prediction of ligand-protein binding affinities by meta-modeling

Authors: Ho-Joon Lee, Prashant S. Emani, Mark B. Gerstein

Abstract: The accurate screening of candidate drug ligands against target proteins through computational approaches is of prime interest to drug development efforts. Such virtual screening depends in part on methods to predict the binding affinity between ligands and proteins. Many computational models for binding affinity prediction have been developed, but with varying results across targets. Given that e… ▽ More The accurate screening of candidate drug ligands against target proteins through computational approaches is of prime interest to drug development efforts. Such virtual screening depends in part on methods to predict the binding affinity between ligands and proteins. Many computational models for binding affinity prediction have been developed, but with varying results across targets. Given that ensembling or meta-modeling methods have shown great promise in reducing model-specific biases, we develop a framework to integrate published force-field-based empirical docking and sequence-based deep learning models. In building this framework, we evaluate many combinations of individual base models, training databases, and several meta-modeling approaches. We show that many of our meta-models significantly improve affinity predictions over base models. Our best meta-models achieve comparable performance to state-of-the-art deep learning tools exclusively based on structures, while allowing for improved database scalability and flexibility through the explicit inclusion of features such as physicochemical properties or molecular descriptors. Overall, we demonstrate that diverse modeling approaches can be ensembled together to gain improvement in binding affinity prediction. △ Less

Submitted 18 May, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: 52 pages, 5 main tables, 6 main figures, 7 supplementary figures, and supporting information. For 11 supplementary tables and code, see https://github.com/Lee1701/Lee2023a

arXiv:1911.07127 [pdf]

doi 10.1038/s41592-020-01004-3

Quantum Computing at the Frontiers of Biological Sciences

Authors: Prashant S. Emani, Jonathan Warrell, Alan Anticevic, Stefan Bekiranov, Michael Gandal, Michael J. McConnell, Guillermo Sapiro, Alán Aspuru-Guzik, Justin Baker, Matteo Bastiani, Patrick McClure, John Murray, Stamatios N Sotiropoulos, Jacob Taylor, Geetha Senthil, Thomas Lehner, Mark B. Gerstein, Aram W. Harrow

Abstract: The search for meaningful structure in biological data has relied on cutting-edge advances in computational technology and data science methods. However, challenges arise as we push the limits of scale and complexity in biological problems. Innovation in massively parallel, classical computing hardware and algorithms continues to address many of these challenges, but there is a need to simultaneou… ▽ More The search for meaningful structure in biological data has relied on cutting-edge advances in computational technology and data science methods. However, challenges arise as we push the limits of scale and complexity in biological problems. Innovation in massively parallel, classical computing hardware and algorithms continues to address many of these challenges, but there is a need to simultaneously consider new paradigms to circumvent current barriers to processing speed. Accordingly, we articulate a view towards quantum computation and quantum information science, where algorithms have demonstrated potential polynomial and exponential computational speedups in certain applications, such as machine learning. The maturation of the field of quantum computing, in hardware and algorithm development, also coincides with the growth of several collaborative efforts to address questions across length and time scales, and scientific disciplines. We use this coincidence to explore the potential for quantum computing to aid in one such endeavor: the merging of insights from genetics, genomics, neuroimaging and behavioral phenoty**. By examining joint opportunities for computational innovation across fields, we highlight the need for a common language between biological data analysis and quantum computing. Ultimately, we consider current and future prospects for the employment of quantum computing algorithms in the biological sciences. △ Less

Submitted 16 November, 2019; originally announced November 2019.

Comments: 22 pages, 3 figures, Perspective

Journal ref: Nature Methods (2021)

arXiv:0709.4200 [pdf]

Copy Number Variants and Segmental Duplications Show Different Formation Signatures

Authors: Philip M. Kim, Jan O. Korbel, Xueying Chen, Mark B. Gerstein

Abstract: In addition to variation in terms of single nucleotide polymorphisms (SNPs), whole regions ranging from several kilobases up to a megabase in length differ in copy number among individuals. These differences are referred to as Copy Number Variants (CNVs) and extensive map** of these is underway. Recent studies have highlighted their great prevalence in the human genome. Segmental Duplications… ▽ More In addition to variation in terms of single nucleotide polymorphisms (SNPs), whole regions ranging from several kilobases up to a megabase in length differ in copy number among individuals. These differences are referred to as Copy Number Variants (CNVs) and extensive map** of these is underway. Recent studies have highlighted their great prevalence in the human genome. Segmental Duplications (SDs) are long (>1kb) stretches of duplicated DNA with high sequence identity. First, we analyzed the co-localization of SDs and find that SDs are significantly co-localized with each other, resulting in a power-law distribution, which suggests a preferential attachment mechanism, i.e. existing SDs are likely to be involved in creating new ones nearby. Second, we look at the relationship of CNVs/SDs with various types of repeats. We we find that the previously recognized association of SDs with Alu elements is significantly stronger for older SDs and is sharply decreasing for younger ones. While it might be expected that the patterns should be similar for SDs and CNVs, we find, surprisingly, no association of CNVs with Alu elements. This trend is consistent with the decreasing correlation between Alu elements and younger SDs, the activity of Alu elements has been decreasing and by now it they seem no longer active. Furthermore, we find a striking association of SDs with processed pseudogenes suggesting that they may also have mediated SD formation. Moreover, find strong association with microsatellites for both SDs and CNVs that suggests a role for satellites in the formation of both. △ Less

Submitted 26 September, 2007; originally announced September 2007.

Comments: 13 pages

arXiv:0706.0194 [pdf]

Comparing Classical Pathways and Modern Networks: Towards the Development of an Edge Ontology

Authors: Long J. Lu, Andrea Sboner, Yuanpeng J. Huang, Hao Xin Lu, Tara A. Gianoulis, Kevin Y. Yip, Philip M. Kim, Gaetano T. Montelione, Mark B. Gerstein

Abstract: Pathways are integral to systems biology. Their classical representation has proven useful but is inconsistent in the meaning assigned to each arrow (or edge) and inadvertently implies the isolation of one pathway from another. Conversely, modern high-throughput experiments give rise to standardized networks facilitating topological calculations. Combining these perspectives, we can embed classi… ▽ More Pathways are integral to systems biology. Their classical representation has proven useful but is inconsistent in the meaning assigned to each arrow (or edge) and inadvertently implies the isolation of one pathway from another. Conversely, modern high-throughput experiments give rise to standardized networks facilitating topological calculations. Combining these perspectives, we can embed classical pathways within large-scale networks and thus demonstrate the crosstalk between them. As more diverse types of high-throughput data become available, we can effectively merge both perspectives, embedding pathways simultaneously in multiple networks. However, the original problem still remains - the current edge representation is inadequate to accurately convey all the information in pathways. Therefore, we suggest that a standardized, well-defined, edge ontology is necessary and propose a prototype here, as a starting point for reaching this goal. △ Less

Submitted 1 June, 2007; originally announced June 2007.

Comments: 30 pages including 5 figures and supplemental material

Showing 1–4 of 4 results for author: Gerstein, M B