Search | arXiv e-print repository

Scoreformer: A Surrogate Model For Large-Scale Prediction of Docking Scores

Authors: Álvaro Ciudad, Adrián Morales-Pastor, Laura Malo, Isaac Filella-Mercè, Victor Guallar, Alexis Molina

Abstract: In this study, we present ScoreFormer, a novel graph transformer model designed to accurately predict molecular docking scores, thereby optimizing high-throughput virtual screening (HTVS) in drug discovery. The architecture integrates Principal Neighborhood Aggregation (PNA) and Learnable Random Walk Positional Encodings (LRWPE), enhancing the model's ability to understand complex molecular struct… ▽ More In this study, we present ScoreFormer, a novel graph transformer model designed to accurately predict molecular docking scores, thereby optimizing high-throughput virtual screening (HTVS) in drug discovery. The architecture integrates Principal Neighborhood Aggregation (PNA) and Learnable Random Walk Positional Encodings (LRWPE), enhancing the model's ability to understand complex molecular structures and their relationship with their respective docking scores. This approach significantly surpasses traditional HTVS methods and recent Graph Neural Network (GNN) models in both recovery and efficiency due to a wider coverage of the chemical space and enhanced performance. Our results demonstrate that ScoreFormer achieves competitive performance in docking score prediction and offers a substantial 1.65-fold reduction in inference time compared to existing models. We evaluated ScoreFormer across multiple datasets under various conditions, confirming its robustness and reliability in identifying potential drug candidates rapidly. △ Less

Submitted 25 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: Accepted at the 1st Machine Learning for Life and Material Sciences Workshop at ICML 2024

arXiv:2406.07249 [pdf, other]

Are Protein Language Models Compute Optimal?

Authors: Yaiza Serrano, Álvaro Ciudad, Alexis Molina

Abstract: While protein language models (pLMs) have transformed biological research, the scaling laws governing their improvement remain underexplored. By adapting methodologies from NLP scaling laws, we investigated the optimal ratio between model parameters and training tokens within a fixed compute budget. Our study reveals that pLM sizes scale sublinearly with compute budget, showing diminishing returns… ▽ More While protein language models (pLMs) have transformed biological research, the scaling laws governing their improvement remain underexplored. By adapting methodologies from NLP scaling laws, we investigated the optimal ratio between model parameters and training tokens within a fixed compute budget. Our study reveals that pLM sizes scale sublinearly with compute budget, showing diminishing returns in performance as model size increases, and we identify a performance plateau in training loss comparable to the one found in relevant works in the field. Our findings suggest that widely-used pLMs might not be compute-optimal, indicating that larger models could achieve convergence more efficiently. Training a 35M model on a reduced token set, we attained perplexity results comparable to larger models like ESM-2 (15B) and xTrimoPGLM (100B) with a single dataset pass. This work paves the way towards more compute-efficient pLMs, democratizing their training and practical application in computational biology. △ Less

Submitted 26 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: Proceedings of the ICML 2024 Workshop on Accessible and Efficient Foundation Models for Biological Discovery, Vienna, Austria. 2024

arXiv:2404.06481 [pdf, other]

GeoDirDock: Guiding Docking Along Geodesic Paths

Authors: Raúl Miñán, Javier Gallardo, Álvaro Ciudad, Alexis Molina

Abstract: This work introduces GeoDirDock (GDD), a novel approach to molecular docking that enhances the accuracy and physical plausibility of ligand docking predictions. GDD guides the denoising process of a diffusion model along geodesic paths within multiple spaces representing translational, rotational, and torsional degrees of freedom. Our method leverages expert knowledge to direct the generative mode… ▽ More This work introduces GeoDirDock (GDD), a novel approach to molecular docking that enhances the accuracy and physical plausibility of ligand docking predictions. GDD guides the denoising process of a diffusion model along geodesic paths within multiple spaces representing translational, rotational, and torsional degrees of freedom. Our method leverages expert knowledge to direct the generative modeling process, specifically targeting desired protein-ligand interaction regions. We demonstrate that GDD significantly outperforms existing blind docking methods in terms of RMSD accuracy and physicochemical pose realism. Our results indicate that incorporating domain expertise into the diffusion process leads to more biologically relevant docking predictions. Additionally, we explore the potential of GDD for lead optimization in drug discovery through angle transfer in maximal common substructure (MCS) docking, showcasing its capability to predict ligand orientations for chemically similar compounds accurately. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: Generative and Experimental Perspectives for Biomolecular Design Workshop at ICLR 2024

arXiv:q-bio/0610045 [pdf, ps, other]

doi 10.1007/s10867-006-9028-6

Kinesin as an electrostatic machine

Authors: A. Ciudad, J. M. Sancho, G. P. Tsironis

Abstract: Kinesin and related motor proteins utilize ATP fuel to propel themselves along the external surface of microtubules in a processive and directional fashion. We show that the observed step-like motion is possible through time varying charge distributions furnished by the ATP hydrolysis circle while the static charge configuration on the microtuble provides the guide for motion. Thus, while the ch… ▽ More Kinesin and related motor proteins utilize ATP fuel to propel themselves along the external surface of microtubules in a processive and directional fashion. We show that the observed step-like motion is possible through time varying charge distributions furnished by the ATP hydrolysis circle while the static charge configuration on the microtuble provides the guide for motion. Thus, while the chemical hydrolysis energy induces appropriate local conformational changes, the motor translational energy is fundamentally electrostatic. Numerical simulations of the mechanical equations of motion show that processivity and directionality are direct consequences of the ATP-dependent electrostatic interaction between the different charge distributions of kinesin and microtubule. △ Less

Submitted 24 October, 2006; originally announced October 2006.

Comments: 6 pages, 3 figures. To appear in the Journal of Biological Physics

arXiv:q-bio/0602011 [pdf, ps, other]

doi 10.1016/j.physa.2006.04.099

Dynamics of an inchworm nano-walker

Authors: A. Ciudad, J. M. Sancho, A. M. Lacasta

Abstract: An inchworm processive mechanism is proposed to explain the motion of dimeric molecular motors such as kinesin. We present here preliminary results for this mechanism focusing on observables like mean velocity, coupling ratio and efficiency versus ATP concentration and the external load F. An inchworm processive mechanism is proposed to explain the motion of dimeric molecular motors such as kinesin. We present here preliminary results for this mechanism focusing on observables like mean velocity, coupling ratio and efficiency versus ATP concentration and the external load F. △ Less

Submitted 9 February, 2006; originally announced February 2006.

Comments: 6 pages, 2 figures

Journal ref: Physica A, Volume 371, Issue 1, 1 November 2006, Pages 25-28

Showing 1–5 of 5 results for author: Ciudad, Á