Search | arXiv e-print repository

doi 10.1016/j.drudis.2022.103439

Docking-based generative approaches in the search for new drug candidates

Authors: Tomasz Danel, Jan Łęski, Sabina Podlewska, Igor T. Podolak

Abstract: Despite the great popularity of virtual screening of existing compound libraries, the search for new potential drug candidates also takes advantage of generative protocols, where new compound suggestions are enumerated using various algorithms. To increase the activity potency of generative approaches, they have recently been coupled with molecular docking, a leading methodology of structure-based… ▽ More Despite the great popularity of virtual screening of existing compound libraries, the search for new potential drug candidates also takes advantage of generative protocols, where new compound suggestions are enumerated using various algorithms. To increase the activity potency of generative approaches, they have recently been coupled with molecular docking, a leading methodology of structure-based drug design. In this review, we summarize progress since docking-based generative models emerged. We propose a new taxonomy for these methods and discuss their importance for the field of computer-aided drug design. In addition, we discuss the most promising directions for further development of generative protocols coupled with docking. △ Less

Submitted 22 November, 2023; originally announced December 2023.

Journal ref: Drug Discovery Today 28.2 (2023)

arXiv:2210.03745 [pdf, other]

ProGReST: Prototypical Graph Regression Soft Trees for Molecular Property Prediction

Authors: Dawid Rymarczyk, Daniel Dobrowolski, Tomasz Danel

Abstract: In this work, we propose the novel Prototypical Graph Regression Self-explainable Trees (ProGReST) model, which combines prototype learning, soft decision trees, and Graph Neural Networks. In contrast to other works, our model can be used to address various challenging tasks, including compound property prediction. In ProGReST, the rationale is obtained along with prediction due to the model's bui… ▽ More In this work, we propose the novel Prototypical Graph Regression Self-explainable Trees (ProGReST) model, which combines prototype learning, soft decision trees, and Graph Neural Networks. In contrast to other works, our model can be used to address various challenging tasks, including compound property prediction. In ProGReST, the rationale is obtained along with prediction due to the model's built-in interpretability. Additionally, we introduce a new graph prototype projection to accelerate model training. Finally, we evaluate PRoGReST on a wide range of chemical datasets for molecular property prediction and perform in-depth analysis with chemical experts to evaluate obtained interpretations. Our method achieves competitive results against state-of-the-art methods. △ Less

Submitted 27 December, 2022; v1 submitted 7 October, 2022; originally announced October 2022.

Comments: Accepted to SDM2023

arXiv:2110.05841 [pdf, other]

Relative Molecule Self-Attention Transformer

Authors: Łukasz Maziarka, Dawid Majchrowski, Tomasz Danel, Piotr Gaiński, Jacek Tabor, Igor Podolak, Paweł Morkisz, Stanisław Jastrzębski

Abstract: Self-supervised learning holds promise to revolutionize molecule property prediction - a central task to drug discovery and many more industries - by enabling data efficient learning from scarce experimental data. Despite significant progress, non-pretrained methods can be still competitive in certain settings. We reason that architecture might be a key bottleneck. In particular, enriching the bac… ▽ More Self-supervised learning holds promise to revolutionize molecule property prediction - a central task to drug discovery and many more industries - by enabling data efficient learning from scarce experimental data. Despite significant progress, non-pretrained methods can be still competitive in certain settings. We reason that architecture might be a key bottleneck. In particular, enriching the backbone architecture with domain-specific inductive biases has been key for the success of self-supervised learning in other domains. In this spirit, we methodologically explore the design space of the self-attention mechanism tailored to molecular data. We identify a novel variant of self-attention adapted to processing molecules, inspired by the relative self-attention layer, which involves fusing embedded graph and distance relationships between atoms. Our main contribution is Relative Molecule Attention Transformer (R-MAT): a novel Transformer-based model based on the developed self-attention layer that achieves state-of-the-art or very competitive results across a~wide range of molecule property prediction tasks. △ Less

Submitted 12 October, 2021; originally announced October 2021.

arXiv:2107.13214 [pdf, other]

SONG: Self-Organizing Neural Graphs

Authors: Łukasz Struski, Tomasz Danel, Marek Śmieja, Jacek Tabor, Bartosz Zieliński

Abstract: Recent years have seen a surge in research on deep interpretable neural networks with decision trees as one of the most commonly incorporated tools. There are at least three advantages of using decision trees over logistic regression classification models: they are easy to interpret since they are based on binary decisions, they can make decisions faster, and they provide a hierarchy of classes. H… ▽ More Recent years have seen a surge in research on deep interpretable neural networks with decision trees as one of the most commonly incorporated tools. There are at least three advantages of using decision trees over logistic regression classification models: they are easy to interpret since they are based on binary decisions, they can make decisions faster, and they provide a hierarchy of classes. However, one of the well-known drawbacks of decision trees, as compared to decision graphs, is that decision trees cannot reuse the decision nodes. Nevertheless, decision graphs were not commonly used in deep learning due to the lack of efficient gradient-based training techniques. In this paper, we fill this gap and provide a general paradigm based on Markov processes, which allows for efficient training of the special type of decision graphs, which we call Self-Organizing Neural Graphs (SONG). We provide an extensive theoretical study of SONG, complemented by experiments conducted on Letter, Connect4, MNIST, CIFAR, and TinyImageNet datasets, showing that our method performs on par or better than existing decision models. △ Less

Submitted 28 July, 2021; originally announced July 2021.

arXiv:2012.04444 [pdf, other]

Comparison of Atom Representations in Graph Neural Networks for Molecular Property Prediction

Authors: Agnieszka Pocha, Tomasz Danel, Łukasz Maziarka

Abstract: Graph neural networks have recently become a standard method for analysing chemical compounds. In the field of molecular property prediction, the emphasis is now put on designing new model architectures, and the importance of atom featurisation is oftentimes belittled. When contrasting two graph neural networks, the use of different atom features possibly leads to the incorrect attribution of the… ▽ More Graph neural networks have recently become a standard method for analysing chemical compounds. In the field of molecular property prediction, the emphasis is now put on designing new model architectures, and the importance of atom featurisation is oftentimes belittled. When contrasting two graph neural networks, the use of different atom features possibly leads to the incorrect attribution of the results to the network architecture. To provide a better understanding of this issue, we compare multiple atom representations for graph models and evaluate them on the prediction of free energy, solubility, and metabolic stability. To the best of our knowledge, this is the first methodological study that focuses on the relevance of atom representation to the predictive performance of graph neural networks. △ Less

Submitted 12 February, 2021; v1 submitted 23 November, 2020; originally announced December 2020.

Comments: Machine Learning for Molecules Workshop at NeurIPS 2020 (spotlight talk)

arXiv:2010.13914 [pdf, other]

Processing of incomplete images by (graph) convolutional neural networks

Authors: Tomasz Danel, Marek Śmieja, Łukasz Struski, Przemysław Spurek, Łukasz Maziarka

Abstract: We investigate the problem of training neural networks from incomplete images without replacing missing values. For this purpose, we first represent an image as a graph, in which missing pixels are entirely ignored. The graph image representation is processed using a spatial graph convolutional network (SGCN) -- a type of graph convolutional networks, which is a proper generalization of classical… ▽ More We investigate the problem of training neural networks from incomplete images without replacing missing values. For this purpose, we first represent an image as a graph, in which missing pixels are entirely ignored. The graph image representation is processed using a spatial graph convolutional network (SGCN) -- a type of graph convolutional networks, which is a proper generalization of classical CNNs operating on images. On one hand, our approach avoids the problem of missing data imputation while, on the other hand, there is a natural correspondence between CNNs and SGCN. Experiments confirm that our approach performs better than analogical CNNs with the imputation of missing values on typical classification and reconstruction tasks. △ Less

Submitted 26 October, 2020; originally announced October 2020.

arXiv:2006.16955 [pdf, other]

doi 10.1021/acs.jcim.2c01355

We Should at Least Be Able to Design Molecules That Dock Well

Authors: Tobiasz Cieplinski, Tomasz Danel, Sabina Podlewska, Stanislaw Jastrzebski

Abstract: Designing compounds with desired properties is a key element of the drug discovery process. However, measuring progress in the field has been challenging due to the lack of realistic retrospective benchmarks, and the large cost of prospective validation. To close this gap, we propose a benchmark based on docking, a popular computational method for assessing molecule binding to a protein. Concretel… ▽ More Designing compounds with desired properties is a key element of the drug discovery process. However, measuring progress in the field has been challenging due to the lack of realistic retrospective benchmarks, and the large cost of prospective validation. To close this gap, we propose a benchmark based on docking, a popular computational method for assessing molecule binding to a protein. Concretely, the goal is to generate drug-like molecules that are scored highly by SMINA, a popular docking software. We observe that popular graph-based generative models fail to generate molecules with a high docking score when trained using a realistically sized training set. This suggests a limitation of the current incarnation of models for de novo drug design. Finally, we propose a simplified version of the benchmark based on a simpler scoring function, and show that the tested models are able to partially solve it. We release the benchmark as an easy to use package available at https://github.com/cieplinski-tobiasz/smina-docking-benchmark. We hope that our benchmark will serve as a step** stone towards the goal of automatically generating promising drug candidates. △ Less

Submitted 13 June, 2023; v1 submitted 20 June, 2020; originally announced June 2020.

Comments: Published in Journal of Chemical Information and Modeling

arXiv:2002.08264 [pdf, other]

Molecule Attention Transformer

Authors: Łukasz Maziarka, Tomasz Danel, Sławomir Mucha, Krzysztof Rataj, Jacek Tabor, Stanisław Jastrzębski

Abstract: Designing a single neural network architecture that performs competitively across a range of molecule property prediction tasks remains largely an open challenge, and its solution may unlock a widespread use of deep learning in the drug discovery industry. To move towards this goal, we propose Molecule Attention Transformer (MAT). Our key innovation is to augment the attention mechanism in Transfo… ▽ More Designing a single neural network architecture that performs competitively across a range of molecule property prediction tasks remains largely an open challenge, and its solution may unlock a widespread use of deep learning in the drug discovery industry. To move towards this goal, we propose Molecule Attention Transformer (MAT). Our key innovation is to augment the attention mechanism in Transformer using inter-atomic distances and the molecular graph structure. Experiments show that MAT performs competitively on a diverse set of molecular prediction tasks. Most importantly, with a simple self-supervised pretraining, MAT requires tuning of only a few hyperparameter values to achieve state-of-the-art performance on downstream tasks. Finally, we show that attention weights learned by MAT are interpretable from the chemical point of view. △ Less

Submitted 19 February, 2020; originally announced February 2020.

Journal ref: Graph Representation Learning workshop and Machine Learning and the Physical Sciences workshop at NeurIPS 2019

arXiv:1909.05310 [pdf, other]

Spatial Graph Convolutional Networks

Authors: Tomasz Danel, Przemysław Spurek, Jacek Tabor, Marek Śmieja, Łukasz Struski, Agnieszka Słowik, Łukasz Maziarka

Abstract: Graph Convolutional Networks (GCNs) have recently become the primary choice for learning from graph-structured data, superseding hash fingerprints in representing chemical compounds. However, GCNs lack the ability to take into account the ordering of node neighbors, even when there is a geometric interpretation of the graph vertices that provides an order based on their spatial positions. To remed… ▽ More Graph Convolutional Networks (GCNs) have recently become the primary choice for learning from graph-structured data, superseding hash fingerprints in representing chemical compounds. However, GCNs lack the ability to take into account the ordering of node neighbors, even when there is a geometric interpretation of the graph vertices that provides an order based on their spatial positions. To remedy this issue, we propose Spatial Graph Convolutional Network (SGCN) which uses spatial features to efficiently learn from graphs that can be naturally located in space. Our contribution is threefold: we propose a GCN-inspired architecture which (i) leverages node positions, (ii) is a proper generalization of both GCNs and Convolutional Neural Networks (CNNs), (iii) benefits from augmentation which further improves the performance and assures invariance with respect to the desired properties. Empirically, SGCN outperforms state-of-the-art graph-based methods on image classification and chemical tasks. △ Less

Submitted 2 July, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

arXiv:1904.03445 [pdf, other]

doi 10.1109/TNNLS.2023.3251848

Feature-Based Interpolation and Geodesics in the Latent Spaces of Generative Models

Authors: Łukasz Struski, Michał Sadowski, Tomasz Danel, Jacek Tabor, Igor T. Podolak

Abstract: Interpolating between points is a problem connected simultaneously with finding geodesics and study of generative models. In the case of geodesics, we search for the curves with the shortest length, while in the case of generative models we typically apply linear interpolation in the latent space. However, this interpolation uses implicitly the fact that Gaussian is unimodal. Thus the problem of i… ▽ More Interpolating between points is a problem connected simultaneously with finding geodesics and study of generative models. In the case of geodesics, we search for the curves with the shortest length, while in the case of generative models we typically apply linear interpolation in the latent space. However, this interpolation uses implicitly the fact that Gaussian is unimodal. Thus the problem of interpolating in the case when the latent density is non-Gaussian is an open problem. In this paper, we present a general and unified approach to interpolation, which simultaneously allows us to search for geodesics and interpolating curves in latent space in the case of arbitrary density. Our results have a strong theoretical background based on the introduced quality measure of an interpolating curve. In particular, we show that maximising the quality measure of the curve can be equivalently understood as a search of geodesic for a certain redefinition of the Riemannian metric on the space. We provide examples in three important cases. First, we show that our approach can be easily applied to finding geodesics on manifolds. Next, we focus our attention in finding interpolations in pre-trained generative models. We show that our model effectively works in the case of arbitrary density. Moreover, we can interpolate in the subset of the space consisting of data possessing a given feature. The last case is focused on finding interpolation in the space of chemical compounds. △ Less

Submitted 13 March, 2023; v1 submitted 6 April, 2019; originally announced April 2019.

Journal ref: IEEE Transactions on Neural Networks and Learning Systems, 2023

Showing 1–10 of 10 results for author: Danel, T