-
HydraScreen: A Generalizable Structure-Based Deep Learning Approach to Drug Discovery
Authors:
Alvaro Prat,
Hisham Abdel Aty,
Gintautas Kamuntavičius,
Tanya Paquet,
Povilas Norvaišas,
Piero Gasparotto,
Roy Tal
Abstract:
We propose HydraScreen, a deep-learning approach that aims to provide a framework for more robust machine-learning-accelerated drug discovery. HydraScreen utilizes a state-of-the-art 3D convolutional neural network, designed for the effective representation of molecular structures and interactions in protein-ligand binding. We design an end-to-end pipeline for high-throughput screening and lead op…
▽ More
We propose HydraScreen, a deep-learning approach that aims to provide a framework for more robust machine-learning-accelerated drug discovery. HydraScreen utilizes a state-of-the-art 3D convolutional neural network, designed for the effective representation of molecular structures and interactions in protein-ligand binding. We design an end-to-end pipeline for high-throughput screening and lead optimization, targeting applications in structure-based drug design. We assess our approach using established public benchmarks based on the CASF 2016 core set, achieving top-tier results in affinity and pose prediction (Pearson's r = 0.86, RMSE = 1.15, Top-1 = 0.95). Furthermore, we utilize a novel interaction profiling approach to identify potential biases in the model and dataset to boost interpretability and support the unbiased nature of our method. Finally, we showcase HydraScreen's capacity to generalize across unseen proteins and ligands, offering directions for future development of robust machine learning scoring functions. HydraScreen (accessible at https://hydrascreen.ro5.ai) provides a user-friendly GUI and a public API, facilitating easy assessment of individual protein-ligand complexes.
△ Less
Submitted 22 September, 2023;
originally announced November 2023.
-
How do interfaces alter the dynamics of supercooled water?
Authors:
Piero Gasparotto,
Martin Fitzner,
Stephen J. Cox,
Gabriele Cesare Sosso,
Angelos Michaelides
Abstract:
The structure of liquid water in the proximity of an interface can deviate significantly from that of bulk water, with surface-induced structural perturbations typically converging to bulk values at about ~1 nm from the interface. While these structural changes are well established it is, in contrast, less clear how an interface perturbs the dynamics of water molecules within the liquid. Here, thr…
▽ More
The structure of liquid water in the proximity of an interface can deviate significantly from that of bulk water, with surface-induced structural perturbations typically converging to bulk values at about ~1 nm from the interface. While these structural changes are well established it is, in contrast, less clear how an interface perturbs the dynamics of water molecules within the liquid. Here, through an extensive set of molecular dynamics simulations of supercooled bulk and interfacial water films and nano-droplets, we observe the formation of persistent, spatially extended dynamical domains in which the average mobility varies as a function of the distance from the interface. This is in stark contrast with the dynamical heterogeneity observed in bulk water, where these domains average out spatially over time. We also find that the dynamical response of water to an interface depends critically on the nature of the interface and on the choice of interface definition. Overall these results reveal a richness in the dynamics of interfacial water that opens up the prospect of tuning the dynamical response of water through specific modifications of the interface structure or confining material.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
Map** the Structure of Oxygen-Doped Wurtzite Aluminum Nitride Coatings From Ab Initio Random Structure Search and Experiments
Authors:
Piero Gasparotto,
Maria Fischer,
Daniele Scopece,
Maciej Oskar Liedke,
Maik Butterling,
Andreas Wagner,
Oguz Yildirim,
Mathis Trant,
Daniele Passerone,
Hans J. Hug,
Carlo Antonio Pignedoli
Abstract:
Machine learning is changing how we design and interpret experiments in materials science. In this work, we show how unsupervised learning, combined with ab initio modeling, improves our understanding of structural metastability in multicomponent alloys. We use the example case of Al-O-N alloys where the formation of aluminum vacancies in wurtzite AlN upon the incorporation of substitutional oxyge…
▽ More
Machine learning is changing how we design and interpret experiments in materials science. In this work, we show how unsupervised learning, combined with ab initio modeling, improves our understanding of structural metastability in multicomponent alloys. We use the example case of Al-O-N alloys where the formation of aluminum vacancies in wurtzite AlN upon the incorporation of substitutional oxygen can be seen as a general mechanism of solids where crystal symmetry is reduced to stabilize defects. The ideal AlN wurtzite crystal structure occupation cannot be matched due to the presence of an aliovalent hetero-element into the structure. The traditional interpretation of the c-lattice shrinkage in sputter-deposited Al-O-N films from X-ray diffraction (XRD) experiments suggests the existence of a solubility limit at 8at.% oxygen content. Here we show that such naive interpretation is misleading. We support XRD data with a machine learning analysis of ab initio simulations and positron annihilation lifetime spectroscopy data, revealing no signs of a possible solubility limit. Instead, the presence of a wide range of non-equilibrium oxygen-rich defective structures emerging at increasing oxygen contents suggests that the formation of grain boundaries is the most plausible mechanism responsible for the lattice shrinkage measured in Al-O-N sputtered films.
△ Less
Submitted 12 January, 2021; v1 submitted 28 September, 2020;
originally announced September 2020.
-
An Accurate and Transferable Machine Learning Potential for Carbon
Authors:
Patrick Rowe,
Volker L Deringer,
Piero Gasparotto,
Gábor Csányi,
Angelos Michaelides
Abstract:
We present an accurate machine learning (ML) model for atomistic simulations of carbon, constructed using the Gaussian approximation potential (GAP) methodology. The potential, named GAP-20, describes the properties of the bulk crystalline and amorphous phases, crystal surfaces and defect structures with an accuracy approaching that of direct ab initio simulation, but at a significantly reduced co…
▽ More
We present an accurate machine learning (ML) model for atomistic simulations of carbon, constructed using the Gaussian approximation potential (GAP) methodology. The potential, named GAP-20, describes the properties of the bulk crystalline and amorphous phases, crystal surfaces and defect structures with an accuracy approaching that of direct ab initio simulation, but at a significantly reduced cost. We combine structural databases for amorphous carbon and graphene, which we extend substantially by adding suitable configurations, for example, for defects in graphene and other nanostructures. The final potential is fitted to reference data computed using the optB88-vdW density functional theory (DFT) functional. Dispersion interactions, which are crucial to describe multilayer carbonaceous materials, are therefore implicitly included. We additionally account for long-range dispersion interactions using a semianalytical two-body term and show that an improved model can be obtained through an optimisation of the many-body smooth overlap of atomic positions (SOAP) descriptor. We rigorously test the potential on lattice parameters, bond lengths, formation energies and phonon dispersions of numerous carbon allotropes. We compare the formation energies of an extensive set of defect structures, surfaces and surface reconstructions to DFT reference calculations. The present work demonstrates the ability to combine, in the same ML model, the previously attained flexibility required for amorphous carbon [Phys. Rev. B, 95, 094203, (2017)] with the high numerical accuracy necessary for crystalline graphene [Phys. Rev. B, 97, 054303, (2018)], thereby providing an interatomic potential that will be applicable to a wide range of applications concerning diverse forms of bulk and nanostructured carbon.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
Using data-reduction techniques to analyse biomolecular trajectories
Authors:
Gareth A. Tribello,
Piero Gasparotto
Abstract:
This chapter discusses the way in which dimensionality reduction algorithms such as diffusion maps and sketch-map can be used to analyze molecular dynamics trajectories. The first part discusses how these various algorithms function, as well as practical issues such as landmark selection and how these algorithms can be used when the data to be analyzed, comes from enhanced sampling trajectories. I…
▽ More
This chapter discusses the way in which dimensionality reduction algorithms such as diffusion maps and sketch-map can be used to analyze molecular dynamics trajectories. The first part discusses how these various algorithms function, as well as practical issues such as landmark selection and how these algorithms can be used when the data to be analyzed, comes from enhanced sampling trajectories. In the later parts, a comparison between the results obtained by applying various algorithms to two sets of sample data is performed and discussed. This section is then followed by a summary of how one algorithm, in particular, sketch-map, has been applied to a range of problems. The chapter concludes with a discussion on the directions that we believe this field is currently moving.
△ Less
Submitted 9 July, 2019;
originally announced July 2019.
-
Recognizing Local and Global Structural Motifs at the Atomic Scale
Authors:
Piero Gasparotto,
Robert Horst Meißner,
Michele Ceriotti
Abstract:
Most of the current understanding of structure-property relations at the molecular and the supramolecular scales can be formulated in terms of the stability of and the interactions between a limited number of recurring structural motifs (e.g., H-bonds, coordination polyhedra, and protein secondary structure). Here we demonstrate an algorithm to automatically recognize such patterns, based on the i…
▽ More
Most of the current understanding of structure-property relations at the molecular and the supramolecular scales can be formulated in terms of the stability of and the interactions between a limited number of recurring structural motifs (e.g., H-bonds, coordination polyhedra, and protein secondary structure). Here we demonstrate an algorithm to automatically recognize such patterns, based on the identification of local maxima in the probability distributions observed in atomistic computer simulations, which is robust to the dimensionality and the sparsity of the reference atomistic data. We first discuss its main features, demonstrating some on artificial data sets, and then show how it can be applied to identify coordination environments in Lennard-Jones clusters and to recognize secondary-structure patterns in the simulation of an oligopeptide. To assess the applicability of this algorithm for motifs that involve several interdependent degrees of freedom, we also employ it to identify groups of conformers of the cluster and the polypeptide, considered in their entirety. The motifs identified by analyzing atomistic simulations can be used to interpret and rationalize the stability and behavior of the system at hand, and also as a tool to accelerate sampling, in association with biased molecular dynamics schemes.
△ Less
Submitted 25 January, 2018;
originally announced January 2018.
-
Anharmonic and Quantum Fluctuations in Molecular Crystals: A First-Principles Study of the Stability of Paracetamol
Authors:
Mariana Rossi,
Piero Gasparotto,
Michele Ceriotti
Abstract:
Molecular crystals often exist in multiple competing polymorphs, showing significantly different physico-chemical properties. Computational crystal structure prediction is key to interpret and guide the search for the most stable or useful form: A real challenge due to the combinatorial search space, and the complex interplay of subtle effects that work together to determine the relative stability…
▽ More
Molecular crystals often exist in multiple competing polymorphs, showing significantly different physico-chemical properties. Computational crystal structure prediction is key to interpret and guide the search for the most stable or useful form: A real challenge due to the combinatorial search space, and the complex interplay of subtle effects that work together to determine the relative stability of different structures. Here we take a comprehensive approach based on different flavors of thermodynamic integration in order to estimate all contributions to the free energies of these systems with density-functional theory, including the oft-neglected anharmonic contributions and nuclear quantum effects. We take the two main stable forms of paracetamol as a paradigmatic example. We find that anharmonic contributions, different descriptions of van der Waals interactions, and nuclear quantum effects all matter to quantitatively determine the stability of different phases. Our analysis highlights the many challenges inherent in the development of a quantitative and predictive framework to model molecular crystals. However, it also indicates which of the components of the free energy can benefit from a cancellation of errors that can redeem the predictive power of approximate models, and suggests simple steps that could be taken to improve the reliability of ab initio crystal structure prediction.
△ Less
Submitted 14 September, 2016;
originally announced September 2016.
-
Probing defects and correlations in the hydrogen-bond network of ab initio water
Authors:
Piero Gasparotto,
Ali A. Hassanali,
Michele Ceriotti
Abstract:
The hydrogen-bond network of water is characterized by the presence of coordination defects relative to the ideal tetrahedral network of ice, whose fluctuations determine the static and time-dependent properties of the liquid. Because of topological constraints, such defects do not come alone, but are highly correlated coming in a plethora of different pairs. Here we discuss in detail such correla…
▽ More
The hydrogen-bond network of water is characterized by the presence of coordination defects relative to the ideal tetrahedral network of ice, whose fluctuations determine the static and time-dependent properties of the liquid. Because of topological constraints, such defects do not come alone, but are highly correlated coming in a plethora of different pairs. Here we discuss in detail such correlations in the case of ab initio water models and show that they have interesting similarities to regular and defective solid phases of water. Although defect correlations involve deviations from idealized tetrahedrality, they can still be regarded as weaker hydrogen bonds that retain a high degree of directionality. We also investigate how the structure and population of coordination defects is affected by approximations to the inter-atomic potential, finding that in most cases, the qualitative features of the hydrogen bond network are remarkably robust.
△ Less
Submitted 10 February, 2016;
originally announced February 2016.
-
Recognizing molecular patterns by machine learning: an agnostic structural definition of the hydrogen bond
Authors:
Piero Gasparotto,
Michele Ceriotti
Abstract:
The concept of chemical bonding can ultimately be seen as a rationalization of the recurring structural patterns observed in molecules and solids. Chemical intuition is nothing but the ability to recognize and predict such patterns, and how they transform into one another. Here we discuss how to use a computer to identify atomic patterns automatically, so as to provide an algorithmic definition of…
▽ More
The concept of chemical bonding can ultimately be seen as a rationalization of the recurring structural patterns observed in molecules and solids. Chemical intuition is nothing but the ability to recognize and predict such patterns, and how they transform into one another. Here we discuss how to use a computer to identify atomic patterns automatically, so as to provide an algorithmic definition of a bond based solely on structural information. We concentrate in particular on hydrogen bonding -- a central concept to our understanding of the physical chemistry of water, biological systems and many technologically important materials. Since the hydrogen bond is a somewhat fuzzy entity that covers a broad range of energies and distances, many different criteria have been proposed and used over the years, based either on sophisticate electronic structure calculations followed by an energy decomposition analysis, or on somewhat arbitrary choices of a range of structural parameters that is deemed to correspond to a hydrogen-bonded configuration. We introduce here a definition that is univocal, unbiased, and adaptive, based on our machine-learning analysis of an atomistic simulation. The strategy we propose could be easily adapted to similar scenarios, where one has to recognize or classify structural patterns in a material or chemical compound.
△ Less
Submitted 12 November, 2014; v1 submitted 16 October, 2014;
originally announced October 2014.