-
Pareto Optimization to Accelerate Multi-Objective Virtual Screening
Authors:
Jenna C. Fromer,
David E. Graff,
Connor W. Coley
Abstract:
The discovery of therapeutic molecules is fundamentally a multi-objective optimization problem. One formulation of the problem is to identify molecules that simultaneously exhibit strong binding affinity for a target protein, minimal off-target interactions, and suitable pharmacokinetic properties. Inspired by prior work that uses active learning to accelerate the identification of strong binders,…
▽ More
The discovery of therapeutic molecules is fundamentally a multi-objective optimization problem. One formulation of the problem is to identify molecules that simultaneously exhibit strong binding affinity for a target protein, minimal off-target interactions, and suitable pharmacokinetic properties. Inspired by prior work that uses active learning to accelerate the identification of strong binders, we implement multi-objective Bayesian optimization to reduce the computational cost of multi-property virtual screening and apply it to the identification of ligands predicted to be selective based on docking scores to on- and off-targets. We demonstrate the superiority of Pareto optimization over scalarization across three case studies. Further, we use the developed optimization tool to search a virtual library of over 4M molecules for those predicted to be selective dual inhibitors of EGFR and IGF1R, acquiring 100% of the molecules that form the library's Pareto front after exploring only 8% of the library. This workflow and associated open source software can reduce the screening burden of molecular design projects and is complementary to research aiming to improve the accuracy of binding predictions and other molecular properties.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Self-focusing virtual screening with active design space pruning
Authors:
David E. Graff,
Matteo Aldeghi,
Joseph A. Morrone,
Kirk E. Jordan,
Edward O. Pyzer-Knapp,
Connor W. Coley
Abstract:
High-throughput virtual screening is an indispensable technique utilized in the discovery of small molecules. In cases where the library of molecules is exceedingly large, the cost of an exhaustive virtual screen may be prohibitive. Model-guided optimization has been employed to lower these costs through dramatic increases in sample efficiency compared to random selection. However, these technique…
▽ More
High-throughput virtual screening is an indispensable technique utilized in the discovery of small molecules. In cases where the library of molecules is exceedingly large, the cost of an exhaustive virtual screen may be prohibitive. Model-guided optimization has been employed to lower these costs through dramatic increases in sample efficiency compared to random selection. However, these techniques introduce new costs to the workflow through the surrogate model training and inference steps. In this study, we propose an extension to the framework of model-guided optimization that mitigates inferences costs using a technique we refer to as design space pruning (DSP), which irreversibly removes poor-performing candidates from consideration. We study the application of DSP to a variety of optimization tasks and observe significant reductions in overhead costs while exhibiting similar performance to the baseline optimization. DSP represents an attractive extension of model-guided optimization that can limit overhead costs in optimization settings where these costs are non-negligible relative to objective costs, such as docking.
△ Less
Submitted 3 May, 2022;
originally announced May 2022.
-
Accelerating high-throughput virtual screening through molecular pool-based active learning
Authors:
David E. Graff,
Eugene I. Shakhnovich,
Connor W. Coley
Abstract:
Structure-based virtual screening is an important tool in early stage drug discovery that scores the interactions between a target protein and candidate ligands. As virtual libraries continue to grow (in excess of $10^8$ molecules), so too do the resources necessary to conduct exhaustive virtual screening campaigns on these libraries. However, Bayesian optimization techniques can aid in their expl…
▽ More
Structure-based virtual screening is an important tool in early stage drug discovery that scores the interactions between a target protein and candidate ligands. As virtual libraries continue to grow (in excess of $10^8$ molecules), so too do the resources necessary to conduct exhaustive virtual screening campaigns on these libraries. However, Bayesian optimization techniques can aid in their exploration: a surrogate structure-property relationship model trained on the predicted affinities of a subset of the library can be applied to the remaining library members, allowing the least promising compounds to be excluded from evaluation. In this study, we assess various surrogate model architectures, acquisition functions, and acquisition batch sizes as applied to several protein-ligand docking datasets and observe significant reductions in computational costs, even when using a greedy acquisition strategy; for example, 87.9% of the top-50000 ligands can be found after testing only 2.4% of a 100M member library. Such model-guided searches mitigate the increasing computational costs of screening increasingly large virtual libraries and can accelerate high-throughput virtual screening campaigns with applications beyond docking.
△ Less
Submitted 13 December, 2020;
originally announced December 2020.
-
Continuous Mental Effort Evaluation during 3D Object Manipulation Tasks based on Brain and Physiological Signals
Authors:
Dennis Wobrock,
Jérémy Frey,
Delphine Graeff,
Jean-Baptiste De La Rivière,
Julien Castet,
Fabien Lotte
Abstract:
Designing 3D User Interfaces (UI) requires adequate evaluation tools to ensure good usability and user experience. While many evaluation tools are already available and widely used, existing approaches generally cannot provide continuous and objective measures of usa-bility qualities during interaction without interrupting the user. In this paper, we propose to use brain (with ElectroEncephaloGrap…
▽ More
Designing 3D User Interfaces (UI) requires adequate evaluation tools to ensure good usability and user experience. While many evaluation tools are already available and widely used, existing approaches generally cannot provide continuous and objective measures of usa-bility qualities during interaction without interrupting the user. In this paper, we propose to use brain (with ElectroEncephaloGraphy) and physiological (ElectroCardioGraphy, Galvanic Skin Response) signals to continuously assess the mental effort made by the user to perform 3D object manipulation tasks. We first show how this mental effort (a.k.a., mental workload) can be estimated from such signals, and then measure it on 8 participants during an actual 3D object manipulation task with an input device known as the CubTile. Our results suggest that monitoring workload enables us to continuously assess the 3DUI and/or interaction technique ease-of-use. Overall, this suggests that this new measure could become a useful addition to the repertoire of available evaluation tools, enabling a finer grain assessment of the ergonomic qualities of a given 3D user interface.
△ Less
Submitted 29 May, 2015;
originally announced May 2015.
-
Many uses, many annotations for large speech corpora: Switchboard and TDT as case studies
Authors:
David Graff,
Steven Bird
Abstract:
This paper discusses the challenges that arise when large speech corpora receive an ever-broadening range of diverse and distinct annotations. Two case studies of this process are presented: the Switchboard Corpus of telephone conversations and the TDT2 corpus of broadcast news. Switchboard has undergone two independent transcriptions and various types of additional annotation, all carried out a…
▽ More
This paper discusses the challenges that arise when large speech corpora receive an ever-broadening range of diverse and distinct annotations. Two case studies of this process are presented: the Switchboard Corpus of telephone conversations and the TDT2 corpus of broadcast news. Switchboard has undergone two independent transcriptions and various types of additional annotation, all carried out as separate projects that were dispersed both geographically and chronologically. The TDT2 corpus has also received a variety of annotations, but all directly created or managed by a core group. In both cases, issues arise involving the propagation of repairs, consistency of references, and the ability to integrate annotations having different formats and levels of detail. We describe a general framework whereby these issues can be addressed successfully.
△ Less
Submitted 13 July, 2000;
originally announced July 2000.