Molecule-Edit Templates for Efficient and Accurate Retrosynthesis Prediction
Authors:
Mikołaj Sacha,
Michał Sadowski,
Piotr Kozakowski,
Ruard van Workum,
Stanisław Jastrzębski
Abstract:
Retrosynthesis involves determining a sequence of reactions to synthesize complex molecules from simpler precursors. As this poses a challenge in organic chemistry, machine learning has offered solutions, particularly for predicting possible reaction substrates for a given target molecule. These solutions mainly fall into template-based and template-free categories. The former is efficient but rel…
▽ More
Retrosynthesis involves determining a sequence of reactions to synthesize complex molecules from simpler precursors. As this poses a challenge in organic chemistry, machine learning has offered solutions, particularly for predicting possible reaction substrates for a given target molecule. These solutions mainly fall into template-based and template-free categories. The former is efficient but relies on a vast set of predefined reaction patterns, while the latter, though more flexible, can be computationally intensive and less interpretable. To address these issues, we introduce METRO (Molecule-Edit Templates for RetrOsynthesis), a machine-learning model that predicts reactions using minimal templates - simplified reaction patterns capturing only essential molecular changes - reducing computational overhead and achieving state-of-the-art results on standard benchmarks.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
Feature-Based Interpolation and Geodesics in the Latent Spaces of Generative Models
Authors:
Łukasz Struski,
Michał Sadowski,
Tomasz Danel,
Jacek Tabor,
Igor T. Podolak
Abstract:
Interpolating between points is a problem connected simultaneously with finding geodesics and study of generative models. In the case of geodesics, we search for the curves with the shortest length, while in the case of generative models we typically apply linear interpolation in the latent space. However, this interpolation uses implicitly the fact that Gaussian is unimodal. Thus the problem of i…
▽ More
Interpolating between points is a problem connected simultaneously with finding geodesics and study of generative models. In the case of geodesics, we search for the curves with the shortest length, while in the case of generative models we typically apply linear interpolation in the latent space. However, this interpolation uses implicitly the fact that Gaussian is unimodal. Thus the problem of interpolating in the case when the latent density is non-Gaussian is an open problem.
In this paper, we present a general and unified approach to interpolation, which simultaneously allows us to search for geodesics and interpolating curves in latent space in the case of arbitrary density. Our results have a strong theoretical background based on the introduced quality measure of an interpolating curve. In particular, we show that maximising the quality measure of the curve can be equivalently understood as a search of geodesic for a certain redefinition of the Riemannian metric on the space.
We provide examples in three important cases. First, we show that our approach can be easily applied to finding geodesics on manifolds. Next, we focus our attention in finding interpolations in pre-trained generative models. We show that our model effectively works in the case of arbitrary density. Moreover, we can interpolate in the subset of the space consisting of data possessing a given feature. The last case is focused on finding interpolation in the space of chemical compounds.
△ Less
Submitted 13 March, 2023; v1 submitted 6 April, 2019;
originally announced April 2019.