Search | arXiv e-print repository

RetroGFN: Diverse and Feasible Retrosynthesis using GFlowNets

Authors: Piotr Gaiński, Michał Koziarski, Krzysztof Maziarz, Marwin Segler, Jacek Tabor, Marek Śmieja

Abstract: Single-step retrosynthesis aims to predict a set of reactions that lead to the creation of a target molecule, which is a crucial task in molecular discovery. Although a target molecule can often be synthesized with multiple different reactions, it is not clear how to verify the feasibility of a reaction, because the available datasets cover only a tiny fraction of the possible solutions. Consequen… ▽ More Single-step retrosynthesis aims to predict a set of reactions that lead to the creation of a target molecule, which is a crucial task in molecular discovery. Although a target molecule can often be synthesized with multiple different reactions, it is not clear how to verify the feasibility of a reaction, because the available datasets cover only a tiny fraction of the possible solutions. Consequently, the existing models are not encouraged to explore the space of possible reactions sufficiently. In this paper, we propose a novel single-step retrosynthesis model, RetroGFN, that can explore outside the limited dataset and return a diverse set of feasible reactions by leveraging a feasibility proxy model during the training. We show that RetroGFN achieves competitive results on standard top-k accuracy while outperforming existing methods on round-trip accuracy. Moreover, we provide empirical arguments in favor of using round-trip accuracy which expands the notion of feasibility with respect to the standard top-k accuracy metric. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2310.19796 [pdf, other]

Re-evaluating Retrosynthesis Algorithms with Syntheseus

Authors: Krzysztof Maziarz, Austin Tripp, Guoqing Liu, Megan Stanley, Shufang Xie, Piotr Gaiński, Philipp Seidl, Marwin Segler

Abstract: The planning of how to synthesize molecules, also known as retrosynthesis, has been a growing focus of the machine learning and chemistry communities in recent years. Despite the appearance of steady progress, we argue that imperfect benchmarks and inconsistent comparisons mask systematic shortcomings of existing techniques. To remedy this, we present a benchmarking library called syntheseus which… ▽ More The planning of how to synthesize molecules, also known as retrosynthesis, has been a growing focus of the machine learning and chemistry communities in recent years. Despite the appearance of steady progress, we argue that imperfect benchmarks and inconsistent comparisons mask systematic shortcomings of existing techniques. To remedy this, we present a benchmarking library called syntheseus which promotes best practice by default, enabling consistent meaningful evaluation of single-step and multi-step retrosynthesis algorithms. We use syntheseus to re-evaluate a number of previous retrosynthesis algorithms, and find that the ranking of state-of-the-art models changes when evaluated carefully. We end with guidance for future works in this area. △ Less

Submitted 19 February, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.09270 [pdf, other]

Retro-fallback: retrosynthetic planning in an uncertain world

Authors: Austin Tripp, Krzysztof Maziarz, Sarah Lewis, Marwin Segler, José Miguel Hernández-Lobato

Abstract: Retrosynthesis is the task of planning a series of chemical reactions to create a desired molecule from simpler, buyable molecules. While previous works have proposed algorithms to find optimal solutions for a range of metrics (e.g. shortest, lowest-cost), these works generally overlook the fact that we have imperfect knowledge of the space of possible reactions, meaning plans created by algorithm… ▽ More Retrosynthesis is the task of planning a series of chemical reactions to create a desired molecule from simpler, buyable molecules. While previous works have proposed algorithms to find optimal solutions for a range of metrics (e.g. shortest, lowest-cost), these works generally overlook the fact that we have imperfect knowledge of the space of possible reactions, meaning plans created by algorithms may not work in a laboratory. In this paper we propose a novel formulation of retrosynthesis in terms of stochastic processes to account for this uncertainty. We then propose a novel greedy algorithm called retro-fallback which maximizes the probability that at least one synthesis plan can be executed in the lab. Using in-silico benchmarks we demonstrate that retro-fallback generally produces better sets of synthesis plans than the popular MCTS and retro* algorithms. △ Less

Submitted 13 April, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: ICLR 2024 camera ready version (https://openreview.net/forum?id=dl0u4ODCuW). 58 pages total. Code available at: https://github.com/AustinT/retro-fallback-iclr24. This version has 1) updated writing 2) updated figures 3) additional experimental results 4) more complete explanation of AND/OR graphs in the appendices 5) correct typos + error in fig G.5 caption

arXiv:2305.03041 [pdf, other]

Are VAEs Bad at Reconstructing Molecular Graphs?

Authors: Hagen Muenkler, Hubert Misztela, Michal Pikusa, Marwin Segler, Nadine Schneider, Krzysztof Maziarz

Abstract: Many contemporary generative models of molecules are variational auto-encoders of molecular graphs. One term in their training loss pertains to reconstructing the input, yet reconstruction capabilities of state-of-the-art models have not yet been thoroughly compared on a large and chemically diverse dataset. In this work, we show that when several state-of-the-art generative models are evaluated u… ▽ More Many contemporary generative models of molecules are variational auto-encoders of molecular graphs. One term in their training loss pertains to reconstructing the input, yet reconstruction capabilities of state-of-the-art models have not yet been thoroughly compared on a large and chemically diverse dataset. In this work, we show that when several state-of-the-art generative models are evaluated under the same conditions, their reconstruction accuracy is surprisingly low, worse than what was previously reported on seemingly harder datasets. However, we show that improving reconstruction does not directly lead to better sampling or optimization performance. Failed reconstructions from the MoLeR model are usually similar to the inputs, assembling the same motifs in a different way, and possess similar chemical properties such as solubility. Finally, we show that the input molecule and its failed reconstruction are usually mapped by the different encoders to statistically distinguishable posterior distributions, hinting that posterior collapse may not fully explain why VAEs are bad at reconstructing molecular graphs. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: Published at the ELLIS Workshop on Machine Learning for Molecules (ML4Molecules 2022)

arXiv:2301.13755 [pdf, other]

Retrosynthetic Planning with Dual Value Networks

Authors: Guoqing Liu, Di Xue, Shufang Xie, Yingce Xia, Austin Tripp, Krzysztof Maziarz, Marwin Segler, Tao Qin, Zongzhang Zhang, Tie-Yan Liu

Abstract: Retrosynthesis, which aims to find a route to synthesize a target molecule from commercially available starting materials, is a critical task in drug discovery and materials design. Recently, the combination of ML-based single-step reaction predictors with multi-step planners has led to promising results. However, the single-step predictors are mostly trained offline to optimize the single-step ac… ▽ More Retrosynthesis, which aims to find a route to synthesize a target molecule from commercially available starting materials, is a critical task in drug discovery and materials design. Recently, the combination of ML-based single-step reaction predictors with multi-step planners has led to promising results. However, the single-step predictors are mostly trained offline to optimize the single-step accuracy, without considering complete routes. Here, we leverage reinforcement learning (RL) to improve the single-step predictor, by using a tree-shaped MDP to optimize complete routes. Specifically, we propose a novel online training algorithm, called Planning with Dual Value Networks (PDVN), which alternates between the planning phase and updating phase. In PDVN, we construct two separate value networks to predict the synthesizability and cost of molecules, respectively. To maintain the single-step accuracy, we design a two-branch network structure for the single-step predictor. On the widely-used USPTO dataset, our PDVN algorithm improves the search success rate of existing multi-step planners (e.g., increasing the success rate from 85.79% to 98.95% for Retro*, and reducing the number of model calls by half while solving 99.47% molecules for RetroGraph). Additionally, PDVN helps find shorter synthesis routes (e.g., reducing the average route length from 5.76 to 4.83 for Retro*, and from 5.63 to 4.78 for RetroGraph). Our code is available at \url{https://github.com/DiXue98/PDVN}. △ Less

Submitted 3 March, 2024; v1 submitted 31 January, 2023; originally announced January 2023.

Comments: Accepted to ICML 2023

arXiv:2105.02856 [pdf, other]

Hashing Modulo Alpha-Equivalence

Authors: Krzysztof Maziarz, Tom Ellis, Alan Lawrence, Andrew Fitzgibbon, Simon Peyton Jones

Abstract: In many applications one wants to identify identical subtrees of a program syntax tree. This identification should ideally be robust to alpha-renaming of the program, but no existing technique has been shown to achieve this with good efficiency (better than $\mathcal{O}(n^2)$ in expression size). We present a new, asymptotically efficient way to hash modulo alpha-equivalence. A key insight of our… ▽ More In many applications one wants to identify identical subtrees of a program syntax tree. This identification should ideally be robust to alpha-renaming of the program, but no existing technique has been shown to achieve this with good efficiency (better than $\mathcal{O}(n^2)$ in expression size). We present a new, asymptotically efficient way to hash modulo alpha-equivalence. A key insight of our method is to use a weak (commutative) hash combiner at exactly one point in the construction, which admits an algorithm with $\mathcal{O}(n (\log n)^2)$ time complexity. We prove that the use of the commutative combiner nevertheless yields a strong hash with low collision probability. Numerical benchmarks attest to the asymptotic behaviour of the method. △ Less

Submitted 6 May, 2021; originally announced May 2021.

Comments: Accepted for publication at the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2021)

arXiv:2104.13915 [pdf, other]

Deep Learning for Rheumatoid Arthritis: Joint Detection and Damage Scoring in X-rays

Authors: Krzysztof Maziarz, Anna Krason, Zbigniew Wojna

Abstract: Recent advancements in computer vision promise to automate medical image analysis. Rheumatoid arthritis is an autoimmune disease that would profit from computer-based diagnosis, as there are no direct markers known, and doctors have to rely on manual inspection of X-ray images. In this work, we present a multi-task deep learning model that simultaneously learns to localize joints on X-ray images a… ▽ More Recent advancements in computer vision promise to automate medical image analysis. Rheumatoid arthritis is an autoimmune disease that would profit from computer-based diagnosis, as there are no direct markers known, and doctors have to rely on manual inspection of X-ray images. In this work, we present a multi-task deep learning model that simultaneously learns to localize joints on X-ray images and diagnose two kinds of joint damage: narrowing and erosion. Additionally, we propose a modification of label smoothing, which combines classification and regression cues into a single loss and achieves 5% relative error reduction compared to standard loss functions. Our final model obtained 4th place in joint space narrowing and 5th place in joint erosion in the global RA2 DREAM challenge. △ Less

Submitted 4 November, 2022; v1 submitted 28 April, 2021; originally announced April 2021.

Comments: Presented at the Workshop on AI for Public Health at ICLR 2021

arXiv:2103.03864 [pdf, other]

Learning to Extend Molecular Scaffolds with Structural Motifs

Authors: Krzysztof Maziarz, Henry Jackson-Flux, Pashmina Cameron, Finton Sirockin, Nadine Schneider, Nikolaus Stiefl, Marwin Segler, Marc Brockschmidt

Abstract: Recent advancements in deep learning-based modeling of molecules promise to accelerate in silico drug discovery. A plethora of generative models is available, building molecules either atom-by-atom and bond-by-bond or fragment-by-fragment. However, many drug discovery projects require a fixed scaffold to be present in the generated molecule, and incorporating that constraint has only recently been… ▽ More Recent advancements in deep learning-based modeling of molecules promise to accelerate in silico drug discovery. A plethora of generative models is available, building molecules either atom-by-atom and bond-by-bond or fragment-by-fragment. However, many drug discovery projects require a fixed scaffold to be present in the generated molecule, and incorporating that constraint has only recently been explored. Here, we propose MoLeR, a graph-based model that naturally supports scaffolds as initial seed of the generative procedure, which is possible because it is not conditioned on the generation history. Our experiments show that MoLeR performs comparably to state-of-the-art methods on unconstrained molecular optimization tasks, and outperforms them on scaffold-based tasks, while being an order of magnitude faster to train and sample from than existing approaches. Furthermore, we show the influence of a number of seemingly minor design choices on the overall performance. △ Less

Submitted 12 May, 2024; v1 submitted 5 March, 2021; originally announced March 2021.

Comments: Published at the 10th International Conference on Learning Representations (ICLR 2022)

arXiv:2008.10041 [pdf, other]

Holistic Multi-View Building Analysis in the Wild with Projection Pooling

Authors: Zbigniew Wojna, Krzysztof Maziarz, Łukasz Jocz, Robert Pałuba, Robert Kozikowski, Iasonas Kokkinos

Abstract: We address six different classification tasks related to fine-grained building attributes: construction type, number of floors, pitch and geometry of the roof, facade material, and occupancy class. Tackling such a remote building analysis problem became possible only recently due to growing large-scale datasets of urban scenes. To this end, we introduce a new benchmarking dataset, consisting of 49… ▽ More We address six different classification tasks related to fine-grained building attributes: construction type, number of floors, pitch and geometry of the roof, facade material, and occupancy class. Tackling such a remote building analysis problem became possible only recently due to growing large-scale datasets of urban scenes. To this end, we introduce a new benchmarking dataset, consisting of 49426 images (top-view and street-view) of 9674 buildings. These photos are further assembled, together with the geometric metadata. The dataset showcases various real-world challenges, such as occlusions, blur, partially visible objects, and a broad spectrum of buildings. We propose a new projection pooling layer, creating a unified, top-view representation of the top-view and the side views in a high-dimensional space. It allows us to utilize the building and imagery metadata seamlessly. Introducing this layer improves classification accuracy -- compared to highly tuned baseline models -- indicating its suitability for building analysis. △ Less

Submitted 19 December, 2020; v1 submitted 23 August, 2020; originally announced August 2020.

Comments: Accepted for publication at the 35th AAAI Conference on Artificial Intelligence (AAAI 2021)

arXiv:1910.04915 [pdf, other]

Flexible Multi-task Networks by Learning Parameter Allocation

Authors: Krzysztof Maziarz, Efi Kokiopoulou, Andrea Gesmundo, Luciano Sbaiz, Gabor Bartok, Jesse Berent

Abstract: This paper proposes a novel learning method for multi-task applications. Multi-task neural networks can learn to transfer knowledge across different tasks by using parameter sharing. However, sharing parameters between unrelated tasks can hurt performance. To address this issue, we propose a framework to learn fine-grained patterns of parameter sharing. Assuming that the network is composed of sev… ▽ More This paper proposes a novel learning method for multi-task applications. Multi-task neural networks can learn to transfer knowledge across different tasks by using parameter sharing. However, sharing parameters between unrelated tasks can hurt performance. To address this issue, we propose a framework to learn fine-grained patterns of parameter sharing. Assuming that the network is composed of several components across layers, our framework uses learned binary variables to allocate components to tasks in order to encourage more parameter sharing between related tasks, and discourage parameter sharing otherwise. The binary allocation variables are learned jointly with the model parameters by standard back-propagation thanks to the Gumbel-Softmax reparametrization method. When applied to the Omniglot benchmark, the proposed method achieves a 17% relative reduction of the error rate compared to state-of-the-art. △ Less

Submitted 18 July, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

arXiv:1811.09828 [pdf, other]

Evolutionary-Neural Hybrid Agents for Architecture Search

Authors: Krzysztof Maziarz, Mingxing Tan, Andrey Khorlin, Marin Georgiev, Andrea Gesmundo

Abstract: Neural Architecture Search has shown potential to automate the design of neural networks. Deep Reinforcement Learning based agents can learn complex architectural patterns, as well as explore a vast and compositional search space. On the other hand, evolutionary algorithms offer higher sample efficiency, which is critical for such a resource intensive application. In order to capture the best of b… ▽ More Neural Architecture Search has shown potential to automate the design of neural networks. Deep Reinforcement Learning based agents can learn complex architectural patterns, as well as explore a vast and compositional search space. On the other hand, evolutionary algorithms offer higher sample efficiency, which is critical for such a resource intensive application. In order to capture the best of both worlds, we propose a class of Evolutionary-Neural hybrid agents (Evo-NAS). We show that the Evo-NAS agent outperforms both neural and evolutionary agents when applied to architecture search for a suite of text and image classification benchmarks. On a high-complexity architecture search space for image classification, the Evo-NAS agent surpasses the accuracy achieved by commonly used agents with only 1/3 of the search cost. △ Less

Submitted 15 February, 2020; v1 submitted 24 November, 2018; originally announced November 2018.

arXiv:1801.06754 [pdf, ps, other]

doi 10.4310/JOC.2021.v12.n2.a6

The Slow-coloring Game on Sparse Graphs: $k$-Degenerate, Planar, and Outerplanar

Authors: Grzegorz Gutowski, Tomasz Krawczyk, Krzysztof Maziarz, Douglas B. West, Michał Zając, Xuding Zhu

Abstract: The \emph{slow-coloring game} is played by Lister and Painter on a graph $G$. Initially, all vertices of $G$ are uncolored. In each round, Lister marks a nonempty set $M$ of uncolored vertices, and Painter colors a subset of $M$ that is independent in $G$. The game ends when all vertices are colored. The score of the game is the sum of the sizes of all sets marked by Lister. The goal of Painter is… ▽ More The \emph{slow-coloring game} is played by Lister and Painter on a graph $G$. Initially, all vertices of $G$ are uncolored. In each round, Lister marks a nonempty set $M$ of uncolored vertices, and Painter colors a subset of $M$ that is independent in $G$. The game ends when all vertices are colored. The score of the game is the sum of the sizes of all sets marked by Lister. The goal of Painter is to minimize the score, while Lister tries to maximize it. We provide strategies for Painter on various classes of graphs whose vertices can be partitioned into a bounded number of sets inducing forests, including $k$-degenerate, acyclically $k$-colorable, planar, and outerplanar graphs. For example, we show that on an $n$-vertex graph $G$, Painter can keep the score to at most $\frac{3k+4}4n$ when $G$ is $k$-degenerate, $3.9857n$ when $G$ is acyclically $5$-colorable, $3n$ when $G$ is planar with a Hamiltonian dual, $\frac{8n+3m}5$ when $G$ is $4$-colorable with $m$ edges (hence $3.4n$ when $G$ is planar), and $\frac73n$ when $G$ is outerplanar. △ Less

Submitted 15 September, 2018; v1 submitted 20 January, 2018; originally announced January 2018.

Comments: 15 pages, 3 figures

arXiv:1701.06538 [pdf, other]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Authors: Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean

Abstract: The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic and performance challenges. In this… ▽ More The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic and performance challenges. In this work, we address these challenges and finally realize the promise of conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters. We introduce a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks. A trainable gating network determines a sparse combination of these experts to use for each example. We apply the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora. We present model architectures in which a MoE with up to 137 billion parameters is applied convolutionally between stacked LSTM layers. On large language modeling and machine translation benchmarks, these models achieve significantly better results than state-of-the-art at lower computational cost. △ Less

Submitted 23 January, 2017; originally announced January 2017.

Showing 1–13 of 13 results for author: Maziarz, K