Skip to main content

Showing 1–21 of 21 results for author: Schwaller, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.19210  [pdf

    cs.LG cs.AI

    Gradient Guided Hypotheses: A unified solution to enable machine learning models on scarce and noisy data regimes

    Authors: Paulo Neves, Joerg K. Wegner, Philippe Schwaller

    Abstract: Ensuring high-quality data is paramount for maximizing the performance of machine learning models and business intelligence systems. However, challenges in data quality, including noise in data capture, missing records, limited data production, and confounding variables, significantly constrain the potential performance of these systems. In this study, we propose an architecture-agnostic algorithm… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  2. arXiv:2405.17066  [pdf, other

    q-bio.BM cs.LG

    Saturn: Sample-efficient Generative Molecular Design using Memory Manipulation

    Authors: Jeff Guo, Philippe Schwaller

    Abstract: Generative molecular design for drug discovery has very recently achieved a wave of experimental validation, with language-based backbones being the most common architectures employed. The most important factor for downstream success is whether an in silico oracle is well correlated with the desired end-point. To this end, current methods use cheaper proxy oracles with higher throughput before eva… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  3. arXiv:2404.01475  [pdf, other

    cs.LG cond-mat.mtrl-sci cs.AI physics.chem-ph

    Are large language models superhuman chemists?

    Authors: Adrian Mirza, Nawaf Alampara, Sreekanth Kunchapu, Benedict Emoekabu, Aswanth Krishnan, Mara Wilhelmi, Macjonathan Okereke, Juliane Eberhardt, Amir Mohammad Elahi, Maximilian Greiner, Caroline T. Holick, Tanya Gupta, Mehrdad Asgari, Christina Glaubitz, Lea C. Klepsch, Yannik Köster, Jakob Meyer, Santiago Miret, Tim Hoffmann, Fabian Alexander Kreth, Michael Ringleb, Nicole Roesner, Ulrich S. Schubert, Leanne M. Stafast, Dinga Wonanke , et al. (3 additional authors not shown)

    Abstract: Large language models (LLMs) have gained widespread interest due to their ability to process human language and perform tasks on which they have not been explicitly trained. This is relevant for the chemical sciences, which face the problem of small and diverse datasets that are frequently in the form of text. LLMs have shown promise in addressing these issues and are increasingly being harnessed… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  4. arXiv:2312.13136  [pdf, other

    physics.chem-ph cs.LG

    Molecular Hypergraph Neural Networks

    Authors: Junwu Chen, Philippe Schwaller

    Abstract: Graph neural networks (GNNs) have demonstrated promising performance across various chemistry-related tasks. However, conventional graphs only model the pairwise connectivity in molecules, failing to adequately represent higher-order connections like multi-center bonds and conjugated structures. To tackle this challenge, we introduce molecular hypergraphs and propose Molecular Hypergraph Neural Ne… ▽ More

    Submitted 21 December, 2023; v1 submitted 20 December, 2023; originally announced December 2023.

  5. arXiv:2312.12737  [pdf, other

    cs.LG q-bio.BM

    FSscore: A Machine Learning-based Synthetic Feasibility Score Leveraging Human Expertise

    Authors: Rebecca M. Neeser, Bruno Correia, Philippe Schwaller

    Abstract: Determining whether a molecule can be synthesized is crucial for many aspects of chemistry and drug discovery, allowing prioritization of experimental work and ranking molecules in de novo design tasks. Existing scoring approaches to assess synthetic feasibility struggle to extrapolate to out-of-distribution chemical spaces or fail to discriminate based on minor differences such as chirality that… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  6. arXiv:2312.09004  [pdf, other

    physics.chem-ph cs.LG

    Holistic chemical evaluation reveals pitfalls in reaction prediction models

    Authors: Victor Sabanza Gil, Andres M. Bran, Malte Franke, Remi Schlama, Jeremy S. Luterbacher, Philippe Schwaller

    Abstract: The prediction of chemical reactions has gained significant interest within the machine learning community in recent years, owing to its complexity and crucial applications in chemistry. However, model evaluation for this task has been mostly limited to simple metrics like top-k accuracy, which obfuscates fine details of a model's limitations. Inspired by progress in other fields, we propose a new… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: 17 pages, 6 figures

  7. arXiv:2311.04047  [pdf, other

    physics.chem-ph cs.LG

    Extracting human interpretable structure-property relationships in chemistry using XAI and large language models

    Authors: Geemi P. Wellawatte, Philippe Schwaller

    Abstract: Explainable Artificial Intelligence (XAI) is an emerging field in AI that aims to address the opaque nature of machine learning models. Furthermore, it has been shown that XAI can be used to extract input-output relationships, making them a useful tool in chemistry to understand structure-property relationships. However, one of the main limitations of XAI methods is that they are developed for tec… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  8. arXiv:2310.06083  [pdf, other

    cs.LG physics.chem-ph

    Transformers and Large Language Models for Chemistry and Drug Discovery

    Authors: Andres M Bran, Philippe Schwaller

    Abstract: Language modeling has seen impressive progress over the last years, mainly prompted by the invention of the Transformer architecture, sparking a revolution in many fields of machine learning, with breakthroughs in chemistry and biology. In this chapter, we explore how analogies between chemical and natural language have inspired the use of Transformers to tackle important bottlenecks in the drug d… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  9. arXiv:2310.05573  [pdf, other

    cs.LG

    ODEFormer: Symbolic Regression of Dynamical Systems with Transformers

    Authors: Stéphane d'Ascoli, Sören Becker, Alexander Mathis, Philippe Schwaller, Niki Kilbertus

    Abstract: We introduce ODEFormer, the first transformer able to infer multidimensional ordinary differential equation (ODE) systems in symbolic form from the observation of a single solution trajectory. We perform extensive evaluations on two datasets: (i) the existing "Strogatz" dataset featuring two-dimensional systems; (ii) ODEBench, a collection of one- to four-dimensional systems that we carefully cura… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  10. arXiv:2309.13957  [pdf, other

    q-bio.BM cs.LG

    Beam Enumeration: Probabilistic Explainability For Sample Efficient Self-conditioned Molecular Design

    Authors: Jeff Guo, Philippe Schwaller

    Abstract: Generative molecular design has moved from proof-of-concept to real-world applicability, as marked by the surge in very recent papers reporting experimental validation. Key challenges in explainability and sample efficiency present opportunities to enhance generative design to directly optimize expensive high-fidelity oracles and provide actionable insights to domain experts. Here, we propose Beam… ▽ More

    Submitted 3 March, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

  11. arXiv:2306.06283  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.chem-ph

    14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon

    Authors: Kevin Maik Jablonka, Qianxiang Ai, Alexander Al-Feghali, Shruti Badhwar, Joshua D. Bocarsly, Andres M Bran, Stefan Bringuier, L. Catherine Brinson, Kamal Choudhary, Defne Circi, Sam Cox, Wibe A. de Jong, Matthew L. Evans, Nicolas Gastellu, Jerome Genzling, María Victoria Gil, Ankur K. Gupta, Zhi Hong, Alishba Imran, Sabine Kruschwitz, Anne Labarre, Jakub Lála, Tao Liu, Steven Ma, Sauradeep Majumdar , et al. (28 additional authors not shown)

    Abstract: Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of mole… ▽ More

    Submitted 14 July, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  12. arXiv:2305.16160  [pdf, other

    q-bio.BM cs.LG q-bio.QM

    Augmented Memory: Capitalizing on Experience Replay to Accelerate De Novo Molecular Design

    Authors: Jeff Guo, Philippe Schwaller

    Abstract: Sample efficiency is a fundamental challenge in de novo molecular design. Ideally, molecular generative models should learn to satisfy a desired objective under minimal oracle evaluations (computational prediction or wet-lab experiment). This problem becomes more apparent when using oracles that can provide increased predictive accuracy but impose a significant cost. Consequently, these oracles ca… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

  13. arXiv:2212.04450  [pdf, other

    physics.chem-ph cond-mat.mtrl-sci cs.LG

    GAUCHE: A Library for Gaussian Processes in Chemistry

    Authors: Ryan-Rhys Griffiths, Leo Klarner, Henry B. Moss, Aditya Ravuri, Sang Truong, Samuel Stanton, Gary Tom, Bojana Rankovic, Yuanqi Du, Arian Jamasb, Aryan Deshwal, Julius Schwartz, Austin Tripp, Gregory Kell, Simon Frieder, Anthony Bourached, Alex Chan, Jacob Moss, Chengzhi Guo, Johannes Durholt, Saudamini Chaurasia, Felix Strieth-Kalthoff, Alpha A. Lee, Bingqing Cheng, Alán Aspuru-Guzik , et al. (2 additional authors not shown)

    Abstract: We introduce GAUCHE, a library for GAUssian processes in CHEmistry. Gaussian processes have long been a cornerstone of probabilistic machine learning, affording particular advantages for uncertainty quantification and Bayesian optimisation. Extending Gaussian processes to chemical representations, however, is nontrivial, necessitating kernels defined over structured inputs such as graphs, strings… ▽ More

    Submitted 21 February, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

  14. arXiv:2204.00056  [pdf, other

    physics.chem-ph cs.LG

    SELFIES and the future of molecular string representations

    Authors: Mario Krenn, Qianxiang Ai, Senja Barthel, Nessa Carson, Angelo Frei, Nathan C. Frey, Pascal Friederich, Théophile Gaudin, Alberto Alexander Gayle, Kevin Maik Jablonka, Rafael F. Lameiro, Dominik Lemm, Alston Lo, Seyed Mohamad Moosavi, José Manuel Nápoles-Duarte, AkshatKumar Nigam, Robert Pollice, Kohulan Rajan, Ulrich Schatzschneider, Philippe Schwaller, Marta Skreta, Berend Smit, Felix Strieth-Kalthoff, Chong Sun, Gary Tom , et al. (6 additional authors not shown)

    Abstract: Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool… ▽ More

    Submitted 31 March, 2022; originally announced April 2022.

    Comments: 34 pages, 15 figures, comments and suggestions for additional references are welcome!

    Journal ref: Cell Patterns 3(10), 100588(2022)

  15. arXiv:2105.02637  [pdf, other

    physics.chem-ph cs.LG

    Dataset Bias in the Natural Sciences: A Case Study in Chemical Reaction Prediction and Synthesis Design

    Authors: Ryan-Rhys Griffiths, Philippe Schwaller, Alpha A. Lee

    Abstract: Datasets in the Natural Sciences are often curated with the goal of aiding scientific understanding and hence may not always be in a form that facilitates the application of machine learning. In this paper, we identify three trends within the fields of chemical reaction prediction and synthesis design that require a change in direction. First, the manner in which reaction datasets are split into r… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

    Comments: Presented at the 2018 NeurIPS Workshop on Machine Learning for Molecules and Materials

  16. arXiv:2102.01399  [pdf, other

    cs.LG cs.AI

    Unassisted Noise Reduction of Chemical Reaction Data Sets

    Authors: Alessandra Toniato, Philippe Schwaller, Antonio Cardinale, Joppe Geluykens, Teodoro Laino

    Abstract: Existing deep learning models applied to reaction prediction in organic chemistry can reach high levels of accuracy (> 90% for Natural Language Processing-based ones). With no chemical knowledge embedded than the information learnt from reaction data, the quality of the data sets plays a crucial role in the performance of the prediction models. While human curation is prohibitively expensive, the… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

  17. arXiv:2012.06051  [pdf, other

    physics.chem-ph cs.CL cs.LG

    Map** the Space of Chemical Reactions Using Attention-Based Neural Networks

    Authors: Philippe Schwaller, Daniel Probst, Alain C. Vaucher, Vishnu H. Nair, David Kreutter, Teodoro Laino, Jean-Louis Reymond

    Abstract: Organic reactions are usually assigned to classes containing reactions with similar reagents and mechanisms. Reaction classes facilitate the communication of complex concepts and efficient navigation through chemical reaction space. However, the classification process is a tedious task. It requires the identification of the corresponding reaction class template via annotation of the number of mole… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    Comments: https://rxn4chemistry.github.io/rxnfp/

  18. arXiv:2002.06053  [pdf, other

    q-bio.BM cs.CL cs.LG stat.ML

    Exploring Chemical Space using Natural Language Processing Methodologies for Drug Discovery

    Authors: Hakime Öztürk, Arzucan Özgür, Philippe Schwaller, Teodoro Laino, Elif Ozkirimli

    Abstract: Text-based representations of chemicals and proteins can be thought of as unstructured languages codified by humans to describe domain-specific knowledge. Advances in natural language processing (NLP) methodologies in the processing of spoken languages accelerated the application of NLP to elucidate hidden knowledge in textual representations of these biochemical entities and then use it to constr… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

  19. arXiv:1910.08036  [pdf, other

    cs.LG stat.ML

    Predicting retrosynthetic pathways using a combined linguistic model and hyper-graph exploration strategy

    Authors: Philippe Schwaller, Riccardo Petraglia, Valerio Zullo, Vishnu H Nair, Rico Andreas Haeuselmann, Riccardo Pisoni, Costas Bekas, Anna Iuliano, Teodoro Laino

    Abstract: We present an extension of our Molecular Transformer architecture combined with a hyper-graph exploration strategy for automatic retrosynthesis route planning without human intervention. The single-step retrosynthetic model sets a new state of the art for predicting reactants as well as reagents, solvents and catalysts for each retrosynthetic step. We introduce new metrics (coverage, class diversi… ▽ More

    Submitted 17 October, 2019; originally announced October 2019.

  20. arXiv:1811.02633  [pdf, other

    physics.chem-ph cs.LG

    Molecular Transformer - A Model for Uncertainty-Calibrated Chemical Reaction Prediction

    Authors: Philippe Schwaller, Teodoro Laino, Théophile Gaudin, Peter Bolgar, Costas Bekas, Alpha A Lee

    Abstract: Organic synthesis is one of the key stumbling blocks in medicinal chemistry. A necessary yet unsolved step in planning synthesis is solving the forward problem: given reactants and reagents, predict the products. Similar to other work, we treat reaction prediction as a machine translation problem between SMILES strings of reactants-reagents and the products. We show that a multi-head attention Mol… ▽ More

    Submitted 30 May, 2019; v1 submitted 6 November, 2018; originally announced November 2018.

    Comments: Machine Learning for Molecules and Materials workshop, NeurIPS 2018 / Platform: https://rxn.res.ibm.com

    Journal ref: ACS Central Science, 2019

  21. arXiv:1711.04810  [pdf, other

    cs.LG stat.ML

    "Found in Translation": Predicting Outcomes of Complex Organic Chemistry Reactions using Neural Sequence-to-Sequence Models

    Authors: Philippe Schwaller, Theophile Gaudin, David Lanyi, Costas Bekas, Teodoro Laino

    Abstract: There is an intuitive analogy of an organic chemist's understanding of a compound and a language speaker's understanding of a word. Consequently, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry. In this work, we cast the reaction prediction task as a translation problem by introducing a template-free sequence-to-… ▽ More

    Submitted 15 November, 2017; v1 submitted 13 November, 2017; originally announced November 2017.