Skip to main content

Showing 1–10 of 10 results for author: White, A D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01603  [pdf, other

    cs.LG cs.AI cs.CL physics.chem-ph

    A Review of Large Language Models and Autonomous Agents in Chemistry

    Authors: Mayk Caldas Ramos, Christopher J. Collison, Andrew D. White

    Abstract: Large language models (LLMs) are emerging as a powerful tool in chemistry across multiple domains. In chemistry, LLMs are able to accurately predict properties, design new molecules, optimize synthesis pathways, and accelerate drug and material discovery. A core emerging idea is combining LLMs with chemistry-specific tools like synthesis planners and databases, leading to so-called "agents." This… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

  2. arXiv:2312.07559  [pdf, other

    cs.CL cs.AI cs.LG

    PaperQA: Retrieval-Augmented Generative Agent for Scientific Research

    Authors: Jakub Lála, Odhran O'Donoghue, Aleksandar Shtedritski, Sam Cox, Samuel G. Rodriques, Andrew D. White

    Abstract: Large Language Models (LLMs) generalize well across language tasks, but suffer from hallucinations and uninterpretability, making it difficult to assess their accuracy without ground-truth. Retrieval-Augmented Generation (RAG) models have been proposed to reduce hallucinations and provide provenance for how an answer was generated. Applying such models to the scientific literature may enable large… ▽ More

    Submitted 14 December, 2023; v1 submitted 8 December, 2023; originally announced December 2023.

  3. arXiv:2307.05318  [pdf, other

    physics.chem-ph cs.LG

    Predicting small molecules solubilities on endpoint devices using deep ensemble neural networks

    Authors: Mayk Caldas Ramos, Andrew D. White

    Abstract: Aqueous solubility is a valuable yet challenging property to predict. Computing solubility using first-principles methods requires accounting for the competing effects of entropy and enthalpy, resulting in long computations for relatively poor accuracy. Data-driven approaches, such as deep learning, offer improved accuracy and computational efficiency but typically lack uncertainty quantification.… ▽ More

    Submitted 7 March, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

  4. arXiv:2306.06283  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.chem-ph

    14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon

    Authors: Kevin Maik Jablonka, Qianxiang Ai, Alexander Al-Feghali, Shruti Badhwar, Joshua D. Bocarsly, Andres M Bran, Stefan Bringuier, L. Catherine Brinson, Kamal Choudhary, Defne Circi, Sam Cox, Wibe A. de Jong, Matthew L. Evans, Nicolas Gastellu, Jerome Genzling, María Victoria Gil, Ankur K. Gupta, Zhi Hong, Alishba Imran, Sabine Kruschwitz, Anne Labarre, Jakub Lála, Tao Liu, Steven Ma, Sauradeep Majumdar , et al. (28 additional authors not shown)

    Abstract: Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of mole… ▽ More

    Submitted 14 July, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  5. arXiv:2305.10379  [pdf, other

    cs.LG cs.NE physics.chem-ph stat.ML

    Active Learning in Symbolic Regression with Physical Constraints

    Authors: Jorge Medina, Andrew D. White

    Abstract: Evolutionary symbolic regression (SR) fits a symbolic equation to data, which gives a concise interpretable model. We explore using SR as a method to propose which data to gather in an active learning setting with physical constraints. SR with active learning proposes which experiments to do next. Active learning is done with query by committee, where the Pareto frontier of equations is the commit… ▽ More

    Submitted 18 May, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  6. arXiv:2304.10510  [pdf, other

    cs.LG cs.CR cs.CY physics.chem-ph

    Censoring chemical data to mitigate dual use risk

    Authors: Quintina L. Campbell, Jonathan Herington, Andrew D. White

    Abstract: The dual use of machine learning applications, where models can be used for both beneficial and malicious purposes, presents a significant challenge. This has recently become a particular concern in chemistry, where chemical datasets containing sensitive labels (e.g. toxicological information) could be used to develop predictive models that identify novel toxins or chemical warfare agents. To miti… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

  7. arXiv:2304.05341  [pdf, other

    physics.chem-ph cs.LG

    Bayesian Optimization of Catalysts With In-context Learning

    Authors: Mayk Caldas Ramos, Shane S. Michtavy, Marc D. Porosoff, Andrew D. White

    Abstract: Large language models (LLMs) are able to do accurate classification with zero or only a few examples (in-context learning). We show a prompting system that enables regression with uncertainty for in-context learning with frozen LLM (GPT-3, GPT-3.5, and GPT-4) models, allowing predictions without features or architecture tuning. By incorporating uncertainty, our approach enables Bayesian optimizati… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  8. arXiv:2302.03620  [pdf, other

    physics.chem-ph cs.LG

    Recent advances in the Self-Referencing Embedding Strings (SELFIES) library

    Authors: Alston Lo, Robert Pollice, AkshatKumar Nigam, Andrew D. White, Mario Krenn, Alán Aspuru-Guzik

    Abstract: String-based molecular representations play a crucial role in cheminformatics applications, and with the growing success of deep learning in chemistry, have been readily adopted into machine learning pipelines. However, traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models. To address these problems, a novel repr… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: 11 pages, 2 figures

    Journal ref: Digital Discovery 2, 897 (2023)

  9. arXiv:2007.04921  [pdf, other

    q-bio.QM cs.LG stat.ML

    Graph Neural Network Based Coarse-Grained Map** Prediction

    Authors: Zhiheng Li, Geemi P. Wellawatte, Maghesree Chakraborty, Heta A. Gandhi, Chenliang Xu, Andrew D. White

    Abstract: The selection of coarse-grained (CG) map** operators is a critical step for CG molecular dynamics (MD) simulation. It is still an open question about what is optimal for this choice and there is a need for theory. The current state-of-the art method is map** operators manually selected by experts. In this work, we demonstrate an automated approach by viewing this problem as supervised learning… ▽ More

    Submitted 19 August, 2021; v1 submitted 24 June, 2020; originally announced July 2020.

  10. arXiv:1911.09103  [pdf, other

    q-bio.BM cs.LG stat.ML

    Investigating Active Learning and Meta-Learning for Iterative Peptide Design

    Authors: Rainier Barrett, Andrew D. White

    Abstract: Often the development of novel functional peptides is not amenable to high throughput or purely computational screening methods. Peptides must be synthesized one at a time in a process that does not generate large amounts of data. One way this method can be improved is by ensuring that each experiment provides the best improvement in both peptide properties and predictive modeling accuracy. Here,… ▽ More

    Submitted 10 December, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

    Comments: 19 pages, 8 figures, 9 tables