-
CACTUS: Chemistry Agent Connecting Tool-Usage to Science
Authors:
Andrew D. McNaughton,
Gautham Ramalaxmi,
Agustin Kruel,
Carter R. Knutson,
Rohith A. Varikoti,
Neeraj Kumar
Abstract:
Large language models (LLMs) have shown remarkable potential in various domains, but they often lack the ability to access and reason over domain-specific knowledge and tools. In this paper, we introduced CACTUS (Chemistry Agent Connecting Tool-Usage to Science), an LLM-based agent that integrates cheminformatics tools to enable advanced reasoning and problem-solving in chemistry and molecular dis…
▽ More
Large language models (LLMs) have shown remarkable potential in various domains, but they often lack the ability to access and reason over domain-specific knowledge and tools. In this paper, we introduced CACTUS (Chemistry Agent Connecting Tool-Usage to Science), an LLM-based agent that integrates cheminformatics tools to enable advanced reasoning and problem-solving in chemistry and molecular discovery. We evaluate the performance of CACTUS using a diverse set of open-source LLMs, including Gemma-7b, Falcon-7b, MPT-7b, Llama2-7b, and Mistral-7b, on a benchmark of thousands of chemistry questions. Our results demonstrate that CACTUS significantly outperforms baseline LLMs, with the Gemma-7b and Mistral-7b models achieving the highest accuracy regardless of the prompting strategy used. Moreover, we explore the impact of domain-specific prompting and hardware configurations on model performance, highlighting the importance of prompt engineering and the potential for deploying smaller models on consumer-grade hardware without significant loss in accuracy. By combining the cognitive capabilities of open-source LLMs with domain-specific tools, CACTUS can assist researchers in tasks such as molecular property prediction, similarity searching, and drug-likeness assessment. Furthermore, CACTUS represents a significant milestone in the field of cheminformatics, offering an adaptable tool for researchers engaged in chemistry and molecular discovery. By integrating the strengths of open-source LLMs with domain-specific tools, CACTUS has the potential to accelerate scientific advancement and unlock new frontiers in the exploration of novel, effective, and safe therapeutic candidates, catalysts, and materials. Moreover, CACTUS's ability to integrate with automated experimentation platforms and make data-driven decisions in real time opens up new possibilities for autonomous discovery.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Scaffold-Based Multi-Objective Drug Candidate Optimization
Authors:
Agustin Kruel,
Andrew D. McNaughton,
Neeraj Kumar
Abstract:
In therapeutic design, balancing various physiochemical properties is crucial for molecule development, similar to how Multiparameter Optimization (MPO) evaluates multiple variables to meet a primary goal. While many molecular features can now be predicted using \textit{in silico} methods, aiding early drug development, the vast data generated from high throughput virtual screening challenges the…
▽ More
In therapeutic design, balancing various physiochemical properties is crucial for molecule development, similar to how Multiparameter Optimization (MPO) evaluates multiple variables to meet a primary goal. While many molecular features can now be predicted using \textit{in silico} methods, aiding early drug development, the vast data generated from high throughput virtual screening challenges the practicality of traditional MPO approaches. Addressing this, we introduce a scaffold focused graph-based Markov chain Monte Carlo framework (ScaMARS) built to generate molecules with optimal properties. This innovative framework is capable of self-training and handling a wider array of properties, sampling different chemical spaces according to the starting scaffold. The benchmark analysis on several properties shows that ScaMARS has a diversity score of 84.6\% and has a much higher success rate of 99.5\% compared to conditional models. The integration of new features into MPO significantly enhances its adaptability and effectiveness in therapeutic design, facilitating the discovery of candidates that efficiently optimize multiple properties.
△ Less
Submitted 2 January, 2024; v1 submitted 15 December, 2022;
originally announced January 2023.
-
Causes and consequences of ordering and dynamic phases of confined vortex rows in superconducting nanostripes
Authors:
Benjamin A. McNaughton,
Nicola Pinto,
Andrea Perali,
Milorad V. Milosevic
Abstract:
Understanding the behaviour of vortices under nanoscale confinement in superconducting circuits is of importance for development of superconducting electronics and quantum technologies. Using numerical simulations based on the Ginzburg-Landau theory for non-homogeneous superconductivity in the presence of magnetic fields, we detail how lateral confinement organises vortices in a long superconducti…
▽ More
Understanding the behaviour of vortices under nanoscale confinement in superconducting circuits is of importance for development of superconducting electronics and quantum technologies. Using numerical simulations based on the Ginzburg-Landau theory for non-homogeneous superconductivity in the presence of magnetic fields, we detail how lateral confinement organises vortices in a long superconducting nanostripe, and present a phase diagram of vortex configurations as a function of the stripe width and magnetic field. We discuss why average vortex density is reduced and reveal that confinement also has profound influence on vortex dynamics in the dissipative regime under sourced electrical current, map** out transitions between asynchronous and synchronous vortex rows crossing the nanostripe as the current is varied. Synchronous crossings are of particular interest, since they cause single-mode modulations in the voltage drop along the stripe in a high (typically GHz-to-THz) frequency range.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
De novo design of protein target specific scaffold-based Inhibitors via Reinforcement Learning
Authors:
Andrew D. McNaughton,
Mridula S. Bontha,
Carter R. Knutson,
Jenna A. Pope,
Neeraj Kumar
Abstract:
Efficient design and discovery of target-driven molecules is a critical step in facilitating lead optimization in drug discovery. Current approaches to develop molecules for a target protein are intuition-driven, hampered by slow iterative design-test cycles due to computational challenges in utilizing 3D structural data, and ultimately limited by the expertise of the chemist - leading to bottlene…
▽ More
Efficient design and discovery of target-driven molecules is a critical step in facilitating lead optimization in drug discovery. Current approaches to develop molecules for a target protein are intuition-driven, hampered by slow iterative design-test cycles due to computational challenges in utilizing 3D structural data, and ultimately limited by the expertise of the chemist - leading to bottlenecks in molecular design. In this contribution, we propose a novel framework, called 3D-MolGNN$_{RL}$, coupling reinforcement learning (RL) to a deep generative model based on 3D-Scaffold to generate target candidates specific to a protein building up atom by atom from the starting core scaffold. 3D-MolGNN$_{RL}$ provides an efficient way to optimize key features by multi-objective reward function within a protein pocket using parallel graph neural network models. The agent learns to build molecules in 3D space while optimizing the activity, binding affinity, potency, and synthetic accessibility of the candidates generated for infectious disease protein targets. Our approach can serve as an interpretable artificial intelligence (AI) tool for lead optimization with optimized activity, potency, and biophysical properties.
△ Less
Submitted 20 May, 2022;
originally announced May 2022.
-
The first radio spectrum of a rapidly rotating A-type star
Authors:
Jacob Aaron White,
F. Tapia-Vázquez,
A. G. Hughes,
A. Moór,
B. Matthews,
D. Wilner,
J. Aufdenberg,
O. Fehér,
A. M. Hughes,
V. De la Luz,
A. McNaughton,
L. A. Zapata
Abstract:
The radio spectra of main-sequence stars remain largely unconstrained due to the lack of observational data to inform stellar atmosphere models. As such, the dominant emission mechanisms at long wavelengths, how they vary with spectral type, and how much they contribute to the expected brightness at a given radio wavelength are still relatively unknown for most spectral types. We present radio con…
▽ More
The radio spectra of main-sequence stars remain largely unconstrained due to the lack of observational data to inform stellar atmosphere models. As such, the dominant emission mechanisms at long wavelengths, how they vary with spectral type, and how much they contribute to the expected brightness at a given radio wavelength are still relatively unknown for most spectral types. We present radio continuum observations of Altair, a rapidly rotating A-type star. We observed Altair with NOEMA in 2018 and 2019 at 1.34 mm, 2.09 mm, and 3.22 mm and with the VLA in 2019 at 6.7 mm and 9.1 mm. In the radio spectra, we see a brightness temperature minimum at millimeter wavelengths followed by a steep rise to temperatures larger than the optical photosphere, behavior that is unexpected for A-type stars. We use these data to produce the first sub-millimeter to centimeter spectrum of a rapidly rotating A-type star informed by observations. We generated both PHOENIX and KINICH-PAKAL model atmospheres and determine the KINICH-PAKAL model better reproduces Altair's radio spectrum. The synthetic spectrum shows a millimeter brightness temperature minimum followed by significant emission over that of the photosphere at centimeter wavelengths. Together, these data and models show how the radio spectrum of an A-type star can reveal the presence of a chromosphere, likely induced by rapid rotation, and that a Rayleigh Jean's extrapolation of the stellar photosphere is not an adequate representation of a star's radio spectrum.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.