-
Emergent autonomous scientific research capabilities of large language models
Authors:
Daniil A. Boiko,
Robert MacKnight,
Gabe Gomes
Abstract:
Transformer-based large language models are rapidly advancing in the field of machine learning research, with applications spanning natural language, biology, chemistry, and computer programming. Extreme scaling and reinforcement learning from human feedback have significantly improved the quality of generated text, enabling these models to perform various tasks and reason about their choices. In…
▽ More
Transformer-based large language models are rapidly advancing in the field of machine learning research, with applications spanning natural language, biology, chemistry, and computer programming. Extreme scaling and reinforcement learning from human feedback have significantly improved the quality of generated text, enabling these models to perform various tasks and reason about their choices. In this paper, we present an Intelligent Agent system that combines multiple large language models for autonomous design, planning, and execution of scientific experiments. We showcase the Agent's scientific research capabilities with three distinct examples, with the most complex being the successful performance of catalyzed cross-coupling reactions. Finally, we discuss the safety implications of such systems and propose measures to prevent their misuse.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
On scientific understanding with artificial intelligence
Authors:
Mario Krenn,
Robert Pollice,
Si Yue Guo,
Matteo Aldeghi,
Alba Cervera-Lierta,
Pascal Friederich,
Gabriel dos Passos Gomes,
Florian Häse,
Adrian **ich,
AkshatKumar Nigam,
Zhenpeng Yao,
Alán Aspuru-Guzik
Abstract:
Imagine an oracle that correctly predicts the outcome of every particle physics experiment, the products of every chemical reaction, or the function of every protein. Such an oracle would revolutionize science and technology as we know them. However, as scientists, we would not be satisfied with the oracle itself. We want more. We want to comprehend how the oracle conceived these predictions. This…
▽ More
Imagine an oracle that correctly predicts the outcome of every particle physics experiment, the products of every chemical reaction, or the function of every protein. Such an oracle would revolutionize science and technology as we know them. However, as scientists, we would not be satisfied with the oracle itself. We want more. We want to comprehend how the oracle conceived these predictions. This feat, denoted as scientific understanding, has frequently been recognized as the essential aim of science. Now, the ever-growing power of computers and artificial intelligence poses one ultimate question: How can advanced artificial systems contribute to scientific understanding or achieve it autonomously?
We are convinced that this is not a mere technical question but lies at the core of science. Therefore, here we set out to answer where we are and where we can go from here. We first seek advice from the philosophy of science to understand scientific understanding. Then we review the current state of the art, both from literature and by collecting dozens of anecdotes from scientists about how they acquired new conceptual understanding with the help of computers. Those combined insights help us to define three dimensions of android-assisted scientific understanding: The android as a I) computational microscope, II) resource of inspiration and the ultimate, not yet existent III) agent of understanding. For each dimension, we explain new avenues to push beyond the status quo and unleash the full power of artificial intelligence's contribution to the central aim of science. We hope our perspective inspires and focuses research towards androids that get new scientific understanding and ultimately bring us closer to true artificial scientists.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Herd immunity under individual variation and reinfection
Authors:
Antonio Montalbán,
Rodrigo M. Corder,
M. Gabriela M. Gomes
Abstract:
We study a SEIR model considered by Gomes et al. \cite{Gomes2020} and Aguas et al. \cite{Aguas2020} where different individuals are assumed to have different levels of susceptibility or exposure to infection. Under this heterogeneity assumption, epidemic growth is effectively suppressed when the percentage of population having acquired immunity surpasses a critical level - the herd immunity thresh…
▽ More
We study a SEIR model considered by Gomes et al. \cite{Gomes2020} and Aguas et al. \cite{Aguas2020} where different individuals are assumed to have different levels of susceptibility or exposure to infection. Under this heterogeneity assumption, epidemic growth is effectively suppressed when the percentage of population having acquired immunity surpasses a critical level - the herd immunity threshold - that is lower than in homogeneous populations. We find explicit formulas to calculate herd immunity thresholds and stable configuration, and explore extensions of the model.
△ Less
Submitted 9 May, 2022; v1 submitted 31 July, 2020;
originally announced August 2020.
-
Epidemics, the Ising-model and percolation theory: a comprehensive review focussed on Covid-19
Authors:
Isys F. Mello,
Lucas Squillante,
Gabriel O. Gomes,
Antonio C. Seridonio,
M. de Souza
Abstract:
We revisit well-established concepts of epidemiology, the Ising-model, and percolation theory. Also, we employ a spin $S$ = 1/2 Ising-like model and a (logistic) Fermi-Dirac-like function to describe the spread of Covid-19. Our analysis reinforces well-established literature results, namely: \emph{i}) that the epidemic curves can be described by a Gaussian-type function; \emph{ii}) that the tempor…
▽ More
We revisit well-established concepts of epidemiology, the Ising-model, and percolation theory. Also, we employ a spin $S$ = 1/2 Ising-like model and a (logistic) Fermi-Dirac-like function to describe the spread of Covid-19. Our analysis reinforces well-established literature results, namely: \emph{i}) that the epidemic curves can be described by a Gaussian-type function; \emph{ii}) that the temporal evolution of the accumulative number of infections and fatalities follow a logistic function, which has some resemblance with a distorted Fermi-Dirac-like function; \emph{iii}) the key role played by the quarantine to block the spread of Covid-19 in terms of an \emph{interacting} parameter, which emulates the contact between infected and non-infected people. Furthermore, in the frame of elementary percolation theory, we show that: \emph{i}) the percolation probability can be associated with the probability of a person being infected with Covid-19; \emph{ii}) the concepts of blocked and non-blocked connections can be associated, respectively, with a person respecting or not the social distancing, impacting thus in the probability of an infected person to infect other people. Increasing the number of infected people leads to an increase in the number of net connections, giving rise thus to a higher probability of new infections (percolation). We demonstrate the importance of social distancing in preventing the spread of Covid-19 in a pedagogical way. Given the impossibility of making a precise forecast of the disease spread, we highlight the importance of taking into account additional factors, such as climate changes and urbanization, in the mathematical description of epidemics. Yet, we make a connection between the standard mathematical models employed in epidemics and well-established concepts in condensed matter Physics, such as the Fermi gas and the Landau Fermi-liquid picture.
△ Less
Submitted 16 June, 2020; v1 submitted 26 March, 2020;
originally announced March 2020.
-
Extended Experimental Inferential Structure Determination Method for Evaluating the Structural Ensembles of Disordered Protein States
Authors:
James Lincoff,
Mickael Krzeminski,
Mojtaba Haghighatlari,
João M. C. Teixeira,
Gregory-Neal W. Gomes,
Claudiu C. Gradinaru,
Julie D. Forman-Kay,
Teresa Head-Gordon
Abstract:
Characterization of proteins with intrinsic or unfolded state disorder comprises a new frontier in structural biology, requiring the characterization of diverse and dynamic structural ensembles. We introduce a comprehensive Bayesian framework, the Extended Experimental Inferential Structure Determination (X-EISD) method, that calculates the maximum log-likelihood of a protein structural ensemble b…
▽ More
Characterization of proteins with intrinsic or unfolded state disorder comprises a new frontier in structural biology, requiring the characterization of diverse and dynamic structural ensembles. We introduce a comprehensive Bayesian framework, the Extended Experimental Inferential Structure Determination (X-EISD) method, that calculates the maximum log-likelihood of a protein structural ensemble by accounting for the uncertainties of a wide range of experimental data and back-calculation models from structures, including NMR chemical shifts, J-couplings, Nuclear Overhauser Effects, paramagnetic relaxation enhancements, residual dipolar couplings, and hydrodynamic radii, single molecule fluorescence Förster resonance energy transfer efficiencies and small angle X-ray scattering intensity curves. We apply X-EISD to the drkN SH3 unfolded state domain and show that certain experimental data types are more influential than others for both eliminating structural ensemble models, while also finding equally probable disordered ensembles that have alternative structural properties that will stimulate further experiments to discriminate between them.
△ Less
Submitted 29 December, 2019;
originally announced December 2019.