Search | arXiv e-print repository

Dynamic Observation Policies in Observation Cost-Sensitive Reinforcement Learning

Authors: Colin Bellinger, Mark Crowley, Isaac Tamblyn

Abstract: Reinforcement learning (RL) has been shown to learn sophisticated control policies for complex tasks including games, robotics, heating and cooling systems and text generation. The action-perception cycle in RL, however, generally assumes that a measurement of the state of the environment is available at each time step without a cost. In applications such as materials design, deep-sea and planetar… ▽ More Reinforcement learning (RL) has been shown to learn sophisticated control policies for complex tasks including games, robotics, heating and cooling systems and text generation. The action-perception cycle in RL, however, generally assumes that a measurement of the state of the environment is available at each time step without a cost. In applications such as materials design, deep-sea and planetary robot exploration and medicine, however, there can be a high cost associated with measuring, or even approximating, the state of the environment. In this paper, we survey the recently growing literature that adopts the perspective that an RL agent might not need, or even want, a costly measurement at each time step. Within this context, we propose the Deep Dynamic Multi-Step Observationless Agent (DMSOA), contrast it with the literature and empirically evaluate it on OpenAI gym and Atari Pong environments. Our results, show that DMSOA learns a better policy with fewer decision steps and measurements than the considered alternative from the literature. △ Less

Submitted 18 April, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: NeurIPS 2023 Workshop WANT

MSC Class: 68T01 ACM Class: I.2.0

arXiv:2305.14177 [pdf, other]

ChemGymRL: An Interactive Framework for Reinforcement Learning for Digital Chemistry

Authors: Chris Beeler, Sriram Ganapathi Subramanian, Kyle Sprague, Nouha Chatti, Colin Bellinger, Mitchell Shahen, Nicholas Paquin, Mark Baula, Amanuel Dawit, Zihan Yang, Xinkai Li, Mark Crowley, Isaac Tamblyn

Abstract: This paper provides a simulated laboratory for making use of Reinforcement Learning (RL) for chemical discovery. Since RL is fairly data intensive, training agents `on-the-fly' by taking actions in the real world is infeasible and possibly dangerous. Moreover, chemical processing and discovery involves challenges which are not commonly found in RL benchmarks and therefore offer a rich space to wor… ▽ More This paper provides a simulated laboratory for making use of Reinforcement Learning (RL) for chemical discovery. Since RL is fairly data intensive, training agents `on-the-fly' by taking actions in the real world is infeasible and possibly dangerous. Moreover, chemical processing and discovery involves challenges which are not commonly found in RL benchmarks and therefore offer a rich space to work in. We introduce a set of highly customizable and open-source RL environments, ChemGymRL, based on the standard Open AI Gym template. ChemGymRL supports a series of interconnected virtual chemical benches where RL agents can operate and train. The paper introduces and details each of these benches using well-known chemical reactions as illustrative examples, and trains a set of standard RL algorithms in each of these benches. Finally, discussion and comparison of the performances of several standard RL methods are provided in addition to a list of directions for future work as a vision for the further development and usage of ChemGymRL. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 19 pages, 13 figures, 2 tables

arXiv:2301.01807 [pdf, other]

fintech-kMC: Agent based simulations of financial platforms for design and testing of machine learning systems

Authors: Isaac Tamblyn, Tengkai Yu, Ian Benlolo

Abstract: We discuss our simulation tool, fintech-kMC, which is designed to generate synthetic data for machine learning model development and testing. fintech-kMC is an agent-based model driven by a kinetic Monte Carlo (a.k.a. continuous time Monte Carlo) engine which simulates the behaviour of customers using an online digital financial platform. The tool provides an interpretable, reproducible, and reali… ▽ More We discuss our simulation tool, fintech-kMC, which is designed to generate synthetic data for machine learning model development and testing. fintech-kMC is an agent-based model driven by a kinetic Monte Carlo (a.k.a. continuous time Monte Carlo) engine which simulates the behaviour of customers using an online digital financial platform. The tool provides an interpretable, reproducible, and realistic way of generating synthetic data which can be used to validate and test AI/ML models and pipelines to be used in real-world customer-facing financial applications. △ Less

Submitted 4 January, 2023; originally announced January 2023.

Comments: To appear at AAAI-23 Bridge Program: AI for Financial Services, Washington D.C., February 7 - 8, 2023

arXiv:2205.07408 [pdf, other]

Training neural networks using Metropolis Monte Carlo and an adaptive variant

Authors: Stephen Whitelam, Viktor Selin, Ian Benlolo, Corneel Casert, Isaac Tamblyn

Abstract: We examine the zero-temperature Metropolis Monte Carlo algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis Monte Carlo can train a neural net with an accuracy comparable to that of gradient descent, if not necessarily as quickly. The Metropolis algorithm does not fail au… ▽ More We examine the zero-temperature Metropolis Monte Carlo algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis Monte Carlo can train a neural net with an accuracy comparable to that of gradient descent, if not necessarily as quickly. The Metropolis algorithm does not fail automatically when the number of parameters of a neural network is large. It can fail when a neural network's structure or neuron activations are strongly heterogenous, and we introduce an adaptive Monte Carlo algorithm, aMC, to overcome these limitations. The intrinsic stochasticity and numerical stability of the Monte Carlo method allow aMC to train deep neural networks and recurrent neural networks in which the gradient is too small or too large to allow training by gradient descent. Monte Carlo methods offer a complement to gradient-based methods for training neural networks, allowing access to a distinct set of network architectures and principles. △ Less

Submitted 9 August, 2022; v1 submitted 15 May, 2022; originally announced May 2022.

arXiv:2205.04547 [pdf, other]

doi 10.1021/acs.jctc.2c00483

Machine Learning Diffusion Monte Carlo Energies

Authors: Kevin Ryczko, Jaron T. Krogel, Isaac Tamblyn

Abstract: We present two machine learning methodologies that are capable of predicting diffusion Monte Carlo (DMC) energies with small datasets (~60 DMC calculations in total). The first uses voxel deep neural networks (VDNNs) to predict DMC energy densities using Kohn-Sham density functional theory (DFT) electron densities as input. The second uses kernel ridge regression (KRR) to predict atomic contributi… ▽ More We present two machine learning methodologies that are capable of predicting diffusion Monte Carlo (DMC) energies with small datasets (~60 DMC calculations in total). The first uses voxel deep neural networks (VDNNs) to predict DMC energy densities using Kohn-Sham density functional theory (DFT) electron densities as input. The second uses kernel ridge regression (KRR) to predict atomic contributions to the DMC total energy using atomic environment vectors as input (we used atom centred symmetry functions, atomic environment vectors from the ANI models, and smooth overlap of atomic positions). We first compare the methodologies on pristine graphene lattices, where we find the KRR methodology performs best in comparison to gradient boosted decision trees, random forest, gaussian process regression, and multilayer perceptrons. In addition, KRR outperforms VDNNs by an order of magnitude. Afterwards, we study the generalizability of KRR to predict the energy barrier associated with a Stone-Wales defect. Lastly, we move from 2D to 3D materials and use KRR to predict total energies of liquid water. In all cases, we find that the KRR models are more accurate than Kohn-Sham DFT and all mean absolute errors are less than chemical accuracy. △ Less

Submitted 5 October, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

arXiv:2204.02474 [pdf, other]

Generative Enriched Sequential Learning (ESL) Approach for Molecular Design via Augmented Domain Knowledge

Authors: Mohammad Sajjad Ghaemi, Karl Grantham, Isaac Tamblyn, Yifeng Li, Hsu Kiang Ooi

Abstract: Deploying generative machine learning techniques to generate novel chemical structures based on molecular fingerprint representation has been well established in molecular design. Typically, sequential learning (SL) schemes such as hidden Markov models (HMM) and, more recently, in the sequential deep learning context, recurrent neural network (RNN) and long short-term memory (LSTM) were used exten… ▽ More Deploying generative machine learning techniques to generate novel chemical structures based on molecular fingerprint representation has been well established in molecular design. Typically, sequential learning (SL) schemes such as hidden Markov models (HMM) and, more recently, in the sequential deep learning context, recurrent neural network (RNN) and long short-term memory (LSTM) were used extensively as generative models to discover unprecedented molecules. To this end, emission probability between two states of atoms plays a central role without considering specific chemical or physical properties. Lack of supervised domain knowledge can mislead the learning procedure to be relatively biased to the prevalent molecules observed in the training data that are not necessarily of interest. We alleviated this drawback by augmenting the training data with domain knowledge, e.g. quantitative estimates of the drug-likeness score (QEDs). As such, our experiments demonstrated that with this subtle trick called enriched sequential learning (ESL), specific patterns of particular interest can be learnt better, which led to generating de novo molecules with ameliorated QEDs. △ Less

Submitted 5 April, 2022; originally announced April 2022.

Comments: 6 pages

arXiv:2203.05551 [pdf, other]

doi 10.1103/PhysRevE.108.014126

Cellular automata can classify data by inducing trajectory phase coexistence

Authors: Stephen Whitelam, Isaac Tamblyn

Abstract: We show that cellular automata can classify data by inducing a form of dynamical phase coexistence. We use Monte Carlo methods to search for general two-dimensional deterministic automata that classify images on the basis of activity, the number of state changes that occur in a trajectory initiated from the image. When the number of timesteps of the automaton is a trainable parameter, the search s… ▽ More We show that cellular automata can classify data by inducing a form of dynamical phase coexistence. We use Monte Carlo methods to search for general two-dimensional deterministic automata that classify images on the basis of activity, the number of state changes that occur in a trajectory initiated from the image. When the number of timesteps of the automaton is a trainable parameter, the search scheme identifies automata that generate a population of dynamical trajectories displaying high or low activity, depending on initial conditions. Automata of this nature behave as nonlinear activation functions with an output that is effectively binary, resembling an emergent version of a spiking neuron. △ Less

Submitted 25 July, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

arXiv:2202.08708 [pdf, other]

Learning stochastic dynamics and predicting emergent behavior using transformers

Authors: Corneel Casert, Isaac Tamblyn, Stephen Whitelam

Abstract: We show that a neural network originally designed for language processing can learn the dynamical rules of a stochastic system by observation of a single dynamical trajectory of the system, and can accurately predict its emergent behavior under conditions not observed during training. We consider a lattice model of active matter undergoing continuous-time Monte Carlo dynamics, simulated at a densi… ▽ More We show that a neural network originally designed for language processing can learn the dynamical rules of a stochastic system by observation of a single dynamical trajectory of the system, and can accurately predict its emergent behavior under conditions not observed during training. We consider a lattice model of active matter undergoing continuous-time Monte Carlo dynamics, simulated at a density at which its steady state comprises small, dispersed clusters. We train a neural network called a transformer on a single trajectory of the model. The transformer, which we show has the capacity to represent dynamical rules that are numerous and nonlocal, learns that the dynamics of this model consists of a small number of processes. Forward-propagated trajectories of the trained transformer, at densities not encountered during training, exhibit motility-induced phase separation and so predict the existence of a nonequilibrium phase transition. Transformers have the flexibility to learn dynamical rules from observation without explicit enumeration of rates or coarse-graining of configuration space, and so the procedure used here can be applied to a wide range of physical systems, including those with large and complex dynamical generators. △ Less

Submitted 17 February, 2022; originally announced February 2022.

arXiv:2112.14657 [pdf, other]

Dynamic programming with incomplete information to overcome navigational uncertainty in a nautical environment

Authors: Chris Beeler, Xinkai Li, Colin Bellinger, Mark Crowley, Maia Fraser, Isaac Tamblyn

Abstract: Using a novel toy nautical navigation environment, we show that dynamic programming can be used when only incomplete information about a partially observed Markov decision process (POMDP) is known. By incorporating uncertainty into our model, we show that navigation policies can be constructed that maintain safety, outperforming the baseline performance of traditional dynamic programming for Marko… ▽ More Using a novel toy nautical navigation environment, we show that dynamic programming can be used when only incomplete information about a partially observed Markov decision process (POMDP) is known. By incorporating uncertainty into our model, we show that navigation policies can be constructed that maintain safety, outperforming the baseline performance of traditional dynamic programming for Markov decision processes (MDPs). Adding in controlled sensing methods, we show that these policies can also lower measurement costs at the same time. △ Less

Submitted 19 July, 2022; v1 submitted 29 December, 2021; originally announced December 2021.

Comments: 11 pages, 5 figures

arXiv:2112.07535 [pdf, other]

Scientific Discovery and the Cost of Measurement -- Balancing Information and Cost in Reinforcement Learning

Authors: Colin Bellinger, Andriy Drozdyuk, Mark Crowley, Isaac Tamblyn

Abstract: The use of reinforcement learning (RL) in scientific applications, such as materials design and automated chemistry, is increasing. A major challenge, however, lies in fact that measuring the state of the system is often costly and time consuming in scientific applications, whereas policy learning with RL requires a measurement after each time step. In this work, we make the measurement costs expl… ▽ More The use of reinforcement learning (RL) in scientific applications, such as materials design and automated chemistry, is increasing. A major challenge, however, lies in fact that measuring the state of the system is often costly and time consuming in scientific applications, whereas policy learning with RL requires a measurement after each time step. In this work, we make the measurement costs explicit in the form of a costed reward and propose a framework that enables off-the-shelf deep RL algorithms to learn a policy for both selecting actions and determining whether or not to measure the current state of the system at each time step. In this way, the agents learn to balance the need for information with the cost of information. Our results show that when trained under this regime, the Dueling DQN and PPO agents can learn optimal action policies whilst making up to 50\% fewer state measurements, and recurrent neural networks can produce a greater than 50\% reduction in measurements. We postulate the these reduction can help to lower the barrier to applying RL to real-world scientific applications. △ Less

Submitted 6 April, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: To appear in: 1st Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE)

arXiv:2106.06124 [pdf, other]

doi 10.1088/2632-2153/ac9885

Twin Neural Network Regression is a Semi-Supervised Regression Algorithm

Authors: Sebastian J. Wetzel, Roger G. Melko, Isaac Tamblyn

Abstract: Twin neural network regression (TNNR) is a semi-supervised regression algorithm, it can be trained on unlabelled data points as long as other, labelled anchor data points, are present. TNNR is trained to predict differences between the target values of two different data points rather than the targets themselves. By ensembling predicted differences between the targets of an unseen data point and a… ▽ More Twin neural network regression (TNNR) is a semi-supervised regression algorithm, it can be trained on unlabelled data points as long as other, labelled anchor data points, are present. TNNR is trained to predict differences between the target values of two different data points rather than the targets themselves. By ensembling predicted differences between the targets of an unseen data point and all training data points, it is possible to obtain a very accurate prediction for the original regression problem. Since any loop of predicted differences should sum to zero, loops can be supplied to the training data, even if the data points themselves within loops are unlabelled. Semi-supervised training improves TNNR performance, which is already state of the art, significantly. △ Less

Submitted 10 June, 2021; originally announced June 2021.

arXiv:2103.03716 [pdf, other]

doi 10.1039/D1SC01545A

Golem: An algorithm for robust experiment and process optimization

Authors: Matteo Aldeghi, Florian Häse, Riley J. Hickman, Isaac Tamblyn, Alán Aspuru-Guzik

Abstract: Numerous challenges in science and engineering can be framed as optimization tasks, including the maximization of reaction yields, the optimization of molecular and materials properties, and the fine-tuning of automated hardware protocols. Design of experiment and optimization algorithms are often adopted to solve these tasks efficiently. Increasingly, these experiment planning strategies are coup… ▽ More Numerous challenges in science and engineering can be framed as optimization tasks, including the maximization of reaction yields, the optimization of molecular and materials properties, and the fine-tuning of automated hardware protocols. Design of experiment and optimization algorithms are often adopted to solve these tasks efficiently. Increasingly, these experiment planning strategies are coupled with automated hardware to enable autonomous experimental platforms. The vast majority of the strategies used, however, do not consider robustness against the variability of experiment and process conditions. In fact, it is generally assumed that these parameters are exact and reproducible. Yet some experiments may have considerable noise associated with some of their conditions, and process parameters optimized under precise control may be applied in the future under variable operating conditions. In either scenario, the optimal solutions found might not be robust against input variability, affecting the reproducibility of results and returning suboptimal performance in practice. Here, we introduce Golem, an algorithm that is agnostic to the choice of experiment planning strategy and that enables robust experiment and process optimization. Golem identifies optimal solutions that are robust to input uncertainty, thus ensuring the reproducible performance of optimized experimental protocols and processes. It can be used to analyze the robustness of past experiments, or to guide experiment planning algorithms toward robust solutions on the fly. We assess the performance and domain of applicability of Golem through extensive benchmark studies and demonstrate its practical relevance by optimizing an analytical chemistry protocol under the presence of significant noise in its experimental conditions. △ Less

Submitted 12 October, 2021; v1 submitted 5 March, 2021; originally announced March 2021.

Comments: 37 pages, 25 figures; additional experiments, expanded discussions and references

Journal ref: Chemical Science, 2021, 12, 14792 - 14807

arXiv:2102.11743 [pdf, other]

Weakly-supervised multi-class object localization using only object counts as labels

Authors: Kyle Mills, Isaac Tamblyn

Abstract: We demonstrate the use of an extensive deep neural network to localize instances of objects in images. The EDNN is naturally able to accurately perform multi-class counting using only ground truth count values as labels. Without providing any conceptual information, object annotations, or pixel segmentation information, the neural network is able to formulate its own conceptual representation of t… ▽ More We demonstrate the use of an extensive deep neural network to localize instances of objects in images. The EDNN is naturally able to accurately perform multi-class counting using only ground truth count values as labels. Without providing any conceptual information, object annotations, or pixel segmentation information, the neural network is able to formulate its own conceptual representation of the items in the image. Using images labelled with only the counts of the objects present,the structure of the extensive deep neural network can be exploited to perform localization of the objects within the visual field. We demonstrate that a trained EDNN can be used to count objects in images much larger than those on which it was trained. In order to demonstrate our technique, we introduce seven new data sets: five progressively harder MNIST digit-counting data sets, and two datasets of 3d-rendered rubber ducks in various situations. On most of these datasets, the EDNN achieves greater than 99% test set accuracy in counting objects. △ Less

Submitted 23 February, 2021; originally announced February 2021.

arXiv:2101.04383 [pdf]

Interpretable discovery of new semiconductors with machine learning

Authors: Hitarth Choubisa, Petar Todorović, Joao M. Pina, Darshan H. Parmar, Ziliang Li, Oleksandr Voznyy, Isaac Tamblyn, Edward Sargent

Abstract: Machine learning models of materials$^{1-5}$ accelerate discovery compared to ab initio methods: deep learning models now reproduce density functional theory (DFT)-calculated results at one hundred thousandths of the cost of DFT$^{6}$. To provide guidance in experimental materials synthesis, these need to be coupled with an accurate yet effective search algorithm and training data consistent with… ▽ More Machine learning models of materials$^{1-5}$ accelerate discovery compared to ab initio methods: deep learning models now reproduce density functional theory (DFT)-calculated results at one hundred thousandths of the cost of DFT$^{6}$. To provide guidance in experimental materials synthesis, these need to be coupled with an accurate yet effective search algorithm and training data consistent with experimental observations. Here we report an evolutionary algorithm powered search which uses machine-learned surrogate models trained on high-throughput hybrid functional DFT data benchmarked against experimental bandgaps: Deep Adaptive Regressive Weighted Intelligent Network (DARWIN). The strategy enables efficient search over the materials space of ~10$^8$ ternaries and 10$^{11}$ quaternaries$^{7}$ for candidates with target properties. It provides interpretable design rules, such as our finding that the difference in the electronegativity between the halide and B-site cation being a strong predictor of ternary structural stability. As an example, when we seek UV emission, DARWIN predicts K$_2$CuX$_3$ (X = Cl, Br) as a promising materials family, based on its electronegativity difference. We synthesized and found these materials to be stable, direct bandgap UV emitters. The approach also allows knowledge distillation for use by humans. △ Less

Submitted 12 January, 2021; originally announced January 2021.

Comments: 25 pages, 4 figures, 1 table

arXiv:2012.14873 [pdf, other]

doi 10.1002/ail2.78

Twin Neural Network Regression

Authors: Sebastian J. Wetzel, Kevin Ryczko, Roger G. Melko, Isaac Tamblyn

Abstract: We introduce twin neural network (TNN) regression. This method predicts differences between the target values of two different data points rather than the targets themselves. The solution of a traditional regression problem is then obtained by averaging over an ensemble of all predicted differences between the targets of an unseen data point and all training data points. Whereas ensembles are norm… ▽ More We introduce twin neural network (TNN) regression. This method predicts differences between the target values of two different data points rather than the targets themselves. The solution of a traditional regression problem is then obtained by averaging over an ensemble of all predicted differences between the targets of an unseen data point and all training data points. Whereas ensembles are normally costly to produce, TNN regression intrinsically creates an ensemble of predictions of twice the size of the training set while only training a single neural network. Since ensembles have been shown to be more accurate than single models this property naturally transfers to TNN regression. We show that TNNs are able to compete or yield more accurate predictions for different data sets, compared to other state-of-the-art methods. Furthermore, TNN regression is constrained by self-consistency conditions. We find that the violation of these conditions provides an estimate for the prediction uncertainty. △ Less

Submitted 29 December, 2020; originally announced December 2020.

arXiv:2012.11832 [pdf, other]

doi 10.1103/PhysRevLett.127.018003

Neuroevolutionary learning of particles and protocols for self-assembly

Authors: Stephen Whitelam, Isaac Tamblyn

Abstract: Within simulations of molecules deposited on a surface we show that neuroevolutionary learning can design particles and time-dependent protocols to promote self-assembly, without input from physical concepts such as thermal equilibrium or mechanical stability and without prior knowledge of candidate or competing structures. The learning algorithm is capable of both directed and exploratory design:… ▽ More Within simulations of molecules deposited on a surface we show that neuroevolutionary learning can design particles and time-dependent protocols to promote self-assembly, without input from physical concepts such as thermal equilibrium or mechanical stability and without prior knowledge of candidate or competing structures. The learning algorithm is capable of both directed and exploratory design: it can assemble a material with a user-defined property, or search for novelty in the space of specified order parameters. In the latter mode it explores the space of what can be made rather than the space of structures that are low in energy but not necessarily kinetically accessible. △ Less

Submitted 22 December, 2020; originally announced December 2020.

Journal ref: Phys. Rev. Lett. 127, 018003 (2021)

arXiv:2012.10328 [pdf, ps, other]

doi 10.1139/cjp-2022-0115

Deep learning and high harmonic generation

Authors: M. Lytova, M. Spanner, I. Tamblyn

Abstract: Using machine learning, we explore the utility of various deep neural networks (NN) when applied to high harmonic generation (HHG) scenarios. First, we train the NNs to predict the time-dependent dipole and spectra of HHG emission from reduced-dimensionality models of di- and triatomic systems based of on sets of randomly generated parameters (laser pulse intensity, internuclear distance, and mole… ▽ More Using machine learning, we explore the utility of various deep neural networks (NN) when applied to high harmonic generation (HHG) scenarios. First, we train the NNs to predict the time-dependent dipole and spectra of HHG emission from reduced-dimensionality models of di- and triatomic systems based of on sets of randomly generated parameters (laser pulse intensity, internuclear distance, and molecular orientation). These networks, once trained, are useful tools to rapidly generate the HHG spectra of our systems. Similarly, we have trained the NNs to solve the inverse problem - to determine the molecular parameters based on HHG spectra or dipole acceleration data. These types of networks could then be used as spectroscopic tools to invert HHG spectra in order to recover the underlying physical parameters of a system. Next, we demonstrate that transfer learning can be applied to our networks to expand the range of applicability of the networks with only a small number of new test cases added to our training sets. Finally, we demonstrate NNs that can be used to classify molecules by type: di- or triatomic, symmetric or asymmetric, wherein we can even rely on fairly simple fully connected neural networks. With outlooks toward training with experimental data, these NN topologies offer a novel set of spectroscopic tools that could be incorporated into HHG experiments. △ Less

Submitted 4 January, 2021; v1 submitted 18 December, 2020; originally announced December 2020.

Journal ref: Can. J. Phys. 101, 132 (2023)

arXiv:2011.08657 [pdf, other]

doi 10.1103/PhysRevLett.127.120602

Dynamical large deviations of two-dimensional kinetically constrained models using a neural-network state ansatz

Authors: Corneel Casert, Tom Vieijra, Stephen Whitelam, Isaac Tamblyn

Abstract: We use a neural network ansatz originally designed for the variational optimization of quantum systems to study dynamical large deviations in classical ones. We obtain the scaled cumulant-generating function for the dynamical activity of the Fredrickson-Andersen model, a prototypical kinetically constrained model, in one and two dimensions, and present the first size-scaling analysis of the dynami… ▽ More We use a neural network ansatz originally designed for the variational optimization of quantum systems to study dynamical large deviations in classical ones. We obtain the scaled cumulant-generating function for the dynamical activity of the Fredrickson-Andersen model, a prototypical kinetically constrained model, in one and two dimensions, and present the first size-scaling analysis of the dynamical activity in two dimensions. These results provide a new route to the study of dynamical large-deviation functions, and highlight the broad applicability of the neural-network state ansatz across domains in physics. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Journal ref: Phys. Rev. Lett. 127, 120602 (2021)

arXiv:2010.14236 [pdf, other]

doi 10.1088/2632-2153/abda08

Scientific intuition inspired by machine learning generated hypotheses

Authors: Pascal Friederich, Mario Krenn, Isaac Tamblyn, Alan Aspuru-Guzik

Abstract: Machine learning with application to questions in the physical sciences has become a widely used tool, successfully applied to classification, regression and optimization tasks in many areas. Research focus mostly lies in improving the accuracy of the machine learning models in numerical predictions, while scientific understanding is still almost exclusively generated by human researchers analysin… ▽ More Machine learning with application to questions in the physical sciences has become a widely used tool, successfully applied to classification, regression and optimization tasks in many areas. Research focus mostly lies in improving the accuracy of the machine learning models in numerical predictions, while scientific understanding is still almost exclusively generated by human researchers analysing numerical results and drawing conclusions. In this work, we shift the focus on the insights and the knowledge obtained by the machine learning models themselves. In particular, we study how it can be extracted and used to inspire human scientists to increase their intuitions and understanding of natural systems. We apply gradient boosting in decision trees to extract human interpretable insights from big data sets from chemistry and physics. In chemistry, we not only rediscover widely know rules of thumb but also find new interesting motifs that tell us how to control solubility and energy levels of organic molecules. At the same time, in quantum physics, we gain new understanding on experiments for quantum entanglement. The ability to go beyond numerics and to enter the realm of scientific insight and hypothesis generation opens the door to use machine learning to accelerate the discovery of conceptual understanding in some of the most challenging domains of science. △ Less

Submitted 14 December, 2020; v1 submitted 27 October, 2020; originally announced October 2020.

Journal ref: Machine Learning: Science and Technology 2, 025027 (2021)

arXiv:2008.06643 [pdf, other]

doi 10.1038/s41467-021-26568-2

Correspondence between neuroevolution and gradient descent

Authors: Stephen Whitelam, Viktor Selin, Sang-Won Park, Isaac Tamblyn

Abstract: We show analytically that training a neural network by conditioned stochastic mutation or neuroevolution of its weights is equivalent, in the limit of small mutations, to gradient descent on the loss function in the presence of Gaussian white noise. Averaged over independent realizations of the learning process, neuroevolution is equivalent to gradient descent on the loss function. We use numerica… ▽ More We show analytically that training a neural network by conditioned stochastic mutation or neuroevolution of its weights is equivalent, in the limit of small mutations, to gradient descent on the loss function in the presence of Gaussian white noise. Averaged over independent realizations of the learning process, neuroevolution is equivalent to gradient descent on the loss function. We use numerical simulation to show that this correspondence can be observed for finite mutations,for shallow and deep neural networks. Our results provide a connection between two families of neural-network training methods that are usually considered to be fundamentally different. △ Less

Submitted 10 September, 2021; v1 submitted 14 August, 2020; originally announced August 2020.

arXiv:2005.12697 [pdf, other]

Active Measure Reinforcement Learning for Observation Cost Minimization

Authors: Colin Bellinger, Rory Coles, Mark Crowley, Isaac Tamblyn

Abstract: Standard reinforcement learning (RL) algorithms assume that the observation of the next state comes instantaneously and at no cost. In a wide variety of sequential decision making tasks ranging from medical treatment to scientific discovery, however, multiple classes of state observations are possible, each of which has an associated cost. We propose the active measure RL framework (Amrl) as an in… ▽ More Standard reinforcement learning (RL) algorithms assume that the observation of the next state comes instantaneously and at no cost. In a wide variety of sequential decision making tasks ranging from medical treatment to scientific discovery, however, multiple classes of state observations are possible, each of which has an associated cost. We propose the active measure RL framework (Amrl) as an initial solution to this problem where the agent learns to maximize the costed return, which we define as the discounted sum of rewards minus the sum of observation costs. Our empirical evaluation demonstrates that Amrl-Q agents are able to learn a policy and state estimator in parallel during online training. During training the agent naturally shifts from its reliance on costly measurements of the environment to its state estimator in order to increase its reward. It does this without harm to the learned policy. Our results show that the Amrl-Q agent learns at a rate similar to standard Q-learning and Dyna-Q. Critically, by utilizing an active strategy, Amrl-Q achieves a higher costed return. △ Less

Submitted 26 May, 2020; originally announced May 2020.

Comments: Under review at NeurIPS 2020

MSC Class: 68T01

arXiv:2004.07333 [pdf, other]

Reinforcement Learning in a Physics-Inspired Semi-Markov Environment

Authors: Colin Bellinger, Rory Coles, Mark Crowley, Isaac Tamblyn

Abstract: Reinforcement learning (RL) has been demonstrated to have great potential in many applications of scientific discovery and design. Recent work includes, for example, the design of new structures and compositions of molecules for therapeutic drugs. Much of the existing work related to the application of RL to scientific domains, however, assumes that the available state representation obeys the Mar… ▽ More Reinforcement learning (RL) has been demonstrated to have great potential in many applications of scientific discovery and design. Recent work includes, for example, the design of new structures and compositions of molecules for therapeutic drugs. Much of the existing work related to the application of RL to scientific domains, however, assumes that the available state representation obeys the Markov property. For reasons associated with time, cost, sensor accuracy, and gaps in scientific knowledge, many scientific design and discovery problems do not satisfy the Markov property. Thus, something other than a Markov decision process (MDP) should be used to plan / find the optimal policy. In this paper, we present a physics-inspired semi-Markov RL environment, namely the phase change environment. In addition, we evaluate the performance of value-based RL algorithms for both MDPs and partially observable MDPs (POMDPs) on the proposed environment. Our results demonstrate deep recurrent Q-networks (DRQN) significantly outperform deep Q-networks (DQN), and that DRQNs benefit from training with hindsight experience replay. Implications for the use of semi-Markovian RL and POMDPs for scientific laboratories are also discussed. △ Less

Submitted 15 April, 2020; originally announced April 2020.

Comments: To appear in the Canadian Conference on Artificial Intelligence, 2020

ACM Class: I.2; J.2

arXiv:2003.02647 [pdf, other]

doi 10.1088/2632-2153/abc81b

Watch and learn -- a generalized approach for transferrable learning in deep neural networks via physical principles

Authors: Kyle Sprague, Juan Carrasquilla, Steve Whitelam, Isaac Tamblyn

Abstract: Transfer learning refers to the use of knowledge gained while solving a machine learning task and applying it to the solution of a closely related problem. Such an approach has enabled scientific breakthroughs in computer vision and natural language processing where the weights learned in state-of-the-art models can be used to initialize models for other tasks which dramatically improve their perf… ▽ More Transfer learning refers to the use of knowledge gained while solving a machine learning task and applying it to the solution of a closely related problem. Such an approach has enabled scientific breakthroughs in computer vision and natural language processing where the weights learned in state-of-the-art models can be used to initialize models for other tasks which dramatically improve their performance and save computational time. Here we demonstrate an unsupervised learning approach augmented with basic physical principles that achieves fully transferrable learning for problems in statistical physics across different physical regimes. By coupling a sequence model based on a recurrent neural network to an extensive deep neural network, we are able to learn the equilibrium probability distributions and inter-particle interaction models of classical statistical mechanical systems. Our approach, distribution-consistent learning, DCL, is a general strategy that works for a variety of canonical statistical mechanical models (Ising and Potts) as well as disordered (spin-glass) interaction potentials. Using data collected from a single set of observation conditions, DCL successfully extrapolates across all temperatures, thermodynamic phases, and can be applied to different length-scales. This constitutes a fully transferrable physics-based learning in a generalizable approach. △ Less

Submitted 3 March, 2020; originally announced March 2020.

arXiv:1912.08333 [pdf, other]

doi 10.1103/PhysRevE.101.052604

Learning to grow: control of material self-assembly using evolutionary reinforcement learning

Authors: Stephen Whitelam, Isaac Tamblyn

Abstract: We show that neural networks trained by evolutionary reinforcement learning can enact efficient molecular self-assembly protocols. Presented with molecular simulation trajectories, networks learn to change temperature and chemical potential in order to promote the assembly of desired structures or choose between competing polymorphs. In the first case, networks reproduce in a qualitative sense the… ▽ More We show that neural networks trained by evolutionary reinforcement learning can enact efficient molecular self-assembly protocols. Presented with molecular simulation trajectories, networks learn to change temperature and chemical potential in order to promote the assembly of desired structures or choose between competing polymorphs. In the first case, networks reproduce in a qualitative sense the results of previously-known protocols, but faster and with higher fidelity; in the second case they identify strategies previously unknown, from which we can extract physical insight. Networks that take as input the elapsed time of the simulation or microscopic information from the system are both effective, the latter more so. The evolutionary scheme we have used is simple to implement and can be applied to a broad range of examples of experimental self-assembly, whether or not one can monitor the experiment as it proceeds. Our results have been achieved with no human input beyond the specification of which order parameter to promote, pointing the way to the design of synthesis protocols by artificial intelligence. △ Less

Submitted 28 May, 2020; v1 submitted 17 December, 2019; originally announced December 2019.

Journal ref: Phys. Rev. E 101, 052604 (2020)

arXiv:1909.00835 [pdf, other]

doi 10.1063/5.0015301

Evolutionary reinforcement learning of dynamical large deviations

Authors: Stephen Whitelam, Daniel Jacobson, Isaac Tamblyn

Abstract: We show how to calculate the likelihood of dynamical large deviations using evolutionary reinforcement learning. An agent, a stochastic model, propagates a continuous-time Monte Carlo trajectory and receives a reward conditioned upon the values of certain path-extensive quantities. Evolution produces progressively fitter agents, eventually allowing the calculation of a piece of a large-deviation r… ▽ More We show how to calculate the likelihood of dynamical large deviations using evolutionary reinforcement learning. An agent, a stochastic model, propagates a continuous-time Monte Carlo trajectory and receives a reward conditioned upon the values of certain path-extensive quantities. Evolution produces progressively fitter agents, eventually allowing the calculation of a piece of a large-deviation rate function for a particular model and path-extensive quantity. For models with small state spaces the evolutionary process acts directly on rates, and for models with large state spaces the process acts on the weights of a neural network that parameterizes the model's rates. This approach shows how path-extensive physics problems can be considered within a framework widely used in machine learning. △ Less

Submitted 21 February, 2020; v1 submitted 2 September, 2019; originally announced September 2019.

arXiv:1903.08543 [pdf, other]

doi 10.1103/PhysRevE.104.064128

Optimizing thermodynamic trajectories using evolutionary and gradient-based reinforcement learning

Authors: Chris Beeler, Uladzimir Yahorau, Rory Coles, Kyle Mills, Stephen Whitelam, Isaac Tamblyn

Abstract: Using a model heat engine, we show that neural network-based reinforcement learning can identify thermodynamic trajectories of maximal efficiency. We consider both gradient and gradient-free reinforcement learning. We use an evolutionary learning algorithm to evolve a population of neural networks, subject to a directive to maximize the efficiency of a trajectory composed of a set of elementary th… ▽ More Using a model heat engine, we show that neural network-based reinforcement learning can identify thermodynamic trajectories of maximal efficiency. We consider both gradient and gradient-free reinforcement learning. We use an evolutionary learning algorithm to evolve a population of neural networks, subject to a directive to maximize the efficiency of a trajectory composed of a set of elementary thermodynamic processes; the resulting networks learn to carry out the maximally-efficient Carnot, Stirling, or Otto cycles. When given an additional irreversible process, this evolutionary scheme learns a previously unknown thermodynamic cycle. Gradient-based reinforcement learning is able to learn the Stirling cycle, whereas an evolutionary approach achieves the optimal Carnot cycle. Our results show how the reinforcement learning strategies developed for game playing can be applied to solve physical problems conditioned upon path-extensive order parameters. △ Less

Submitted 22 November, 2021; v1 submitted 20 March, 2019; originally announced March 2019.

Comments: 11 pages, 5 figures

Journal ref: Phys. Rev. E 104, 064128 (2021)

arXiv:1702.01361 [pdf, other]

doi 10.1103/PhysRevA.96.042113

Deep learning and the Schrödinger equation

Authors: Kyle Mills, Michael Spanner, Isaac Tamblyn

Abstract: We have trained a deep (convolutional) neural network to predict the ground-state energy of an electron in four classes of confining two-dimensional electrostatic potentials. On randomly generated potentials, for which there is no analytic form for either the potential or the ground-state energy, the neural network model was able to predict the ground-state energy to within chemical accuracy, with… ▽ More We have trained a deep (convolutional) neural network to predict the ground-state energy of an electron in four classes of confining two-dimensional electrostatic potentials. On randomly generated potentials, for which there is no analytic form for either the potential or the ground-state energy, the neural network model was able to predict the ground-state energy to within chemical accuracy, with a median absolute error of 1.49 mHa. We also investigate the performance of the model in predicting other quantities such as the kinetic energy and the first excited-state energy of random potentials. △ Less

Submitted 3 November, 2017; v1 submitted 4 February, 2017; originally announced February 2017.

Journal ref: Phys. Rev. A 96, 042113 (2017)

arXiv:1610.07458 [pdf, other]

doi 10.1007/s13278-017-0424-7

Hashkat: Large-scale simulations of online social networks

Authors: Kevin Ryczko, Adam Domurad, Nicholas Buhagiar, Isaac Tamblyn

Abstract: Hashkat (http://hashkat.org) is a free, open source, agent based simulation software package designed to simulate large-scale online social networks (e.g. Twitter, Facebook, LinkedIn, etc). It allows for dynamic agent generation, edge creation, and information propagation. The purpose of hashkat is to study the growth of online social networks and how information flows within them. Like real life… ▽ More Hashkat (http://hashkat.org) is a free, open source, agent based simulation software package designed to simulate large-scale online social networks (e.g. Twitter, Facebook, LinkedIn, etc). It allows for dynamic agent generation, edge creation, and information propagation. The purpose of hashkat is to study the growth of online social networks and how information flows within them. Like real life online social networks, hashkat incorporates user relationships, information diffusion, and trending topics. Hashkat was implemented in C++, and was designed with extensibility in mind. The software includes Shell and Python scripts for easy installation and usability. In this report, we describe all of the algorithms and features integrated into hashkat before moving on to example use cases. In general, hashkat can be used to understand the underlying topology of social networks, validate sampling methods of such networks, develop business strategy for advertising on online social networks, and test new features of an online social network before going into production. △ Less

Submitted 24 October, 2016; originally announced October 2016.

Showing 1–28 of 28 results for author: Tamblyn, I