Search | arXiv e-print repository

ChemGymRL: An Interactive Framework for Reinforcement Learning for Digital Chemistry

Authors: Chris Beeler, Sriram Ganapathi Subramanian, Kyle Sprague, Nouha Chatti, Colin Bellinger, Mitchell Shahen, Nicholas Paquin, Mark Baula, Amanuel Dawit, Zihan Yang, Xinkai Li, Mark Crowley, Isaac Tamblyn

Abstract: This paper provides a simulated laboratory for making use of Reinforcement Learning (RL) for chemical discovery. Since RL is fairly data intensive, training agents `on-the-fly' by taking actions in the real world is infeasible and possibly dangerous. Moreover, chemical processing and discovery involves challenges which are not commonly found in RL benchmarks and therefore offer a rich space to wor… ▽ More This paper provides a simulated laboratory for making use of Reinforcement Learning (RL) for chemical discovery. Since RL is fairly data intensive, training agents `on-the-fly' by taking actions in the real world is infeasible and possibly dangerous. Moreover, chemical processing and discovery involves challenges which are not commonly found in RL benchmarks and therefore offer a rich space to work in. We introduce a set of highly customizable and open-source RL environments, ChemGymRL, based on the standard Open AI Gym template. ChemGymRL supports a series of interconnected virtual chemical benches where RL agents can operate and train. The paper introduces and details each of these benches using well-known chemical reactions as illustrative examples, and trains a set of standard RL algorithms in each of these benches. Finally, discussion and comparison of the performances of several standard RL methods are provided in addition to a list of directions for future work as a vision for the further development and usage of ChemGymRL. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 19 pages, 13 figures, 2 tables

arXiv:2112.14657 [pdf, other]

Dynamic programming with incomplete information to overcome navigational uncertainty in a nautical environment

Authors: Chris Beeler, Xinkai Li, Colin Bellinger, Mark Crowley, Maia Fraser, Isaac Tamblyn

Abstract: Using a novel toy nautical navigation environment, we show that dynamic programming can be used when only incomplete information about a partially observed Markov decision process (POMDP) is known. By incorporating uncertainty into our model, we show that navigation policies can be constructed that maintain safety, outperforming the baseline performance of traditional dynamic programming for Marko… ▽ More Using a novel toy nautical navigation environment, we show that dynamic programming can be used when only incomplete information about a partially observed Markov decision process (POMDP) is known. By incorporating uncertainty into our model, we show that navigation policies can be constructed that maintain safety, outperforming the baseline performance of traditional dynamic programming for Markov decision processes (MDPs). Adding in controlled sensing methods, we show that these policies can also lower measurement costs at the same time. △ Less

Submitted 19 July, 2022; v1 submitted 29 December, 2021; originally announced December 2021.

Comments: 11 pages, 5 figures

arXiv:1903.08543 [pdf, other]

doi 10.1103/PhysRevE.104.064128

Optimizing thermodynamic trajectories using evolutionary and gradient-based reinforcement learning

Authors: Chris Beeler, Uladzimir Yahorau, Rory Coles, Kyle Mills, Stephen Whitelam, Isaac Tamblyn

Abstract: Using a model heat engine, we show that neural network-based reinforcement learning can identify thermodynamic trajectories of maximal efficiency. We consider both gradient and gradient-free reinforcement learning. We use an evolutionary learning algorithm to evolve a population of neural networks, subject to a directive to maximize the efficiency of a trajectory composed of a set of elementary th… ▽ More Using a model heat engine, we show that neural network-based reinforcement learning can identify thermodynamic trajectories of maximal efficiency. We consider both gradient and gradient-free reinforcement learning. We use an evolutionary learning algorithm to evolve a population of neural networks, subject to a directive to maximize the efficiency of a trajectory composed of a set of elementary thermodynamic processes; the resulting networks learn to carry out the maximally-efficient Carnot, Stirling, or Otto cycles. When given an additional irreversible process, this evolutionary scheme learns a previously unknown thermodynamic cycle. Gradient-based reinforcement learning is able to learn the Stirling cycle, whereas an evolutionary approach achieves the optimal Carnot cycle. Our results show how the reinforcement learning strategies developed for game playing can be applied to solve physical problems conditioned upon path-extensive order parameters. △ Less

Submitted 22 November, 2021; v1 submitted 20 March, 2019; originally announced March 2019.

Comments: 11 pages, 5 figures

Journal ref: Phys. Rev. E 104, 064128 (2021)

Showing 1–3 of 3 results for author: Beeler, C