-
ChemGymRL: An Interactive Framework for Reinforcement Learning for Digital Chemistry
Authors:
Chris Beeler,
Sriram Ganapathi Subramanian,
Kyle Sprague,
Nouha Chatti,
Colin Bellinger,
Mitchell Shahen,
Nicholas Paquin,
Mark Baula,
Amanuel Dawit,
Zihan Yang,
Xinkai Li,
Mark Crowley,
Isaac Tamblyn
Abstract:
This paper provides a simulated laboratory for making use of Reinforcement Learning (RL) for chemical discovery. Since RL is fairly data intensive, training agents `on-the-fly' by taking actions in the real world is infeasible and possibly dangerous. Moreover, chemical processing and discovery involves challenges which are not commonly found in RL benchmarks and therefore offer a rich space to wor…
▽ More
This paper provides a simulated laboratory for making use of Reinforcement Learning (RL) for chemical discovery. Since RL is fairly data intensive, training agents `on-the-fly' by taking actions in the real world is infeasible and possibly dangerous. Moreover, chemical processing and discovery involves challenges which are not commonly found in RL benchmarks and therefore offer a rich space to work in. We introduce a set of highly customizable and open-source RL environments, ChemGymRL, based on the standard Open AI Gym template. ChemGymRL supports a series of interconnected virtual chemical benches where RL agents can operate and train. The paper introduces and details each of these benches using well-known chemical reactions as illustrative examples, and trains a set of standard RL algorithms in each of these benches. Finally, discussion and comparison of the performances of several standard RL methods are provided in addition to a list of directions for future work as a vision for the further development and usage of ChemGymRL.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Dynamic programming with incomplete information to overcome navigational uncertainty in a nautical environment
Authors:
Chris Beeler,
Xinkai Li,
Colin Bellinger,
Mark Crowley,
Maia Fraser,
Isaac Tamblyn
Abstract:
Using a novel toy nautical navigation environment, we show that dynamic programming can be used when only incomplete information about a partially observed Markov decision process (POMDP) is known. By incorporating uncertainty into our model, we show that navigation policies can be constructed that maintain safety, outperforming the baseline performance of traditional dynamic programming for Marko…
▽ More
Using a novel toy nautical navigation environment, we show that dynamic programming can be used when only incomplete information about a partially observed Markov decision process (POMDP) is known. By incorporating uncertainty into our model, we show that navigation policies can be constructed that maintain safety, outperforming the baseline performance of traditional dynamic programming for Markov decision processes (MDPs). Adding in controlled sensing methods, we show that these policies can also lower measurement costs at the same time.
△ Less
Submitted 19 July, 2022; v1 submitted 29 December, 2021;
originally announced December 2021.
-
Optimizing thermodynamic trajectories using evolutionary and gradient-based reinforcement learning
Authors:
Chris Beeler,
Uladzimir Yahorau,
Rory Coles,
Kyle Mills,
Stephen Whitelam,
Isaac Tamblyn
Abstract:
Using a model heat engine, we show that neural network-based reinforcement learning can identify thermodynamic trajectories of maximal efficiency. We consider both gradient and gradient-free reinforcement learning. We use an evolutionary learning algorithm to evolve a population of neural networks, subject to a directive to maximize the efficiency of a trajectory composed of a set of elementary th…
▽ More
Using a model heat engine, we show that neural network-based reinforcement learning can identify thermodynamic trajectories of maximal efficiency. We consider both gradient and gradient-free reinforcement learning. We use an evolutionary learning algorithm to evolve a population of neural networks, subject to a directive to maximize the efficiency of a trajectory composed of a set of elementary thermodynamic processes; the resulting networks learn to carry out the maximally-efficient Carnot, Stirling, or Otto cycles. When given an additional irreversible process, this evolutionary scheme learns a previously unknown thermodynamic cycle. Gradient-based reinforcement learning is able to learn the Stirling cycle, whereas an evolutionary approach achieves the optimal Carnot cycle. Our results show how the reinforcement learning strategies developed for game playing can be applied to solve physical problems conditioned upon path-extensive order parameters.
△ Less
Submitted 22 November, 2021; v1 submitted 20 March, 2019;
originally announced March 2019.