-
Reinforcement Learning via Auxiliary Task Distillation
Authors:
Abhinav Narayan Harish,
Larry Heck,
Josiah P. Hanna,
Zsolt Kira,
Andrew Szot
Abstract:
We present Reinforcement Learning via Auxiliary Task Distillation (AuxDistill), a new method that enables reinforcement learning (RL) to perform long-horizon robot control problems by distilling behaviors from auxiliary RL tasks. AuxDistill achieves this by concurrently carrying out multi-task RL with auxiliary tasks, which are easier to learn and relevant to the main task. A weighted distillation…
▽ More
We present Reinforcement Learning via Auxiliary Task Distillation (AuxDistill), a new method that enables reinforcement learning (RL) to perform long-horizon robot control problems by distilling behaviors from auxiliary RL tasks. AuxDistill achieves this by concurrently carrying out multi-task RL with auxiliary tasks, which are easier to learn and relevant to the main task. A weighted distillation loss transfers behaviors from these auxiliary tasks to solve the main task. We demonstrate that AuxDistill can learn a pixels-to-actions policy for a challenging multi-stage embodied object rearrangement task from the environment reward without demonstrations, a learning curriculum, or pre-trained skills. AuxDistill achieves $2.3 \times$ higher success than the previous state-of-the-art baseline in the Habitat Object Rearrangement benchmark and outperforms methods that use pre-trained skills and expert demonstrations.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Authors:
Subhojyoti Mukherjee,
Josiah P. Hanna,
Qiaomin Xie,
Robert Nowak
Abstract:
In this paper, we study multi-task structured bandit problem where the goal is to learn a near-optimal algorithm that minimizes cumulative regret. The tasks share a common structure and the algorithm exploits the shared structure to minimize the cumulative regret for an unseen but related test task. We use a transformer as a decision-making algorithm to learn this shared structure so as to general…
▽ More
In this paper, we study multi-task structured bandit problem where the goal is to learn a near-optimal algorithm that minimizes cumulative regret. The tasks share a common structure and the algorithm exploits the shared structure to minimize the cumulative regret for an unseen but related test task. We use a transformer as a decision-making algorithm to learn this shared structure so as to generalize to the test task. The prior work of pretrained decision transformers like DPT requires access to the optimal action during training which may be hard in several scenarios. Diverging from these works, our learning algorithm does not need the knowledge of optimal action per task during training but predicts a reward vector for each of the actions using only the observed offline data from the diverse training tasks. Finally, during inference time, it selects action using the reward predictions employing various exploration strategies in-context for an unseen test task. Our model outperforms other SOTA methods like DPT, and Algorithmic Distillation over a series of experiments on several structured bandit problems (linear, bilinear, latent, non-linear). Interestingly, we show that our algorithm, without the knowledge of the underlying problem structure, can learn a near-optimal policy in-context by leveraging the shared structure across diverse tasks. We further extend the field of pre-trained decision transformers by showing that they can leverage unseen tasks with new actions and still learn the underlying latent structure to derive a near-optimal policy. We validate this over several experiments to show that our proposed solution is very general and has wide applications to potentially emergent online and offline strategies at test time. Finally, we theoretically analyze the performance of our algorithm and obtain generalization bounds in the in-context multi-task learning setting.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP
Authors:
Subhojyoti Mukherjee,
Josiah P. Hanna,
Robert Nowak
Abstract:
In this paper, we study safe data collection for the purpose of policy evaluation in tabular Markov decision processes (MDPs). In policy evaluation, we are given a \textit{target} policy and asked to estimate the expected cumulative reward it will obtain. Policy evaluation requires data and we are interested in the question of what \textit{behavior} policy should collect the data for the most accu…
▽ More
In this paper, we study safe data collection for the purpose of policy evaluation in tabular Markov decision processes (MDPs). In policy evaluation, we are given a \textit{target} policy and asked to estimate the expected cumulative reward it will obtain. Policy evaluation requires data and we are interested in the question of what \textit{behavior} policy should collect the data for the most accurate evaluation of the target policy. While prior work has considered behavior policy selection, in this paper, we additionally consider a safety constraint on the behavior policy. Namely, we assume there exists a known default policy that incurs a particular expected cost when run and we enforce that the cumulative cost of all behavior policies ran is better than a constant factor of the cost that would be incurred had we always run the default policy. We first show that there exists a class of intractable MDPs where no safe oracle algorithm with knowledge about problem parameters can efficiently collect data and satisfy the safety constraints. We then define the tractability condition for an MDP such that a safe oracle algorithm can efficiently collect data and using that we prove the first lower bound for this setting. We then introduce an algorithm SaVeR for this problem that approximates the safe oracle algorithm and bound the finite-sample mean squared error of the algorithm while ensuring it satisfies the safety constraint. Finally, we show in simulations that SaVeR produces low MSE policy evaluation while satisfying the safety constraint.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Adaptive Exploration for Data-Efficient General Value Function Evaluations
Authors:
Arushi Jain,
Josiah P. Hanna,
Doina Precup
Abstract:
General Value Functions (GVFs) (Sutton et al, 2011) are an established way to represent predictive knowledge in reinforcement learning. Each GVF computes the expected return for a given policy, based on a unique pseudo-reward. Multiple GVFs can be estimated in parallel using off-policy learning from a single stream of data, often sourced from a fixed behavior policy or pre-collected dataset. This…
▽ More
General Value Functions (GVFs) (Sutton et al, 2011) are an established way to represent predictive knowledge in reinforcement learning. Each GVF computes the expected return for a given policy, based on a unique pseudo-reward. Multiple GVFs can be estimated in parallel using off-policy learning from a single stream of data, often sourced from a fixed behavior policy or pre-collected dataset. This leaves an open question: how can behavior policy be chosen for data-efficient GVF learning? To address this gap, we propose GVFExplorer, which aims at learning a behavior policy that efficiently gathers data for evaluating multiple GVFs in parallel. This behavior policy selects actions in proportion to the total variance in the return across all GVFs, reducing the number of environmental interactions. To enable accurate variance estimation, we use a recently proposed temporal-difference-style variance estimator. We prove that each behavior policy update reduces the mean squared error in the summed predictions over all GVFs. We empirically demonstrate our method's performance in both tabular representations and nonlinear function approximation.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Assorted remarks on bending measures and energies for plates and shells, and their invariance properties
Authors:
E. Vitral,
J. A. Hanna
Abstract:
In this note, we address several issues, including some raised in recent works and commentary, related to bending measures and energies for plates and shells, and certain of their invariance properties. We discuss the distinction between definitions and results in our and others' approaches, correct an error and citation oversights in our work, and provide additional brief observations regarding t…
▽ More
In this note, we address several issues, including some raised in recent works and commentary, related to bending measures and energies for plates and shells, and certain of their invariance properties. We discuss the distinction between definitions and results in our and others' approaches, correct an error and citation oversights in our work, and provide additional brief observations regarding the relative size of energetic terms and the symmetrization of bending measures. Particular points of emphasis are a reiteration of some of the early history of dilation-invariant measures, the similarities between all such measures, and the non-dilation-invariance of our recently introduced bending measure for shells and curved rods. In the course of this discussion, we provide a simpler presentation of the elementary, but much overlooked, fact that the additional tangential stretch of material near the mid-surface of a thin body is the product of the mid-surface stretch and the change in curvature.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Neural Plasticity-Inspired Multimodal Foundation Model for Earth Observation
Authors:
Zhitong Xiong,
Yi Wang,
Fahong Zhang,
Adam J. Stewart,
Joëlle Hanna,
Damian Borth,
Ioannis Papoutsis,
Bertrand Le Saux,
Gustau Camps-Valls,
Xiao Xiang Zhu
Abstract:
The development of foundation models has revolutionized our ability to interpret the Earth's surface using satellite observational data. Traditional models have been siloed, tailored to specific sensors or data types like optical, radar, and hyperspectral, each with its own unique characteristics. This specialization hinders the potential for a holistic analysis that could benefit from the combine…
▽ More
The development of foundation models has revolutionized our ability to interpret the Earth's surface using satellite observational data. Traditional models have been siloed, tailored to specific sensors or data types like optical, radar, and hyperspectral, each with its own unique characteristics. This specialization hinders the potential for a holistic analysis that could benefit from the combined strengths of these diverse data sources. Our novel approach introduces the Dynamic One-For-All (DOFA) model, leveraging the concept of neural plasticity in brain science to integrate various data modalities into a single framework adaptively. This dynamic hypernetwork, adjusting to different wavelengths, enables a single versatile Transformer jointly trained on data from five sensors to excel across 12 distinct Earth observation tasks, including sensors never seen during pretraining. DOFA's innovative design offers a promising leap towards more accurate, efficient, and unified Earth observation analysis, showcasing remarkable adaptability and performance in harnessing the potential of multimodal Earth observation data.
△ Less
Submitted 7 June, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Future Prediction Can be a Strong Evidence of Good History Representation in Partially Observable Environments
Authors:
Jeongyeol Kwon,
Liu Yang,
Robert Nowak,
Josiah Hanna
Abstract:
Learning a good history representation is one of the core challenges of reinforcement learning (RL) in partially observable environments. Recent works have shown the advantages of various auxiliary tasks for facilitating representation learning. However, the effectiveness of such auxiliary tasks has not been fully convincing, especially in partially observable environments that require long-term m…
▽ More
Learning a good history representation is one of the core challenges of reinforcement learning (RL) in partially observable environments. Recent works have shown the advantages of various auxiliary tasks for facilitating representation learning. However, the effectiveness of such auxiliary tasks has not been fully convincing, especially in partially observable environments that require long-term memorization and inference. In this empirical study, we investigate the effectiveness of future prediction for learning the representations of histories, possibly of extensive length, in partially observable environments. We first introduce an approach that decouples the task of learning history representations from policy optimization via future prediction. Then, our main contributions are two-fold: (a) we demonstrate that the performance of reinforcement learning is strongly correlated with the prediction accuracy of future observations in partially observable environments, and (b) our approach can significantly improve the overall end-to-end approach by preventing high-variance noisy signals from reinforcement learning objectives to influence the representation learning. We illustrate our claims on three types of benchmarks that necessitate the ability to process long histories for high returns.
△ Less
Submitted 10 February, 2024;
originally announced February 2024.
-
On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
Authors:
Nicholas E. Corrado,
Josiah P. Hanna
Abstract:
On-policy reinforcement learning (RL) algorithms perform policy updates using i.i.d. trajectories collected by the current policy. However, after observing only a finite number of trajectories, on-policy sampling may produce data that fails to match the expected on-policy data distribution. This sampling error leads to noisy updates and data inefficient on-policy learning. Recent work in the polic…
▽ More
On-policy reinforcement learning (RL) algorithms perform policy updates using i.i.d. trajectories collected by the current policy. However, after observing only a finite number of trajectories, on-policy sampling may produce data that fails to match the expected on-policy data distribution. This sampling error leads to noisy updates and data inefficient on-policy learning. Recent work in the policy evaluation setting has shown that non-i.i.d., off-policy sampling can produce data with lower sampling error than on-policy sampling can produce. Motivated by this observation, we introduce an adaptive, off-policy sampling method to improve the data efficiency of on-policy policy gradient algorithms. Our method, Proximal Robust On-Policy Sampling (PROPS), reduces sampling error by collecting data with a behavior policy that increases the probability of sampling actions that are under-sampled with respect to the current policy. Rather than discarding data from old policies -- as is commonly done in on-policy algorithms -- PROPS uses data collection to adjust the distribution of previously collected data to be approximately on-policy. We empirically evaluate PROPS on both continuous-action MuJoCo benchmark tasks as well as discrete-action tasks and demonstrate that (1) PROPS decreases sampling error throughout training and (2) improves the data efficiency of on-policy policy gradient algorithms. Our work improves the RL community's understanding of a nuance in the on-policy vs off-policy dichotomy: on-policy learning requires on-policy data, not on-policy sampling.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Multi-task Representation Learning for Pure Exploration in Bilinear Bandits
Authors:
Subhojyoti Mukherjee,
Qiaomin Xie,
Josiah P. Hanna,
Robert Nowak
Abstract:
We study multi-task representation learning for the problem of pure exploration in bilinear bandits. In bilinear bandits, an action takes the form of a pair of arms from two different entity types and the reward is a bilinear function of the known feature vectors of the arms. In the \textit{multi-task bilinear bandit problem}, we aim to find optimal actions for multiple tasks that share a common l…
▽ More
We study multi-task representation learning for the problem of pure exploration in bilinear bandits. In bilinear bandits, an action takes the form of a pair of arms from two different entity types and the reward is a bilinear function of the known feature vectors of the arms. In the \textit{multi-task bilinear bandit problem}, we aim to find optimal actions for multiple tasks that share a common low-dimensional linear representation. The objective is to leverage this characteristic to expedite the process of identifying the best pair of arms for all tasks. We propose the algorithm GOBLIN that uses an experimental design approach to optimize sample allocations for learning the global representation as well as minimize the number of samples needed to identify the optimal pair of arms in individual tasks. To the best of our knowledge, this is the first study to give sample complexity analysis for pure exploration in bilinear bandits with shared representation. Our results demonstrate that by learning the shared representation across tasks, we achieve significantly improved sample complexity compared to the traditional approach of solving tasks independently.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
State-Action Similarity-Based Representations for Off-Policy Evaluation
Authors:
Brahma S. Pavse,
Josiah P. Hanna
Abstract:
In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the expected return of an evaluation policy given a fixed dataset that was collected by running one or more different policies. One of the more empirically successful algorithms for OPE has been the fitted q-evaluation (FQE) algorithm that uses temporal difference updates to learn an action-value function, which is…
▽ More
In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the expected return of an evaluation policy given a fixed dataset that was collected by running one or more different policies. One of the more empirically successful algorithms for OPE has been the fitted q-evaluation (FQE) algorithm that uses temporal difference updates to learn an action-value function, which is then used to estimate the expected return of the evaluation policy. Typically, the original fixed dataset is fed directly into FQE to learn the action-value function of the evaluation policy. Instead, in this paper, we seek to enhance the data-efficiency of FQE by first transforming the fixed dataset using a learned encoder, and then feeding the transformed dataset into FQE. To learn such an encoder, we introduce an OPE-tailored state-action behavioral similarity metric, and use this metric and the fixed dataset to learn an encoder that models this metric. Theoretically, we show that this metric allows us to bound the error in the resulting OPE estimate. Empirically, we show that other state-action similarity metrics lead to representations that cannot represent the action-value function of the evaluation policy, and that our state-action representation method boosts the data-efficiency of FQE and lowers OPE error relative to other OPE-based representation learning methods on challenging OPE tasks. We also empirically show that the learned representations significantly mitigate divergence of FQE under varying distribution shifts. Our code is available here: https://github.com/Badger-RL/ROPE.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning
Authors:
Nicholas E. Corrado,
Yuxiao Qu,
John U. Balis,
Adam Labiosa,
Josiah P. Hanna
Abstract:
In offline reinforcement learning (RL), an RL agent learns to solve a task using only a fixed dataset of previously collected data. While offline RL has been successful in learning real-world robot control policies, it typically requires large amounts of expert-quality data to learn effective policies that generalize to out-of-distribution states. Unfortunately, such data is often difficult and ex…
▽ More
In offline reinforcement learning (RL), an RL agent learns to solve a task using only a fixed dataset of previously collected data. While offline RL has been successful in learning real-world robot control policies, it typically requires large amounts of expert-quality data to learn effective policies that generalize to out-of-distribution states. Unfortunately, such data is often difficult and expensive to acquire in real-world tasks. Several recent works have leveraged data augmentation (DA) to inexpensively generate additional data, but most DA works apply augmentations in a random fashion and ultimately produce highly suboptimal augmented experience. In this work, we propose Guided Data Augmentation (GuDA), a human-guided DA framework that generates expert-quality augmented data. The key insight behind GuDA is that while it may be difficult to demonstrate the sequence of actions required to produce expert data, a user can often easily characterize when an augmented trajectory segment represents progress toward task completion. Thus, a user can restrict the space of possible augmentations to automatically reject suboptimal augmented data. To extract a policy from GuDA, we use off-the-shelf offline reinforcement learning and behavior cloning algorithms. We evaluate GuDA on a physical robot soccer task as well as simulated D4RL navigation tasks, a simulated autonomous driving task, and a simulated soccer task. Empirically, GuDA enables learning given a small initial dataset of potentially suboptimal experience and outperforms a random DA strategy as well as a model-based DA strategy.
△ Less
Submitted 16 March, 2024; v1 submitted 27 October, 2023;
originally announced October 2023.
-
Understanding when Dynamics-Invariant Data Augmentations Benefit Model-Free Reinforcement Learning Updates
Authors:
Nicholas E. Corrado,
Josiah P. Hanna
Abstract:
Recently, data augmentation (DA) has emerged as a method for leveraging domain knowledge to inexpensively generate additional data in reinforcement learning (RL) tasks, often yielding substantial improvements in data efficiency. While prior work has demonstrated the utility of incorporating augmented data directly into model-free RL updates, it is not well-understood when a particular DA strategy…
▽ More
Recently, data augmentation (DA) has emerged as a method for leveraging domain knowledge to inexpensively generate additional data in reinforcement learning (RL) tasks, often yielding substantial improvements in data efficiency. While prior work has demonstrated the utility of incorporating augmented data directly into model-free RL updates, it is not well-understood when a particular DA strategy will improve data efficiency. In this paper, we seek to identify general aspects of DA responsible for observed learning improvements. Our study focuses on sparse-reward tasks with dynamics-invariant data augmentation functions, serving as an initial step towards a more general understanding of DA and its integration into RL training. Experimentally, we isolate three relevant aspects of DA: state-action coverage, reward density, and the number of augmented transitions generated per update (the augmented replay ratio). From our experiments, we draw two conclusions: (1) increasing state-action coverage often has a much greater impact on data efficiency than increasing reward density, and (2) decreasing the augmented replay ratio substantially improves data efficiency. In fact, certain tasks in our empirical study are solvable only when the replay ratio is sufficiently low.
△ Less
Submitted 16 March, 2024; v1 submitted 26 October, 2023;
originally announced October 2023.
-
Buckling mediated by mobile localized elastic excitations
Authors:
R. S. Hutton,
E. Vitral,
E. Hamm,
J. A. Hanna
Abstract:
Experiments reveal that structural transitions in thin sheets are mediated by the passage of transient and stable mobile localized elastic excitations. These ``crumples'' or ``d-cones'' nucleate, propagate, interact, annihilate, and escape. Much of the dynamics occurs on millisecond time scales. Nucleation sites correspond to regions where generators of the ideal unstretched surface converge. Addi…
▽ More
Experiments reveal that structural transitions in thin sheets are mediated by the passage of transient and stable mobile localized elastic excitations. These ``crumples'' or ``d-cones'' nucleate, propagate, interact, annihilate, and escape. Much of the dynamics occurs on millisecond time scales. Nucleation sites correspond to regions where generators of the ideal unstretched surface converge. Additional stable intermediate states illustrate two forms of quasistatic inter-crumple interaction through ridges or valleys. These interactions create pairs from which extended patterns may be constructed in larger specimens. The onset of localized transient deformation with increasing sheet size is correlated with a characteristic stable crumple size, whose measured scaling with thickness is consistent with prior theory and experiment for localized elastic features in thin sheets. We offer a new theoretical justification of this scaling.
△ Less
Submitted 26 January, 2024; v1 submitted 26 October, 2023;
originally announced October 2023.
-
Ben-ge: Extending BigEarthNet with Geographical and Environmental Data
Authors:
Michael Mommert,
Nicolas Kesseli,
Joëlle Hanna,
Linus Scheibenreif,
Damian Borth,
Begüm Demir
Abstract:
Deep learning methods have proven to be a powerful tool in the analysis of large amounts of complex Earth observation data. However, while Earth observation data are multi-modal in most cases, only single or few modalities are typically considered. In this work, we present the ben-ge dataset, which supplements the BigEarthNet-MM dataset by compiling freely and globally available geographical and e…
▽ More
Deep learning methods have proven to be a powerful tool in the analysis of large amounts of complex Earth observation data. However, while Earth observation data are multi-modal in most cases, only single or few modalities are typically considered. In this work, we present the ben-ge dataset, which supplements the BigEarthNet-MM dataset by compiling freely and globally available geographical and environmental data. Based on this dataset, we showcase the value of combining different data modalities for the downstream tasks of patch-based land-use/land-cover classification and land-use/land-cover segmentation. ben-ge is freely available and expected to serve as a test bed for fully supervised and self-supervised Earth observation applications.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces
Authors:
Brahma S. Pavse,
Matthew Zurek,
Yudong Chen,
Qiaomin Xie,
Josiah P. Hanna
Abstract:
In many reinforcement learning (RL) applications, we want policies that reach desired states and then keep the controlled system within an acceptable region around the desired states over an indefinite period of time. This latter objective is called stability and is especially important when the state space is unbounded, such that the states can be arbitrarily far from each other and the agent can…
▽ More
In many reinforcement learning (RL) applications, we want policies that reach desired states and then keep the controlled system within an acceptable region around the desired states over an indefinite period of time. This latter objective is called stability and is especially important when the state space is unbounded, such that the states can be arbitrarily far from each other and the agent can drift far away from the desired states. For example, in stochastic queuing networks, where queues of waiting jobs can grow without bound, the desired state is all-zero queue lengths. Here, a stable policy ensures queue lengths are finite while an optimal policy minimizes queue lengths. Since an optimal policy is also stable, one would expect that RL algorithms would implicitly give us stable policies. However, in this work, we find that deep RL algorithms that directly minimize the distance to the desired state during online training often result in unstable policies, i.e., policies that drift far away from the desired state. We attribute this instability to poor credit-assignment for destabilizing actions. We then introduce an approach based on two ideas: 1) a Lyapunov-based cost-sha** technique and 2) state transformations to the unbounded state space. We conduct an empirical study on various queueing networks and traffic signal control problems and find that our approach performs competitively against strong baselines with knowledge of the transition dynamics. Our code is available here: https://github.com/Badger-RL/STOP.
△ Less
Submitted 26 May, 2024; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Conditional Mutual Information for Disentangled Representations in Reinforcement Learning
Authors:
Mhairi Dunion,
Trevor McInroe,
Kevin Sebastian Luck,
Josiah P. Hanna,
Stefano V. Albrecht
Abstract:
Reinforcement Learning (RL) environments can produce training data with spurious correlations between features due to the amount of training data or its limited feature coverage. This can lead to RL agents encoding these misleading correlations in their latent representation, preventing the agent from generalising if the correlation changes within the environment or when deployed in the real world…
▽ More
Reinforcement Learning (RL) environments can produce training data with spurious correlations between features due to the amount of training data or its limited feature coverage. This can lead to RL agents encoding these misleading correlations in their latent representation, preventing the agent from generalising if the correlation changes within the environment or when deployed in the real world. Disentangled representations can improve robustness, but existing disentanglement techniques that minimise mutual information between features require independent features, thus they cannot disentangle correlated features. We propose an auxiliary task for RL algorithms that learns a disentangled representation of high-dimensional observations with correlated features by minimising the conditional mutual information between features in the representation. We demonstrate experimentally, using continuous control tasks, that our approach improves generalisation under correlation shifts, as well as improving the training performance of RL algorithms in the presence of correlated features.
△ Less
Submitted 12 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Comment on the elastica section in Thorne and Blandford "Modern Classical Physics", the shape of things, and the aspect ratio of reality
Authors:
J. A. Hanna
Abstract:
I point out and diagnose an error in a figure in a textbook on classical physics. The error helps to illustrate a pitfall encountered when dealing with the shapes of objects, and perhaps also reflects general cultural attitudes in physics. Another, less interesting, error is noted in passing.
I point out and diagnose an error in a figure in a textbook on classical physics. The error helps to illustrate a pitfall encountered when dealing with the shapes of objects, and perhaps also reflects general cultural attitudes in physics. Another, less interesting, error is noted in passing.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits
Authors:
Subhojyoti Mukherjee,
Qiaomin Xie,
Josiah Hanna,
Robert Nowak
Abstract:
In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits. In policy evaluation, we are given a target policy and asked to estimate the expected reward it will obtain when executed in a multi-armed bandit environment. Our work is the first work that focuses on such optimal data collection strategy for policy evaluation involving heteroscedastic reward n…
▽ More
In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits. In policy evaluation, we are given a target policy and asked to estimate the expected reward it will obtain when executed in a multi-armed bandit environment. Our work is the first work that focuses on such optimal data collection strategy for policy evaluation involving heteroscedastic reward noise in the linear bandit setting. We first formulate an optimal design for weighted least squares estimates in the heteroscedastic linear bandit setting that reduces the MSE of the value of the target policy. We then use this formulation to derive the optimal allocation of samples per action during data collection. We then introduce a novel algorithm SPEED (Structured Policy Evaluation Experimental Design) that tracks the optimal design and derive its regret with respect to the optimal design. Finally, we empirically validate that SPEED leads to policy evaluation with mean squared error comparable to the oracle strategy and significantly lower than simply running the target policy.
△ Less
Submitted 29 February, 2024; v1 submitted 28 January, 2023;
originally announced January 2023.
-
Sensitivity analysis using Physics-informed neural networks
Authors:
John M. Hanna,
José V. Aguado,
Sebastien Comas-Cardona,
Ramzi Askri,
Domenico Borzacchiello
Abstract:
The goal of this paper is to provide a simple approach to perform local sensitivity analysis using Physics-informed neural networks (PINN). The main idea lies in adding a new term in the loss function that regularizes the solution in a small neighborhood near the nominal value of the parameter of interest. The added term represents the derivative of the loss function with respect to the parameter…
▽ More
The goal of this paper is to provide a simple approach to perform local sensitivity analysis using Physics-informed neural networks (PINN). The main idea lies in adding a new term in the loss function that regularizes the solution in a small neighborhood near the nominal value of the parameter of interest. The added term represents the derivative of the loss function with respect to the parameter of interest. The result of this modification is a solution to the problem along with the derivative of the solution with respect to the parameter of interest (the sensitivity). We call the new technique SA-PNN which stands for sensitivity analysis in PINN. The effectiveness of the technique is shown using four examples: the first one is a simple one-dimensional advection-diffusion problem to show the methodology, the second is a two-dimensional Poisson's problem with nine parameters of interest, and the third and fourth examples are one and two-dimensional transient two-phase flow in porous media problem.
△ Less
Submitted 6 June, 2024; v1 submitted 6 January, 2023;
originally announced January 2023.
-
Safe Evaluation For Offline Learning: Are We Ready To Deploy?
Authors:
Hager Radi,
Josiah P. Hanna,
Peter Stone,
Matthew E. Taylor
Abstract:
The world currently offers an abundance of data in multiple domains, from which we can learn reinforcement learning (RL) policies without further interaction with the environment. RL agents learning offline from such data is possible but deploying them while learning might be dangerous in domains where safety is critical. Therefore, it is essential to find a way to estimate how a newly-learned age…
▽ More
The world currently offers an abundance of data in multiple domains, from which we can learn reinforcement learning (RL) policies without further interaction with the environment. RL agents learning offline from such data is possible but deploying them while learning might be dangerous in domains where safety is critical. Therefore, it is essential to find a way to estimate how a newly-learned agent will perform if deployed in the target environment before actually deploying it and without the risk of overestimating its true performance. To achieve this, we introduce a framework for safe evaluation of offline learning using approximate high-confidence off-policy evaluation (HCOPE) to estimate the performance of offline policies during learning. In our setting, we assume a source of data, which we split into a train-set, to learn an offline policy, and a test-set, to estimate a lower-bound on the offline policy using off-policy evaluation with bootstrap**. A lower-bound estimate tells us how good a newly-learned target policy would perform before it is deployed in the real environment, and therefore allows us to decide when to deploy our learned policy.
△ Less
Submitted 16 December, 2022;
originally announced December 2022.
-
Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction
Authors:
Brahma S. Pavse,
Josiah P. Hanna
Abstract:
We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of an evaluation policy, $π_e$, using a fixed dataset, $\mathcal{D}$, collected by one or more policies that may be different from $π_e$. Current OPE algorithms may produce poor OPE estimates under policy distribution shift i.e., when the probability of a particular…
▽ More
We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of an evaluation policy, $π_e$, using a fixed dataset, $\mathcal{D}$, collected by one or more policies that may be different from $π_e$. Current OPE algorithms may produce poor OPE estimates under policy distribution shift i.e., when the probability of a particular state-action pair occurring under $π_e$ is very different from the probability of that same pair occurring in $\mathcal{D}$ (Voloshin et al. 2021, Fu et al. 2021). In this work, we propose to improve the accuracy of OPE estimators by projecting the high-dimensional state-space into a low-dimensional state-space using concepts from the state abstraction literature. Specifically, we consider marginalized importance sampling (MIS) OPE algorithms which compute state-action distribution correction ratios to produce their OPE estimate. In the original ground state-space, these ratios may have high variance which may lead to high variance OPE. However, we prove that in the lower-dimensional abstract state-space the ratios can have lower variance resulting in lower variance OPE. We then highlight the challenges that arise when estimating the abstract ratios from data, identify sufficient conditions to overcome these issues, and present a minimax optimization problem whose solution yields these abstract ratios. Finally, our empirical evaluation on difficult, high-dimensional state-space OPE tasks shows that the abstract ratios can make MIS OPE estimators achieve lower mean-squared error and more robust to hyperparameter tuning than the ground ratios.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
The Impact of Inter-grain Phases on the Ionic Conductivity of LAGP Solid Electrolyte Prepared by Spark Plasma Sintering
Authors:
Sorina Cretu,
David G. Bradley,
Omer Ulas Kudu,
Li Patrick Wen Feng,
Linh Lan Nguyen,
Tuan Tu Nguyen,
Arash Jamali,
Jean-Noel Chotard,
Vincent Seznec,
John V. Hanna,
Arnaud Demortière,
Martial Duchamp
Abstract:
Li1.5Al0.5Ge1.5(PO4)3 (LAGP) is a promising oxide solid electrolyte for all-solid-state batteries due to its excellent air stability, wide electrochemical stability window and cost-effective precursor materials. However, further improvement in their ionic conductivity performance is hindered by the presence of inter-grain phases leading to a major obstacle to the advanced design of oxide based sol…
▽ More
Li1.5Al0.5Ge1.5(PO4)3 (LAGP) is a promising oxide solid electrolyte for all-solid-state batteries due to its excellent air stability, wide electrochemical stability window and cost-effective precursor materials. However, further improvement in their ionic conductivity performance is hindered by the presence of inter-grain phases leading to a major obstacle to the advanced design of oxide based solid-state electrolytes. This study establishes and quantifies the influence of inter-grain phases, their 3D morphology, and formed compositions on the overall ion conductivity properties of LAGP pellets fabricated under different Spark plasma sintering conditions. Based on complementary techniques, such as PEIS, XRD, 3D FIB-SEM tomography and solid-state MAS NMR coupled with DFT modelling, a deep insight into the inter-grain phase microstructures is obtained revealing that the inter-grain region is comprised of Li4P2O7 and a disordered Li9Al3(P2O7)3(PO4)2 phase. We demonstrate that optimal ionic conductivity for the LAGP system is achieved for the 680 °C SPS preparation when the disordered Li9Al3(P2O7)3(PO4)2 phase dominates the inter-grain region composition with reduced contributions from the highly ordered Li4P2O7 phases.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
A Joint Imitation-Reinforcement Learning Framework for Reduced Baseline Regret
Authors:
Sheelabhadra Dey,
Sumedh Pendurkar,
Guni Sharon,
Josiah P. Hanna
Abstract:
In various control task domains, existing controllers provide a baseline level of performance that -- though possibly suboptimal -- should be maintained. Reinforcement learning (RL) algorithms that rely on extensive exploration of the state and action space can be used to optimize a control policy. However, fully exploratory RL algorithms may decrease performance below a baseline level during trai…
▽ More
In various control task domains, existing controllers provide a baseline level of performance that -- though possibly suboptimal -- should be maintained. Reinforcement learning (RL) algorithms that rely on extensive exploration of the state and action space can be used to optimize a control policy. However, fully exploratory RL algorithms may decrease performance below a baseline level during training. In this paper, we address the issue of online optimization of a control policy while minimizing regret w.r.t a baseline policy performance. We present a joint imitation-reinforcement learning framework, denoted JIRL. The learning process in JIRL assumes the availability of a baseline policy and is designed with two objectives in mind \textbf{(a)} leveraging the baseline's online demonstrations to minimize the regret w.r.t the baseline policy during training, and \textbf{(b)} eventually surpassing the baseline performance. JIRL addresses these objectives by initially learning to imitate the baseline policy and gradually shifting control from the baseline to an RL agent. Experimental results show that JIRL effectively accomplishes the aforementioned objectives in several, continuous action-space domains. The results demonstrate that JIRL is comparable to a state-of-the-art algorithm in its final performance while incurring significantly lower baseline regret during training in all of the presented domains. Moreover, the results show a reduction factor of up to $21$ in baseline regret over a state-of-the-art baseline regret minimization approach.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Momentum and pseudomomentum in a shallow water equation
Authors:
J. A. Hanna
Abstract:
A basic shallow water system with variable topography is analyzed from the point of view of a Lagrangian derivation of momentum, energy, and pseudomomentum balances. A two-dimensional action and associated momentum equation are derived. The latter is further manipulated to derive additional equations for energy and pseudomomentum. This revealed structure emphasizes broken symmetries in space and a…
▽ More
A basic shallow water system with variable topography is analyzed from the point of view of a Lagrangian derivation of momentum, energy, and pseudomomentum balances. A two-dimensional action and associated momentum equation are derived. The latter is further manipulated to derive additional equations for energy and pseudomomentum. This revealed structure emphasizes broken symmetries in space and a reference configuration, and preserved symmetry in time.
△ Less
Submitted 11 August, 2022;
originally announced August 2022.
-
Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning
Authors:
Mhairi Dunion,
Trevor McInroe,
Kevin Sebastian Luck,
Josiah P. Hanna,
Stefano V. Albrecht
Abstract:
Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training. This issue is especially problematic for image-based RL, where a change in just one variable, such as the background colour, can change many pixels in the image. The changed pixels can lead to drastic changes in the agent's latent representatio…
▽ More
Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training. This issue is especially problematic for image-based RL, where a change in just one variable, such as the background colour, can change many pixels in the image. The changed pixels can lead to drastic changes in the agent's latent representation of the image, causing the learned policy to fail. To learn more robust representations, we introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled image representations exploiting the sequential nature of RL observations. We find empirically that RL algorithms utilising TED as an auxiliary task adapt more quickly to changes in environment variables with continued training compared to state-of-the-art representation learning methods. Since TED enforces a disentangled structure of the representation, our experiments also show that policies trained with TED generalise better to unseen values of variables irrelevant to the task (e.g. background colour) as well as unseen values of variables that affect the optimal policy (e.g. goal positions).
△ Less
Submitted 27 February, 2023; v1 submitted 12 July, 2022;
originally announced July 2022.
-
Anomalous curvature evolution and geometric regularization of energy focusing in the snap** dynamics of a flexible body
Authors:
A. R. Dehadrai,
J. A. Hanna
Abstract:
We examine the focusing of kinetic energy and the amplification of various quantities during the snap** motion of the free end of a flexible structure. This brief but violent event appears to be a regularized finite-time singularity, with remarkably large spikes in velocity, acceleration, and tension easily induced by generic initial and boundary conditions. A numerical scheme for the inextensib…
▽ More
We examine the focusing of kinetic energy and the amplification of various quantities during the snap** motion of the free end of a flexible structure. This brief but violent event appears to be a regularized finite-time singularity, with remarkably large spikes in velocity, acceleration, and tension easily induced by generic initial and boundary conditions. A numerical scheme for the inextensible string equations is validated against available experimental data for a falling chain and further employed to explore the phenomenon. We determine that the discretization of the equations, equivalent to the physically discrete problem of a chain, does not provide the regularizing length scale, which in the absence of other physical effects must then arise from the geometry of the problem. An analytical solution for a geometrically singular limit, a falling perfectly-folded string, accounts surprisingly well for the scalings of several quantities in the numerics, but can only indirectly suggest a behavior for the curvature, one which seems to explain prior experimental data but does not correspond to the evolution of the curvature peak in our system, which instead displays a newly observed anomalously slow scaling. A simple model, incorporating only knowledge of the initial conditions along with the anomalous and singular-limit scalings, provides reasonable estimates for the amplifications of relevant quantities. This is a first step to predict and harness arbitrarily large energy focusing in structures, with a practical limit set only by length scales present in the discrete mechanical system or the initial conditions.
△ Less
Submitted 7 April, 2023; v1 submitted 15 June, 2022;
originally announced June 2022.
-
Avoiding localization instabilities in rotary pleating
Authors:
Tian Yu,
J. A. Hanna
Abstract:
Rotary pleating is a widely used process for making filters out of nonwoven fabric sheets. This involves indirect elastic-plastic bending of pre-weakened creases by continuously injecting material into an accordion-shaped pack. This step can fail through a localization instability that creates a kink in a pleat facet instead of in the desired crease location. In the present work, we consider the e…
▽ More
Rotary pleating is a widely used process for making filters out of nonwoven fabric sheets. This involves indirect elastic-plastic bending of pre-weakened creases by continuously injecting material into an accordion-shaped pack. This step can fail through a localization instability that creates a kink in a pleat facet instead of in the desired crease location. In the present work, we consider the effects of geometric and material parameters on the rotary pleating process. We formulate the process as a multi-point variable-arc-length boundary value problem for planar inextensible rods, with hinge connections. Both the facets (rods) and creases (hinges) obey nonlinear moment-curvature or moment-angle constitutive laws. Some unexpected aspects of the sleeve boundary condition at the point of material injection, common to many continuous sheet processes, are noted. The process, modeled as quasistatic, features multiple equilibria which we explore by numerical continuation. The presence of, presumably stable, kinked equilibria is taken as a conservative sign of potential pleating failure. Failure may also occur due to localization at the injection point. We may thus obtain "pleatability surfaces" that separate the parameter space into regions where mechanical pleating will succeed or fail. Successful pleating depends primarily on the distance between the injection point and the pleated pack. Other factors, such as the crease stiffness and strength relative to that of the facets, also have an influence. Our approach can be adapted to study other pleating and forming processes, the deployment and collapse of folded structures, or multi-stability in compliant structures.
△ Less
Submitted 20 March, 2023; v1 submitted 4 June, 2022;
originally announced June 2022.
-
Multi-agent Databases via Independent Learning
Authors:
Chi Zhang,
Olga Papaemmanouil,
Josiah P. Hanna,
Aditya Akella
Abstract:
Machine learning is rapidly being used in database research to improve the effectiveness of numerous tasks included but not limited to query optimization, workload scheduling, physical design, etc. Currently, the research focus has been on replacing a single database component responsible for one task by its learning-based counterpart. However, query performance is not simply determined by the per…
▽ More
Machine learning is rapidly being used in database research to improve the effectiveness of numerous tasks included but not limited to query optimization, workload scheduling, physical design, etc. Currently, the research focus has been on replacing a single database component responsible for one task by its learning-based counterpart. However, query performance is not simply determined by the performance of a single component, but by the cooperation of multiple ones. As such, learning based database components need to collaborate during both training and execution in order to develop policies that meet end performance goals. Thus, the paper attempts to address the question "Is it possible to design a database consisting of various learned components that cooperatively work to improve end-to-end query latency?".
To answer this question, we introduce MADB (Multi-Agent DB), a proof-of-concept system that incorporates a learned query scheduler and a learned query optimizer. MADB leverages a cooperative multi-agent reinforcement learning approach that allows the two components to exchange the context of their decisions with each other and collaboratively work towards reducing the query latency. Preliminary results demonstrate that MADB can outperform the non-cooperative integration of learned components.
△ Less
Submitted 5 August, 2022; v1 submitted 27 May, 2022;
originally announced May 2022.
-
Token sliding on graphs of girth five
Authors:
Valentin Bartier,
Nicolas Bousquet,
Jihad Hanna,
Amer E. Mouawad,
Sebastian Siebertz
Abstract:
In the Token Sliding problem we are given a graph $G$ and two independent sets $I_s$ and $I_t$ in $G$ of size $k \geq 1$. The goal is to decide whether there exists a sequence $\langle I_1, I_2, \ldots, I_\ell \rangle$ of independent sets such that for all $i \in \{1,\ldots, \ell\}$ the set $I_i$ is an independent set of size $k$, $I_1 = I_s$, $I_\ell = I_t$ and…
▽ More
In the Token Sliding problem we are given a graph $G$ and two independent sets $I_s$ and $I_t$ in $G$ of size $k \geq 1$. The goal is to decide whether there exists a sequence $\langle I_1, I_2, \ldots, I_\ell \rangle$ of independent sets such that for all $i \in \{1,\ldots, \ell\}$ the set $I_i$ is an independent set of size $k$, $I_1 = I_s$, $I_\ell = I_t$ and $I_i \triangle I_{i + 1} = \{u, v\} \in E(G)$. Intuitively, we view each independent set as a collection of tokens placed on the vertices of the graph. Then, the problem asks whether there exists a sequence of independent sets that transforms $I_s$ into $I_t$ where at each step we are allowed to slide one token from a vertex to a neighboring vertex. In this paper, we focus on the parameterized complexity of Token Sliding parameterized by $k$. As shown by Bartier et al., the problem is W[1]-hard on graphs of girth four or less, and the authors posed the question of whether there exists a constant $p \geq 5$ such that the problem becomes fixed-parameter tractable on graphs of girth at least $p$. We answer their question positively and prove that the problem is indeed fixed-parameter tractable on graphs of girth five or more, which establishes a full classification of the tractability of Token Sliding parameterized by the number of tokens based on the girth of the input graph.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
ReVar: Strengthening Policy Evaluation via Reduced Variance Sampling
Authors:
Subhojyoti Mukherjee,
Josiah P. Hanna,
Robert Nowak
Abstract:
This paper studies the problem of data collection for policy evaluation in Markov decision processes (MDPs). In policy evaluation, we are given a target policy and asked to estimate the expected cumulative reward it will obtain in an environment formalized as an MDP. We develop theory for optimal data collection within the class of tree-structured MDPs by first deriving an oracle data collection s…
▽ More
This paper studies the problem of data collection for policy evaluation in Markov decision processes (MDPs). In policy evaluation, we are given a target policy and asked to estimate the expected cumulative reward it will obtain in an environment formalized as an MDP. We develop theory for optimal data collection within the class of tree-structured MDPs by first deriving an oracle data collection strategy that uses knowledge of the variance of the reward distributions. We then introduce the Reduced Variance Sampling (ReVar) algorithm that approximates the oracle strategy when the reward variances are unknown a priori and bound its sub-optimality compared to the oracle strategy. Finally, we empirically validate that ReVar leads to policy evaluation with mean squared error comparable to the oracle strategy and significantly lower than simply running the target policy.
△ Less
Submitted 17 June, 2022; v1 submitted 8 March, 2022;
originally announced March 2022.
-
Energies for elastic plates and shells from quadratic-stretch elasticity
Authors:
E. Vitral,
J. A. Hanna
Abstract:
We derive stretching and bending energies for isotropic elastic plates and shells. Through the dimensional reduction of a bulk elastic energy quadratic in Biot strains, we obtain two-dimensional bending energies quadratic in bending measures featuring a bilinear coupling of stretches and geometric curvatures. For plates, the bending measure is invariant under spatial dilations and naturally extend…
▽ More
We derive stretching and bending energies for isotropic elastic plates and shells. Through the dimensional reduction of a bulk elastic energy quadratic in Biot strains, we obtain two-dimensional bending energies quadratic in bending measures featuring a bilinear coupling of stretches and geometric curvatures. For plates, the bending measure is invariant under spatial dilations and naturally extends primitive bending strains for straight rods. For shells or naturally-curved rods, the measure is not dilation invariant, and contrasts with previous \emph{ad hoc} postulated forms. The corresponding field equations and boundary conditions feature moments linear in the bending measures, and a decoupling of stretching and bending such that application of a pure moment results in isometric deformation of a unique neutral surface, primitive behaviors in agreement with classical linear response but not displayed by commonly used analytical models. We briefly comment on relations between our energies, those derived from a neo-Hookean bulk energy, and a commonly used discrete model for flat membranes. Although the derivation requires consideration of stretch and rotation fields, the resulting energy and field equations can be expressed entirely in terms of metric and curvature components of deformed and reference surfaces.
△ Less
Submitted 18 June, 2022; v1 submitted 14 January, 2022;
originally announced January 2022.
-
Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
Authors:
Rujie Zhong,
Duohan Zhang,
Lukas Schäfer,
Stefano V. Albrecht,
Josiah P. Hanna
Abstract:
Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy. In this paper, we study a subtle distinction between on-policy data and on-policy sampling in the context of the RL sub-problem of policy evaluation. We observe that on-policy sampling may fail to mat…
▽ More
Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy. In this paper, we study a subtle distinction between on-policy data and on-policy sampling in the context of the RL sub-problem of policy evaluation. We observe that on-policy sampling may fail to match the expected distribution of on-policy data after observing only a finite number of trajectories and this failure hinders data-efficient policy evaluation. Towards improved data-efficiency, we show how non-i.i.d., off-policy sampling can produce data that more closely matches the expected on-policy data distribution and consequently increases the accuracy of the Monte Carlo estimator for policy evaluation. We introduce a method called Robust On-Policy Sampling and demonstrate theoretically and empirically that it produces data that converges faster to the expected on-policy distribution compared to on-policy sampling. Empirically, we show that this faster convergence leads to lower mean squared error policy value estimates.
△ Less
Submitted 10 October, 2022; v1 submitted 29 November, 2021;
originally announced November 2021.
-
Dilation-invariant bending of elastic plates, and broken symmetry in shells
Authors:
E. Vitral,
J. A. Hanna
Abstract:
We propose bending energies for isotropic elastic plates and shells. For a plate, we define and employ a surface tensor that symmetrically couples stretch and curvature such that any elastic energy density constructed from its invariants is invariant under spatial dilations. This kinematic measure and its corresponding isotropic quadratic energy resolve outstanding issues in thin structure elastic…
▽ More
We propose bending energies for isotropic elastic plates and shells. For a plate, we define and employ a surface tensor that symmetrically couples stretch and curvature such that any elastic energy density constructed from its invariants is invariant under spatial dilations. This kinematic measure and its corresponding isotropic quadratic energy resolve outstanding issues in thin structure elasticity, including the natural extension of primitive bending strains for straight rods to plates, the assurance of a moment linear in the bending measure, and the avoidance of induced mid-plane strains in response to pure moments as found in some commonly used analytical plate models. Our analysis also reveals that some other commonly used numerical models have the right invariance properties, although they lack full generality at quadratic order in stretch. We further extend our result to naturally-curved rods and shells, for which the pure stretching of a curved rest configuration breaks dilation invariance; the new shell bending measure we provide contrasts with previous \emph{ad hoc} postulated forms. The concept that unifies these theories is not dilation invariance, but rather through-thickness uniformity of strain as a definition of pure stretching deformations. Our results provide a clean basis for simple models of low-dimensional elastic systems, and should enable more accurate analytical probing of the structure of singularities in sheets and membranes.
△ Less
Submitted 18 June, 2022; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Residual-based adaptivity for two-phase flow simulation in porous media using Physics-informed Neural Networks
Authors:
John M. Hanna,
Jose V. Aguado,
Sebastien Comas-Cardona,
Ramzi Askri,
Domenico Borzacchiello
Abstract:
This paper aims to provide a machine learning framework to simulate two-phase flow in porous media. The proposed algorithm is based on Physics-informed neural networks (PINN). A novel residual-based adaptive PINN is developed and compared with the residual-based adaptive refinement (RAR) method and with PINN with fixed collocation points. The proposed algorithm is expected to have great potential…
▽ More
This paper aims to provide a machine learning framework to simulate two-phase flow in porous media. The proposed algorithm is based on Physics-informed neural networks (PINN). A novel residual-based adaptive PINN is developed and compared with the residual-based adaptive refinement (RAR) method and with PINN with fixed collocation points. The proposed algorithm is expected to have great potential to be applied to different fields where adaptivity is needed. In this paper, we focus on the two-phase flow in porous media problem. We provide two numerical examples to show the effectiveness of the new algorithm. It is found that adaptivity is essential to capture moving flow fronts. We show how the results obtained through this approach are more accurate than using RAR method or PINN with fixed collocation points, while having a comparable computational cost.
△ Less
Submitted 10 February, 2022; v1 submitted 29 September, 2021;
originally announced September 2021.
-
Interpretable Goal Recognition in the Presence of Occluded Factors for Autonomous Vehicles
Authors:
Josiah P. Hanna,
Arrasy Rahman,
Elliot Fosong,
Francisco Eiras,
Mihai Dobre,
John Redford,
Subramanian Ramamoorthy,
Stefano V. Albrecht
Abstract:
Recognising the goals or intentions of observed vehicles is a key step towards predicting the long-term future behaviour of other agents in an autonomous driving scenario. When there are unseen obstacles or occluded vehicles in a scenario, goal recognition may be confounded by the effects of these unseen entities on the behaviour of observed vehicles. Existing prediction algorithms that assume rat…
▽ More
Recognising the goals or intentions of observed vehicles is a key step towards predicting the long-term future behaviour of other agents in an autonomous driving scenario. When there are unseen obstacles or occluded vehicles in a scenario, goal recognition may be confounded by the effects of these unseen entities on the behaviour of observed vehicles. Existing prediction algorithms that assume rational behaviour with respect to inferred goals may fail to make accurate long-horizon predictions because they ignore the possibility that the behaviour is influenced by such unseen entities. We introduce the Goal and Occluded Factor Inference (GOFI) algorithm which bases inference on inverse-planning to jointly infer a probabilistic belief over goals and potential occluded factors. We then show how these beliefs can be integrated into Monte Carlo Tree Search (MCTS). We demonstrate that jointly inferring goals and occluded factors leads to more accurate beliefs with respect to the true world state and allows an agent to safely navigate several scenarios where other baselines take unsafe actions leading to collisions.
△ Less
Submitted 5 August, 2021;
originally announced August 2021.
-
Selective energy and enstrophy modification of two-dimensional decaying turbulence
Authors:
Aditya G. Nair,
James Hanna,
Matteo Aureli
Abstract:
In two-dimensional decaying homogeneous isotropic turbulence, kinetic energy and enstrophy are respectively transferred to larger and smaller scales. In such spatiotemporally complex dynamics, it is challenging to identify the important flow structures that govern this behavior. We propose and numerically employ two flow modification strategies that leverage the inviscid global conservation of ene…
▽ More
In two-dimensional decaying homogeneous isotropic turbulence, kinetic energy and enstrophy are respectively transferred to larger and smaller scales. In such spatiotemporally complex dynamics, it is challenging to identify the important flow structures that govern this behavior. We propose and numerically employ two flow modification strategies that leverage the inviscid global conservation of energy and enstrophy to design external forcing inputs which change these quantities selectively and simultaneously, and drive the system towards steady-state or other late-stage behavior. One strategy employs only local flow-field information, while the other is global. We observe various flow structures excited by these inputs and compare with recent literature. Energy modification is characterized by excitation of smaller wavenumber structures in the flow than enstrophy modification.
△ Less
Submitted 6 December, 2023; v1 submitted 2 August, 2021;
originally announced August 2021.
-
Power Plant Classification from Remote Imaging with Deep Learning
Authors:
Michael Mommert,
Linus Scheibenreif,
Joëlle Hanna,
Damian Borth
Abstract:
Satellite remote imaging enables the detailed study of land use patterns on a global scale. We investigate the possibility to improve the information content of traditional land use classification by identifying the nature of industrial sites from medium-resolution remote sensing images. In this work, we focus on classifying different types of power plants from Sentinel-2 imaging data. Using a Res…
▽ More
Satellite remote imaging enables the detailed study of land use patterns on a global scale. We investigate the possibility to improve the information content of traditional land use classification by identifying the nature of industrial sites from medium-resolution remote sensing images. In this work, we focus on classifying different types of power plants from Sentinel-2 imaging data. Using a ResNet-50 deep learning model, we are able to achieve a mean accuracy of 90.0% in distinguishing 10 different power plant types and a background class. Furthermore, we are able to identify the cooling mechanisms utilized in thermal power plants with a mean accuracy of 87.5%. Our results enable us to qualitatively investigate the energy mix from Sentinel-2 imaging data, and prove the feasibility to classify industrial sites on a global scale from freely available satellite imagery.
△ Less
Submitted 22 July, 2021;
originally announced July 2021.
-
Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration
Authors:
Lukas Schäfer,
Filippos Christianos,
Josiah P. Hanna,
Stefano V. Albrecht
Abstract:
Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward sha** and strong dependency on hyperparameters. In this work, we introduce Decoupled RL (DeRL) as a general framework which trains separate policies for intrinsically-motivated exploration and exploitation. Such decoupling allows DeRL to lev…
▽ More
Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward sha** and strong dependency on hyperparameters. In this work, we introduce Decoupled RL (DeRL) as a general framework which trains separate policies for intrinsically-motivated exploration and exploitation. Such decoupling allows DeRL to leverage the benefits of intrinsic rewards for exploration while demonstrating improved robustness and sample efficiency. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. Our results show that DeRL is more robust to varying scale and rate of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically-motivated baselines in fewer interactions. Lastly, we discuss the challenge of distribution shift and show that divergence constraint regularisers can successfully minimise instability caused by divergence of exploration and exploitation policies.
△ Less
Submitted 9 February, 2022; v1 submitted 19 July, 2021;
originally announced July 2021.
-
Quadratic-stretch elasticity
Authors:
E. Vitral,
J. A. Hanna
Abstract:
A nonlinear small-strain elastic theory is constructed from a systematic expansion in Biot strains, truncated at quadratic order. The primary motivation is the desire for a clean separation between stretching and bending energies for shells, which appears to arise only from reduction of a bulk energy of this type. An approximation of isotropic invariants, bypassing the solution of a quartic equati…
▽ More
A nonlinear small-strain elastic theory is constructed from a systematic expansion in Biot strains, truncated at quadratic order. The primary motivation is the desire for a clean separation between stretching and bending energies for shells, which appears to arise only from reduction of a bulk energy of this type. An approximation of isotropic invariants, bypassing the solution of a quartic equation or computation of tensor square roots, allows stretches, rotations, stresses, and balance laws to be written in terms of derivatives of position. Two-field formulations are also presented. Extensions to anisotropic theories are briefly discussed.
△ Less
Submitted 8 July, 2021; v1 submitted 23 April, 2021;
originally announced April 2021.
-
Insight into the partitioning and clustering mechanism of rare-earth cations in alkali aluminoborosilicate glasses
Authors:
Hrishikesh Kamat,
Fu Wang,
Kristian Barnsley,
John V. Hanna,
Alexei M. Tyryshkin,
Ashutosh Goel
Abstract:
Rare-earth (RE) containing alkali aluminoborosilicate glasses find increasingly broad technological applications, with their further development only impeded by yet-poor understanding of coordination environment and structural role of RE ions in glasses. In this work we combine free induction decay (FID)-detected electron paramagnetic resonance (EPR), electron spin echo envelope modulation (ESEEM)…
▽ More
Rare-earth (RE) containing alkali aluminoborosilicate glasses find increasingly broad technological applications, with their further development only impeded by yet-poor understanding of coordination environment and structural role of RE ions in glasses. In this work we combine free induction decay (FID)-detected electron paramagnetic resonance (EPR), electron spin echo envelope modulation (ESEEM), and MAS NMR spectroscopies, to examine the coordination environment and the clustering tendencies of RE3+ in a series of peralkaline aluminoborosilicate glasses co-doped with Nd2O3 (0.001-0.1 mol%) and 5 mol% La2O3. Quantitative EPR spectral analysis reveals three different Nd3+ forms coexisting in the glasses: isolated Nd3+ centers, dipole-coupled Nd clusters (Nd-O-X-O-Nd, where X = Si/B/Al), and spin-exchange-coupled Nd clusters, (Nd-O-Nd) and (Nd-O-La-O-Nd). Extensive RE clustering is observed at high RE2O3 concentrations, with more than 90% REs converting to dipole- and exchange-coupled Nd clusters already at [RE2O3] = 0.01 mol%. ESEEM analysis of the EPR-detectable Nd centers indicates a Na/Si-rich environment (four Na+ per Nd3+) for the isolated Nd3+ centers and the Na/Si/B-rich environment (2-3 Na+ and 1-2 boron per each Nd3+) for the dipole-coupled Nd clusters, while the EPR-undetectable exchanged-coupled RE clusters are predicted to exist in a Na/B-rich environment. The RE clustering induces nano-scale glass phase separation, while the Na/B-rich environment of the RE clusters implies a depletion of the same elements from the remaining host glass. Based on our results, we develop a mechanistic model that explains the high tendency of RE3+ to form clusters in alkali aluminoborosilicate glasses.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Exterior dissipation, proportional decay, and integrals of motion
Authors:
M. Aureli,
J. A. Hanna
Abstract:
Given a dynamical system with $m$ independent conserved quantities, we construct a multi-parameter family of new systems in which these quantities evolve monotonically and proportionally, and are replaced by $m-1$ conserved linear combinations of themselves, with any of the original quantities as limiting cases. The modification of the dynamics employs an exterior product of gradients of the origi…
▽ More
Given a dynamical system with $m$ independent conserved quantities, we construct a multi-parameter family of new systems in which these quantities evolve monotonically and proportionally, and are replaced by $m-1$ conserved linear combinations of themselves, with any of the original quantities as limiting cases. The modification of the dynamics employs an exterior product of gradients of the original quantities, and often evolves the system towards asymptotic linear dependence of these gradients in a nontrivial state. The process both generalizes and provides additional structure to existing techniques for selective dissipation in the literature on fluids and plasmas, nonequilibrium thermodynamics, and nonlinear controls. It may be iterated or adapted to obtain any reduction in the degree of integrability. It may enable discovery of extremal states, limit cycles, or solitons, and the construction of new integrable systems from superintegrable systems. We briefly illustrate the approach by its application to the cyclic three-body Toda lattice, driven from an aperiodic orbit towards a limit cycle.
△ Less
Submitted 3 August, 2021; v1 submitted 28 March, 2021;
originally announced March 2021.
-
Cutting holes in bistable folds
Authors:
T. Yu,
I. Andrade-Silva,
M. A. Dias,
J. A. Hanna
Abstract:
A folded disk is bistable, as it can be popped through to an inverted state with elastic energy localized in a small, highly-deformed region on the fold. Cutting out this singularity relaxes the surrounding material and leads to a loss of bistability when the hole dimensions reach a critical size. These dimensions are strongly anisotropic and feature a surprising re-entrant behavior, such that rem…
▽ More
A folded disk is bistable, as it can be popped through to an inverted state with elastic energy localized in a small, highly-deformed region on the fold. Cutting out this singularity relaxes the surrounding material and leads to a loss of bistability when the hole dimensions reach a critical size. These dimensions are strongly anisotropic and feature a surprising re-entrant behavior, such that removal of additional material can re-stabilize the inverted state. A model of the surface as a wide annular developable strip is found to capture the qualitative observations in experiments and simulations. These phenomena are consequential to the mechanics and design of crumpled elastic sheets, developable surfaces, origami and kirigami, and other deployable and compliant structures.
△ Less
Submitted 24 May, 2021; v1 submitted 27 August, 2020;
originally announced August 2020.
-
Reducing Sampling Error in Batch Temporal Difference Learning
Authors:
Brahma Pavse,
Ishan Durugkar,
Josiah Hanna,
Peter Stone
Abstract:
Temporal difference (TD) learning is one of the main foundations of modern reinforcement learning. This paper studies the use of TD(0), a canonical TD algorithm, to estimate the value function of a given policy from a batch of data. In this batch setting, we show that TD(0) may converge to an inaccurate value function because the update following an action is weighted according to the number of ti…
▽ More
Temporal difference (TD) learning is one of the main foundations of modern reinforcement learning. This paper studies the use of TD(0), a canonical TD algorithm, to estimate the value function of a given policy from a batch of data. In this batch setting, we show that TD(0) may converge to an inaccurate value function because the update following an action is weighted according to the number of times that action occurred in the batch -- not the true probability of the action under the given policy. To address this limitation, we introduce \textit{policy sampling error corrected}-TD(0) (PSEC-TD(0)). PSEC-TD(0) first estimates the empirical distribution of actions in each state in the batch and then uses importance sampling to correct for the mismatch between the empirical weighting and the correct weighting for updates following each action. We refine the concept of a certainty-equivalence estimate and argue that PSEC-TD(0) is a more data efficient estimator than TD(0) for a fixed batch of data. Finally, we conduct an empirical evaluation of PSEC-TD(0) on three batch value function learning tasks, with a hyperparameter sensitivity analysis, and show that PSEC-TD(0) produces value function estimates with lower mean squared error than TD(0).
△ Less
Submitted 15 August, 2020;
originally announced August 2020.
-
Anisotropic swelling of anisotropic elastic panels
Authors:
H. G. Wood,
J. A. Hanna
Abstract:
While isotropic in-plane swelling problems for thin elastic sheets have been studied extensively in recent years, many shape-programmable materials, including nematic solids and 3D-printed structures, are anisotropic, as are most industrial sheet materials. In this theoretical work, we consider central swelling and shrinkage of plates of aspect ratio and material properties relevant to the manufac…
▽ More
While isotropic in-plane swelling problems for thin elastic sheets have been studied extensively in recent years, many shape-programmable materials, including nematic solids and 3D-printed structures, are anisotropic, as are most industrial sheet materials. In this theoretical work, we consider central swelling and shrinkage of plates of aspect ratio and material properties relevant to the manufacture of engineered wood composite panels in which both in-plane swelling and material stiffness are highly orthotropic, leading to multiple separations in energy scales. With transverse swelling in the soft direction, and gradients in the stiff direction, the warped plates adopt two distinct types of configurations, axisymmetric and twisted, which we illustrate with toy models. We employ a two-parameter family of isometries to embed the metric programmed by the swelling, thus reducing the problem to one of minimizing bending energy alone. A simple argument is seen to closely predict averaged axisymmetric curvatures. While purely cylindrical shapes are unobtainable by pure in-plane swelling, they can be closely approximated in a highly anisotropic system. However, anisotropy can favor twisting, and breaks a degenerate soft deformation mode associated with minimal surfaces in isotropic materials. Bifurcations from axisymmetric to twisted shapes can be induced by anisotropy or by certain attributes of a central shrinkage profile. Finally, we note how our findings indicate practical limitations on the diagnosis of moisture inhomogeneities in manufactured panels by observation of warped conformations, due to the sensitivity of the qualitative response to specifics of the profile.
△ Less
Submitted 13 August, 2020;
originally announced August 2020.
-
An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch
Authors:
Siddharth Desai,
Ishan Durugkar,
Haresh Karnan,
Garrett Warnell,
Josiah Hanna,
Peter Stone
Abstract:
We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning. This problem is particularly important in sim-to-real transfer because simulators inevitably model real-world dynamics imperfectly. In this pape…
▽ More
We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning. This problem is particularly important in sim-to-real transfer because simulators inevitably model real-world dynamics imperfectly. In this paper, we show that one existing solution to this transfer problem - grounded action transformation - is closely related to the problem of imitation from observation (IfO): learning behaviors that mimic the observations of behavior demonstrations. After establishing this relationship, we hypothesize that recent state-of-the-art approaches from the IfO literature can be effectively repurposed for grounded transfer learning.To validate our hypothesis we derive a new algorithm - generative adversarial reinforced action transformation (GARAT) - based on adversarial imitation from observation techniques. We run experiments in several domains with mismatched dynamics, and find that agents trained with GARAT achieve higher returns in the target environment compared to existing black-box transfer methods
△ Less
Submitted 16 November, 2020; v1 submitted 4 August, 2020;
originally announced August 2020.
-
Stochastic Grounded Action Transformation for Robot Learning in Simulation
Authors:
Siddharth Desai,
Haresh Karnan,
Josiah P. Hanna,
Garrett Warnell,
Peter Stone
Abstract:
Robot control policies learned in simulation do not often transfer well to the real world. Many existing solutions to this sim-to-real problem, such as the Grounded Action Transformation (GAT) algorithm, seek to correct for or ground these differences by matching the simulator to the real world. However, the efficacy of these approaches is limited if they do not explicitly account for stochasticit…
▽ More
Robot control policies learned in simulation do not often transfer well to the real world. Many existing solutions to this sim-to-real problem, such as the Grounded Action Transformation (GAT) algorithm, seek to correct for or ground these differences by matching the simulator to the real world. However, the efficacy of these approaches is limited if they do not explicitly account for stochasticity in the target environment. In this work, we analyze the problems associated with grounding a deterministic simulator in a stochastic real world environment, and we present examples where GAT fails to transfer a good policy due to stochastic transitions in the target domain. In response, we introduce the Stochastic Grounded Action Transformation(SGAT) algorithm,which models this stochasticity when grounding the simulator. We find experimentally for both simulated and physical target domains that SGAT can find policies that are robust to stochasticity in the target domain
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
Reinforced Grounded Action Transformation for Sim-to-Real Transfer
Authors:
Haresh Karnan,
Siddharth Desai,
Josiah P. Hanna,
Garrett Warnell,
Peter Stone
Abstract:
Robots can learn to do complex tasks in simulation, but often, learned behaviors fail to transfer well to the real world due to simulator imperfections (the reality gap). Some existing solutions to this sim-to-real problem, such as Grounded Action Transformation (GAT), use a small amount of real-world experience to minimize the reality gap by grounding the simulator. While very effective in certai…
▽ More
Robots can learn to do complex tasks in simulation, but often, learned behaviors fail to transfer well to the real world due to simulator imperfections (the reality gap). Some existing solutions to this sim-to-real problem, such as Grounded Action Transformation (GAT), use a small amount of real-world experience to minimize the reality gap by grounding the simulator. While very effective in certain scenarios, GAT is not robust on problems that use complex function approximation techniques to model a policy. In this paper, we introduce Reinforced Grounded Action Transformation(RGAT), a new sim-to-real technique that uses Reinforcement Learning (RL) not only to update the target policy in simulation, but also to perform the grounding step itself. This novel formulation allows for end-to-end training during the grounding step, which, compared to GAT, produces a better grounded simulator. Moreover, we show experimentally in several MuJoCo domains that our approach leads to successful transfer for policies modeled using neural networks.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
Towards Quantum-Secure Authentication and Key Agreement via Abstract Multi-Agent Interaction
Authors:
Ibrahim H. Ahmed,
Josiah P. Hanna,
Elliot Fosong,
Stefano V. Albrecht
Abstract:
Current methods for authentication and key agreement based on public-key cryptography are vulnerable to quantum computing. We propose a novel approach based on artificial intelligence research in which communicating parties are viewed as autonomous agents which interact repeatedly using their private decision models. Authentication and key agreement are decided based on the agents' observed behavi…
▽ More
Current methods for authentication and key agreement based on public-key cryptography are vulnerable to quantum computing. We propose a novel approach based on artificial intelligence research in which communicating parties are viewed as autonomous agents which interact repeatedly using their private decision models. Authentication and key agreement are decided based on the agents' observed behaviors during the interaction. The security of this approach rests upon the difficulty of modeling the decisions of interacting agents from limited observations, a problem which we conjecture is also hard for quantum computing. We release PyAMI, a prototype authentication and key agreement system based on the proposed method. We empirically validate our method for authenticating legitimate users while detecting different types of adversarial attacks. Finally, we show how reinforcement learning techniques can be used to train server models which effectively probe a client's decisions to achieve more sample-efficient authentication.
△ Less
Submitted 9 July, 2021; v1 submitted 18 July, 2020;
originally announced July 2020.
-
Pseudomomentum: origins and consequences
Authors:
H. Singh,
J. A. Hanna
Abstract:
The balance of pseudomomentum is discussed and applied to simple elasticity, ideal fluids, and the mechanics of inextensible rods and sheets. A general framework is presented in which the simultaneous variation of an action with respect to position, time, and material labels yields bulk balance laws and jump conditions for momentum, energy, and pseudomomentum. The example of simple elasticity of s…
▽ More
The balance of pseudomomentum is discussed and applied to simple elasticity, ideal fluids, and the mechanics of inextensible rods and sheets. A general framework is presented in which the simultaneous variation of an action with respect to position, time, and material labels yields bulk balance laws and jump conditions for momentum, energy, and pseudomomentum. The example of simple elasticity of space-filling solids is treated at length. The pseudomomentum balance in ideal fluids is shown to imply conservation of vorticity, circulation, and helicity, and a mathematical similarity is noted between the evaluation of circulation along a material loop and the J-integral of fracture mechanics. Integration of the pseudomomentum balance, making use of a prescription for singular sources derived by analogy with the continuous form of the balance, directly provides the propulsive force driving passive reconfiguration or locomotion of confined, inhomogeneous elastic rods. The conserved angular momentum and pseudomomentum are identified in the classification of conical sheets with rotational inertia or bending energy.
△ Less
Submitted 11 August, 2022; v1 submitted 12 July, 2020;
originally announced July 2020.
-
An integrable family of torqued, damped, rigid rotors
Authors:
J. A. Hanna
Abstract:
Expositions of the Euler equations for the rotation of a rigid body often invoke the idea of a specially damped system whose energy dissipates while its angular momentum magnitude is conserved in the body frame. An attempt to explicitly construct such a dam** function leads to a more general, but still integrable, system of cubic equations whose trajectories are confined to nested sets of quadri…
▽ More
Expositions of the Euler equations for the rotation of a rigid body often invoke the idea of a specially damped system whose energy dissipates while its angular momentum magnitude is conserved in the body frame. An attempt to explicitly construct such a dam** function leads to a more general, but still integrable, system of cubic equations whose trajectories are confined to nested sets of quadric surfaces in angular momentum space. For some choices of parameters, the lines of fixed points along both the largest and smallest moment of inertia axes can be simultaneously attracting. Limiting cases are those that conserve either the energy or the magnitude of the angular momentum. Parallels with rod mechanics, micromagnetics, and particles with effective mass are briefly discussed.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.