Search | arXiv e-print repository

EMOTE: An Explainable architecture for Modelling the Other Through Empathy

Authors: Manisha Senadeera, Thommen Karimpanal George, Sunil Gupta, Stephan Jacobs, Santu Rana

Abstract: We can usually assume others have goals analogous to our own. This assumption can also, at times, be applied to multi-agent games - e.g. Agent 1's attraction to green pellets is analogous to Agent 2's attraction to red pellets. This "analogy" assumption is tied closely to the cognitive process known as empathy. Inspired by empathy, we design a simple and explainable architecture to model another a… ▽ More We can usually assume others have goals analogous to our own. This assumption can also, at times, be applied to multi-agent games - e.g. Agent 1's attraction to green pellets is analogous to Agent 2's attraction to red pellets. This "analogy" assumption is tied closely to the cognitive process known as empathy. Inspired by empathy, we design a simple and explainable architecture to model another agent's action-value function. This involves learning an "Imagination Network" to transform the other agent's observed state in order to produce a human-interpretable "empathetic state" which, when presented to the learning agent, produces behaviours that mimic the other agent. Our approach is applicable to multi-agent scenarios consisting of a single learning agent and other (independent) agents acting according to fixed policies. This architecture is particularly beneficial for (but not limited to) algorithms using a composite value or reward function. We show our method produces better performance in multi-agent games, where it robustly estimates the other's model in different environment configurations. Additionally, we show that the empathetic states are human interpretable, and thus verifiable. △ Less

Submitted 31 May, 2023; originally announced June 2023.

arXiv:2204.09315 [pdf, ps, other]

Learning to Constrain Policy Optimization with Virtual Trust Region

Authors: Hung Le, Thommen Karimpanal George, Majid Abdolshah, Dung Nguyen, Kien Do, Sunil Gupta, Svetha Venkatesh

Abstract: We introduce a constrained optimization method for policy gradient reinforcement learning, which uses a virtual trust region to regulate each policy update. In addition to using the proximity of one single old policy as the normal trust region, we propose forming a second trust region through another virtual policy representing a wide range of past policies. We then enforce the new policy to stay… ▽ More We introduce a constrained optimization method for policy gradient reinforcement learning, which uses a virtual trust region to regulate each policy update. In addition to using the proximity of one single old policy as the normal trust region, we propose forming a second trust region through another virtual policy representing a wide range of past policies. We then enforce the new policy to stay closer to the virtual policy, which is beneficial if the old policy performs poorly. More importantly, we propose a mechanism to automatically build the virtual policy from a memory of past policies, providing a new capability for dynamically learning appropriate virtual trust regions during the optimization process. Our proposed method, dubbed Memory-Constrained Policy Optimization (MCPO), is examined in diverse environments, including robotic locomotion control, navigation with sparse rewards and Atari games, consistently demonstrating competitive performance against recent on-policy constrained policy gradient methods. △ Less

Submitted 15 September, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

Comments: Preprint, 22 pages

arXiv:2112.01853 [pdf, other]

Episodic Policy Gradient Training

Authors: Hung Le, Majid Abdolshah, Thommen K. George, Kien Do, Dung Nguyen, Svetha Venkatesh

Abstract: We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly. Unlike other hyperparameter searches, we formulate hyperparameter scheduling as a standard Markov Decision Process and use episodic memory to store the outcome of used hyperparameters and their training contexts. At any… ▽ More We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly. Unlike other hyperparameter searches, we formulate hyperparameter scheduling as a standard Markov Decision Process and use episodic memory to store the outcome of used hyperparameters and their training contexts. At any policy update step, the policy learner refers to the stored experiences, and adaptively reconfigures its learning algorithm with the new hyperparameters determined by the memory. This mechanism, dubbed as Episodic Policy Gradient Training (EPGT), enables an episodic learning process, and jointly learns the policy and the learning algorithm's hyperparameters within a single run. Experimental results on both continuous and discrete environments demonstrate the advantage of using the proposed method in boosting the performance of various policy gradient algorithms. △ Less

Submitted 3 December, 2021; originally announced December 2021.

Comments: 19 pages

arXiv:2111.02104 [pdf, ps, other]

Model-Based Episodic Memory Induces Dynamic Hybrid Controls

Authors: Hung Le, Thommen Karimpanal George, Majid Abdolshah, Truyen Tran, Svetha Venkatesh

Abstract: Episodic control enables sample efficiency in reinforcement learning by recalling past experiences from an episodic memory. We propose a new model-based episodic memory of trajectories addressing current limitations of episodic control. Our memory estimates trajectory values, guiding the agent towards good policies. Built upon the memory, we construct a complementary learning model via a dynamic h… ▽ More Episodic control enables sample efficiency in reinforcement learning by recalling past experiences from an episodic memory. We propose a new model-based episodic memory of trajectories addressing current limitations of episodic control. Our memory estimates trajectory values, guiding the agent towards good policies. Built upon the memory, we construct a complementary learning model via a dynamic hybrid control unifying model-based, episodic and habitual learning into a single architecture. Experiments demonstrate that our model allows significantly faster and better learning than other strong reinforcement learning agents across a variety of environments including stochastic and non-Markovian settings. △ Less

Submitted 6 November, 2021; v1 submitted 3 November, 2021; originally announced November 2021.

Comments: 26 pages

arXiv:2108.08960 [pdf, other]

Plug and Play, Model-Based Reinforcement Learning

Authors: Majid Abdolshah, Hung Le, Thommen Karimpanal George, Sunil Gupta, Santu Rana, Svetha Venkatesh

Abstract: Sample-efficient generalisation of reinforcement learning approaches have always been a challenge, especially, for complex scenes with many components. In this work, we introduce Plug and Play Markov Decision Processes, an object-based representation that allows zero-shot integration of new objects from known object classes. This is achieved by representing the global transition dynamics as a unio… ▽ More Sample-efficient generalisation of reinforcement learning approaches have always been a challenge, especially, for complex scenes with many components. In this work, we introduce Plug and Play Markov Decision Processes, an object-based representation that allows zero-shot integration of new objects from known object classes. This is achieved by representing the global transition dynamics as a union of local transition functions, each with respect to one active object in the scene. Transition dynamics from an object class can be pre-learnt and thus would be ready to use in a new environment. Each active object is also endowed with its reward function. Since there is no central reward function, addition or removal of objects can be handled efficiently by only updating the reward functions of objects involved. A new transfer learning mechanism is also proposed to adapt reward function in such cases. Experiments show that our representation can achieve sample-efficiency in a variety of set-ups. △ Less

Submitted 19 August, 2021; originally announced August 2021.

arXiv:2107.08426 [pdf, other]

A New Representation of Successor Features for Transfer across Dissimilar Environments

Authors: Majid Abdolshah, Hung Le, Thommen Karimpanal George, Sunil Gupta, Santu Rana, Svetha Venkatesh

Abstract: Transfer in reinforcement learning is usually achieved through generalisation across tasks. Whilst many studies have investigated transferring knowledge when the reward function changes, they have assumed that the dynamics of the environments remain consistent. Many real-world RL problems require transfer among environments with different dynamics. To address this problem, we propose an approach b… ▽ More Transfer in reinforcement learning is usually achieved through generalisation across tasks. Whilst many studies have investigated transferring knowledge when the reward function changes, they have assumed that the dynamics of the environments remain consistent. Many real-world RL problems require transfer among environments with different dynamics. To address this problem, we propose an approach based on successor features in which we model successor feature functions with Gaussian Processes permitting the source successor features to be treated as noisy measurements of the target successor feature function. Our theoretical analysis proves the convergence of this approach as well as the bounded error on modelling successor feature functions with Gaussian Processes in environments with both different dynamics and rewards. We demonstrate our method on benchmark datasets and show that it outperforms current baselines. △ Less

Submitted 18 July, 2021; originally announced July 2021.

arXiv:1705.08012 [pdf]

Sensing discomfort of standing passengers in public rail transportation systems using a smart phone

Authors: Thommen Karimpanal George, Harit Maganlal Gadhia, Ruben S/O Sukumar, John-John Cabibihan

Abstract: This paper aims to investigate the effect of acceleration on the discomfort of standing passengers. The acceleration levels from different public rail transport lines such as the mass rapid transits (MRTs) and light rail transits (LRTs) of Singapore, as well as the associated qualitative data indicating the discomfort of standing passengers were collected and analyzed. Based on a logistic regressi… ▽ More This paper aims to investigate the effect of acceleration on the discomfort of standing passengers. The acceleration levels from different public rail transport lines such as the mass rapid transits (MRTs) and light rail transits (LRTs) of Singapore, as well as the associated qualitative data indicating the discomfort of standing passengers were collected and analyzed. Based on a logistic regression model to analyze the data, a discomfort index was introduced, which can be used to compare various rail lines based on ride comfort. A method for predicting the discomfort of passengers based on the acceleration values was proposed for any given train line. △ Less

Submitted 22 May, 2017; originally announced May 2017.

Comments: Document prepared for IEEE International Conference on Control and Automation (ICCA), 2013, 5 pages, 8 figures

Journal ref: 10th IEEE International Conference on Control & Automation (IEEE ICCA 2013), HangZhou China, June 12-14, 2013, pp. 1509-1513

Showing 1–7 of 7 results for author: George, T K