Skip to main content

Showing 1–10 of 10 results for author: Abdolshah, M

.
  1. arXiv:2303.01684  [pdf, other

    cs.LG cs.AI

    BO-Muse: A human expert and AI teaming framework for accelerated experimental design

    Authors: Sunil Gupta, Alistair Shilton, Arun Kumar A V, Shannon Ryan, Majid Abdolshah, Hung Le, Santu Rana, Julian Berk, Mahad Rashid, Svetha Venkatesh

    Abstract: In this paper we introduce BO-Muse, a new approach to human-AI teaming for the optimization of expensive black-box functions. Inspired by the intrinsic difficulty of extracting expert knowledge and distilling it back into AI models and by observations of human behavior in real-world experimental design, our algorithm lets the human expert take the lead in the experimental process. The human expert… ▽ More

    Submitted 30 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: 34 Pages, 7 Figures and 5 Tables

  2. arXiv:2204.09315  [pdf, ps, other

    cs.LG

    Learning to Constrain Policy Optimization with Virtual Trust Region

    Authors: Hung Le, Thommen Karimpanal George, Majid Abdolshah, Dung Nguyen, Kien Do, Sunil Gupta, Svetha Venkatesh

    Abstract: We introduce a constrained optimization method for policy gradient reinforcement learning, which uses a virtual trust region to regulate each policy update. In addition to using the proximity of one single old policy as the normal trust region, we propose forming a second trust region through another virtual policy representing a wide range of past policies. We then enforce the new policy to stay… ▽ More

    Submitted 15 September, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

    Comments: Preprint, 22 pages

  3. arXiv:2112.01853  [pdf, other

    cs.LG cs.MA

    Episodic Policy Gradient Training

    Authors: Hung Le, Majid Abdolshah, Thommen K. George, Kien Do, Dung Nguyen, Svetha Venkatesh

    Abstract: We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly. Unlike other hyperparameter searches, we formulate hyperparameter scheduling as a standard Markov Decision Process and use episodic memory to store the outcome of used hyperparameters and their training contexts. At any… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

    Comments: 19 pages

  4. arXiv:2111.02787  [pdf, ps, other

    cs.LG cs.AI

    Balanced Q-learning: Combining the Influence of Optimistic and Pessimistic Targets

    Authors: Thommen George Karimpanal, Hung Le, Majid Abdolshah, Santu Rana, Sunil Gupta, Truyen Tran, Svetha Venkatesh

    Abstract: The optimistic nature of the Q-learning target leads to an overestimation bias, which is an inherent problem associated with standard $Q-$learning. Such a bias fails to account for the possibility of low returns, particularly in risky scenarios. However, the existence of biases, whether overestimation or underestimation, need not necessarily be undesirable. In this paper, we analytically examine t… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

    Comments: 26 pages, 11 figures

  5. arXiv:2111.02104  [pdf, ps, other

    cs.LG cs.AI

    Model-Based Episodic Memory Induces Dynamic Hybrid Controls

    Authors: Hung Le, Thommen Karimpanal George, Majid Abdolshah, Truyen Tran, Svetha Venkatesh

    Abstract: Episodic control enables sample efficiency in reinforcement learning by recalling past experiences from an episodic memory. We propose a new model-based episodic memory of trajectories addressing current limitations of episodic control. Our memory estimates trajectory values, guiding the agent towards good policies. Built upon the memory, we construct a complementary learning model via a dynamic h… ▽ More

    Submitted 6 November, 2021; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: 26 pages

  6. arXiv:2108.08960  [pdf, other

    cs.LG

    Plug and Play, Model-Based Reinforcement Learning

    Authors: Majid Abdolshah, Hung Le, Thommen Karimpanal George, Sunil Gupta, Santu Rana, Svetha Venkatesh

    Abstract: Sample-efficient generalisation of reinforcement learning approaches have always been a challenge, especially, for complex scenes with many components. In this work, we introduce Plug and Play Markov Decision Processes, an object-based representation that allows zero-shot integration of new objects from known object classes. This is achieved by representing the global transition dynamics as a unio… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

  7. arXiv:2107.08426  [pdf, other

    cs.LG

    A New Representation of Successor Features for Transfer across Dissimilar Environments

    Authors: Majid Abdolshah, Hung Le, Thommen Karimpanal George, Sunil Gupta, Santu Rana, Svetha Venkatesh

    Abstract: Transfer in reinforcement learning is usually achieved through generalisation across tasks. Whilst many studies have investigated transferring knowledge when the reward function changes, they have assumed that the dynamics of the environments remain consistent. Many real-world RL problems require transfer among environments with different dynamics. To address this problem, we propose an approach b… ▽ More

    Submitted 18 July, 2021; originally announced July 2021.

  8. arXiv:1909.03600  [pdf, other

    cs.LG math.OC stat.ML

    Cost-aware Multi-objective Bayesian optimisation

    Authors: Majid Abdolshah, Alistair Shilton, Santu Rana, Sunil Gupta, Svetha Venkatesh

    Abstract: The notion of expense in Bayesian optimisation generally refers to the uniformly expensive cost of function evaluations over the whole search space. However, in some scenarios, the cost of evaluation for black-box objective functions is non-uniform since different inputs from search space may incur different costs for function evaluations. We introduce a cost-aware multi-objective Bayesian optimis… ▽ More

    Submitted 8 September, 2019; originally announced September 2019.

  9. arXiv:1902.07846  [pdf, other

    stat.ML cs.LG

    Stable Bayesian Optimisation via Direct Stability Quantification

    Authors: Alistair Shilton, Sunil Gupta, Santu Rana, Svetha Venkatesh, Majid Abdolshah, Dang Nguyen

    Abstract: In this paper we consider the problem of finding stable maxima of expensive (to evaluate) functions. We are motivated by the optimisation of physical and industrial processes where, for some input ranges, small and unavoidable variations in inputs lead to unacceptably large variation in outputs. Our approach uses multiple gradient Gaussian Process models to estimate the probability that worst-case… ▽ More

    Submitted 20 February, 2019; originally announced February 2019.

  10. arXiv:1902.04228  [pdf, other

    cs.LG cs.AI stat.ML

    Multi-objective Bayesian optimisation with preferences over objectives

    Authors: Majid Abdolshah, Alistair Shilton, Santu Rana, Sunil Gupta, Svetha Venkatesh

    Abstract: We present a multi-objective Bayesian optimisation algorithm that allows the user to express preference-order constraints on the objectives of the type "objective A is more important than objective B". These preferences are defined based on the stability of the obtained solutions with respect to preferred objective functions. Rather than attempting to find a representative subset of the complete P… ▽ More

    Submitted 12 November, 2019; v1 submitted 11 February, 2019; originally announced February 2019.