Skip to main content

Showing 1–3 of 3 results for author: Low, S M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.03005  [pdf, other

    cs.LG cs.AI

    Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints

    Authors: Siow Meng Low, Akshat Kumar

    Abstract: In safe Reinforcement Learning (RL), safety cost is typically defined as a function dependent on the immediate state and actions. In practice, safety constraints can often be non-Markovian due to the insufficient fidelity of state representation, and safety cost may not be known. We therefore address a general setting where safety labels (e.g., safe or unsafe) are associated with state-action traj… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  2. arXiv:2304.03081  [pdf, other

    cs.LG cs.AI

    Safe MDP Planning by Learning Temporal Patterns of Undesirable Trajectories and Averting Negative Side Effects

    Authors: Siow Meng Low, Akshat Kumar, Scott Sanner

    Abstract: In safe MDP planning, a cost function based on the current state and action is often used to specify safety aspects. In the real world, often the state representation used may lack sufficient fidelity to specify such safety constraints. Operating based on an incomplete model can often produce unintended negative side effects (NSEs). To address these challenges, first, we associate safety signals w… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

  3. arXiv:2203.12679  [pdf, other

    cs.AI cs.LG

    Sample-efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs

    Authors: Siow Meng Low, Akshat Kumar, Scott Sanner

    Abstract: Recent advances in deep learning have enabled optimization of deep reactive policies (DRPs) for continuous MDP planning by encoding a parametric policy as a deep neural network and exploiting automatic differentiation in an end-to-end model-based gradient descent framework. This approach has proven effective for optimizing DRPs in nonlinear continuous MDPs, but it requires a large number of sample… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.