Skip to main content

Showing 1–3 of 3 results for author: Al-Hafez, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.04082  [pdf, other

    cs.LG

    Time-Efficient Reinforcement Learning with Stochastic Stateful Policies

    Authors: Firas Al-Hafez, Guo** Zhao, Jan Peters, Davide Tateo

    Abstract: Stateful policies play an important role in reinforcement learning, such as handling partially observable environments, enhancing robustness, or imposing an inductive bias directly into the policy structure. The conventional method for training stateful policies is Backpropagation Through Time (BPTT), which comes with significant drawbacks, such as slow training due to sequential gradient propagat… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  2. arXiv:2311.02496  [pdf, other

    cs.LG cs.RO

    LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion

    Authors: Firas Al-Hafez, Guo** Zhao, Jan Peters, Davide Tateo

    Abstract: Imitation Learning (IL) holds great promise for enabling agile locomotion in embodied agents. However, many existing locomotion benchmarks primarily focus on simplified toy tasks, often failing to capture the complexity of real-world scenarios and steering research toward unrealistic domains. To advance research in IL for locomotion, we present a novel benchmark designed to facilitate rigorous eva… ▽ More

    Submitted 30 November, 2023; v1 submitted 4 November, 2023; originally announced November 2023.

    Comments: https://github.com/robfiras/loco-mujoco

  3. arXiv:2303.00599  [pdf, other

    cs.LG cs.AI cs.RO

    LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

    Authors: Firas Al-Hafez, Davide Tateo, Oleg Arenz, Guo** Zhao, Jan Peters

    Abstract: Recent methods for imitation learning directly learn a $Q$-function using an implicit reward formulation rather than an explicit reward function. However, these methods generally require implicit reward regularization to improve stability and often mistreat absorbing states. Previous works show that a squared norm regularization on the implicit reward function is effective, but do not provide a th… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.