Skip to main content

Showing 1–1 of 1 results for author: Guntara, T W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.18792  [pdf, other

    cs.LG cs.AI

    Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies

    Authors: Haanvid Lee, Tri Wahyu Guntara, Jongmin Lee, Yung-Kyun Noh, Kee-Eung Kim

    Abstract: We consider off-policy evaluation (OPE) of deterministic target policies for reinforcement learning (RL) in environments with continuous action spaces. While it is common to use importance sampling for OPE, it suffers from high variance when the behavior policy deviates significantly from the target policy. In order to address this issue, some recent works on OPE proposed in-sample learning with i… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 23 pages, 2 figures, Accepted at ICLR 2024 (spotlight)