Skip to main content

Showing 1–5 of 5 results for author: Kiyohara, H

Searching in archive stat. Search in all archives.
.
  1. Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction

    Authors: Haruka Kiyohara, Masahiro Nomura, Yuta Saito

    Abstract: We study off-policy evaluation (OPE) in the problem of slate contextual bandits where a policy selects multi-dimensional actions known as slates. This problem is widespread in recommender systems, search engines, marketing, to medical applications, however, the typical Inverse Propensity Scoring (IPS) estimator suffers from substantial variance due to large action spaces, making effective OPE a si… ▽ More

    Submitted 17 February, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: WWW2024

  2. arXiv:2306.15098  [pdf, other

    stat.ML cs.IR cs.LG

    Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

    Authors: Haruka Kiyohara, Masatoshi Uehara, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto, Yuta Saito

    Abstract: Ranking interfaces are everywhere in online platforms. There is thus an ever growing interest in their Off-Policy Evaluation (OPE), aiming towards an accurate performance evaluation of ranking policies using logged data. A de-facto approach for OPE is Inverse Propensity Scoring (IPS), which provides an unbiased and consistent value estimate. However, it becomes extremely inaccurate in the ranking… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: KDD2023 Research track

  3. arXiv:2207.13081  [pdf, other

    cs.LG stat.ML

    Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

    Authors: Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun

    Abstract: We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs.… ▽ More

    Submitted 14 November, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

    Comments: This paper was accepted in NeurIPS 2023

  4. Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

    Authors: Haruka Kiyohara, Yuta Saito, Tatsuya Matsuhiro, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto

    Abstract: In real-world recommender systems and search engines, optimizing ranking decisions to present a ranked list of relevant items is critical. Off-policy evaluation (OPE) for ranking policies is thus gaining a growing interest because it enables performance estimation of new ranking policies using only logged data. Although OPE in contextual bandits has been studied extensively, its naive application… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

    Comments: WSDM2022

  5. arXiv:2108.13703  [pdf, other

    stat.ML cs.AI cs.LG

    Evaluating the Robustness of Off-Policy Evaluation

    Authors: Yuta Saito, Takuma Udagawa, Haruka Kiyohara, Kazuki Mogi, Yusuke Narita, Kei Tateno

    Abstract: Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of hypothetical policies leveraging only offline log data. It is particularly useful in applications where the online interaction involves high stakes and expensive setting such as precision medicine and recommender systems. Since many OPE estimators have been proposed and some of them have hyperparameters to… ▽ More

    Submitted 31 August, 2021; originally announced August 2021.

    Comments: Accepted at RecSys2021