Skip to main content

Showing 1–4 of 4 results for author: Nikulkov, A

.
  1. arXiv:2312.03814  [pdf, other

    cs.LG cs.AI

    Pearl: A Production-ready Reinforcement Learning Agent

    Authors: Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu

    Abstract: Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals. Its generality allows us to formalize a wide range of problems that real-world intelligent systems encounter, such as dealing with delayed rewards, handling partial observability, addressing the exploration and exploitation dilemma, utilizing offline data to improve online performance, and ensuring safety const… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  2. arXiv:2310.09426  [pdf, other

    cs.LG stat.ML

    Offline Reinforcement Learning for Optimizing Production Bidding Policies

    Authors: Dmytro Korenkevych, Frank Cheng, Artsiom Balakir, Alex Nikulkov, Lingnan Gao, Zhihao Cen, Zuobing Xu, Zheqing Zhu

    Abstract: The online advertising market, with its thousands of auctions run per second, presents a daunting challenge for advertisers who wish to optimize their spend under a budget constraint. Thus, advertising platforms typically provide automated agents to their customers, which act on their behalf to bid for impression opportunities in real time at scale. Because these proxy agents are owned by the plat… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  3. arXiv:2305.13747  [pdf, other

    cs.IR cs.AI

    Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

    Authors: Ruiyang Xu, Jalaj Bhandari, Dmytro Korenkevych, Fan Liu, Yuchen He, Alex Nikulkov, Zheqing Zhu

    Abstract: Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on user behavior. In this study, we employ reinforcement learning to optimize for long-term return metrics in an auction-based recommender system. Utilizing… ▽ More

    Submitted 30 July, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  4. arXiv:2304.02572  [pdf, other

    cs.IR cs.AI cs.LG cs.SI

    Evaluating Online Bandit Exploration In Large-Scale Recommender System

    Authors: Hongbo Guo, Ruben Naeff, Alex Nikulkov, Zheqing Zhu

    Abstract: Bandit learning has been an increasingly popular design choice for recommender system. Despite the strong interest in bandit learning from the community, there remains multiple bottlenecks that prevent many bandit learning approaches from productionalization. One major bottleneck is how to test the effectiveness of bandit algorithm with fairness and without data leakage. Different from supervised… ▽ More

    Submitted 30 July, 2023; v1 submitted 5 April, 2023; originally announced April 2023.