Skip to main content

Showing 1–1 of 1 results for author: Pi, C

Searching in archive stat. Search in all archives.
.
  1. arXiv:1904.10642  [pdf, ps, other

    cs.LG stat.ML

    Towards Combining On-Off-Policy Methods for Real-World Applications

    Authors: Kai-Chun Hu, Chen-Huan Pi, Ting Han Wei, I-Chen Wu, Stone Cheng, Yi-Wei Dai, Wei-Yuan Ye

    Abstract: In this paper, we point out a fundamental property of the objective in reinforcement learning, with which we can reformulate the policy gradient objective into a perceptron-like loss function, removing the need to distinguish between on and off policy training. Namely, we posit that it is sufficient to only update a policy $π$ for cases that satisfy the condition $A(\fracπμ-1)\leq0$, where $A$ is… ▽ More

    Submitted 24 April, 2019; originally announced April 2019.