Skip to main content

Showing 1–3 of 3 results for author: Bhupatiraju, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:1808.04888  [pdf, other

    stat.ML cs.LG

    Skill Rating for Generative Models

    Authors: Catherine Olsson, Surya Bhupatiraju, Tom Brown, Augustus Odena, Ian Goodfellow

    Abstract: We explore a new way to evaluate generative models using insights from evaluation of competitive games between human players. We show experimentally that tournaments between generators and discriminators provide an effective way to evaluate generative models. We introduce two methods for summarizing tournament outcomes: tournament win rate and skill rating. Evaluations are useful in different cont… ▽ More

    Submitted 14 August, 2018; originally announced August 2018.

  2. arXiv:1807.00403  [pdf, other

    cs.LG cs.AI stat.ML

    Towards Mixed Optimization for Reinforcement Learning with Program Synthesis

    Authors: Surya Bhupatiraju, Kumar Krishna Agrawal, Rishabh Singh

    Abstract: Deep reinforcement learning has led to several recent breakthroughs, though the learned policies are often based on black-box neural networks. This makes them difficult to interpret and to impose desired specification constraints during learning. We present an iterative framework, MORL, for improving the learned policies using program synthesis. Concretely, we propose to use synthesis techniques t… ▽ More

    Submitted 3 July, 2018; v1 submitted 1 July, 2018; originally announced July 2018.

    Comments: Updated publication details, format. Accepted at NAMPI workshop, ICML '18

  3. arXiv:1802.10031  [pdf, other

    cs.LG stat.ML

    The Mirage of Action-Dependent Baselines in Reinforcement Learning

    Authors: George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E. Turner, Zoubin Ghahramani, Sergey Levine

    Abstract: Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance. Several recent papers extend the baseline to depend on both the state and action and suggest that this significantly reduces variance and improves sample efficiency without introducing bias into the gradient estimates. To be… ▽ More

    Submitted 19 November, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

    Comments: Updated to ICML final submission