Skip to main content

Showing 1–5 of 5 results for author: Parmas, P

.
  1. arXiv:2105.14900  [pdf, other

    cs.LG cs.AI stat.ML

    A unified view of likelihood ratio and reparameterization gradients

    Authors: Paavo Parmas, Masashi Sugiyama

    Abstract: Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used to estimate gradients of expectations throughout machine learning and reinforcement learning; however, they are usually explained as simple mathematical tricks, with no insight into their nature. We use a first principles approach to explain that LR and RP are alternative methods of kee** track of the movement of prob… ▽ More

    Submitted 31 May, 2021; originally announced May 2021.

    Comments: AISTATS2021; Earlier paper was split in two (arXiv:1910.06419). Refer to the current paper for the unified view, but see the earlier paper for discussion on an importance sampling technique

    Journal ref: In International Conference on Artificial Intelligence and Statistics (pp. 4078-4086). PMLR (2021, March)

  2. arXiv:1910.06419  [pdf, other

    cs.LG stat.ML

    A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme

    Authors: Paavo Parmas, Masashi Sugiyama

    Abstract: Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used throughout machine and reinforcement learning; however, they are usually explained as simple mathematical tricks without providing any insight into their nature. We use a first principles approach to explain LR and RP, and show a connection between the two via the divergence theorem. The theory motivated us to derive op… ▽ More

    Submitted 14 October, 2019; originally announced October 2019.

    Comments: 8 pages + 19 pages appendix. Preliminary work

  3. arXiv:1906.00190  [pdf, other

    cs.LG cs.AI stat.ML

    Neural Replicator Dynamics

    Authors: Daniel Hennes, Dustin Morrill, Shayegan Omidshafiei, Remi Munos, Julien Perolat, Marc Lanctot, Audrunas Gruslys, Jean-Baptiste Lespiau, Paavo Parmas, Edgar Duenez-Guzman, Karl Tuyls

    Abstract: Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning. Using these algorithms in multiagent environments poses problems such as nonstationarity and instability. In this paper, we first demonstrate that standard softmax-based policy gradient can be prone to poor performance in the presence of even the most benign nonstati… ▽ More

    Submitted 26 February, 2020; v1 submitted 1 June, 2019; originally announced June 2019.

  4. arXiv:1902.01722  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Total stochastic gradient algorithms and applications in reinforcement learning

    Authors: Paavo Parmas

    Abstract: Backpropagation and the chain rule of derivatives have been prominent; however, the total derivative rule has not enjoyed the same amount of attention. In this work we show how the total derivative rule leads to an intuitive visual framework for creating gradient estimators on graphical models. In particular, previous "policy gradient theorems" are easily derived. We derive new gradient estimators… ▽ More

    Submitted 5 February, 2019; originally announced February 2019.

    Comments: NeurIPS 2018

  5. arXiv:1902.01240  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos

    Authors: Paavo Parmas, Carl Edward Rasmussen, Jan Peters, Kenji Doya

    Abstract: Previously, the exploding gradient problem has been explained to be central in deep learning and model-based reinforcement learning, because it causes numerical issues and instability in optimization. Our experiments in model-based reinforcement learning imply that the problem is not just a numerical issue, but it may be caused by a fundamental chaos-like nature of long chains of nonlinear computa… ▽ More

    Submitted 4 February, 2019; originally announced February 2019.

    Comments: ICML 2018