Showing 1–1 of 1 results for author: Ward, P N

Search v0.5.6 released 2020-02-24

arXiv:1906.02771 [pdf, other]

cs.LG cs.AI stat.ML

Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies

Authors: Patrick Nadeem Ward, Ariella Smofsky, Avishek Joey Bose

Abstract: Deep Reinforcement Learning (DRL) algorithms for continuous action spaces are known to be brittle toward hyperparameters as well as \cut{being}sample inefficient. Soft Actor Critic (SAC) proposes an off-policy deep actor critic algorithm within the maximum entropy RL framework which offers greater stability and empirical gains. The choice of policy distribution, a factored Gaussian, is motivated b… ▽ More Deep Reinforcement Learning (DRL) algorithms for continuous action spaces are known to be brittle toward hyperparameters as well as \cut{being}sample inefficient. Soft Actor Critic (SAC) proposes an off-policy deep actor critic algorithm within the maximum entropy RL framework which offers greater stability and empirical gains. The choice of policy distribution, a factored Gaussian, is motivated by \cut{chosen due}its easy re-parametrization rather than its modeling power. We introduce Normalizing Flow policies within the SAC framework that learn more expressive classes of policies than simple factored Gaussians. \cut{We also present a series of stabilization tricks that enable effective training of these policies in the RL setting.}We show empirically on continuous grid world tasks that our approach increases stability and is better suited to difficult exploration in sparse reward settings. △ Less

Submitted 6 June, 2019; originally announced June 2019.

Comments: INNF workshop, International Conference on Machine Learning 2019, Long Beach CA, USA

Search v0.5.6 released 2020-02-24