-
Flip** Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning
Authors:
Sam Lobel,
Akhil Bagaria,
George Konidaris
Abstract:
We propose a new method for count-based exploration in high-dimensional state spaces. Unlike previous work which relies on density models, we show that counts can be derived by averaging samples from the Rademacher distribution (or coin flips). This insight is used to set up a simple supervised learning objective which, when optimized, yields a state's visitation count. We show that our method is…
▽ More
We propose a new method for count-based exploration in high-dimensional state spaces. Unlike previous work which relies on density models, we show that counts can be derived by averaging samples from the Rademacher distribution (or coin flips). This insight is used to set up a simple supervised learning objective which, when optimized, yields a state's visitation count. We show that our method is significantly more effective at deducing ground-truth visitation counts than previous work; when used as an exploration bonus for a model-free reinforcement learning algorithm, it outperforms existing approaches on most of 9 challenging exploration tasks, including the Atari game Montezuma's Revenge.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Scaling Goal-based Exploration via Pruning Proto-goals
Authors:
Akhil Bagaria,
Ray Jiang,
Ramana Kumar,
Tom Schaul
Abstract:
One of the gnarliest challenges in reinforcement learning (RL) is exploration that scales to vast domains, where novelty-, or coverage-seeking behaviour falls short. Goal-directed, purposeful behaviours are able to overcome this, but rely on a good goal space. The core challenge in goal discovery is finding the right balance between generality (not hand-crafted) and tractability (useful, not too m…
▽ More
One of the gnarliest challenges in reinforcement learning (RL) is exploration that scales to vast domains, where novelty-, or coverage-seeking behaviour falls short. Goal-directed, purposeful behaviours are able to overcome this, but rely on a good goal space. The core challenge in goal discovery is finding the right balance between generality (not hand-crafted) and tractability (useful, not too many). Our approach explicitly seeks the middle ground, enabling the human designer to specify a vast but meaningful proto-goal space, and an autonomous discovery process to refine this to a narrower space of controllable, reachable, novel, and relevant goals. The effectiveness of goal-conditioned exploration with the latter is then demonstrated in three challenging environments.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Sphynx: A Shared Instruction Cache Exporatory Study
Authors:
Dong-hyeon Park,
Akhil Bagaria,
Fabiha Hannan,
Eric Storm,
Josef Spjut
Abstract:
The Sphynx project was an exploratory study to discover what might be done to improve the heavy replication of in- structions in independent instruction caches for a massively parallel machine where a single program is executing across all of the cores. While a machine with only many cores (fewer than 50) might not have any issues replicating the instructions for each core, as we approach the era…
▽ More
The Sphynx project was an exploratory study to discover what might be done to improve the heavy replication of in- structions in independent instruction caches for a massively parallel machine where a single program is executing across all of the cores. While a machine with only many cores (fewer than 50) might not have any issues replicating the instructions for each core, as we approach the era where thousands of cores can be placed on one chip, the overhead of instruction replication may become unacceptably large. We believe that a large amount of sharing should be possible when the ma- chine is configured for all of the threads to issue from the same set of instructions. We propose a technique that allows sharing an instruction cache among a number of independent processor cores to allow for inter-thread sharing and reuse of instruction memory. While we do not have test cases to demonstrate the potential magnitude of performance gains that could be achieved, the potential for sharing reduces the die area required for instruction storage on chip.
△ Less
Submitted 2 December, 2014;
originally announced December 2014.