Skip to main content

Showing 1–16 of 16 results for author: Doya, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15016  [pdf, other

    cs.NE cs.AI

    Evolution of Rewards for Food and Motor Action by Simulating Birth and Death

    Authors: Yuji Kanagawa, Kenji Doya

    Abstract: The reward system is one of the fundamental drivers of animal behaviors and is critical for survival and reproduction. Despite its importance, the problem of how the reward system has evolved is underexplored. In this paper, we try to replicate the evolution of biologically plausible reward functions and investigate how environmental conditions affect evolved rewards' shape. For this purpose, we d… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2405.07473  [pdf, other

    cs.LG stat.ML

    Intrinsic Rewards for Exploration without Harm from Observational Noise: A Simulation Study Based on the Free Energy Principle

    Authors: Theodore Jerome Tinker, Kenji Doya, Jun Tani

    Abstract: In Reinforcement Learning (RL), artificial agents are trained to maximize numerical rewards by performing tasks. Exploration is essential in RL because agents must discover information before exploiting it. Two rewards encouraging efficient exploration are the entropy of action policy and curiosity for information gain. Entropy is well-established in literature, promoting randomized action selecti… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 54 pages, 11 figures, to be published in Neural Computation

    MSC Class: 68T01

  3. arXiv:2306.16906  [pdf, other

    stat.ML cs.LG

    Numerical Data Imputation for Multimodal Data Sets: A Probabilistic Nearest-Neighbor Kernel Density Approach

    Authors: Florian Lalande, Kenji Doya

    Abstract: Numerical data imputation algorithms replace missing values by estimates to leverage incomplete data sets. Current imputation methods seek to minimize the error between the unobserved ground truth and the imputed values. But this strategy can create artifacts leading to poor imputation in the presence of multimodal or complex distributions. To tackle this problem, we introduce the $k$NN$\times$KDE… ▽ More

    Submitted 10 July, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: 30 pages, 8 figures, accepted in TMLR (Reproducibility certification)

    Journal ref: Transactions on Machine Learning Research, June 2023

  4. arXiv:2304.05008  [pdf, other

    cs.LG cs.AI

    Habits and goals in synergy: a variational Bayesian framework for behavior

    Authors: Dongqi Han, Kenji Doya, Dongsheng Li, Jun Tani

    Abstract: How to behave efficiently and flexibly is a central problem for understanding biological agents and creating intelligent embodied AI. It has been well known that behavior can be classified as two types: reward-maximizing habitual behavior, which is fast while inflexible; and goal-directed behavior, which is flexible while slow. Conventionally, habitual and goal-directed behaviors are considered ha… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

    Journal ref: Nat Commun 15, 4461 (2024)

  5. arXiv:2106.09938  [pdf, other

    cs.LG cs.AI cs.RO

    Goal-Directed Planning by Reinforcement Learning and Active Inference

    Authors: Dongqi Han, Kenji Doya, Jun Tani

    Abstract: What is the difference between goal-directed and habitual behavior? We propose a novel computational framework of decision making with Bayesian inference, in which everything is integrated as an entire neural network model. The model learns to predict environmental state transitions by self-exploration and generating motor actions by sampling stochastic internal states ${z}$. Habitual behavior, wh… ▽ More

    Submitted 22 June, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: Work in progress

  6. A Whole Brain Probabilistic Generative Model: Toward Realizing Cognitive Architectures for Developmental Robots

    Authors: Tadahiro Taniguchi, Hiroshi Yamakawa, Takayuki Nagai, Kenji Doya, Masamichi Sakagami, Masahiro Suzuki, Tomoaki Nakamura, Akira Taniguchi

    Abstract: Building a humanlike integrative artificial cognitive system, that is, an artificial general intelligence (AGI), is the holy grail of the artificial intelligence (AI) field. Furthermore, a computational model that enables an artificial system to achieve cognitive development will be an excellent reference for brain and cognitive science. This paper describes an approach to develop a cognitive arch… ▽ More

    Submitted 9 January, 2022; v1 submitted 15 March, 2021; originally announced March 2021.

    Comments: 62 pages, 9 figures, submitted to Neural Networks

    Journal ref: Neural Networks, 2022, Volume 150, 293-312

  7. Forward and inverse reinforcement learning sharing network weights and hyperparameters

    Authors: Eiji Uchibe, Kenji Doya

    Abstract: This paper proposes model-free imitation learning named Entropy-Regularized Imitation Learning (ERIL) that minimizes the reverse Kullback-Leibler (KL) divergence. ERIL combines forward and inverse reinforcement learning (RL) under the framework of an entropy-regularized Markov decision process. An inverse RL step computes the log-ratio between two distributions by evaluating two binary discriminat… ▽ More

    Submitted 31 May, 2022; v1 submitted 17 August, 2020; originally announced August 2020.

    Comments: Accepted for publication in the Neural Networks

    Journal ref: Neural Networks, December 2021, Pages 138-153

  8. arXiv:1912.10703  [pdf, other

    cs.LG cs.NE eess.SY stat.ML

    Variational Recurrent Models for Solving Partially Observable Control Tasks

    Authors: Dongqi Han, Kenji Doya, Jun Tani

    Abstract: In partially observable (PO) environments, deep reinforcement learning (RL) agents often suffer from unsatisfactory performance, since two problems need to be tackled together: how to extract information from the raw observations to solve the task, and how to improve the policy. In this study, we propose an RL algorithm for solving PO tasks. Our method comprises two parts: a variational recurrent… ▽ More

    Submitted 24 December, 2019; v1 submitted 23 December, 2019; originally announced December 2019.

    Comments: Published as a conference paper at the Eighth International Conference on Learning Representations (ICLR 2020)

  9. arXiv:1908.00876  [pdf, other

    eess.IV cs.LG q-bio.NC stat.ML

    MarmoNet: a pipeline for automated projection map** of the common marmoset brain from whole-brain serial two-photon tomography

    Authors: Henrik Skibbe, Akiya Watakabe, Ken Nakae, Carlos Enrique Gutierrez, Hiromichi Tsukada, Junichi Hata, Takashi Kawase, Rui Gong, Alexander Woodward, Kenji Doya, Hideyuki Okano, Tetsuo Yamamori, Shin Ishii

    Abstract: Understanding the connectivity in the brain is an important prerequisite for understanding how the brain processes information. In the Brain/MINDS project, a connectivity study on marmoset brains uses two-photon microscopy fluorescence images of axonal projections to collect the neuron connectivity from defined brain regions at the mesoscopic scale. The processing of the images requires the detect… ▽ More

    Submitted 2 August, 2019; originally announced August 2019.

  10. arXiv:1906.07586  [pdf, other

    cs.LG stat.ML

    Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant Reinforcement Learning

    Authors: Tadashi Kozuno, Dongqi Han, Kenji Doya

    Abstract: In real-world applications of reinforcement learning (RL), noise from inherent stochasticity of environments is inevitable. However, current policy evaluation algorithms, which plays a key role in many RL algorithms, are either prone to noise or inefficient. To solve this issue, we introduce a novel policy evaluation algorithm, which we call Gap-increasing RetrAce Policy Evaluation (GRAPE). It lev… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

  11. arXiv:1902.01240  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos

    Authors: Paavo Parmas, Carl Edward Rasmussen, Jan Peters, Kenji Doya

    Abstract: Previously, the exploding gradient problem has been explained to be central in deep learning and model-based reinforcement learning, because it causes numerical issues and instability in optimization. Our experiments in model-based reinforcement learning imply that the problem is not just a numerical issue, but it may be caused by a fundamental chaos-like nature of long chains of nonlinear computa… ▽ More

    Submitted 4 February, 2019; originally announced February 2019.

    Comments: ICML 2018

  12. arXiv:1901.10113  [pdf, other

    cs.LG stat.ML

    Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent neural networks

    Authors: Dongqi Han, Kenji Doya, Jun Tani

    Abstract: Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on improving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In this paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical results show t… ▽ More

    Submitted 26 November, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

  13. arXiv:1807.09443  [pdf, ps, other

    cs.LG stat.ML

    Unbounded Output Networks for Classification

    Authors: Stefan Elfwing, Eiji Uchibe, Kenji Doya

    Abstract: We proposed the expected energy-based restricted Boltzmann machine (EE-RBM) as a discriminative RBM method for classification. Two characteristics of the EE-RBM are that the output is unbounded and that the target value of correct classification is set to a value much greater than one. In this study, by adopting features of the EE-RBM approach to feed-forward neural networks, we propose the UnBoun… ▽ More

    Submitted 25 July, 2018; originally announced July 2018.

    Comments: 8 pages, 7 figures

  14. arXiv:1710.10866  [pdf, other

    stat.ML cs.LG

    Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

    Authors: Tadashi Kozuno, Eiji Uchibe, Kenji Doya

    Abstract: Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks. In this paper we propose a new, robust dynamic programming algorithm that unifies value iterati… ▽ More

    Submitted 30 October, 2017; originally announced October 2017.

  15. arXiv:1702.07490  [pdf, ps, other

    cs.LG

    Online Meta-learning by Parallel Algorithm Competition

    Authors: Stefan Elfwing, Eiji Uchibe, Kenji Doya

    Abstract: The efficiency of reinforcement learning algorithms depends critically on a few meta-parameters that modulates the learning updates and the trade-off between exploration and exploitation. The adaptation of the meta-parameters is an open question in reinforcement learning, which arguably has become more of an issue recently with the success of deep reinforcement learning in high-dimensional state s… ▽ More

    Submitted 24 February, 2017; originally announced February 2017.

    Comments: 15 pages, 10 figures. arXiv admin note: text overlap with arXiv:1702.03118

  16. arXiv:1702.03118  [pdf, ps, other

    cs.LG

    Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning

    Authors: Stefan Elfwing, Eiji Uchibe, Kenji Doya

    Abstract: In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Tesauro's TD-Gammon achieved near top-level human performance in backgammon, the deep reinforcement learning algorithm DQN achieved human-level performance in many Atari 2600 games. The purpose of this study is twofold. First, we propose two activation functions for ne… ▽ More

    Submitted 1 November, 2017; v1 submitted 10 February, 2017; originally announced February 2017.

    Comments: 18 pages, 22 figures; added deep RL results for SZ-Tetris