Skip to main content

Showing 1–23 of 23 results for author: Kozuno, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.18002  [pdf, other

    cs.RO cs.AI cs.LG

    Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist

    Authors: Hai Nguyen, Tadashi Kozuno, Cristian C. Beltran-Hernandez, Masashi Hamaya

    Abstract: This study tackles the representative yet challenging contact-rich peg-in-hole task of robotic assembly, using a soft wrist that can operate more safely and tolerate lower-frequency control signals than a rigid one. Previous studies often use a fully observable formulation, requiring external setups or estimators for the peg-to-hole pose. In contrast, we use a partially observable formulation and… ▽ More

    Submitted 29 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted at ICRA-2024

  2. arXiv:2401.17780  [pdf, other

    cs.LG

    A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees

    Authors: Toshinori Kitamura, Tadashi Kozuno, Masahiro Kato, Yuki Ichihara, Soichiro Nishimori, Akiyoshi Sannai, Sho Sonoda, Wataru Kumagai, Yutaka Matsuo

    Abstract: We study a primal-dual (PD) reinforcement learning (RL) algorithm for online constrained Markov decision processes (CMDPs). Despite its widespread practical use, the existing theoretical literature on PD-RL algorithms for this problem only provides sublinear regret guarantees and fails to ensure convergence to optimal policies. In this paper, we introduce a novel policy gradient PD algorithm with… ▽ More

    Submitted 1 July, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

  3. arXiv:2312.02008  [pdf, other

    cs.RO

    Multi-Agent Behavior Retrieval: Retrieval-Augmented Policy Training for Cooperative Push Manipulation by Mobile Robots

    Authors: So Kuroki, Mai Nishimura, Tadashi Kozuno

    Abstract: Due to the complex interactions between agents, learning multi-agent control policy often requires a prohibited amount of data. This paper aims to enable multi-agent systems to effectively utilize past memories to adapt to novel collaborative tasks in a data-efficient fashion. We propose the Multi-Agent Coordination Skill Database, a repository for storing a collection of coordinated behaviors ass… ▽ More

    Submitted 29 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

  4. arXiv:2309.00656  [pdf, other

    cs.GT cs.LG stat.ML

    Local and adaptive mirror descents in extensive-form games

    Authors: Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

    Abstract: We study how to learn $ε$-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback. In this setting, players update their policies sequentially based on their observations over a fixed number of episodes, denoted by $T$. Existing procedures suffer from high variance due to the use of importance sampling over sequences of actions (Steinberger et al., 2020; McAleer e… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  5. arXiv:2305.18501  [pdf, other

    cs.LG

    DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

    Authors: Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko

    Abstract: Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings. However, in the optimal control case, the impact of multi-step learning has been relatively limited despite a number of prior efforts. Fundamentally, this might be because multi-step policy improvements require operations that cannot be approximated by stochastic samples, hence hin… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  6. arXiv:2305.13185  [pdf, other

    cs.LG

    Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

    Authors: Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, **cheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo

    Abstract: Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms. However, despite the use of function approximation in practice, the theoretical understanding of MDVI has been limited to tabular Markov decision processes (MDPs). We study MDVI with linear fu… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: ICML 2023 accepted

  7. arXiv:2305.11465  [pdf, other

    cs.MA cs.RO

    Counterfactual Fairness Filter for Fair-Delay Multi-Robot Navigation

    Authors: Hikaru Asano, Ryo Yonetani, Mai Nishimura, Tadashi Kozuno

    Abstract: Multi-robot navigation is the task of finding trajectories for a team of robotic agents to reach their destinations as quickly as possible without collisions. In this work, we introduce a new problem: fair-delay multi-robot navigation, which aims not only to enable such efficient, safe travels but also to equalize the travel delays among agents in terms of actual trajectories as compared to the be… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: To appear in the International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2023)

  8. arXiv:2304.12046  [pdf, other

    cs.RO

    When to Replan? An Adaptive Replanning Strategy for Autonomous Navigation using Deep Reinforcement Learning

    Authors: Kohei Honda, Ryo Yonetani, Mai Nishimura, Tadashi Kozuno

    Abstract: The hierarchy of global and local planners is one of the most commonly utilized system designs in autonomous robot navigation. While the global planner generates a reference path from the current to goal locations based on the pre-built map, the local planner produces a kinodynamic trajectory to follow the reference path while avoiding perceived obstacles. To account for unforeseen or dynamic obst… ▽ More

    Submitted 26 February, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: 7 pages, 3 figures

  9. Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints

    Authors: Kazumi Kasaura, Shuwa Miura, Tadashi Kozuno, Ryo Yonetani, Kenta Hoshino, Yohei Hosoe

    Abstract: This study presents a benchmark for evaluating action-constrained reinforcement learning (RL) algorithms. In action-constrained RL, each action taken by the learning system must comply with certain constraints. These constraints are crucial for ensuring the feasibility and safety of actions in real-world systems. We evaluate existing algorithms and their novel variants across multiple robotics con… ▽ More

    Submitted 29 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: 8 pages, 7 figures, accepted to Robotics and Automation Letters

    Journal ref: IEEE Robotics and Automation Letters 8(8) (2023) 4449-4456

  10. arXiv:2302.01248  [pdf, other

    stat.ML cs.LG

    Robust Markov Decision Processes without Model Estimation

    Authors: Wenhao Yang, Han Wang, Tadashi Kozuno, Scott M. Jordan, Zhihua Zhang

    Abstract: Robust Markov Decision Processes (MDPs) are receiving much attention in learning a robust policy which is less sensitive to environment changes. There are an increasing number of works analyzing sample-efficiency of robust MDPs. However, there are two major barriers to applying robust MDPs in practice. First, most works study robust MDPs in a model-based regime, where the transition probability ne… ▽ More

    Submitted 12 September, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

  11. arXiv:2212.12567  [pdf, other

    stat.ML cs.LG

    Adapting to game trees in zero-sum imperfect information games

    Authors: Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

    Abstract: Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $ε$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\widetilde{\mathcal{O}}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/ε^2)$ on the required number of realizations to learn these strategies with hi… ▽ More

    Submitted 15 February, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

  12. arXiv:2210.15755  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Confident Approximate Policy Iteration for Efficient Local Planning in $q^π$-realizable MDPs

    Authors: Gellért Weisz, András György, Tadashi Kozuno, Csaba Szepesvári

    Abstract: We consider approximate dynamic programming in $γ$-discounted Markov decision processes and apply it to approximate planning with linear value-function approximation. Our first contribution is a new variant of Approximate Policy Iteration (API), called Confident Approximate Policy Iteration (CAPI), which computes a deterministic stationary policy with an optimal error bound scaling linearly with t… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  13. arXiv:2205.14211  [pdf, other

    cs.LG cs.AI stat.ML

    KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

    Authors: Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, **cheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári

    Abstract: In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model. Particularly, we analyze mirror descent value iteration (MDVI) by Geist et al. (2019) and Vieillard et al. (2020a), which uses the Kullback-Leibler divergence and entropy regularization in its value and policy updates. Our analysis shows that it is nearly minimax-optimal for fi… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: 29 pages, 6 figures

  14. arXiv:2205.08716  [pdf, other

    cs.LG

    No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL

    Authors: Han Wang, Archit Sakhadeo, Adam White, James Bell, Vincent Liu, Xutong Zhao, Puer Liu, Tadashi Kozuno, Alona Fyshe, Martha White

    Abstract: The performance of reinforcement learning (RL) agents is sensitive to the choice of hyperparameters. In real-world settings like robotics or industrial control systems, however, testing different hyperparameter configurations directly on the environment can be financially prohibitive, dangerous, or time consuming. We propose a new approach to tune hyperparameters from offline logs of data, to full… ▽ More

    Submitted 18 May, 2022; originally announced May 2022.

  15. arXiv:2107.08285  [pdf, other

    cs.LG

    Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

    Authors: Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White

    Abstract: Approximate Policy Iteration (API) algorithms alternate between (approximate) policy evaluation and (approximate) greedification. Many different approaches have been explored for approximate policy evaluation, but less is understood about approximate greedification and what choices guarantee policy improvement. In this work, we investigate approximate greedification when reducing the KL divergence… ▽ More

    Submitted 18 April, 2022; v1 submitted 17 July, 2021; originally announced July 2021.

    Comments: Updated the paper with more theory in Section 5 and moved some experiments to the Appendix

  16. arXiv:2106.13125  [pdf, other

    cs.LG

    Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

    Authors: Yunhao Tang, Tadashi Kozuno, Mark Rowland, Rémi Munos, Michal Valko

    Abstract: Model-agnostic meta-reinforcement learning requires estimating the Hessian matrix of value functions. This is challenging from an implementation perspective, as repeatedly differentiating policy gradient estimates may lead to biased Hessian estimates. In this work, we provide a unifying framework for estimating higher-order derivatives of value functions, based on off-policy evaluation. Our framew… ▽ More

    Submitted 3 November, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: Accepted at Neural Information Processing Systems (NeurIPS), 2021. Code is available at https://github.com/robintyh1/neurips2021-meta-gradient-offpolicy-evaluation

  17. arXiv:2106.06279  [pdf, ps, other

    stat.ML cs.LG

    Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

    Authors: Tadashi Kozuno, Pierre Ménard, Rémi Munos, Michal Valko

    Abstract: We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play. Precisely, we focus on two-player, zero-sum, episodic, tabular IIG under the perfect-recall assumption where the only feedback is realizations of the game (bandit feedback). In particular, the dynamic of the IIG is not known -- we can only access it by sampling or interacting with a g… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

    Comments: 20 pages

  18. arXiv:2103.17258  [pdf, other

    cs.LG cs.AI stat.ML

    Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning

    Authors: Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima, Yutaka Matsuo, Shixiang Shane Gu

    Abstract: Recently many algorithms were devised for reinforcement learning (RL) with function approximation. While they have clear algorithmic distinctions, they also have many implementation differences that are algorithm-independent and sometimes under-emphasized. Such mixing of algorithmic novelty and implementation craftsmanship makes rigorous analyses of the sources of performance improvements across a… ▽ More

    Submitted 25 October, 2021; v1 submitted 31 March, 2021; originally announced March 2021.

    Comments: Accepted at NeurIPS 2021. The implementation is available at: https://github.com/frt03/inference-based-rl

  19. arXiv:2103.12726  [pdf, other

    cs.LG cs.AI stat.ML

    Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning

    Authors: Hiroki Furuta, Tatsuya Matsushima, Tadashi Kozuno, Yutaka Matsuo, Sergey Levine, Ofir Nachum, Shixiang Shane Gu

    Abstract: Progress in deep reinforcement learning (RL) research is largely enabled by benchmark task environments. However, analyzing the nature of those environments is often overlooked. In particular, we still do not have agreeable ways to measure the difficulty or solvability of a task, given that each has fundamentally different actions, observations, dynamics, rewards, and can be tackled with diverse R… ▽ More

    Submitted 31 May, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: Accepted to ICML2021. The code is available at: https://github.com/frt03/pic

  20. arXiv:2103.00107  [pdf, other

    cs.LG cs.AI stat.ML

    Revisiting Peng's Q($λ$) for Modern Reinforcement Learning

    Authors: Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel

    Abstract: Off-policy multi-step reinforcement learning algorithms consist of conservative and non-conservative algorithms: the former actively cut traces, whereas the latter do not. Recently, Munos et al. (2016) proved the convergence of conservative algorithms to an optimal Q-function. In contrast, non-conservative algorithms are thought to be unsafe and have a limited or no theoretical guarantee. Nonethel… ▽ More

    Submitted 26 February, 2021; originally announced March 2021.

    Comments: 26 pages, 7 figures, 2 tables

  21. arXiv:2003.14089  [pdf, other

    cs.LG stat.ML

    Leverage the Average: an Analysis of KL Regularization in RL

    Authors: Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Rémi Munos, Matthieu Geist

    Abstract: Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance. Yet, only little is understood theoretically about why KL regularization helps, so far. We study KL regularization within an approximate value iteration scheme and show that it implicitly averages q-values. Leveraging this insight, we provide a ve… ▽ More

    Submitted 6 January, 2021; v1 submitted 31 March, 2020; originally announced March 2020.

    Comments: NeurIPS 2020

  22. arXiv:1906.07586  [pdf, other

    cs.LG stat.ML

    Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant Reinforcement Learning

    Authors: Tadashi Kozuno, Dongqi Han, Kenji Doya

    Abstract: In real-world applications of reinforcement learning (RL), noise from inherent stochasticity of environments is inevitable. However, current policy evaluation algorithms, which plays a key role in many RL algorithms, are either prone to noise or inefficient. To solve this issue, we introduce a novel policy evaluation algorithm, which we call Gap-increasing RetrAce Policy Evaluation (GRAPE). It lev… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

  23. arXiv:1710.10866  [pdf, other

    stat.ML cs.LG

    Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

    Authors: Tadashi Kozuno, Eiji Uchibe, Kenji Doya

    Abstract: Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks. In this paper we propose a new, robust dynamic programming algorithm that unifies value iterati… ▽ More

    Submitted 30 October, 2017; originally announced October 2017.