-
Decentralized Monte Carlo Tree Search for Partially Observable Multi-agent Pathfinding
Authors:
Alexey Skrynnik,
Anton Andreychuk,
Konstantin Yakovlev,
Aleksandr Panov
Abstract:
The Multi-Agent Pathfinding (MAPF) problem involves finding a set of conflict-free paths for a group of agents confined to a graph. In typical MAPF scenarios, the graph and the agents' starting and ending vertices are known beforehand, allowing the use of centralized planning algorithms. However, in this study, we focus on the decentralized MAPF setting, where the agents may observe the other agen…
▽ More
The Multi-Agent Pathfinding (MAPF) problem involves finding a set of conflict-free paths for a group of agents confined to a graph. In typical MAPF scenarios, the graph and the agents' starting and ending vertices are known beforehand, allowing the use of centralized planning algorithms. However, in this study, we focus on the decentralized MAPF setting, where the agents may observe the other agents only locally and are restricted in communications with each other. Specifically, we investigate the lifelong variant of MAPF, where new goals are continually assigned to the agents upon completion of previous ones. Drawing inspiration from the successful AlphaZero approach, we propose a decentralized multi-agent Monte Carlo Tree Search (MCTS) method for MAPF tasks. Our approach utilizes the agent's observations to recreate the intrinsic Markov decision process, which is then used for planning with a tailored for multi-agent tasks version of neural MCTS. The experimental results show that our approach outperforms state-of-the-art learnable MAPF solvers. The source code is available at https://github.com/AIRI-Institute/mats-lp.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Gradual Optimization Learning for Conformational Energy Minimization
Authors:
Artem Tsypin,
Leonid Ugadiarov,
Kuzma Khrabrov,
Alexander Telepov,
Egor Rumiantsev,
Alexey Skrynnik,
Aleksandr I. Panov,
Dmitry Vetrov,
Elena Tutubalina,
Artur Kadurin
Abstract:
Molecular conformation optimization is crucial to computer-aided drug discovery and materials design. Traditional energy minimization techniques rely on iterative optimization methods that use molecular forces calculated by a physical simulator (oracle) as anti-gradients. However, this is a computationally expensive approach that requires many interactions with a physical simulator. One way to acc…
▽ More
Molecular conformation optimization is crucial to computer-aided drug discovery and materials design. Traditional energy minimization techniques rely on iterative optimization methods that use molecular forces calculated by a physical simulator (oracle) as anti-gradients. However, this is a computationally expensive approach that requires many interactions with a physical simulator. One way to accelerate this procedure is to replace the physical simulator with a neural network. Despite recent progress in neural networks for molecular conformation energy prediction, such models are prone to distribution shift, leading to inaccurate energy minimization. We find that the quality of energy minimization with neural networks can be improved by providing optimization trajectories as additional training data. Still, it takes around $5 \times 10^5$ additional conformations to match the physical simulator's optimization quality. In this work, we present the Gradual Optimization Learning Framework (GOLF) for energy minimization with neural networks that significantly reduces the required additional data. The framework consists of an efficient data-collecting scheme and an external optimizer. The external optimizer utilizes gradients from the energy prediction model to generate optimization trajectories, and the data-collecting scheme selects additional training data to be processed by the physical simulator. Our results demonstrate that the neural network trained with GOLF performs on par with the oracle on a benchmark of diverse drug-like molecules using $50$x less additional data.
△ Less
Submitted 12 March, 2024; v1 submitted 5 November, 2023;
originally announced November 2023.
-
Learn to Follow: Decentralized Lifelong Multi-agent Pathfinding via Planning and Learning
Authors:
Alexey Skrynnik,
Anton Andreychuk,
Maria Nesterova,
Konstantin Yakovlev,
Aleksandr Panov
Abstract:
Multi-agent Pathfinding (MAPF) problem generally asks to find a set of conflict-free paths for a set of agents confined to a graph and is typically solved in a centralized fashion.
Conversely, in this work, we investigate the decentralized MAPF setting, when the central controller that posses all the information on the agents' locations and goals is absent and the agents have to sequientially de…
▽ More
Multi-agent Pathfinding (MAPF) problem generally asks to find a set of conflict-free paths for a set of agents confined to a graph and is typically solved in a centralized fashion.
Conversely, in this work, we investigate the decentralized MAPF setting, when the central controller that posses all the information on the agents' locations and goals is absent and the agents have to sequientially decide the actions on their own without having access to a full state of the environment. We focus on the practically important lifelong variant of MAPF, which involves continuously assigning new goals to the agents upon arrival to the previous ones. To address this complex problem, we propose a method that integrates two complementary approaches: planning with heuristic search and reinforcement learning through policy optimization. Planning is utilized to construct and re-plan individual paths. We enhance our planning algorithm with a dedicated technique tailored to avoid congestion and increase the throughput of the system. We employ reinforcement learning to discover the collision avoidance policies that effectively guide the agents along the paths. The policy is implemented as a neural network and is effectively trained without any reward-sha** or external guidance.
We evaluate our method on a wide range of setups comparing it to the state-of-the-art solvers. The results show that our method consistently outperforms the learnable competitors, showing higher throughput and better ability to generalize to the maps that were unseen at the training stage. Moreover our solver outperforms a rule-based one in terms of throughput and is an order of magnitude faster than a state-of-the-art search-based solver.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Monte-Carlo Tree Search for Multi-Agent Pathfinding: Preliminary Results
Authors:
Yelisey Pitanov,
Alexey Skrynnik,
Anton Andreychuk,
Konstantin Yakovlev,
Aleksandr Panov
Abstract:
In this work we study a well-known and challenging problem of Multi-agent Pathfinding, when a set of agents is confined to a graph, each agent is assigned a unique start and goal vertices and the task is to find a set of collision-free paths (one for each agent) such that each agent reaches its respective goal. We investigate how to utilize Monte-Carlo Tree Search (MCTS) to solve the problem. Alth…
▽ More
In this work we study a well-known and challenging problem of Multi-agent Pathfinding, when a set of agents is confined to a graph, each agent is assigned a unique start and goal vertices and the task is to find a set of collision-free paths (one for each agent) such that each agent reaches its respective goal. We investigate how to utilize Monte-Carlo Tree Search (MCTS) to solve the problem. Although MCTS was shown to demonstrate superior performance in a wide range of problems like playing antagonistic games (e.g. Go, Chess etc.), discovering faster matrix multiplication algorithms etc., its application to the problem at hand was not well studied before. To this end we introduce an original variant of MCTS, tailored to multi-agent pathfinding. The crux of our approach is how the reward, that guides MCTS, is computed. Specifically, we use individual paths to assist the agents with the the goal-reaching behavior, while leaving them freedom to get off the track if it is needed to avoid collisions. We also use a dedicated decomposition technique to reduce the branching factor of the tree search procedure. Empirically we show that the suggested method outperforms the baseline planning algorithm that invokes heuristic search, e.g. A*, at each re-planning step.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Reinforcement Learning with Success Induced Task Prioritization
Authors:
Maria Nesterova,
Alexey Skrynnik,
Aleksandr Panov
Abstract:
Many challenging reinforcement learning (RL) problems require designing a distribution of tasks that can be applied to train effective policies. This distribution of tasks can be specified by the curriculum. A curriculum is meant to improve the results of learning and accelerate it. We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning, where a task…
▽ More
Many challenging reinforcement learning (RL) problems require designing a distribution of tasks that can be applied to train effective policies. This distribution of tasks can be specified by the curriculum. A curriculum is meant to improve the results of learning and accelerate it. We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning, where a task sequence is created based on the success rate of each task. In this setting, each task is an algorithmically created environment instance with a unique configuration. The algorithm selects the order of tasks that provide the fastest learning for agents. The probability of selecting any of the tasks for the next stage of learning is determined by evaluating its performance score in previous stages. Experiments were carried out in the Partially Observable Grid Environment for Multiple Agents (POGEMA) and Procgen benchmark. We demonstrate that SITP matches or surpasses the results of other curriculum design methods. Our method can be implemented with handful of minor modifications to any standard RL framework and provides useful prioritization with minimal computational overhead.
△ Less
Submitted 30 December, 2022;
originally announced January 2023.
-
Collecting Interactive Multi-modal Datasets for Grounded Language Understanding
Authors:
Shrestha Mohanty,
Negar Arabzadeh,
Milagro Teruel,
Yuxuan Sun,
Artem Zholus,
Alexey Skrynnik,
Mikhail Burtsev,
Kavya Srinet,
Aleksandr Panov,
Arthur Szlam,
Marc-Alexandre Côté,
Julia Kiseleva
Abstract:
Human intelligence can remarkably adapt quickly to new tasks and environments. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research which can enable similar capabilities in machines, we made the following contributions (1) formalized the co…
▽ More
Human intelligence can remarkably adapt quickly to new tasks and environments. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research which can enable similar capabilities in machines, we made the following contributions (1) formalized the collaborative embodied agent using natural language task; (2) developed a tool for extensive and scalable data collection; and (3) collected the first dataset for interactive grounded language understanding.
△ Less
Submitted 21 March, 2023; v1 submitted 11 November, 2022;
originally announced November 2022.
-
Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions
Authors:
Alexey Skrynnik,
Zoya Volovikova,
Marc-Alexandre Côté,
Anton Voronov,
Artem Zholus,
Negar Arabzadeh,
Shrestha Mohanty,
Milagro Teruel,
Ahmed Awadallah,
Aleksandr Panov,
Mikhail Burtsev,
Julia Kiseleva
Abstract:
The adoption of pre-trained language models to generate action plans for embodied agents is a promising research strategy. However, execution of instructions in real or simulated environments requires verification of the feasibility of actions as well as their relevance to the completion of a goal. We propose a new method that combines a language model and reinforcement learning for the task of bu…
▽ More
The adoption of pre-trained language models to generate action plans for embodied agents is a promising research strategy. However, execution of instructions in real or simulated environments requires verification of the feasibility of actions as well as their relevance to the completion of a goal. We propose a new method that combines a language model and reinforcement learning for the task of building objects in a Minecraft-like environment according to the natural language instructions. Our method first generates a set of consistently achievable sub-goals from the instructions and then completes associated sub-tasks with a pre-trained RL policy. The proposed method formed the RL baseline at the IGLU 2022 competition.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
POGEMA: Partially Observable Grid Environment for Multiple Agents
Authors:
Alexey Skrynnik,
Anton Andreychuk,
Konstantin Yakovlev,
Aleksandr I. Panov
Abstract:
We introduce POGEMA (https://github.com/AIRI-Institute/pogema) a sandbox for challenging partially observable multi-agent pathfinding (PO-MAPF) problems . This is a grid-based environment that was specifically designed to be a flexible, tunable and scalable benchmark. It can be tailored to a variety of PO-MAPF, which can serve as an excellent testing ground for planning and learning methods, and t…
▽ More
We introduce POGEMA (https://github.com/AIRI-Institute/pogema) a sandbox for challenging partially observable multi-agent pathfinding (PO-MAPF) problems . This is a grid-based environment that was specifically designed to be a flexible, tunable and scalable benchmark. It can be tailored to a variety of PO-MAPF, which can serve as an excellent testing ground for planning and learning methods, and their combination, which will allow us to move towards filling the gap between AI planning and learning.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents
Authors:
Artem Zholus,
Alexey Skrynnik,
Shrestha Mohanty,
Zoya Volovikova,
Julia Kiseleva,
Artur Szlam,
Marc-Alexandre Coté,
Aleksandr I. Panov
Abstract:
We present the IGLU Gridworld: a reinforcement learning environment for building and evaluating language conditioned embodied agents in a scalable way. The environment features visual agent embodiment, interactive learning through collaboration, language conditioned RL, and combinatorically hard task (3d blocks building) space.
We present the IGLU Gridworld: a reinforcement learning environment for building and evaluating language conditioned embodied agents in a scalable way. The environment features visual agent embodiment, interactive learning through collaboration, language conditioned RL, and combinatorically hard task (3d blocks building) space.
△ Less
Submitted 31 May, 2022;
originally announced June 2022.
-
IGLU 2022: Interactive Grounded Language Understanding in a Collaborative Environment at NeurIPS 2022
Authors:
Julia Kiseleva,
Alexey Skrynnik,
Artem Zholus,
Shrestha Mohanty,
Negar Arabzadeh,
Marc-Alexandre Côté,
Mohammad Aliannejadi,
Milagro Teruel,
Ziming Li,
Mikhail Burtsev,
Maartje ter Hoeve,
Zoya Volovikova,
Aleksandr Panov,
Yuxuan Sun,
Kavya Srinet,
Arthur Szlam,
Ahmed Awadallah
Abstract:
Human intelligence has the remarkable ability to adapt to new tasks and environments quickly. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose IGLU: Interactive Grounded Language Understanding in a Collabor…
▽ More
Human intelligence has the remarkable ability to adapt to new tasks and environments quickly. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose IGLU: Interactive Grounded Language Understanding in a Collaborative Environment. The primary goal of the competition is to approach the problem of how to develop interactive embodied agents that learn to solve a task while provided with grounded natural language instructions in a collaborative environment. Understanding the complexity of the challenge, we split it into sub-tasks to make it feasible for participants.
This research challenge is naturally related, but not limited, to two fields of study that are highly relevant to the NeurIPS community: Natural Language Understanding and Generation (NLU/G) and Reinforcement Learning (RL). Therefore, the suggested challenge can bring two communities together to approach one of the crucial challenges in AI. Another critical aspect of the challenge is the dedication to perform a human-in-the-loop evaluation as a final evaluation for the agents developed by contestants.
△ Less
Submitted 27 May, 2022;
originally announced May 2022.
-
Interactive Grounded Language Understanding in a Collaborative Environment: IGLU 2021
Authors:
Julia Kiseleva,
Ziming Li,
Mohammad Aliannejadi,
Shrestha Mohanty,
Maartje ter Hoeve,
Mikhail Burtsev,
Alexey Skrynnik,
Artem Zholus,
Aleksandr Panov,
Kavya Srinet,
Arthur Szlam,
Yuxuan Sun,
Marc-Alexandre Côté,
Katja Hofmann,
Ahmed Awadallah,
Linar Abdrazakov,
Igor Churin,
Putra Manggala,
Kata Naszadi,
Michiel van der Meer,
Taewoon Kim
Abstract:
Human intelligence has the remarkable ability to quickly adapt to new tasks and environments. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose \emph{IGLU: Interactive Grounded Language Understanding in a Co…
▽ More
Human intelligence has the remarkable ability to quickly adapt to new tasks and environments. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose \emph{IGLU: Interactive Grounded Language Understanding in a Collaborative Environment}.
The primary goal of the competition is to approach the problem of how to build interactive agents that learn to solve a task while provided with grounded natural language instructions in a collaborative environment. Understanding the complexity of the challenge, we split it into sub-tasks to make it feasible for participants.
△ Less
Submitted 27 May, 2022; v1 submitted 4 May, 2022;
originally announced May 2022.
-
NeurIPS 2021 Competition IGLU: Interactive Grounded Language Understanding in a Collaborative Environment
Authors:
Julia Kiseleva,
Ziming Li,
Mohammad Aliannejadi,
Shrestha Mohanty,
Maartje ter Hoeve,
Mikhail Burtsev,
Alexey Skrynnik,
Artem Zholus,
Aleksandr Panov,
Kavya Srinet,
Arthur Szlam,
Yuxuan Sun,
Katja Hofmann,
Michel Galley,
Ahmed Awadallah
Abstract:
Human intelligence has the remarkable ability to adapt to new tasks and environments quickly. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose IGLU: Interactive Grounded Language Understanding in a Collabor…
▽ More
Human intelligence has the remarkable ability to adapt to new tasks and environments quickly. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose IGLU: Interactive Grounded Language Understanding in a Collaborative Environment. The primary goal of the competition is to approach the problem of how to build interactive agents that learn to solve a task while provided with grounded natural language instructions in a collaborative environment. Understanding the complexity of the challenge, we split it into sub-tasks to make it feasible for participants.
This research challenge is naturally related, but not limited, to two fields of study that are highly relevant to the NeurIPS community: Natural Language Understanding and Generation (NLU/G) and Reinforcement Learning (RL). Therefore, the suggested challenge can bring two communities together to approach one of the important challenges in AI. Another important aspect of the challenge is the dedication to perform a human-in-the-loop evaluation as a final evaluation for the agents developed by contestants.
△ Less
Submitted 14 October, 2021; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Long-Term Exploration in Persistent MDPs
Authors:
Leonid Ugadiarov,
Alexey Skrynnik,
Aleksandr I. Panov
Abstract:
Exploration is an essential part of reinforcement learning, which restricts the quality of learned policy. Hard-exploration environments are defined by huge state space and sparse rewards. In such conditions, an exhaustive exploration of the environment is often impossible, and the successful training of an agent requires a lot of interaction steps. In this paper, we propose an exploration method…
▽ More
Exploration is an essential part of reinforcement learning, which restricts the quality of learned policy. Hard-exploration environments are defined by huge state space and sparse rewards. In such conditions, an exhaustive exploration of the environment is often impossible, and the successful training of an agent requires a lot of interaction steps. In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process, in which agents during training can roll back to visited states. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge. At all used levels of the game, our agent outperforms or shows comparable results with state-of-the-art curiosity methods with knowledge-based intrinsic motivation: ICM and RND. An implementation of RbExplore can be found at https://github.com/cds-mipt/RbExplore.
△ Less
Submitted 21 September, 2021;
originally announced September 2021.
-
Q-Mixing Network for Multi-Agent Pathfinding in Partially Observable Grid Environments
Authors:
Vasilii Davydov,
Alexey Skrynnik,
Konstantin Yakovlev,
Aleksandr I. Panov
Abstract:
In this paper, we consider the problem of multi-agent navigation in partially observable grid environments. This problem is challenging for centralized planning approaches as they, typically, rely on the full knowledge of the environment. We suggest utilizing the reinforcement learning approach when the agents, first, learn the policies that map observations to actions and then follow these polici…
▽ More
In this paper, we consider the problem of multi-agent navigation in partially observable grid environments. This problem is challenging for centralized planning approaches as they, typically, rely on the full knowledge of the environment. We suggest utilizing the reinforcement learning approach when the agents, first, learn the policies that map observations to actions and then follow these policies to reach their goals. To tackle the challenge associated with learning cooperative behavior, i.e. in many cases agents need to yield to each other to accomplish a mission, we use a mixing Q-network that complements learning individual policies. In the experimental evaluation, we show that such approach leads to plausible results and scales well to large number of agents.
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations
Authors:
Alexey Skrynnik,
Aleksey Staroverov,
Ermek Aitygulov,
Kirill Aksenov,
Vasilii Davydov,
Aleksandr I. Panov
Abstract:
Currently, deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. Often these results are achieved at the expense of huge computational costs and require an incredible number of episodes of interaction between the agent and the environment. There are two main approaches to improving the sample efficiency of reinforcement learning methods - using hiera…
▽ More
Currently, deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. Often these results are achieved at the expense of huge computational costs and require an incredible number of episodes of interaction between the agent and the environment. There are two main approaches to improving the sample efficiency of reinforcement learning methods - using hierarchical methods and expert demonstrations. In this paper, we propose a combination of these approaches that allow the agent to use low-quality demonstrations in complex vision-based environments with multiple related goals. Our forgetful experience replay (ForgER) algorithm effectively handles errors in expert data and reduces quality losses when adapting the action space and states representation to the agent's capabilities. Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. Our method is universal and can be integrated into various off-policy methods. It surpasses all known existing state-of-the-art RL methods using expert demonstrations on various model environments. The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
△ Less
Submitted 17 June, 2020;
originally announced June 2020.
-
Hierarchical Deep Q-Network from Imperfect Demonstrations in Minecraft
Authors:
Alexey Skrynnik,
Aleksey Staroverov,
Ermek Aitygulov,
Kirill Aksenov,
Vasilii Davydov,
Aleksandr I. Panov
Abstract:
We present Hierarchical Deep Q-Network (HDQfD) that took first place in the MineRL competition. HDQfD works on imperfect demonstrations and utilizes the hierarchical structure of expert trajectories. We introduce the procedure of extracting an effective sequence of meta-actions and subgoals from demonstration data. We present a structured task-dependent replay buffer and adaptive prioritizing tech…
▽ More
We present Hierarchical Deep Q-Network (HDQfD) that took first place in the MineRL competition. HDQfD works on imperfect demonstrations and utilizes the hierarchical structure of expert trajectories. We introduce the procedure of extracting an effective sequence of meta-actions and subgoals from demonstration data. We present a structured task-dependent replay buffer and adaptive prioritizing technique that allow the HDQfD agent to gradually erase poor-quality expert data from the buffer. In this paper, we present the details of the HDQfD algorithm and give the experimental results in the Minecraft domain.
△ Less
Submitted 13 July, 2020; v1 submitted 18 December, 2019;
originally announced December 2019.
-
Automatic formation of the structure of abstract machines in hierarchical reinforcement learning with state clustering
Authors:
Aleksandr I. Panov,
Aleksey Skrynnik
Abstract:
We introduce a new approach to hierarchy formation and task decomposition in hierarchical reinforcement learning. Our method is based on the Hierarchy Of Abstract Machines (HAM) framework because HAM approach is able to design efficient controllers that will realize specific behaviors in real robots. The key to our algorithm is the introduction of the internal or "mental" environment in which the…
▽ More
We introduce a new approach to hierarchy formation and task decomposition in hierarchical reinforcement learning. Our method is based on the Hierarchy Of Abstract Machines (HAM) framework because HAM approach is able to design efficient controllers that will realize specific behaviors in real robots. The key to our algorithm is the introduction of the internal or "mental" environment in which the state represents the structure of the HAM hierarchy. The internal action in this environment leads to changes the hierarchy of HAMs. We propose the classical Q-learning procedure in the internal environment which allows the agent to obtain an optimal hierarchy. We extends the HAM framework by adding on-model approach to select the appropriate sub-machine to execute action sequences for certain class of external environment states. Preliminary experiments demonstrated the prospects of the method.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.