Search | arXiv e-print repository

Interactive Semantic Map Representation for Skill-based Visual Object Navigation

Authors: Tatiana Zemskova, Aleksei Staroverov, Kirill Muravyev, Dmitry Yudin, Aleksandr Panov

Abstract: Visual object navigation using learning methods is one of the key tasks in mobile robotics. This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment. It is based on a neural network method that adjusts the weights of the segmentation model with backpropagation of the predicted fusion loss values during inference on… ▽ More Visual object navigation using learning methods is one of the key tasks in mobile robotics. This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment. It is based on a neural network method that adjusts the weights of the segmentation model with backpropagation of the predicted fusion loss values during inference on a regular (backward) or delayed (forward) image sequence. We have implemented this representation into a full-fledged navigation approach called SkillTron, which can select robot skills from end-to-end policies based on reinforcement learning and classic map-based planning methods. The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation. We conducted intensive experiments with the proposed approach in the Habitat environment, which showed a significant superiority in navigation quality metrics compared to state-of-the-art approaches. The developed code and used custom datasets are publicly available at github.com/AIRI-Institute/skill-fusion. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2310.16362 [pdf, other]

Neural Potential Field for Obstacle-Aware Local Motion Planning

Authors: Muhammad Alhaddad, Konstantin Mironov, Aleksey Staroverov, Aleksandr Panov

Abstract: Model predictive control (MPC) may provide local motion planning for mobile robotic platforms. The challenging aspect is the analytic representation of collision cost for the case when both the obstacle map and robot footprint are arbitrary. We propose a Neural Potential Field: a neural network model that returns a differentiable collision cost based on robot pose, obstacle map, and robot footprin… ▽ More Model predictive control (MPC) may provide local motion planning for mobile robotic platforms. The challenging aspect is the analytic representation of collision cost for the case when both the obstacle map and robot footprint are arbitrary. We propose a Neural Potential Field: a neural network model that returns a differentiable collision cost based on robot pose, obstacle map, and robot footprint. The differentiability of our model allows its usage within the MPC solver. It is computationally hard to solve problems with a very high number of parameters. Therefore, our architecture includes neural image encoders, which transform obstacle maps and robot footprints into embeddings, which reduce problem dimensionality by two orders of magnitude. The reference data for network training are generated based on algorithmic calculation of a signed distance function. Comparative experiments showed that the proposed approach is comparable with existing local planners: it provides trajectories with outperforming smoothness, comparable path length, and safe distance from obstacles. Experiment on Husky UGV mobile robot showed that our approach allows real-time and safe local planning. The code for our approach is presented at https://github.com/cog-isa/NPField together with demo video. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.12031 [pdf, other]

SegmATRon: Embodied Adaptive Semantic Segmentation for Indoor Environment

Authors: Tatiana Zemskova, Margarita Kichik, Dmitry Yudin, Aleksei Staroverov, Aleksandr Panov

Abstract: This paper presents an adaptive transformer model named SegmATRon for embodied image semantic segmentation. Its distinctive feature is the adaptation of model weights during inference on several images using a hybrid multicomponent loss function. We studied this model on datasets collected in the photorealistic Habitat and the synthetic AI2-THOR Simulators. We showed that obtaining additional imag… ▽ More This paper presents an adaptive transformer model named SegmATRon for embodied image semantic segmentation. Its distinctive feature is the adaptation of model weights during inference on several images using a hybrid multicomponent loss function. We studied this model on datasets collected in the photorealistic Habitat and the synthetic AI2-THOR Simulators. We showed that obtaining additional images using the agent's actions in an indoor environment can improve the quality of semantic segmentation. The code of the proposed approach and datasets are publicly available at https://github.com/wingrune/SegmATRon. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: 14 pages, 6 figures

arXiv:2306.09459 [pdf, other]

Recurrent Action Transformer with Memory

Authors: Alexey Staroverov, Egor Cherepanov, Dmitry Yudin, Alexey K. Kovalev, Aleksandr I. Panov

Abstract: Recently, the use of transformers in offline reinforcement learning has become a rapidly develo** area. This is due to their ability to treat the agent's trajectory in the environment as a sequence, thereby reducing the policy learning problem to sequence modeling. In environments where the agent's decisions depend on past events, it is essential to capture both the event itself and the decision… ▽ More Recently, the use of transformers in offline reinforcement learning has become a rapidly develo** area. This is due to their ability to treat the agent's trajectory in the environment as a sequence, thereby reducing the policy learning problem to sequence modeling. In environments where the agent's decisions depend on past events, it is essential to capture both the event itself and the decision point in the context of the model. However, the quadratic complexity of the attention mechanism limits the potential for context expansion. One solution to this problem is to enhance transformers with memory mechanisms. In this paper, we propose the Recurrent Action Transformer with Memory (RATE) - a model that incorporates recurrent memory. To evaluate our model, we conducted extensive experiments on both memory-intensive environments (VizDoom-Two-Color, T-Maze) and classic Atari games and MuJoCo control environments. The results show that the use of memory can significantly improve performance in memory-intensive environments while maintaining or improving results in classic environments. We hope that our findings will stimulate research on memory mechanisms for transformers applicable to offline reinforcement learning. △ Less

Submitted 27 March, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: 15 pages, 11 figures

arXiv:2212.14649 [pdf, other]

HPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D Images

Authors: Dmitry Yudin, Yaroslav Solomentsev, Ruslan Musaev, Aleksei Staroverov, Aleksandr I. Panov

Abstract: We present a novel dataset named as HPointLoc, specially designed for exploring capabilities of visual place recognition in indoor environment and loop detection in simultaneous localization and map**. The loop detection sub-task is especially relevant when a robot with an on-board RGB-D camera can drive past the same place (``Point") at different angles. The dataset is based on the popular Habi… ▽ More We present a novel dataset named as HPointLoc, specially designed for exploring capabilities of visual place recognition in indoor environment and loop detection in simultaneous localization and map**. The loop detection sub-task is especially relevant when a robot with an on-board RGB-D camera can drive past the same place (``Point") at different angles. The dataset is based on the popular Habitat simulator, in which it is possible to generate photorealistic indoor scenes using both own sensor data and open datasets, such as Matterport3D. To study the main stages of solving the place recognition problem on the HPointLoc dataset, we proposed a new modular approach named as PNTR. It first performs an image retrieval with the Patch-NetVLAD method, then extracts keypoints and matches them using R2D2, LoFTR or SuperPoint with SuperGlue, and finally performs a camera pose optimization step with TEASER++. Such a solution to the place recognition problem has not been previously studied in existing publications. The PNTR approach has shown the best quality metrics on the HPointLoc dataset and has a high potential for real use in localization systems for unmanned vehicles. The proposed dataset and framework are publicly available: https://github.com/metra4ok/HPointLoc. △ Less

Submitted 30 December, 2022; originally announced December 2022.

Comments: Accepted for publishing in proceedings of the 29th International Conference on Neural Information Processing (ICONIP 2022)

arXiv:2109.09512 [pdf, other]

Landmark Policy Optimization for Object Navigation Task

Authors: Aleksey Staroverov, Aleksandr I. Panov

Abstract: This work studies object goal navigation task, which involves navigating to the closest object related to the given semantic category in unseen environments. Recent works have shown significant achievements both in the end-to-end Reinforcement Learning approach and modular systems, but need a big step forward to be robust and optimal. We propose a hierarchical method that incorporates standard tas… ▽ More This work studies object goal navigation task, which involves navigating to the closest object related to the given semantic category in unseen environments. Recent works have shown significant achievements both in the end-to-end Reinforcement Learning approach and modular systems, but need a big step forward to be robust and optimal. We propose a hierarchical method that incorporates standard task formulation and additional area knowledge as landmarks, with a way to extract these landmarks. In a hierarchy, a low level consists of separately trained algorithms to the most intuitive skills, and a high level decides which skill is needed at this moment. With all proposed solutions, we achieve a 0.75 success rate in a realistic Habitat simulator. After a small stage of additional model training in a reconstructed virtual area at a simulator, we successfully confirmed our results in a real-world case. △ Less

Submitted 17 September, 2021; originally announced September 2021.

arXiv:2006.09939 [pdf, other]

Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations

Authors: Alexey Skrynnik, Aleksey Staroverov, Ermek Aitygulov, Kirill Aksenov, Vasilii Davydov, Aleksandr I. Panov

Abstract: Currently, deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. Often these results are achieved at the expense of huge computational costs and require an incredible number of episodes of interaction between the agent and the environment. There are two main approaches to improving the sample efficiency of reinforcement learning methods - using hiera… ▽ More Currently, deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. Often these results are achieved at the expense of huge computational costs and require an incredible number of episodes of interaction between the agent and the environment. There are two main approaches to improving the sample efficiency of reinforcement learning methods - using hierarchical methods and expert demonstrations. In this paper, we propose a combination of these approaches that allow the agent to use low-quality demonstrations in complex vision-based environments with multiple related goals. Our forgetful experience replay (ForgER) algorithm effectively handles errors in expert data and reduces quality losses when adapting the action space and states representation to the agent's capabilities. Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. Our method is universal and can be integrated into various off-policy methods. It surpasses all known existing state-of-the-art RL methods using expert demonstrations on various model environments. The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment. △ Less

Submitted 17 June, 2020; originally announced June 2020.

arXiv:1912.08664 [pdf, other]

Hierarchical Deep Q-Network from Imperfect Demonstrations in Minecraft

Authors: Alexey Skrynnik, Aleksey Staroverov, Ermek Aitygulov, Kirill Aksenov, Vasilii Davydov, Aleksandr I. Panov

Abstract: We present Hierarchical Deep Q-Network (HDQfD) that took first place in the MineRL competition. HDQfD works on imperfect demonstrations and utilizes the hierarchical structure of expert trajectories. We introduce the procedure of extracting an effective sequence of meta-actions and subgoals from demonstration data. We present a structured task-dependent replay buffer and adaptive prioritizing tech… ▽ More We present Hierarchical Deep Q-Network (HDQfD) that took first place in the MineRL competition. HDQfD works on imperfect demonstrations and utilizes the hierarchical structure of expert trajectories. We introduce the procedure of extracting an effective sequence of meta-actions and subgoals from demonstration data. We present a structured task-dependent replay buffer and adaptive prioritizing technique that allow the HDQfD agent to gradually erase poor-quality expert data from the buffer. In this paper, we present the details of the HDQfD algorithm and give the experimental results in the Minecraft domain. △ Less

Submitted 13 July, 2020; v1 submitted 18 December, 2019; originally announced December 2019.

Showing 1–8 of 8 results for author: Staroverov, A