Search | arXiv e-print repository

Data-Driven Short-Term Daily Operational Sea Ice Regional Forecasting

Authors: Timofey Grigoryev, Polina Verezemskaya, Mikhail Krinitskiy, Nikita Anikin, Alexander Gavrikov, Ilya Trofimov, Nikita Balabin, Aleksei Shpilman, Andrei Eremchenko, Sergey Gulev, Evgeny Burnaev, Vladimir Vanovskiy

Abstract: Global warming made the Arctic available for marine operations and created demand for reliable operational sea ice forecasts to make them safe. While ocean-ice numerical models are highly computationally intensive, relatively lightweight ML-based methods may be more efficient in this task. Many works have exploited different deep learning models alongside classical approaches for predicting sea ic… ▽ More Global warming made the Arctic available for marine operations and created demand for reliable operational sea ice forecasts to make them safe. While ocean-ice numerical models are highly computationally intensive, relatively lightweight ML-based methods may be more efficient in this task. Many works have exploited different deep learning models alongside classical approaches for predicting sea ice concentration in the Arctic. However, only a few focus on daily operational forecasts and consider the real-time availability of data they need for operation. In this work, we aim to close this gap and investigate the performance of the U-Net model trained in two regimes for predicting sea ice for up to the next 10 days. We show that this deep learning model can outperform simple baselines by a significant margin and improve its quality by using additional weather data and training on multiple regions, ensuring its generalization abilities. As a practical outcome, we build a fast and flexible tool that produces operational sea ice forecasts in the Barents Sea, the Labrador Sea, and the Laptev Sea regions. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2205.15023 [pdf, other]

Scalable Multi-Agent Model-Based Reinforcement Learning

Authors: Vladimir Egorov, Aleksei Shpilman

Abstract: Recent Multi-Agent Reinforcement Learning (MARL) literature has been largely focused on Centralized Training with Decentralized Execution (CTDE) paradigm. CTDE has been a dominant approach for both cooperative and mixed environments due to its capability to efficiently train decentralized policies. While in mixed environments full autonomy of the agents can be a desirable outcome, cooperative envi… ▽ More Recent Multi-Agent Reinforcement Learning (MARL) literature has been largely focused on Centralized Training with Decentralized Execution (CTDE) paradigm. CTDE has been a dominant approach for both cooperative and mixed environments due to its capability to efficiently train decentralized policies. While in mixed environments full autonomy of the agents can be a desirable outcome, cooperative environments allow agents to share information to facilitate coordination. Approaches that leverage this technique are usually referred as communication methods, as full autonomy of agents is compromised for better performance. Although communication approaches have shown impressive results, they do not fully leverage this additional information during training phase. In this paper, we propose a new method called MAMBA which utilizes Model-Based Reinforcement Learning (MBRL) to further leverage centralized training in cooperative environments. We argue that communication between agents is enough to sustain a world model for each agent during execution phase while imaginary rollouts can be used for training, removing the necessity to interact with the environment. These properties yield sample efficient algorithm that can scale gracefully with the number of agents. We empirically confirm that MAMBA achieves good performance while reducing the number of interactions with the environment up to an orders of magnitude compared to Model-Free state-of-the-art approaches in challenging domains of SMAC and Flatland. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: AAMAS'2022, cite https://dl.acm.org/doi/abs/10.5555/3535850.3535894

arXiv:2203.17070 [pdf, other]

Traffic4cast at NeurIPS 2021 -- Temporal and Spatial Few-Shot Transfer Learning in Gridded Geo-Spatial Processes

Authors: Christian Eichenberger, Moritz Neun, Henry Martin, Pedro Herruzo, Markus Spanring, Yichao Lu, Sungbin Choi, Vsevolod Konyakhin, Nina Lukashina, Aleksei Shpilman, Nina Wiedemann, Martin Raubal, Bo Wang, Hai L. Vu, Reza Mohajerpoor, Chen Cai, Inhi Kim, Luca Hermes, Andrew Melnik, Riza Velioglu, Markus Vieth, Malte Schilling, Alabi Bojesomo, Hasan Al Marzouqi, Panos Liatsis , et al. (12 additional authors not shown)

Abstract: The IARAI Traffic4cast competitions at NeurIPS 2019 and 2020 showed that neural networks can successfully predict future traffic conditions 1 hour into the future on simply aggregated GPS probe data in time and space bins. We thus reinterpreted the challenge of forecasting traffic conditions as a movie completion task. U-Nets proved to be the winning architecture, demonstrating an ability to extra… ▽ More The IARAI Traffic4cast competitions at NeurIPS 2019 and 2020 showed that neural networks can successfully predict future traffic conditions 1 hour into the future on simply aggregated GPS probe data in time and space bins. We thus reinterpreted the challenge of forecasting traffic conditions as a movie completion task. U-Nets proved to be the winning architecture, demonstrating an ability to extract relevant features in this complex real-world geo-spatial process. Building on the previous competitions, Traffic4cast 2021 now focuses on the question of model robustness and generalizability across time and space. Moving from one city to an entirely different city, or moving from pre-COVID times to times after COVID hit the world thus introduces a clear domain shift. We thus, for the first time, release data featuring such domain shifts. The competition now covers ten cities over 2 years, providing data compiled from over 10^12 GPS probe data. Winning solutions captured traffic dynamics sufficiently well to even cope with these complex domain shifts. Surprisingly, this seemed to require only the previous 1h traffic dynamic history and static road graph as input. △ Less

Submitted 1 April, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

Comments: Pre-print under review, submitted to Proceedings of Machine Learning Research

arXiv:2203.10905 [pdf, other]

Self-Imitation Learning from Demonstrations

Authors: Georgiy Pshikhachev, Dmitry Ivanov, Vladimir Egorov, Aleksei Shpilman

Abstract: Despite the numerous breakthroughs achieved with Reinforcement Learning (RL), solving environments with sparse rewards remains a challenging task that requires sophisticated exploration. Learning from Demonstrations (LfD) remedies this issue by guiding the agent's exploration towards states experienced by an expert. Naturally, the benefits of this approach hinge on the quality of demonstrations, w… ▽ More Despite the numerous breakthroughs achieved with Reinforcement Learning (RL), solving environments with sparse rewards remains a challenging task that requires sophisticated exploration. Learning from Demonstrations (LfD) remedies this issue by guiding the agent's exploration towards states experienced by an expert. Naturally, the benefits of this approach hinge on the quality of demonstrations, which are rarely optimal in realistic scenarios. Modern LfD algorithms require meticulous tuning of hyperparameters that control the influence of demonstrations and, as we show in the paper, struggle with learning from suboptimal demonstrations. To address these issues, we extend Self-Imitation Learning (SIL), a recent RL algorithm that exploits the agent's past good experience, to the LfD setup by initializing its replay buffer with demonstrations. We denote our algorithm as SIL from Demonstrations (SILfD). We empirically show that SILfD can learn from demonstrations that are noisy or far from optimal and can automatically adjust the influence of demonstrations throughout the training without additional hyperparameters or handcrafted schedules. We also find SILfD superior to the existing state-of-the-art LfD algorithms in sparse environments, especially when demonstrations are highly suboptimal. △ Less

Submitted 21 March, 2022; originally announced March 2022.

arXiv:2203.07206 [pdf, other]

Improving State-of-the-Art in One-Class Classification by Leveraging Unlabeled Data

Authors: Farid Bagirov, Dmitry Ivanov, Aleksei Shpilman

Abstract: When dealing with binary classification of data with only one labeled class data scientists employ two main approaches, namely One-Class (OC) classification and Positive Unlabeled (PU) learning. The former only learns from labeled positive data, whereas the latter also utilizes unlabeled data to improve the overall performance. Since PU learning utilizes more data, we might be prone to think that… ▽ More When dealing with binary classification of data with only one labeled class data scientists employ two main approaches, namely One-Class (OC) classification and Positive Unlabeled (PU) learning. The former only learns from labeled positive data, whereas the latter also utilizes unlabeled data to improve the overall performance. Since PU learning utilizes more data, we might be prone to think that when unlabeled data is available, the go-to algorithms should always come from the PU group. However, we find that this is not always the case if unlabeled data is unreliable, i.e. contains limited or biased latent negative data. We perform an extensive experimental study of a wide list of state-of-the-art OC and PU algorithms in various scenarios as far as unlabeled data reliability is concerned. Furthermore, we propose PU modifications of state-of-the-art OC algorithms that are robust to unreliable unlabeled data, as well as a guideline to similarly modify other OC algorithms. Our main practical recommendation is to use state-of-the-art PU algorithms when unlabeled data is reliable and to use the proposed modifications of state-of-the-art OC algorithms otherwise. Additionally, we outline procedures to distinguish the cases of reliable and unreliable unlabeled data using statistical tests. △ Less

Submitted 14 March, 2022; originally announced March 2022.

arXiv:2202.10583 [pdf, other]

MineRL Diamond 2021 Competition: Overview, Results, and Lessons Learned

Authors: Anssi Kanervisto, Stephanie Milani, Karolis Ramanauskas, Nicholay Topin, Zichuan Lin, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang, Weijun Hong, Zhongyue Huang, Haicheng Chen, Guangjun Zeng, Yue Lin, Vincent Micheli, Eloi Alonso, François Fleuret, Alexander Nikulin, Yury Belousov, Oleg Svidchenko, Aleksei Shpilman

Abstract: Reinforcement learning competitions advance the field by providing appropriate scope and support to develop solutions toward a specific problem. To promote the development of more broadly applicable methods, organizers need to enforce the use of general techniques, the use of sample-efficient methods, and the reproducibility of the results. While beneficial for the research community, these restri… ▽ More Reinforcement learning competitions advance the field by providing appropriate scope and support to develop solutions toward a specific problem. To promote the development of more broadly applicable methods, organizers need to enforce the use of general techniques, the use of sample-efficient methods, and the reproducibility of the results. While beneficial for the research community, these restrictions come at a cost -- increased difficulty. If the barrier for entry is too high, many potential participants are demoralized. With this in mind, we hosted the third edition of the MineRL ObtainDiamond competition, MineRL Diamond 2021, with a separate track in which we permitted any solution to promote the participation of newcomers. With this track and more extensive tutorials and support, we saw an increased number of submissions. The participants of this easier track were able to obtain a diamond, and the participants of the harder track progressed the generalizable solutions in the same task. △ Less

Submitted 17 February, 2022; originally announced February 2022.

Comments: Under review for PMLR volume on NeurIPS 2021 competitions

arXiv:2112.01195 [pdf, other]

Maximum Entropy Model-based Reinforcement Learning

Authors: Oleg Svidchenko, Aleksei Shpilman

Abstract: Recent advances in reinforcement learning have demonstrated its ability to solve hard agent-environment interaction tasks on a super-human level. However, the application of reinforcement learning methods to practical and real-world tasks is currently limited due to most RL state-of-art algorithms' sample inefficiency, i.e., the need for a vast number of training episodes. For example, OpenAI Five… ▽ More Recent advances in reinforcement learning have demonstrated its ability to solve hard agent-environment interaction tasks on a super-human level. However, the application of reinforcement learning methods to practical and real-world tasks is currently limited due to most RL state-of-art algorithms' sample inefficiency, i.e., the need for a vast number of training episodes. For example, OpenAI Five algorithm that has beaten human players in Dota 2 has trained for thousands of years of game time. Several approaches exist that tackle the issue of sample inefficiency, that either offers a more efficient usage of already gathered experience or aim to gain a more relevant and diverse experience via a better exploration of an environment. However, to our knowledge, no such approach exists for model-based algorithms, that showed their high sample efficiency in solving hard control tasks with high-dimensional state space. This work connects exploration techniques and model-based reinforcement learning. We have designed a novel exploration method that takes into account features of the model-based approach. We also demonstrate through experiments that our method significantly improves the performance of the model-based algorithm Dreamer. △ Less

Submitted 2 December, 2021; originally announced December 2021.

Comments: NeurIPS'2021 Deep Reinforcement Learning Workshop

arXiv:2111.10656 [pdf, other]

Simple End-to-end Deep Learning Model for CDR-H3 Loop Structure Prediction

Authors: Natalia Zenkova, Ekaterina Sedykh, Tatiana Shugaeva, Vladislav Strashko, Timofei Ermak, Aleksei Shpilman

Abstract: Predicting a structure of an antibody from its sequence is important since it allows for a better design process of synthetic antibodies that play a vital role in the health industry. Most of the structure of an antibody is conservative. The most variable and hard-to-predict part is the third complementarity-determining region of the antibody heavy chain (CDR H3). Lately, deep learning has been em… ▽ More Predicting a structure of an antibody from its sequence is important since it allows for a better design process of synthetic antibodies that play a vital role in the health industry. Most of the structure of an antibody is conservative. The most variable and hard-to-predict part is the third complementarity-determining region of the antibody heavy chain (CDR H3). Lately, deep learning has been employed to solve the task of CDR H3 prediction. However, current state-of-the-art methods are not end-to-end, but rather they output inter-residue distances and orientations to the RosettaAntibody package that uses this additional information alongside statistical and physics-based methods to predict the 3D structure. This does not allow a fast screening process and, therefore, inhibits the development of targeted synthetic antibodies. In this work, we present an end-to-end model to predict CDR H3 loop structure, that performs on par with state-of-the-art methods in terms of accuracy but an order of magnitude faster. We also raise an issue with a commonly used RosettaAntibody benchmark that leads to data leaks, i.e., the presence of identical sequences in the train and test datasets. △ Less

Submitted 22 December, 2021; v1 submitted 20 November, 2021; originally announced November 2021.

Comments: NeurIPS 2021 Machine Learning for Structural Biology Workshop

arXiv:2111.03421 [pdf, other]

Solving Traffic4Cast Competition with U-Net and Temporal Domain Adaptation

Authors: Vsevolod Konyakhin, Nina Lukashina, Aleksei Shpilman

Abstract: In this technical report, we present our solution to the Traffic4Cast 2021 Core Challenge, in which participants were asked to develop algorithms for predicting a traffic state 60 minutes ahead, based on the information from the previous hour, in 4 different cities. In contrast to the previously held competitions, this year's challenge focuses on the temporal domain shift in traffic due to the COV… ▽ More In this technical report, we present our solution to the Traffic4Cast 2021 Core Challenge, in which participants were asked to develop algorithms for predicting a traffic state 60 minutes ahead, based on the information from the previous hour, in 4 different cities. In contrast to the previously held competitions, this year's challenge focuses on the temporal domain shift in traffic due to the COVID-19 pandemic. Following the past success of U-Net, we utilize it for predicting future traffic maps. Additionally, we explore the usage of pre-trained encoders such as DenseNet and EfficientNet and employ multiple domain adaptation techniques to fight the domain shift. Our solution has ranked third in the final competition. The code is available at https://github.com/jbr-ai-labs/traffic4cast-2021. △ Less

Submitted 5 November, 2021; originally announced November 2021.

Comments: Conference on Neural Information Processing Systems (NeurIPS 2021) Traffic4cast Competition

arXiv:2107.06009 [pdf, other]

Automatic Classification of Error Types in Solutions to Programming Assignments at Online Learning Platform

Authors: Artyom Lobanov, Timofey Bryksin, Alexey Shpilman

Abstract: Online programming courses are becoming more and more popular, but they still have significant drawbacks when compared to the traditional education system, e.g., the lack of feedback. In this study, we apply machine learning methods to improve the feedback of automated verification systems for programming assignments. We propose an approach that provides an insight on how to fix the code for a giv… ▽ More Online programming courses are becoming more and more popular, but they still have significant drawbacks when compared to the traditional education system, e.g., the lack of feedback. In this study, we apply machine learning methods to improve the feedback of automated verification systems for programming assignments. We propose an approach that provides an insight on how to fix the code for a given incorrect submission. To achieve this, we detect frequent error types by clustering previously submitted incorrect solutions, label these clusters and use this labeled dataset to identify the type of an error in a new submission. We examine and compare several approaches to the detection of frequent error types and to the assignment of clusters to new submissions. The proposed method is evaluated on a dataset provided by a popular online learning platform. △ Less

Submitted 13 July, 2021; originally announced July 2021.

Comments: 5 pages, 2 figures

arXiv:2103.16511 [pdf, other]

Flatland Competition 2020: MAPF and MARL for Efficient Train Coordination on a Grid World

Authors: Florian Laurent, Manuel Schneider, Christian Scheller, Jeremy Watson, Jiaoyang Li, Zhe Chen, Yi Zheng, Shao-Hung Chan, Konstantin Makhnev, Oleg Svidchenko, Vladimir Egorov, Dmitry Ivanov, Aleksei Shpilman, Evgenija Spirovska, Oliver Tanevski, Aleksandar Nikov, Ramon Grunder, David Galevski, Jakov Mitrovski, Guillaume Sartoretti, Zhiyao Luo, Mehul Damani, Nilabha Bhattacharya, Shivam Agarwal, Adrian Egli , et al. (2 additional authors not shown)

Abstract: The Flatland competition aimed at finding novel approaches to solve the vehicle re-scheduling problem (VRSP). The VRSP is concerned with scheduling trips in traffic networks and the re-scheduling of vehicles when disruptions occur, for example the breakdown of a vehicle. While solving the VRSP in various settings has been an active area in operations research (OR) for decades, the ever-growing com… ▽ More The Flatland competition aimed at finding novel approaches to solve the vehicle re-scheduling problem (VRSP). The VRSP is concerned with scheduling trips in traffic networks and the re-scheduling of vehicles when disruptions occur, for example the breakdown of a vehicle. While solving the VRSP in various settings has been an active area in operations research (OR) for decades, the ever-growing complexity of modern railway networks makes dynamic real-time scheduling of traffic virtually impossible. Recently, multi-agent reinforcement learning (MARL) has successfully tackled challenging tasks where many agents need to be coordinated, such as multiplayer video games. However, the coordination of hundreds of agents in a real-life setting like a railway network remains challenging and the Flatland environment used for the competition models these real-world properties in a simplified manner. Submissions had to bring as many trains (agents) to their target stations in as little time as possible. While the best submissions were in the OR category, participants found many promising MARL approaches. Using both centralized and decentralized learning based approaches, top submissions used graph representations of the environment to construct tree-based observations. Further, different coordination mechanisms were implemented, such as communication and prioritization between agents. This paper presents the competition setup, four outstanding solutions to the competition, and a cross-comparison between them. △ Less

Submitted 30 March, 2021; originally announced March 2021.

Comments: 28 pages, 8 figures

arXiv:2102.12307 [pdf, other]

Balancing Rational and Other-Regarding Preferences in Cooperative-Competitive Environments

Authors: Dmitry Ivanov, Vladimir Egorov, Aleksei Shpilman

Abstract: Recent reinforcement learning studies extensively explore the interplay between cooperative and competitive behaviour in mixed environments. Unlike cooperative environments where agents strive towards a common goal, mixed environments are notorious for the conflicts of selfish and social interests. As a consequence, purely rational agents often struggle to achieve and maintain cooperation. A preva… ▽ More Recent reinforcement learning studies extensively explore the interplay between cooperative and competitive behaviour in mixed environments. Unlike cooperative environments where agents strive towards a common goal, mixed environments are notorious for the conflicts of selfish and social interests. As a consequence, purely rational agents often struggle to achieve and maintain cooperation. A prevalent approach to induce cooperative behaviour is to assign additional rewards based on other agents' well-being. However, this approach suffers from the issue of multi-agent credit assignment, which can hinder performance. This issue is efficiently alleviated in cooperative setting with such state-of-the-art algorithms as QMIX and COMA. Still, when applied to mixed environments, these algorithms may result in unfair allocation of rewards. We propose BAROCCO, an extension of these algorithms capable to balance individual and social incentives. The mechanism behind BAROCCO is to train two distinct but interwoven components that jointly affect each agent's decisions. Our meta-algorithm is compatible with both Q-learning and Actor-Critic frameworks. We experimentally confirm the advantages over the existing methods and explore the behavioural aspects of BAROCCO in two mixed multi-agent setups. △ Less

Submitted 24 February, 2021; originally announced February 2021.

Comments: Short version of this paper is accepted to AAMAS 2021

arXiv:2012.12125 [pdf, other]

doi 10.1109/ICMLA.2017.0-186

Deep Learning of Cell Classification using Microscope Images of Intracellular Microtubule Networks

Authors: Aleksei Shpilman, Dmitry Boikiy, Marina Polyakova, Daniel Kudenko, Anton Burakov, Elena Nadezhdina

Abstract: Microtubule networks (MTs) are a component of a cell that may indicate the presence of various chemical compounds and can be used to recognize properties such as treatment resistance. Therefore, the classification of MT images is of great relevance for cell diagnostics. Human experts find it particularly difficult to recognize the levels of chemical compound exposure of a cell. Improving the accur… ▽ More Microtubule networks (MTs) are a component of a cell that may indicate the presence of various chemical compounds and can be used to recognize properties such as treatment resistance. Therefore, the classification of MT images is of great relevance for cell diagnostics. Human experts find it particularly difficult to recognize the levels of chemical compound exposure of a cell. Improving the accuracy with automated techniques would have a significant impact on cell therapy. In this paper we present the application of Deep Learning to MT image classification and evaluate it on a large MT image dataset of animal cells with three degrees of exposure to a chemical agent. The results demonstrate that the learned deep network performs on par or better at the corresponding cell classification task than human experts. Specifically, we show that the task of recognizing different levels of chemical agent exposure can be handled significantly better by the neural network than by human experts. △ Less

Submitted 16 December, 2020; originally announced December 2020.

arXiv:2012.10335 [pdf, other]

Solving Black-Box Optimization Challenge via Learning Search Space Partition for Local Bayesian Optimization

Authors: Mikita Sazanovich, Anastasiya Nikolskaya, Yury Belousov, Aleksei Shpilman

Abstract: Black-box optimization is one of the vital tasks in machine learning, since it approximates real-world conditions, in that we do not always know all the properties of a given system, up to knowing almost nothing but the results. This paper describes our approach to solving the black-box optimization challenge at NeurIPS 2020 through learning search space partition for local Bayesian optimization.… ▽ More Black-box optimization is one of the vital tasks in machine learning, since it approximates real-world conditions, in that we do not always know all the properties of a given system, up to knowing almost nothing but the results. This paper describes our approach to solving the black-box optimization challenge at NeurIPS 2020 through learning search space partition for local Bayesian optimization. We describe the task of the challenge as well as our algorithm for low budget optimization that we named \texttt{SPBOpt}. We optimize the hyper-parameters of our algorithm for the competition finals using multi-task Bayesian optimization on results from the first two evaluation settings. Our approach has ranked third in the competition finals. △ Less

Submitted 25 May, 2021; v1 submitted 18 December, 2020; originally announced December 2020.

Comments: Accepted to Proceedings of Machine Learning Research: NeurIPS 2020 Competition and Demonstration Track

arXiv:2012.09771 [pdf, other]

doi 10.1109/REDUNDANCY48165.2019.9003330

End-to-end Deep Object Tracking with Circular Loss Function for Rotated Bounding Box

Authors: Vladislav Belyaev, Aleksandra Malysheva, Aleksei Shpilman

Abstract: The task object tracking is vital in numerous applications such as autonomous driving, intelligent surveillance, robotics, etc. This task entails the assigning of a bounding box to an object in a video stream, given only the bounding box for that object on the first frame. In 2015, a new type of video object tracking (VOT) dataset was created that introduced rotated bounding boxes as an extension… ▽ More The task object tracking is vital in numerous applications such as autonomous driving, intelligent surveillance, robotics, etc. This task entails the assigning of a bounding box to an object in a video stream, given only the bounding box for that object on the first frame. In 2015, a new type of video object tracking (VOT) dataset was created that introduced rotated bounding boxes as an extension of axis-aligned ones. In this work, we introduce a novel end-to-end deep learning method based on the Transformer Multi-Head Attention architecture. We also present a new type of loss function, which takes into account the bounding box overlap and orientation. Our Deep Object Tracking model with Circular Loss Function (DOTCL) shows an considerable improvement in terms of robustness over current state-of-the-art end-to-end deep learning models. It also outperforms state-of-the-art object tracking methods on VOT2018 dataset in terms of expected average overlap (EAO) metric. △ Less

Submitted 17 December, 2020; originally announced December 2020.

arXiv:2012.09762 [pdf, other]

doi 10.1109/REDUNDANCY48165.2019.9003345

MAGNet: Multi-agent Graph Network for Deep Multi-agent Reinforcement Learning

Authors: Aleksandra Malysheva, Daniel Kudenko, Aleksei Shpilman

Abstract: Over recent years, deep reinforcement learning has shown strong successes in complex single-agent tasks, and more recently this approach has also been applied to multi-agent domains. In this paper, we propose a novel approach, called MAGNet, to multi-agent reinforcement learning that utilizes a relevance graph representation of the environment obtained by a self-attention mechanism, and a message-… ▽ More Over recent years, deep reinforcement learning has shown strong successes in complex single-agent tasks, and more recently this approach has also been applied to multi-agent domains. In this paper, we propose a novel approach, called MAGNet, to multi-agent reinforcement learning that utilizes a relevance graph representation of the environment obtained by a self-attention mechanism, and a message-generation technique. We applied our MAGnet approach to the synthetic predator-prey multi-agent environment and the Pommerman game and the results show that it significantly outperforms state-of-the-art MARL solutions, including Multi-agent Deep Q-Networks (MADQN), Multi-agent Deep Deterministic Policy Gradient (MADDPG), and QMIX △ Less

Submitted 17 December, 2020; originally announced December 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1811.12557

arXiv:2012.08824 [pdf, other]

doi 10.1109/ICARCV.2018.8581310

Learning to Run with Potential-Based Reward Sha** and Demonstrations from Video Data

Authors: Aleksandra Malysheva, Daniel Kudenko, Aleksei Shpilman

Abstract: Learning to produce efficient movement behaviour for humanoid robots from scratch is a hard problem, as has been illustrated by the "Learning to run" competition at NIPS 2017. The goal of this competition was to train a two-legged model of a humanoid body to run in a simulated race course with maximum speed. All submissions took a tabula rasa approach to reinforcement learning (RL) and were able t… ▽ More Learning to produce efficient movement behaviour for humanoid robots from scratch is a hard problem, as has been illustrated by the "Learning to run" competition at NIPS 2017. The goal of this competition was to train a two-legged model of a humanoid body to run in a simulated race course with maximum speed. All submissions took a tabula rasa approach to reinforcement learning (RL) and were able to produce relatively fast, but not optimal running behaviour. In this paper, we demonstrate how data from videos of human running (e.g. taken from YouTube) can be used to shape the reward of the humanoid learning agent to speed up the learning and produce a better result. Specifically, we are using the positions of key body parts at regular time intervals to define a potential function for potential-based reward sha** (PBRS). Since PBRS does not change the optimal policy, this approach allows the RL agent to overcome sub-optimalities in the human movements that are shown in the videos. We present experiments in which we combine selected techniques from the top ten approaches from the NIPS competition with further optimizations to create an high-performing agent as a baseline. We then demonstrate how video-based reward sha** improves the performance further, resulting in an RL agent that runs twice as fast as the baseline in 12 hours of training. We furthermore show that our approach can overcome sub-optimal running behaviour in videos, with the learned policy significantly outperforming that of the running agent from the video. △ Less

Submitted 16 December, 2020; originally announced December 2020.

arXiv:2012.08822 [pdf, other]

doi 10.1109/ICMLA.2018.00089

A comparative evaluation of machine learning methods for robot navigation through human crowds

Authors: Anastasia Gaydashenko, Daniel Kudenko, Aleksei Shpilman

Abstract: Robot navigation through crowds poses a difficult challenge to AI systems, since the methods should result in fast and efficient movement but at the same time are not allowed to compromise safety. Most approaches to date were focused on the combination of pathfinding algorithms with machine learning for pedestrian walking prediction. More recently, reinforcement learning techniques have been propo… ▽ More Robot navigation through crowds poses a difficult challenge to AI systems, since the methods should result in fast and efficient movement but at the same time are not allowed to compromise safety. Most approaches to date were focused on the combination of pathfinding algorithms with machine learning for pedestrian walking prediction. More recently, reinforcement learning techniques have been proposed in the research literature. In this paper, we perform a comparative evaluation of pathfinding/prediction and reinforcement learning approaches on a crowd movement dataset collected from surveillance videos taken at Grand Central Station in New York. The results demonstrate the strong superiority of state-of-the-art reinforcement learning approaches over pathfinding with state-of-the-art behaviour prediction techniques. △ Less

Submitted 16 December, 2020; originally announced December 2020.

arXiv:2012.08816 [pdf, other]

doi 10.1109/ICARCV.2018.8581206

Continuous Gesture Recognition from sEMG Sensor Data with Recurrent Neural Networks and Adversarial Domain Adaptation

Authors: Ivan Sosin, Daniel Kudenko, Aleksei Shpilman

Abstract: Movement control of artificial limbs has made big advances in recent years. New sensor and control technology enhanced the functionality and usefulness of artificial limbs to the point that complex movements, such as gras**, can be performed to a limited extent. To date, the most successful results were achieved by applying recurrent neural networks (RNNs). However, in the domain of artificial h… ▽ More Movement control of artificial limbs has made big advances in recent years. New sensor and control technology enhanced the functionality and usefulness of artificial limbs to the point that complex movements, such as gras**, can be performed to a limited extent. To date, the most successful results were achieved by applying recurrent neural networks (RNNs). However, in the domain of artificial hands, experiments so far were limited to non-mobile wrists, which significantly reduces the functionality of such prostheses. In this paper, for the first time, we present empirical results on gesture recognition with both mobile and non-mobile wrists. Furthermore, we demonstrate that recurrent neural networks with simple recurrent units (SRU) outperform regular RNNs in both cases in terms of gesture recognition accuracy, on data acquired by an arm band sensing electromagnetic signals from arm muscles (via surface electromyography or sEMG). Finally, we show that adding domain adaptation techniques to continuous gesture recognition with RNN improves the transfer ability between subjects, where a limb controller trained on data from one person is used for another person. △ Less

Submitted 16 December, 2020; originally announced December 2020.

arXiv:2011.12117 [pdf, other]

Lipophilicity Prediction with Multitask Learning and Molecular Substructures Representation

Authors: Nina Lukashina, Alisa Alenicheva, Elizaveta Vlasova, Artem Kondiukov, Aigul Khakimova, Emil Magerramov, Nikita Churikov, Aleksei Shpilman

Abstract: Lipophilicity is one of the factors determining the permeability of the cell membrane to a drug molecule. Hence, accurate lipophilicity prediction is an essential step in the development of new drugs. In this paper, we introduce a novel approach to encoding additional graph information by extracting molecular substructures. By adding a set of generalized atomic features of these substructures to a… ▽ More Lipophilicity is one of the factors determining the permeability of the cell membrane to a drug molecule. Hence, accurate lipophilicity prediction is an essential step in the development of new drugs. In this paper, we introduce a novel approach to encoding additional graph information by extracting molecular substructures. By adding a set of generalized atomic features of these substructures to an established Direct Message Passing Neural Network (D-MPNN) we were able to achieve a new state-of-the-art result at the task of prediction of two main lipophilicity coefficients, namely logP and logD descriptors. We further improve our approach by employing a multitask approach to predict logP and logD values simultaneously. Additionally, we present a study of the model performance on symmetric and asymmetric molecules, that may yield insight for further research. △ Less

Submitted 24 November, 2020; originally announced November 2020.

Comments: Accepted to Machine Learning for Molecules Workshop at NeurIPS'2020

arXiv:2010.04147 [pdf, other]

doi 10.1109/ICMLA51294.2020.00058

Automatic generation of reviews of scientific papers

Authors: Anna Nikiforovskaya, Nikolai Kapralov, Anna Vlasova, Oleg Shpynov, Aleksei Shpilman

Abstract: With an ever-increasing number of scientific papers published each year, it becomes more difficult for researchers to explore a field that they are not closely familiar with already. This greatly inhibits the potential for cross-disciplinary research. A traditional introduction into an area may come in the form of a review paper. However, not all areas and sub-areas have a current review. In this… ▽ More With an ever-increasing number of scientific papers published each year, it becomes more difficult for researchers to explore a field that they are not closely familiar with already. This greatly inhibits the potential for cross-disciplinary research. A traditional introduction into an area may come in the form of a review paper. However, not all areas and sub-areas have a current review. In this paper, we present a method for the automatic generation of a review paper corresponding to a user-defined query. This method consists of two main parts. The first part identifies key papers in the area by their bibliometric parameters, such as a graph of co-citations. The second stage uses a BERT based architecture that we train on existing reviews for extractive summarization of these key papers. We describe the general pipeline of our method and some implementation details and present both automatic and expert evaluations on the PubMed dataset. △ Less

Submitted 8 October, 2020; originally announced October 2020.

Comments: Accepted as a full paper at ICMLA2020

arXiv:2007.03514 [pdf, other]

Imitation Learning Approach for AI Driving Olympics Trained on Real-world and Simulation Data Simultaneously

Authors: Mikita Sazanovich, Konstantin Chaika, Kirill Krinkin, Aleksei Shpilman

Abstract: In this paper, we describe our winning approach to solving the Lane Following Challenge at the AI Driving Olympics Competition through imitation learning on a mixed set of simulation and real-world data. AI Driving Olympics is a two-stage competition: at stage one, algorithms compete in a simulated environment with the best ones advancing to a real-world final. One of the main problems that partic… ▽ More In this paper, we describe our winning approach to solving the Lane Following Challenge at the AI Driving Olympics Competition through imitation learning on a mixed set of simulation and real-world data. AI Driving Olympics is a two-stage competition: at stage one, algorithms compete in a simulated environment with the best ones advancing to a real-world final. One of the main problems that participants encounter during the competition is that algorithms trained for the best performance in simulated environments do not hold up in a real-world environment and vice versa. Classic control algorithms also do not translate well between tasks since most of them have to be tuned to specific driving conditions such as lighting, road type, camera position, etc. To overcome this problem, we employed the imitation learning algorithm and trained it on a dataset collected from sources both from simulation and real-world, forcing our model to perform equally well in all environments. △ Less

Submitted 7 July, 2020; originally announced July 2020.

Comments: Accepted to the Workshop on AI for Autonomous Driving (AIAD), the 37th International Conference on Machine Learning (ICML2020)

arXiv:2004.01618 [pdf, other]

doi 10.1145/3379597.3387447

Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler

Authors: Timofey Bryksin, Victor Petukhov, Ilya Alexin, Stanislav Prikhodko, Alexey Shpilman, Vladimir Kovalenko, Nikita Povarov

Abstract: In this work, we apply anomaly detection to source code and bytecode to facilitate the development of a programming language and its compiler. We define anomaly as a code fragment that is different from typical code written in a particular programming language. Identifying such code fragments is beneficial to both language developers and end users, since anomalies may indicate potential issues wit… ▽ More In this work, we apply anomaly detection to source code and bytecode to facilitate the development of a programming language and its compiler. We define anomaly as a code fragment that is different from typical code written in a particular programming language. Identifying such code fragments is beneficial to both language developers and end users, since anomalies may indicate potential issues with the compiler or with runtime performance. Moreover, anomalies could correspond to problems in language design. For this study, we choose Kotlin as the target programming language. We outline and discuss approaches to obtaining vector representations of source code and bytecode and to the detection of anomalies across vectorized code snippets. The paper presents a method that aims to detect two types of anomalies: syntax tree anomalies and so-called compiler-induced anomalies that arise only in the compiled bytecode. We describe several experiments that employ different combinations of vectorization and anomaly detection techniques and discuss types of detected anomalies and their usefulness for language developers. We demonstrate that the extracted anomalies and the underlying extraction technique provide additional value for language development. △ Less

Submitted 3 April, 2020; originally announced April 2020.

arXiv:1902.02441 [pdf, other]

Artificial Intelligence for Prosthetics - challenge solutions

Authors: Łukasz Kidziński, Carmichael Ong, Sharada Prasanna Mohanty, Jennifer Hicks, Sean F. Carroll, Bo Zhou, Hongsheng Zeng, Fan Wang, Rongzhong Lian, Hao Tian, Wojciech Jaśkowski, Garrett Andersen, Odd Rune Lykkebø, Nihat Engin Toklu, Pranav Shyam, Rupesh Kumar Srivastava, Sergey Kolesnikov, Oleksii Hrinchuk, Anton Pechenko, Mattias Ljungström, Zhen Wang, Xu Hu, Zehong Hu, Minghui Qiu, Jun Huang , et al. (25 additional authors not shown)

Abstract: In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants were invited to describe their algorithms. In this work, we describe the challenge and present thirteen solutions that used deep reinforcement learning approaches. Many s… ▽ More In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants were invited to describe their algorithms. In this work, we describe the challenge and present thirteen solutions that used deep reinforcement learning approaches. Many solutions use similar relaxations and heuristics, such as reward sha**, frame skip**, discretization of the action space, symmetry, and policy blending. However, each team implemented different modifications of the known algorithms by, for example, dividing the task into subtasks, learning low-level control, or by incorporating expert knowledge and using imitation learning. △ Less

Submitted 6 February, 2019; originally announced February 2019.

arXiv:1811.12557 [pdf, other]

Deep Multi-Agent Reinforcement Learning with Relevance Graphs

Authors: Aleksandra Malysheva, Tegg Taekyong Sung, Chae-Bong Sohn, Daniel Kudenko, Aleksei Shpilman

Abstract: Over recent years, deep reinforcement learning has shown strong successes in complex single-agent tasks, and more recently this approach has also been applied to multi-agent domains. In this paper, we propose a novel approach, called MAGnet, to multi-agent reinforcement learning (MARL) that utilizes a relevance graph representation of the environment obtained by a self-attention mechanism, and a m… ▽ More Over recent years, deep reinforcement learning has shown strong successes in complex single-agent tasks, and more recently this approach has also been applied to multi-agent domains. In this paper, we propose a novel approach, called MAGnet, to multi-agent reinforcement learning (MARL) that utilizes a relevance graph representation of the environment obtained by a self-attention mechanism, and a message-generation technique inspired by the NerveNet architecture. We applied our MAGnet approach to the Pommerman game and the results show that it significantly outperforms state-of-the-art MARL solutions, including DQN, MADDPG, and MCTS. △ Less

Submitted 29 November, 2018; originally announced November 2018.

Comments: The first two authors contributed equally. Author ordering determined by coin flip over a Google Hangout. Accepted at NIPS 2018 Deep RL Workshop

Showing 1–25 of 25 results for author: Shpilman, A