Search | arXiv e-print repository

iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning

Authors: Aidan Scannell, Kalle Kujanpää, Yi Zhao, Mohammadreza Nakhaei, Arno Solin, Joni Pajarinen

Abstract: Learning representations for reinforcement learning (RL) has shown much promise for continuous control. We propose an efficient representation learning method using only a self-supervised latent-state consistency loss. Our approach employs an encoder and a dynamics model to map observations to latent states and predict future latent states, respectively. We achieve high performance and prevent rep… ▽ More Learning representations for reinforcement learning (RL) has shown much promise for continuous control. We propose an efficient representation learning method using only a self-supervised latent-state consistency loss. Our approach employs an encoder and a dynamics model to map observations to latent states and predict future latent states, respectively. We achieve high performance and prevent representation collapse by quantizing the latent representation such that the rank of the representation is empirically preserved. Our method, named iQRL: implicitly Quantized Reinforcement Learning, is straightforward, compatible with any model-free RL algorithm, and demonstrates excellent performance by outperforming other recently proposed representation learning methods in continuous control benchmarks from DeepMind Control Suite. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 9 pages, 11 figures

arXiv:2401.03236 [pdf, other]

Challenges of Data-Driven Simulation of Diverse and Consistent Human Driving Behaviors

Authors: Kalle Kujanpää, Daulet Baimukashev, Shibei Zhu, Shoaib Azam, Farzeen Munir, Gokhan Alcan, Ville Kyrki

Abstract: Building simulation environments for develo** and testing autonomous vehicles necessitates that the simulators accurately model the statistical realism of the real-world environment, including the interaction with other vehicles driven by human drivers. To address this requirement, an accurate human behavior model is essential to incorporate the diversity and consistency of human driving behavio… ▽ More Building simulation environments for develo** and testing autonomous vehicles necessitates that the simulators accurately model the statistical realism of the real-world environment, including the interaction with other vehicles driven by human drivers. To address this requirement, an accurate human behavior model is essential to incorporate the diversity and consistency of human driving behavior. We propose a mathematical framework for designing a data-driven simulation model that simulates human driving behavior more realistically than the currently used physics-based simulation models. Experiments conducted using the NGSIM dataset validate our hypothesis regarding the necessity of considering the complexity, diversity, and consistency of human driving behavior when aiming to develop realistic simulators. △ Less

Submitted 6 January, 2024; originally announced January 2024.

arXiv:2310.12819 [pdf, other]

Hybrid Search for Efficient Planning with Completeness Guarantees

Authors: Kalle Kujanpää, Joni Pajarinen, Alexander Ilin

Abstract: Solving complex planning problems has been a long-standing challenge in computer science. Learning-based subgoal search methods have shown promise in tackling these problems, but they often suffer from a lack of completeness guarantees, meaning that they may fail to find a solution even if one exists. In this paper, we propose an efficient approach to augment a subgoal search method to achieve com… ▽ More Solving complex planning problems has been a long-standing challenge in computer science. Learning-based subgoal search methods have shown promise in tackling these problems, but they often suffer from a lack of completeness guarantees, meaning that they may fail to find a solution even if one exists. In this paper, we propose an efficient approach to augment a subgoal search method to achieve completeness in discrete action spaces. Specifically, we augment the high-level search with low-level actions to execute a multi-level (hybrid) search, which we call complete subgoal search. This solution achieves the best of both worlds: the practical efficiency of high-level search and the completeness of low-level search. We apply the proposed search method to a recently proposed subgoal search algorithm and evaluate the algorithm trained on offline data on complex planning problems. We demonstrate that our complete subgoal search not only guarantees completeness but can even improve performance in terms of search expansions for instances that the high-level could solve without low-level augmentations. Our approach makes it possible to apply subgoal-level planning for systems where completeness is a critical requirement. △ Less

Submitted 28 November, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023 Poster

arXiv:2309.00249 [pdf, other]

Suicidal Pedestrian: Generation of Safety-Critical Scenarios for Autonomous Vehicles

Authors: Yuhang Yang, Kalle Kujanpaa, Amin Babadi, Joni Pajarinen, Alexander Ilin

Abstract: Develo** reliable autonomous driving algorithms poses challenges in testing, particularly when it comes to safety-critical traffic scenarios involving pedestrians. An open question is how to simulate rare events, not necessarily found in autonomous driving datasets or scripted simulations, but which can occur in testing, and, in the end may lead to severe pedestrian related accidents. This paper… ▽ More Develo** reliable autonomous driving algorithms poses challenges in testing, particularly when it comes to safety-critical traffic scenarios involving pedestrians. An open question is how to simulate rare events, not necessarily found in autonomous driving datasets or scripted simulations, but which can occur in testing, and, in the end may lead to severe pedestrian related accidents. This paper presents a method for designing a suicidal pedestrian agent within the CARLA simulator, enabling the automatic generation of traffic scenarios for testing safety of autonomous vehicles (AVs) in dangerous situations with pedestrians. The pedestrian is modeled as a reinforcement learning (RL) agent with two custom reward functions that allow the agent to either arbitrarily or with high velocity to collide with the AV. Instead of significantly constraining the initial locations and the pedestrian behavior, we allow the pedestrian and autonomous car to be placed anywhere in the environment and the pedestrian to roam freely to generate diverse scenarios. To assess the performance of the suicidal pedestrian and the target vehicle during testing, we propose three collision-oriented evaluation metrics. Experimental results involving two state-of-the-art autonomous driving algorithms trained end-to-end with imitation learning from sensor data demonstrate the effectiveness of the suicidal pedestrian in identifying decision errors made by autonomous vehicles controlled by the algorithms. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: 6 pages; 5 figures; 2 tables

arXiv:2301.12962 [pdf, other]

Hierarchical Imitation Learning with Vector Quantized Models

Authors: Kalle Kujanpää, Joni Pajarinen, Alexander Ilin

Abstract: The ability to plan actions on multiple levels of abstraction enables intelligent agents to solve complex tasks effectively. However, learning the models for both low and high-level planning from demonstrations has proven challenging, especially with higher-dimensional inputs. To address this issue, we propose to use reinforcement learning to identify subgoals in expert trajectories by associating… ▽ More The ability to plan actions on multiple levels of abstraction enables intelligent agents to solve complex tasks effectively. However, learning the models for both low and high-level planning from demonstrations has proven challenging, especially with higher-dimensional inputs. To address this issue, we propose to use reinforcement learning to identify subgoals in expert trajectories by associating the magnitude of the rewards with the predictability of low-level actions given the state and the chosen subgoal. We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning. In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art. Because of its ability to plan, our algorithm can find better trajectories than the ones in the training set △ Less

Submitted 29 May, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

Comments: To appear at ICML 2023

arXiv:2210.01426 [pdf, other]

Continuous Monte Carlo Graph Search

Authors: Kalle Kujanpää, Amin Babadi, Yi Zhao, Juho Kannala, Alexander Ilin, Joni Pajarinen

Abstract: Online planning is crucial for high performance in many complex sequential decision-making tasks. Monte Carlo Tree Search (MCTS) employs a principled mechanism for trading off exploration for exploitation for efficient online planning, and it outperforms comparison methods in many discrete decision-making domains such as Go, Chess, and Shogi. Subsequently, extensions of MCTS to continuous domains… ▽ More Online planning is crucial for high performance in many complex sequential decision-making tasks. Monte Carlo Tree Search (MCTS) employs a principled mechanism for trading off exploration for exploitation for efficient online planning, and it outperforms comparison methods in many discrete decision-making domains such as Go, Chess, and Shogi. Subsequently, extensions of MCTS to continuous domains have been developed. However, the inherent high branching factor and the resulting explosion of the search tree size are limiting the existing methods. To address this problem, we propose Continuous Monte Carlo Graph Search (CMCGS), an extension of MCTS to online planning in environments with continuous state and action spaces. CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance. To implement this idea, at each time step, CMCGS clusters similar states into a limited number of stochastic action bandit nodes, which produce a layered directed graph instead of an MCTS search tree. Experimental evaluation shows that CMCGS outperforms comparable planning methods in several complex continuous DeepMind Control Suite benchmarks and 2D navigation and exploration tasks with limited sample budgets. Furthermore, CMCGS can be scaled up through parallelization, and it outperforms the Cross-Entropy Method (CEM) in continuous control with learned dynamics models. △ Less

Submitted 7 February, 2024; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: Accepted at AAMAS 2024 (full paper & oral)

arXiv:2110.01362 [pdf, other]

doi 10.1145/3474369.3486877

Automating Privilege Escalation with Deep Reinforcement Learning

Authors: Kalle Kujanpää, Willie Victor, Alexander Ilin

Abstract: AI-based defensive solutions are necessary to defend networks and information assets against intelligent automated attacks. Gathering enough realistic data for training machine learning-based defenses is a significant practical challenge. An intelligent red teaming agent capable of performing realistic attacks can alleviate this problem. However, there is little scientific evidence demonstrating t… ▽ More AI-based defensive solutions are necessary to defend networks and information assets against intelligent automated attacks. Gathering enough realistic data for training machine learning-based defenses is a significant practical challenge. An intelligent red teaming agent capable of performing realistic attacks can alleviate this problem. However, there is little scientific evidence demonstrating the feasibility of fully automated attacks using machine learning. In this work, we exemplify the potential threat of malicious actors using deep reinforcement learning to train automated agents. We present an agent that uses a state-of-the-art reinforcement learning algorithm to perform local privilege escalation. Our results show that the autonomous agent can escalate privileges in a Windows 7 environment using a wide variety of different techniques depending on the environment configuration it encounters. Hence, our agent is usable for generating realistic attack sensor data for training and evaluating intrusion detection systems. △ Less

Submitted 4 October, 2021; originally announced October 2021.

Comments: To appear at AISec'21 (aisec.cc)

arXiv:2006.09763 [pdf, other]

Longitudinal Variational Autoencoder

Authors: Siddharth Ramchandran, Gleb Tikhonov, Kalle Kujanpää, Miika Koskinen, Harri Lähdesmäki

Abstract: Longitudinal datasets measured repeatedly over time from individual subjects, arise in many biomedical, psychological, social, and other studies. A common approach to analyse high-dimensional data that contains missing values is to learn a low-dimensional representation using variational autoencoders (VAEs). However, standard VAEs assume that the learnt representations are i.i.d., and fail to capt… ▽ More Longitudinal datasets measured repeatedly over time from individual subjects, arise in many biomedical, psychological, social, and other studies. A common approach to analyse high-dimensional data that contains missing values is to learn a low-dimensional representation using variational autoencoders (VAEs). However, standard VAEs assume that the learnt representations are i.i.d., and fail to capture the correlations between the data samples. We propose the Longitudinal VAE (L-VAE), that uses a multi-output additive Gaussian process (GP) prior to extend the VAE's capability to learn structured low-dimensional representations imposed by auxiliary covariate information, and derive a new KL divergence upper bound for such GPs. Our approach can simultaneously accommodate both time-varying shared and random effects, produce structured low-dimensional representations, disentangle effects of individual covariates or their interactions, and achieve highly accurate predictive performance. We compare our model against previous methods on synthetic as well as clinical datasets, and demonstrate the state-of-the-art performance in data imputation, reconstruction, and long-term prediction tasks. △ Less

Submitted 20 April, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

Journal ref: International Conference on Artificial Intelligence and Statistics (AISTATS-2021), pp. 3898-3906. PMLR, 2021

Showing 1–8 of 8 results for author: Kujanpää, K