Search | arXiv e-print repository

Self-organized arrival system for urban air mobility

Authors: Martin Waltz, Ostap Okhrin, Michael Schultz

Abstract: Urban air mobility is an innovative mode of transportation in which electric vertical takeoff and landing (eVTOL) vehicles operate between nodes called vertiports. We outline a self-organized vertiport arrival system based on deep reinforcement learning. The airspace around the vertiport is assumed to be circular, and the vehicles can freely operate inside. Each aircraft is considered an individua… ▽ More Urban air mobility is an innovative mode of transportation in which electric vertical takeoff and landing (eVTOL) vehicles operate between nodes called vertiports. We outline a self-organized vertiport arrival system based on deep reinforcement learning. The airspace around the vertiport is assumed to be circular, and the vehicles can freely operate inside. Each aircraft is considered an individual agent and follows a shared policy, resulting in decentralized actions that are based on local information. We investigate the development of the reinforcement learning policy during training and illustrate how the algorithm moves from suboptimal local holding patterns to a safe and efficient final policy. The latter is validated in simulation-based scenarios and also deployed on small-scale unmanned aerial vehicles to showcase its real-world usability. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2311.16841 [pdf, other]

Two-step dynamic obstacle avoidance

Authors: Fabian Hart, Martin Waltz, Ostap Okhrin

Abstract: Dynamic obstacle avoidance (DOA) is a fundamental challenge for any autonomous vehicle, independent of whether it operates in sea, air, or land. This paper proposes a two-step architecture for handling DOA tasks by combining supervised and reinforcement learning (RL). In the first step, we introduce a data-driven approach to estimate the collision risk of an obstacle using a recurrent neural netwo… ▽ More Dynamic obstacle avoidance (DOA) is a fundamental challenge for any autonomous vehicle, independent of whether it operates in sea, air, or land. This paper proposes a two-step architecture for handling DOA tasks by combining supervised and reinforcement learning (RL). In the first step, we introduce a data-driven approach to estimate the collision risk of an obstacle using a recurrent neural network, which is trained in a supervised fashion and offers robustness to non-linear obstacle movements. In the second step, we include these collision risk estimates into the observation space of an RL agent to increase its situational awareness.~We illustrate the power of our two-step approach by training different RL agents in a challenging environment that requires to navigate amid multiple obstacles. The non-linear movements of obstacles are exemplarily modeled based on stochastic processes and periodic patterns, although our architecture is suitable for any obstacle dynamics. The experiments reveal that integrating our collision risk metrics into the observation space doubles the performance in terms of reward, which is equivalent to halving the number of collisions in the considered environment. Furthermore, we show that the architecture's performance improvement is independent of the applied RL algorithm. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2307.16769 [pdf, other]

2-Level Reinforcement Learning for Ships on Inland Waterways

Authors: Martin Waltz, Niklas Paulig, Ostap Okhrin

Abstract: This paper proposes a realistic modularized framework for controlling autonomous surface vehicles (ASVs) on inland waterways (IWs) based on deep reinforcement learning (DRL). The framework comprises two levels: a high-level local path planning (LPP) unit and a low-level path following (PF) unit, each consisting of a DRL agent. The LPP agent is responsible for planning a path under consideration of… ▽ More This paper proposes a realistic modularized framework for controlling autonomous surface vehicles (ASVs) on inland waterways (IWs) based on deep reinforcement learning (DRL). The framework comprises two levels: a high-level local path planning (LPP) unit and a low-level path following (PF) unit, each consisting of a DRL agent. The LPP agent is responsible for planning a path under consideration of nearby vessels, traffic rules, and the geometry of the waterway. We thereby transfer a recently proposed spatial-temporal recurrent neural network architecture to continuous action spaces. The LPP agent improves operational safety in comparison to a state-of-the-art artificial potential field method by increasing the minimum distance to other vessels by 65% on average. The PF agent performs low-level actuator control while accounting for shallow water influences and the environmental forces winds, waves, and currents. Compared with a proportional-integral-derivative (PID) controller, the PF agent yields only 61% of the mean cross-track error while significantly reducing control effort in terms of the required absolute rudder angle. Lastly, both agents are jointly validated in simulation, employing the lower Elbe in northern Germany as an example case and using real automatic identification system (AIS) trajectories to model the behavior of other ships. △ Less

Submitted 28 November, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

arXiv:2211.01004 [pdf, other]

Spatial-temporal recurrent reinforcement learning for autonomous ships

Authors: Martin Waltz, Ostap Okhrin

Abstract: This paper proposes a spatial-temporal recurrent neural network architecture for deep $Q$-networks that can be used to steer an autonomous ship. The network design makes it possible to handle an arbitrary number of surrounding target ships while offering robustness to partial observability. Furthermore, a state-of-the-art collision risk metric is proposed to enable an easier assessment of differen… ▽ More This paper proposes a spatial-temporal recurrent neural network architecture for deep $Q$-networks that can be used to steer an autonomous ship. The network design makes it possible to handle an arbitrary number of surrounding target ships while offering robustness to partial observability. Furthermore, a state-of-the-art collision risk metric is proposed to enable an easier assessment of different situations by the agent. The COLREG rules of maritime traffic are explicitly considered in the design of the reward function. The final policy is validated on a custom set of newly created single-ship encounters called `Around the Clock' problems and the commonly used Imazu (1987) problems, which include 18 multi-ship scenarios. Performance comparisons with artificial potential field and velocity obstacle methods demonstrate the potential of the proposed approach for maritime path planning. Furthermore, the new architecture exhibits robustness when it is deployed in multi-agent scenarios and it is compatible with other deep reinforcement learning algorithms, including actor-critic frameworks. △ Less

Submitted 15 May, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

arXiv:2203.10777 [pdf, other]

Vulnerability-CoVaR: Investigating the Crypto-market

Authors: Martin Waltz, Abhay Kumar Singh, Ostap Okhrin

Abstract: This paper proposes an important extension to Conditional Value-at-Risk (CoVaR), the popular systemic risk measure, and investigates its properties on the cryptocurrency market. The proposed Vulnerability-CoVaR (VCoVaR) is defined as the Value-at-Risk (VaR) of a financial system or institution, given that at least one other institution is equal or below its VaR. The VCoVaR relaxes normality assump… ▽ More This paper proposes an important extension to Conditional Value-at-Risk (CoVaR), the popular systemic risk measure, and investigates its properties on the cryptocurrency market. The proposed Vulnerability-CoVaR (VCoVaR) is defined as the Value-at-Risk (VaR) of a financial system or institution, given that at least one other institution is equal or below its VaR. The VCoVaR relaxes normality assumptions and is estimated via copula. While important theoretical findings of the measure are detailed, the empirical study analyzes how different distressing events of the cryptocurrencies impact the risk level of each other. The results show that Litecoin displays the largest impact on Bitcoin and that each cryptocurrency is significantly affected if an event of joint distress among the remaining market participants occurs. The VCoVaR is shown to capture domino effects better than other CoVaR extensions. △ Less

Submitted 21 March, 2022; originally announced March 2022.

arXiv:2201.08078 [pdf, other]

Addressing Maximization Bias in Reinforcement Learning with Two-Sample Testing

Authors: Martin Waltz, Ostap Okhrin

Abstract: Value-based reinforcement-learning algorithms have shown strong results in games, robotics, and other real-world applications. Overestimation bias is a known threat to those algorithms and can lead to dramatic performance decreases or even complete algorithmic failure. We frame the bias problem statistically and consider it an instance of estimating the maximum expected value (MEV) of a set of ran… ▽ More Value-based reinforcement-learning algorithms have shown strong results in games, robotics, and other real-world applications. Overestimation bias is a known threat to those algorithms and can lead to dramatic performance decreases or even complete algorithmic failure. We frame the bias problem statistically and consider it an instance of estimating the maximum expected value (MEV) of a set of random variables. We propose the $T$-Estimator (TE) based on two-sample testing for the mean, that flexibly interpolates between over- and underestimation by adjusting the significance level of the underlying hypothesis tests. A generalization, termed $K$-Estimator (KE), obeys the same bias and variance bounds as the TE while relying on a nearly arbitrary kernel function. We introduce modifications of $Q$-Learning and the Bootstrapped Deep $Q$-Network (BDQN) using the TE and the KE, and prove convergence in the tabular setting. Furthermore, we propose an adaptive variant of the TE-based BDQN that dynamically adjusts the significance level to minimize the absolute estimation bias. All proposed estimators and algorithms are thoroughly tested and validated on diverse tasks and environments, illustrating the bias control and performance potential of the TE and KE. △ Less

Submitted 18 October, 2023; v1 submitted 20 January, 2022; originally announced January 2022.

arXiv:2112.12465 [pdf, other]

Missing Velocity in Dynamic Obstacle Avoidance based on Deep Reinforcement Learning

Authors: Fabian Hart, Martin Waltz, Ostap Okhrin

Abstract: We introduce a novel approach to dynamic obstacle avoidance based on Deep Reinforcement Learning by defining a traffic type independent environment with variable complexity. Filling a gap in the current literature, we thoroughly investigate the effect of missing velocity information on an agent's performance in obstacle avoidance tasks. This is a crucial issue in practice since several sensors yie… ▽ More We introduce a novel approach to dynamic obstacle avoidance based on Deep Reinforcement Learning by defining a traffic type independent environment with variable complexity. Filling a gap in the current literature, we thoroughly investigate the effect of missing velocity information on an agent's performance in obstacle avoidance tasks. This is a crucial issue in practice since several sensors yield only positional information of objects or vehicles. We evaluate frequently-applied approaches in scenarios of partial observability, namely the incorporation of recurrency in the deep neural networks and simple frame-stacking. For our analysis, we rely on state-of-the-art model-free deep RL algorithms. The lack of velocity information is found to significantly impact the performance of an agent. Both approaches - recurrency and frame-stacking - cannot consistently replace missing velocity information in the observation space. However, in simplified scenarios, they can significantly boost performance and stabilize the overall training procedure. △ Less

Submitted 28 December, 2021; v1 submitted 23 December, 2021; originally announced December 2021.

Showing 1–7 of 7 results for author: Waltz, M