Search | arXiv e-print repository

Optimal Transport-Assisted Risk-Sensitive Q-Learning

Abstract: The primary goal of reinforcement learning is to develop decision-making policies that prioritize optimal performance without considering risk or safety. In contrast, safe reinforcement learning aims to mitigate or avoid unsafe states. This paper presents a risk-sensitive Q-learning algorithm that leverages optimal transport theory to enhance the agent safety. By integrating optimal transport into… ▽ More The primary goal of reinforcement learning is to develop decision-making policies that prioritize optimal performance without considering risk or safety. In contrast, safe reinforcement learning aims to mitigate or avoid unsafe states. This paper presents a risk-sensitive Q-learning algorithm that leverages optimal transport theory to enhance the agent safety. By integrating optimal transport into the Q-learning framework, our approach seeks to optimize the policy's expected return while minimizing the Wasserstein distance between the policy's stationary distribution and a predefined risk distribution, which encapsulates safety preferences from domain experts. We validate the proposed algorithm in a Gridworld environment. The results indicate that our method significantly reduces the frequency of visits to risky states and achieves faster convergence to a stable policy compared to the traditional Q-learning algorithm. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2403.05925 [pdf, other]

BEACON: A Bayesian Evolutionary Approach for Counterexample Generation of Control Systems

Authors: Joshua Yancosek, Ali Baheri

Abstract: The rigorous safety verification of control systems in critical applications is essential, given their increasing complexity and integration into everyday life. Simulation-based falsification approaches play a pivotal role in the safety verification of control systems, particularly within critical applications. These methods systematically explore the operational space of systems to identify confi… ▽ More The rigorous safety verification of control systems in critical applications is essential, given their increasing complexity and integration into everyday life. Simulation-based falsification approaches play a pivotal role in the safety verification of control systems, particularly within critical applications. These methods systematically explore the operational space of systems to identify configurations that result in violations of safety specifications. However, the effectiveness of traditional simulation-based falsification is frequently limited by the high dimensionality of the search space and the substantial computational resources required for exhaustive exploration. This paper presents BEACON, a novel framework that enhances the falsification process through a combination of Bayesian optimization and covariance matrix adaptation evolutionary strategy. By exploiting quantitative metrics to evaluate how closely a system adheres to safety specifications, BEACON advances the state-of-the-art in testing methodologies. It employs a model-based test point selection approach, designed to facilitate exploration across dynamically evolving search zones to efficiently uncover safety violations. Our findings demonstrate that BEACON not only locates a higher percentage of counterexamples compared to standalone BO but also achieves this with significantly fewer simulations than required by CMA-ES, highlighting its potential to optimize the verification process of control systems. This framework offers a promising direction for achieving thorough and resource-efficient safety evaluations, ensuring the reliability of control systems in critical applications. △ Less

Submitted 12 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

arXiv:2402.15893 [pdf, other]

Concurrent Learning of Policy and Unknown Safety Constraints in Reinforcement Learning

Authors: Lunet Yifru, Ali Baheri

Abstract: Reinforcement learning (RL) has revolutionized decision-making across a wide range of domains over the past few decades. Yet, deploying RL policies in real-world scenarios presents the crucial challenge of ensuring safety. Traditional safe RL approaches have predominantly focused on incorporating predefined safety constraints into the policy learning process. However, this reliance on predefined s… ▽ More Reinforcement learning (RL) has revolutionized decision-making across a wide range of domains over the past few decades. Yet, deploying RL policies in real-world scenarios presents the crucial challenge of ensuring safety. Traditional safe RL approaches have predominantly focused on incorporating predefined safety constraints into the policy learning process. However, this reliance on predefined safety constraints poses limitations in dynamic and unpredictable real-world settings where such constraints may not be available or sufficiently adaptable. Bridging this gap, we propose a novel approach that concurrently learns a safe RL control policy and identifies the unknown safety constraint parameters of a given environment. Initializing with a parametric signal temporal logic (pSTL) safety specification and a small initial labeled dataset, we frame the problem as a bilevel optimization task, intricately integrating constrained policy optimization, using a Lagrangian-variant of the twin delayed deep deterministic policy gradient (TD3) algorithm, with Bayesian optimization for optimizing parameters for the given pSTL safety specification. Through experimentation in comprehensive case studies, we validate the efficacy of this approach across varying forms of environmental constraints, consistently yielding safe RL policies with high returns. Furthermore, our findings indicate successful learning of STL safety constraint parameters, exhibiting a high degree of conformity with true environmental safety constraints. The performance of our model closely mirrors that of an ideal scenario that possesses complete prior knowledge of safety constraints, demonstrating its proficiency in accurately identifying environmental safety constraints and learning safe policies that adhere to those constraints. △ Less

Submitted 24 March, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

arXiv:2401.10949 [pdf, ps, other]

The Synergy Between Optimal Transport Theory and Multi-Agent Reinforcement Learning

Authors: Ali Baheri, Mykel J. Kochenderfer

Abstract: This paper explores the integration of optimal transport (OT) theory with multi-agent reinforcement learning (MARL). This integration uses OT to handle distributions and transportation problems to enhance the efficiency, coordination, and adaptability of MARL. There are five key areas where OT can impact MARL: (1) policy alignment, where OT's Wasserstein metric is used to align divergent agent str… ▽ More This paper explores the integration of optimal transport (OT) theory with multi-agent reinforcement learning (MARL). This integration uses OT to handle distributions and transportation problems to enhance the efficiency, coordination, and adaptability of MARL. There are five key areas where OT can impact MARL: (1) policy alignment, where OT's Wasserstein metric is used to align divergent agent strategies towards unified goals; (2) distributed resource management, employing OT to optimize resource allocation among agents; (3) addressing non-stationarity, using OT to adapt to dynamic environmental shifts; (4) scalable multi-agent learning, harnessing OT for decomposing large-scale learning objectives into manageable tasks; and (5) enhancing energy efficiency, applying OT principles to develop sustainable MARL systems. This paper articulates how the synergy between OT and MARL can address scalability issues, optimize resource distribution, align agent policies in cooperative environments, and ensure adaptability in dynamically changing conditions. △ Less

Submitted 24 January, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

arXiv:2310.12055 [pdf, ps, other]

Understanding Reward Ambiguity Through Optimal Transport Theory in Inverse Reinforcement Learning

Authors: Ali Baheri

Abstract: In inverse reinforcement learning (IRL), the central objective is to infer underlying reward functions from observed expert behaviors in a way that not only explains the given data but also generalizes to unseen scenarios. This ensures robustness against reward ambiguity where multiple reward functions can equally explain the same expert behaviors. While significant efforts have been made in addre… ▽ More In inverse reinforcement learning (IRL), the central objective is to infer underlying reward functions from observed expert behaviors in a way that not only explains the given data but also generalizes to unseen scenarios. This ensures robustness against reward ambiguity where multiple reward functions can equally explain the same expert behaviors. While significant efforts have been made in addressing this issue, current methods often face challenges with high-dimensional problems and lack a geometric foundation. This paper harnesses the optimal transport (OT) theory to provide a fresh perspective on these challenges. By utilizing the Wasserstein distance from OT, we establish a geometric framework that allows for quantifying reward ambiguity and identifying a central representation or centroid of reward functions. These insights pave the way for robust IRL methodologies anchored in geometric interpretations, offering a structured approach to tackle reward ambiguity in high-dimensional settings. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2309.06239 [pdf, ps, other]

Risk-Aware Reinforcement Learning through Optimal Transport Theory

Authors: Ali Baheri

Abstract: In the dynamic and uncertain environments where reinforcement learning (RL) operates, risk management becomes a crucial factor in ensuring reliable decision-making. Traditional RL approaches, while effective in reward optimization, often overlook the landscape of potential risks. In response, this paper pioneers the integration of Optimal Transport (OT) theory with RL to create a risk-aware framew… ▽ More In the dynamic and uncertain environments where reinforcement learning (RL) operates, risk management becomes a crucial factor in ensuring reliable decision-making. Traditional RL approaches, while effective in reward optimization, often overlook the landscape of potential risks. In response, this paper pioneers the integration of Optimal Transport (OT) theory with RL to create a risk-aware framework. Our approach modifies the objective function, ensuring that the resulting policy not only maximizes expected rewards but also respects risk constraints dictated by OT distances between state visitation distributions and the desired risk profiles. By leveraging the mathematical precision of OT, we offer a formulation that elevates risk considerations alongside conventional RL objectives. Our contributions are substantiated with a series of theorems, map** the relationships between risk distributions, optimal value functions, and policy behaviors. Through the lens of OT, this work illuminates a promising direction for RL, ensuring a balanced fusion of reward pursuit and risk awareness. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2305.06796 [pdf, ps, other]

Towards Theoretical Understanding of Data-Driven Policy Refinement

Authors: Ali Baheri

Abstract: This paper presents an approach for data-driven policy refinement in reinforcement learning, specifically designed for safety-critical applications. Our methodology leverages the strengths of data-driven optimization and reinforcement learning to enhance policy safety and optimality through iterative refinement. Our principal contribution lies in the mathematical formulation of this data-driven po… ▽ More This paper presents an approach for data-driven policy refinement in reinforcement learning, specifically designed for safety-critical applications. Our methodology leverages the strengths of data-driven optimization and reinforcement learning to enhance policy safety and optimality through iterative refinement. Our principal contribution lies in the mathematical formulation of this data-driven policy refinement concept. This framework systematically improves reinforcement learning policies by learning from counterexamples identified during data-driven verification. Furthermore, we present a series of theorems elucidating key theoretical properties of our approach, including convergence, robustness bounds, generalization error, and resilience to model mismatch. These results not only validate the effectiveness of our methodology but also contribute to a deeper understanding of its behavior in different environments and scenarios. △ Less

Submitted 15 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: Accepted at the "Bridging the Gap Between AI Planning and Reinforcement Learning (PRL)" workshop at ICAPS 2023

arXiv:2305.06111 [pdf, ps, other]

Joint Falsification and Fidelity Settings Optimization for Validation of Safety-Critical Systems: A Theoretical Analysis

Authors: Ali Baheri, Mykel J. Kochenderfer

Abstract: Safety validation is a crucial component in the development and deployment of autonomous systems, such as self-driving vehicles and robotic systems. Ensuring safe operation necessitates extensive testing and verification of control policies, typically conducted in simulation environments. High-fidelity simulators accurately model real-world dynamics but entail high computational costs, limiting th… ▽ More Safety validation is a crucial component in the development and deployment of autonomous systems, such as self-driving vehicles and robotic systems. Ensuring safe operation necessitates extensive testing and verification of control policies, typically conducted in simulation environments. High-fidelity simulators accurately model real-world dynamics but entail high computational costs, limiting their scalability for exhaustive testing. Conversely, low-fidelity simulators offer efficiency but may not capture the intricacies of high-fidelity simulators, potentially yielding false conclusions. We propose a joint falsification and fidelity optimization framework for safety validation of autonomous systems. Our mathematical formulation combines counterexample searches with simulator fidelity improvement, facilitating more efficient exploration of the critical environmental configurations challenging the control system. Our contributions encompass a set of theorems addressing counterexample sensitivity analysis, sample complexity, convergence, the interplay between the outer and inner optimization loops, and regret bound analysis. The proposed joint optimization approach enables a more targeted and efficient testing process, optimizes the use of available computational resources, and enhances confidence in autonomous system safety validation. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: Submitted to the 20th International Conference on Quantitative Evaluation of Systems (QEST 2023)

arXiv:2305.00576 [pdf, ps, other]

Joint Learning of Policy with Unknown Temporal Constraints for Safe Reinforcement Learning

Authors: Lunet Yifru, Ali Baheri

Abstract: In many real-world applications, safety constraints for reinforcement learning (RL) algorithms are either unknown or not explicitly defined. We propose a framework that concurrently learns safety constraints and optimal RL policies in such environments, supported by theoretical guarantees. Our approach merges a logically-constrained RL algorithm with an evolutionary algorithm to synthesize signal… ▽ More In many real-world applications, safety constraints for reinforcement learning (RL) algorithms are either unknown or not explicitly defined. We propose a framework that concurrently learns safety constraints and optimal RL policies in such environments, supported by theoretical guarantees. Our approach merges a logically-constrained RL algorithm with an evolutionary algorithm to synthesize signal temporal logic (STL) specifications. The framework is underpinned by theorems that establish the convergence of our joint learning process and provide error bounds between the discovered policy and the true optimal policy. We showcased our framework in grid-world environments, successfully identifying both acceptable safety constraints and RL policies while demonstrating the effectiveness of our theorems in practice. △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: Accepted at the "Bridging the Gap Between AI Planning and Reinforcement Learning (PRL)" workshop at ICAPS 2023

arXiv:2212.14118 [pdf, other]

Falsification of Learning-Based Controllers through Multi-Fidelity Bayesian Optimization

Authors: Zahra Shahrooei, Mykel J. Kochenderfer, Ali Baheri

Abstract: Simulation-based falsification is a practical testing method to increase confidence that the system will meet safety requirements. Because full-fidelity simulations can be computationally demanding, we investigate the use of simulators with different levels of fidelity. As a first step, we express the overall safety specification in terms of environmental parameters and structure this safety speci… ▽ More Simulation-based falsification is a practical testing method to increase confidence that the system will meet safety requirements. Because full-fidelity simulations can be computationally demanding, we investigate the use of simulators with different levels of fidelity. As a first step, we express the overall safety specification in terms of environmental parameters and structure this safety specification as an optimization problem. We propose a multi-fidelity falsification framework using Bayesian optimization, which is able to determine at which level of fidelity we should conduct a safety evaluation in addition to finding possible instances from the environment that cause the system to fail. This method allows us to automatically switch between inexpensive, inaccurate information from a low-fidelity simulator and expensive, accurate information from a high-fidelity simulator in a cost-effective way. Our experiments on various environments in simulation demonstrate that multi-fidelity Bayesian optimization has falsification performance comparable to single-fidelity Bayesian optimization but with much lower cost. △ Less

Submitted 28 April, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: 7 pages, 8 figures, Accepted for the 2023 European Control Conference (ECC)

arXiv:2211.02147 [pdf, other]

A Survey on Reinforcement Learning in Aviation Applications

Authors: Pouria Razzaghi, Amin Tabrizian, Wei Guo, Shulu Chen, Abenezer Taye, Ellis Thompson, Alexis Bregeon, Ali Baheri, Peng Wei

Abstract: Compared with model-based control and optimization methods, reinforcement learning (RL) provides a data-driven, learning-based framework to formulate and solve sequential decision-making problems. The RL framework has become promising due to largely improved data availability and computing power in the aviation industry. Many aviation-based applications can be formulated or treated as sequential d… ▽ More Compared with model-based control and optimization methods, reinforcement learning (RL) provides a data-driven, learning-based framework to formulate and solve sequential decision-making problems. The RL framework has become promising due to largely improved data availability and computing power in the aviation industry. Many aviation-based applications can be formulated or treated as sequential decision-making problems. Some of them are offline planning problems, while others need to be solved online and are safety-critical. In this survey paper, we first describe standard RL formulations and solutions. Then we survey the landscape of existing RL-based applications in aviation. Finally, we summarize the paper, identify the technical gaps, and suggest future directions of RL research in aviation. △ Less

Submitted 22 November, 2022; v1 submitted 3 November, 2022; originally announced November 2022.

arXiv:2205.04590 [pdf, other]

A Verification Framework for Certifying Learning-Based Safety-Critical Aviation Systems

Authors: Ali Baheri, Hao Ren, Benjamin Johnson, Pouria Razzaghi, Peng Wei

Abstract: We present a safety verification framework for design-time and run-time assurance of learning-based components in aviation systems. Our proposed framework integrates two novel methodologies. From the design-time assurance perspective, we propose offline mixed-fidelity verification tools that incorporate knowledge from different levels of granularity in simulated environments. From the run-time ass… ▽ More We present a safety verification framework for design-time and run-time assurance of learning-based components in aviation systems. Our proposed framework integrates two novel methodologies. From the design-time assurance perspective, we propose offline mixed-fidelity verification tools that incorporate knowledge from different levels of granularity in simulated environments. From the run-time assurance perspective, we propose reachability- and statistics-based online monitoring and safety guards for a learning-based decision-making model to complement the offline verification methods. This framework is designed to be loosely coupled among modules, allowing the individual modules to be developed using independent methodologies and techniques, under varying circumstances and with different tool access. The proposed framework offers feasible solutions for meeting system safety requirements at different stages throughout the system development and deployment cycle, enabling the continuous learning and assessment of the system product. △ Less

Submitted 14 May, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

Comments: 12 pages, 9 figures

arXiv:2203.12416 [pdf, other]

A Framework for Controlling Multi-Robot Systems Using Bayesian Optimization and Linear Combination of Vectors

Authors: Stephen Jacobs, R. Michael Butts, Yu Gu, Ali Baheri, Guilherme A. S. Pereira

Abstract: We propose a general framework for creating parameterized control schemes for decentralized multi-robot systems. A variety of tasks can be seen in the decentralized multi-robot literature, each with many possible control schemes. For several of them, the agents choose control velocities using algorithms that extract information from the environment and combine that information in meaningful ways.… ▽ More We propose a general framework for creating parameterized control schemes for decentralized multi-robot systems. A variety of tasks can be seen in the decentralized multi-robot literature, each with many possible control schemes. For several of them, the agents choose control velocities using algorithms that extract information from the environment and combine that information in meaningful ways. From this basic formation, a framework is proposed that classifies each robots' measurement information as sets of relevant scalars and vectors and creates a linear combination of the measured vector sets. Along with an optimizable parameter set, the scalar measurements are used to generate the coefficients for the linear combination. With this framework and Bayesian optimization, we can create effective control systems for several multi-robot tasks, including cohesion and segregation, pattern formation, and searching/foraging. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: 7 pages, 8 figures

arXiv:2203.03451 [pdf, other]

Black-Box Safety Validation of Autonomous Systems: A Multi-Fidelity Reinforcement Learning Approach

Authors: Jared J. Beard, Ali Baheri

Abstract: The increasing use of autonomous and semi-autonomous agents in society has made it crucial to validate their safety. However, the complex scenarios in which they are used may make formal verification impossible. To address this challenge, simulation-based safety validation is employed to test the complex system. Recent approaches using reinforcement learning are prone to excessive exploitation of… ▽ More The increasing use of autonomous and semi-autonomous agents in society has made it crucial to validate their safety. However, the complex scenarios in which they are used may make formal verification impossible. To address this challenge, simulation-based safety validation is employed to test the complex system. Recent approaches using reinforcement learning are prone to excessive exploitation of known failures and a lack of coverage in the space of failures. To address this limitation, a type of Markov decision process called the "knowledge MDP" has been defined. This approach takes into account both the learned model and its metadata, such as sample counts, in estimating the system's knowledge through the "knows what it knows" framework. A novel algorithm that extends bidirectional learning to multiple fidelities of simulators has been developed to solve the safety validation problem. The effectiveness of this approach is demonstrated through a case study in which an adversary is trained to intercept a test model in a grid-world environment. Monte Carlo trials compare the sample efficiency of the proposed algorithm to learning with a single-fidelity simulator and show the importance of incorporating knowledge about learned models into the decision-making process. △ Less

Submitted 1 March, 2023; v1 submitted 7 March, 2022; originally announced March 2022.

Comments: 8 pages, 6 figures, 2 Algorithms, submitted to the 2023 7th IEEE Conference on Control Technology and Applications, Bridgetown, Barbados

arXiv:2007.01698 [pdf, other]

Safe Reinforcement Learning with Mixture Density Network: A Case Study in Autonomous Highway Driving

Authors: Ali Baheri

Abstract: This paper presents a safe reinforcement learning system for automated driving that benefits from multimodal future trajectory predictions. We propose a safety system that consists of two safety components: a heuristic safety and a learning-based safety. The heuristic safety module is based on common driving rules. On the other hand, the learning-based safety module is a data-driven safety rule th… ▽ More This paper presents a safe reinforcement learning system for automated driving that benefits from multimodal future trajectory predictions. We propose a safety system that consists of two safety components: a heuristic safety and a learning-based safety. The heuristic safety module is based on common driving rules. On the other hand, the learning-based safety module is a data-driven safety rule that learns safety patterns from driving data. Specifically, it utilizes mixture density recurrent neural networks (MD-RNN) for multimodal future trajectory predictions to accelerate the learning progress. Our simulation results demonstrate that the proposed safety system outperforms previously reported results in terms of average reward and number of collisions. △ Less

Submitted 17 November, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1910.12905

arXiv:2003.08300 [pdf, other]

Vision-Based Autonomous Driving: A Model Learning Approach

Authors: Ali Baheri, Ilya Kolmanovsky, Anouck Girard, H. Eric Tseng, Dimitar Filev

Abstract: We present an integrated approach for perception and control for an autonomous vehicle and demonstrate this approach in a high-fidelity urban driving simulator. Our approach first builds a model for the environment, then trains a policy exploiting the learned model to identify the action to take at each time-step. To build a model for the environment, we leverage several deep learning algorithms.… ▽ More We present an integrated approach for perception and control for an autonomous vehicle and demonstrate this approach in a high-fidelity urban driving simulator. Our approach first builds a model for the environment, then trains a policy exploiting the learned model to identify the action to take at each time-step. To build a model for the environment, we leverage several deep learning algorithms. To that end, first we train a variational autoencoder to encode the input image into an abstract latent representation. We then utilize a recurrent neural network to predict the latent representation of the next frame and handle temporal information. Finally, we utilize an evolutionary-based reinforcement learning algorithm to train a controller based on these latent representations to identify the action to take. We evaluate our approach in CARLA, a high-fidelity urban driving simulator, and conduct an extensive generalization study. Our results demonstrate that our approach outperforms several previously reported approaches in terms of the percentage of successfully completed episodes for a lane kee** task. △ Less

Submitted 18 March, 2020; originally announced March 2020.

Comments: 6

arXiv:1910.12905 [pdf, other]

Deep Reinforcement Learning with Enhanced Safety for Autonomous Highway Driving

Authors: Ali Baheri, Subramanya Nageshrao, H. Eric Tseng, Ilya Kolmanovsky, Anouck Girard, Dimitar Filev

Abstract: In this paper, we present a safe deep reinforcement learning system for automated driving. The proposed framework leverages merits of both rule-based and learning-based approaches for safety assurance. Our safety system consists of two modules namely handcrafted safety and dynamically-learned safety. The handcrafted safety module is a heuristic safety rule based on common driving practice that ens… ▽ More In this paper, we present a safe deep reinforcement learning system for automated driving. The proposed framework leverages merits of both rule-based and learning-based approaches for safety assurance. Our safety system consists of two modules namely handcrafted safety and dynamically-learned safety. The handcrafted safety module is a heuristic safety rule based on common driving practice that ensure a minimum relative gap to a traffic vehicle. On the other hand, the dynamically-learned safety module is a data-driven safety rule that learns safety patterns from driving data. Specifically, the dynamically-leaned safety module incorporates a model lookahead beyond the immediate reward of reinforcement learning to predict safety longer into the future. If one of the future states leads to a near-miss or collision, then a negative reward will be assigned to the reward function to avoid collision and accelerate the learning process. We demonstrate the capability of the proposed framework in a simulation environment with varying traffic density. Our results show the superior capabilities of the policy enhanced with dynamically-learned safety module. △ Less

Submitted 23 April, 2020; v1 submitted 28 October, 2019; originally announced October 2019.

arXiv:1910.12901 [pdf, other]

Waypoint Optimization Using Bayesian Optimization: A Case Study in Airborne Wind Energy Systems

Authors: Ali Baheri, Chris Vermillion

Abstract: We present a data-driven optimization framework that aims to address online adaptation of the flight path shape for an airborne wind energy system (AWE) that follows a repetitive path to generate power. Specifically, Bayesian optimization, which is a data-driven algorithm for finding the optimum of an unknown objective function, is utilized to solve the waypoint adaptation. To form a computational… ▽ More We present a data-driven optimization framework that aims to address online adaptation of the flight path shape for an airborne wind energy system (AWE) that follows a repetitive path to generate power. Specifically, Bayesian optimization, which is a data-driven algorithm for finding the optimum of an unknown objective function, is utilized to solve the waypoint adaptation. To form a computationally efficient optimization framework, we describe each figure-$8$ flight via a compact set of parameters, termed as basis parameters. We model the underlying objective function by a Gaussian Process (GP). Bayesian optimization utilizes the predictive uncertainty information from the GP to determine the best subsequent basis parameters. Once a path is generated using Bayesian optimization, a path following mechanism is used to track the generated figure-$8$ flight. The proposed framework is validated on a simplified $2$-dimensional model that mimics the key behaviors of a $3$-dimensional AWE system. We demonstrate the capability of the proposed framework in a simulation environment for a simplified $2$-dimensional AWE system model. △ Less

Submitted 16 November, 2020; v1 submitted 28 October, 2019; originally announced October 2019.

arXiv:1901.07521 [pdf, other]

Economically Efficient Combined Plant and Controller Design Using Batch Bayesian Optimization: Mathematical Framework and Airborne Wind Energy Case Study

Authors: Ali Baheri, Chris Vermillion

Abstract: We present a novel data-driven nested optimization framework that addresses the problem of coupling between plant and controller optimization. This optimization strategy is tailored towards instances where a closed-form expression for the system dynamic response is unobtainable and simulations or experiments are necessary. Specifically, Bayesian Optimization, which is a data-driven technique for f… ▽ More We present a novel data-driven nested optimization framework that addresses the problem of coupling between plant and controller optimization. This optimization strategy is tailored towards instances where a closed-form expression for the system dynamic response is unobtainable and simulations or experiments are necessary. Specifically, Bayesian Optimization, which is a data-driven technique for finding the optimum of an unknown and expensive-to-evaluate objective function, is employed to solve a nested optimization problem. The underlying objective function is modeled by a Gaussian Process (GP); then, Bayesian Optimization utilizes the predictive uncertainty information from the GP to determine the best subsequent control or plant parameters. The proposed framework differs from the majority of co-design literature where there exists a closed-form model of the system dynamics. Furthermore, we utilize the idea of Batch Bayesian Optimization at the plant optimization level to generate a set of plant designs at each iteration of the overall optimization process, recognizing that there will exist economies of scale in running multiple experiments in each iteration of the plant design process. We validate the proposed framework for a Buoyant Airborne Turbine (BAT). We choose the horizontal stabilizer area, longitudinal center of mass relative to center of buoyancy (plant parameters), and the pitch angle set-point (controller parameter) as our decision variables. Our results demonstrate that these plant and control parameters converge to their respective optimal values within only a few iterations. △ Less

Submitted 22 January, 2019; originally announced January 2019.

Showing 1–19 of 19 results for author: Baheri, A