Search | arXiv e-print repository

arXiv:2008.04712 [pdf, other]

doi 10.1016/j.ifacsc.2021.100144

Learning Event-triggered Control from Data through Joint Optimization

Authors: Niklas Funk, Dominik Baumann, Vincent Berenz, Sebastian Trimpe

Abstract: We present a framework for model-free learning of event-triggered control strategies. Event-triggered methods aim to achieve high control performance while only closing the feedback loop when needed. This enables resource savings, e.g., network bandwidth if control commands are sent via communication networks, as in networked control systems. Event-triggered controllers consist of a communication… ▽ More We present a framework for model-free learning of event-triggered control strategies. Event-triggered methods aim to achieve high control performance while only closing the feedback loop when needed. This enables resource savings, e.g., network bandwidth if control commands are sent via communication networks, as in networked control systems. Event-triggered controllers consist of a communication policy, determining when to communicate, and a control policy, deciding what to communicate. It is essential to jointly optimize the two policies since individual optimization does not necessarily yield the overall optimal solution. To address this need for joint optimization, we propose a novel algorithm based on hierarchical reinforcement learning. The resulting algorithm is shown to accomplish high-performance control in line with resource savings and scales seamlessly to nonlinear and high-dimensional systems. The method's applicability to real-world scenarios is demonstrated through experiments on a six degrees of freedom real-time controlled manipulator. Further, we propose an approach towards evaluating the stability of the learned neural network policies. △ Less

Submitted 23 April, 2021; v1 submitted 11 August, 2020; originally announced August 2020.

arXiv:2006.03906 [pdf, other]

Identifying Causal Structure in Dynamical Systems

Authors: Dominik Baumann, Friedrich Solowjow, Karl H. Johansson, Sebastian Trimpe

Abstract: Mathematical models are fundamental building blocks in the design of dynamical control systems. As control systems are becoming increasingly complex and networked, approaches for obtaining such models based on first principles reach their limits. Data-driven methods provide an alternative. However, without structural knowledge, these methods are prone to finding spurious correlations in the traini… ▽ More Mathematical models are fundamental building blocks in the design of dynamical control systems. As control systems are becoming increasingly complex and networked, approaches for obtaining such models based on first principles reach their limits. Data-driven methods provide an alternative. However, without structural knowledge, these methods are prone to finding spurious correlations in the training data, which can hamper generalization capabilities of the obtained models. This can significantly lower control and prediction performance when the system is exposed to unknown situations. A preceding causal identification can prevent this pitfall. In this paper, we propose a method that identifies the causal structure of control systems. We design experiments based on the concept of controllability, which provides a systematic way to compute input trajectories that steer the system to specific regions in its state space. We then analyze the resulting data leveraging powerful techniques from causal inference and extend them to control systems. Further, we derive conditions that guarantee the discovery of the true causal structure of the system. Experiments on a robot arm demonstrate reliable causal identification from real-world data and enhanced generalization capabilities. △ Less

Submitted 18 July, 2022; v1 submitted 6 June, 2020; originally announced June 2020.

Comments: Accepted final versions to appear in the Transactions on Machine Learning Research

arXiv:2005.07443 [pdf, other]

Excursion Search for Constrained Bayesian Optimization under a Limited Budget of Failures

Authors: Alonso Marco, Alexander von Rohr, Dominik Baumann, José Miguel Hernández-Lobato, Sebastian Trimpe

Abstract: When learning to ride a bike, a child falls down a number of times before achieving the first success. As falling down usually has only mild consequences, it can be seen as a tolerable failure in exchange for a faster learning process, as it provides rich information about an undesired behavior. In the context of Bayesian optimization under unknown constraints (BOC), typical strategies for safe le… ▽ More When learning to ride a bike, a child falls down a number of times before achieving the first success. As falling down usually has only mild consequences, it can be seen as a tolerable failure in exchange for a faster learning process, as it provides rich information about an undesired behavior. In the context of Bayesian optimization under unknown constraints (BOC), typical strategies for safe learning explore conservatively and avoid failures by all means. On the other side of the spectrum, non conservative BOC algorithms that allow failing may fail an unbounded number of times before reaching the optimum. In this work, we propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures. Empirical validation shows that our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods. In addition, we propose an original algorithm for unconstrained Bayesian optimization inspired by the notion of excursion sets in stochastic processes, upon which the failures-aware algorithm is built. △ Less

Submitted 15 May, 2020; originally announced May 2020.

Comments: 14 pages, 4 figures, submitted

arXiv:2004.11238 [pdf, other]

Learning Constrained Dynamics with Gauss Principle adhering Gaussian Processes

Authors: A. Rene Geist, Sebastian Trimpe

Abstract: The identification of the constrained dynamics of mechanical systems is often challenging. Learning methods promise to ease an analytical analysis, but require considerable amounts of data for training. We propose to combine insights from analytical mechanics with Gaussian process regression to improve the model's data efficiency and constraint integrity. The result is a Gaussian process model tha… ▽ More The identification of the constrained dynamics of mechanical systems is often challenging. Learning methods promise to ease an analytical analysis, but require considerable amounts of data for training. We propose to combine insights from analytical mechanics with Gaussian process regression to improve the model's data efficiency and constraint integrity. The result is a Gaussian process model that incorporates a priori constraint knowledge such that its predictions adhere to Gauss' principle of least constraint. In return, predictions of the system's acceleration naturally respect potentially non-ideal (non-)holonomic equality constraints. As corollary results, our model enables to infer the acceleration of the unconstrained system from data of the constrained system and enables knowledge transfer between differing constraint configurations. △ Less

Submitted 23 April, 2020; originally announced April 2020.

Comments: To be published in 2nd Annual Conference on Learning for Dynamics and Control (L4DC), Proceedings of Machine Learning Research 2020

Journal ref: Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR 120:225-234, 2020

arXiv:2004.11098 [pdf, other]

A Kernel Two-sample Test for Dynamical Systems

Authors: Friedrich Solowjow, Dominik Baumann, Christian Fiedler, Andreas Jocham, Thomas Seel, Sebastian Trimpe

Abstract: Evaluating whether data streams are drawn from the same distribution is at the heart of various machine learning problems. This is particularly relevant for data generated by dynamical systems since such systems are essential for many real-world processes in biomedical, economic, or engineering systems. While kernel two-sample tests are powerful for comparing independent and identically distribute… ▽ More Evaluating whether data streams are drawn from the same distribution is at the heart of various machine learning problems. This is particularly relevant for data generated by dynamical systems since such systems are essential for many real-world processes in biomedical, economic, or engineering systems. While kernel two-sample tests are powerful for comparing independent and identically distributed random variables, no established method exists for comparing dynamical systems. The main problem is the inherently violated independence assumption. We propose a two-sample test for dynamical systems by addressing three core challenges: we (i) introduce a novel notion of mixing that captures autocorrelations in a relevant metric, (ii) propose an efficient way to estimate the speed of mixing relying purely on data, and (iii) integrate these into established kernel two-sample tests. The result is a data-driven method that is straightforward to use in practice and comes with sound theoretical guarantees. In an example application to anomaly detection from human walking data, we show that the test is readily applicable without any human expert knowledge and feature engineering. △ Less

Submitted 4 September, 2022; v1 submitted 23 April, 2020; originally announced April 2020.

arXiv:2003.08613 [pdf, ps, other]

doi 10.1109/LCSYS.2020.3004506

Controller Design via Experimental Exploration with Robustness Guarantees

Authors: Tobias Holicki, Carsten W. Scherer, Sebastian Trimpe

Abstract: For a partially unknown linear systems, we present a systematic control design approach based on generated data from measurements of closed-loop experiments with suitable test controllers. These experiments are used to improve the achieved performance and to reduce the uncertainty about the unknown parts of the system. This is achieved through a parametrization of auspicious controllers with conve… ▽ More For a partially unknown linear systems, we present a systematic control design approach based on generated data from measurements of closed-loop experiments with suitable test controllers. These experiments are used to improve the achieved performance and to reduce the uncertainty about the unknown parts of the system. This is achieved through a parametrization of auspicious controllers with convex relaxation techniques from robust control, which guarantees that their implementation on the unknown plant is safe. This approach permits to systematically incorporate available prior knowledge about the system by employing the framework of linear fractional representations. △ Less

Submitted 26 June, 2020; v1 submitted 19 March, 2020; originally announced March 2020.

Journal ref: IEEE Control Systems Letters, Volume 5, Issue 2, Pages 641-646, 2020

arXiv:1912.10360 [pdf, other]

doi 10.1109/LRA.2020.2975727

Safe and Fast Tracking on a Robot Manipulator: Robust MPC and Neural Network Control

Authors: Julian Nubert, Johannes Köhler, Vincent Berenz, Frank Allgöwer, Sebastian Trimpe

Abstract: Fast feedback control and safety guarantees are essential in modern robotics. We present an approach that achieves both by combining novel robust model predictive control (MPC) with function approximation via (deep) neural networks (NNs). The result is a new approach for complex tasks with nonlinear, uncertain, and constrained dynamics as are common in robotics. Specifically, we leverage recent re… ▽ More Fast feedback control and safety guarantees are essential in modern robotics. We present an approach that achieves both by combining novel robust model predictive control (MPC) with function approximation via (deep) neural networks (NNs). The result is a new approach for complex tasks with nonlinear, uncertain, and constrained dynamics as are common in robotics. Specifically, we leverage recent results in MPC research to propose a new robust setpoint tracking MPC algorithm, which achieves reliable and safe tracking of a dynamic setpoint while guaranteeing stability and constraint satisfaction. The presented robust MPC scheme constitutes a one-layer approach that unifies the often separated planning and control layers, by directly computing the control command based on a reference and possibly obstacle positions. As a separate contribution, we show how the computation time of the MPC can be drastically reduced by approximating the MPC law with a NN controller. The NN is trained and validated from offline samples of the MPC, yielding statistical guarantees, and used in lieu thereof at run time. Our experiments on a state-of-the-art robot manipulator are the first to show that both the proposed robust and approximate MPC schemes scale to real-world robotic systems. △ Less

Submitted 2 March, 2020; v1 submitted 21 December, 2019; originally announced December 2019.

Comments: 8 pages, 4 figures,

Journal ref: Robotics and Automation Letters, 2020

arXiv:1911.09946 [pdf, other]

Actively Learning Gaussian Process Dynamics

Authors: Mona Buisson-Fenet, Friedrich Solowjow, Sebastian Trimpe

Abstract: Despite the availability of ever more data enabled through modern sensor and computer technology, it still remains an open problem to learn dynamical systems in a sample-efficient way. We propose active learning strategies that leverage information-theoretical properties arising naturally during Gaussian process regression, while respecting constraints on the sampling process imposed by the system… ▽ More Despite the availability of ever more data enabled through modern sensor and computer technology, it still remains an open problem to learn dynamical systems in a sample-efficient way. We propose active learning strategies that leverage information-theoretical properties arising naturally during Gaussian process regression, while respecting constraints on the sampling process imposed by the system dynamics. Sample points are selected in regions with high uncertainty, leading to exploratory behavior and data-efficient training of the model. All results are finally verified in an extensive numerical benchmark. △ Less

Submitted 26 April, 2020; v1 submitted 22 November, 2019; originally announced November 2019.

Journal ref: Actively Learning Gaussian Process Dynamics, Proceedings of the 2nd Conference on Learning for Dynamics and Control, Proceedings of Machine Learning Research vol 120, pp. 5-15, 2020

arXiv:1910.13399 [pdf, other]

Robust Model-free Reinforcement Learning with Multi-objective Bayesian Optimization

Authors: Matteo Turchetta, Andreas Krause, Sebastian Trimpe

Abstract: In reinforcement learning (RL), an autonomous agent learns to perform complex tasks by maximizing an exogenous reward signal while interacting with its environment. In real-world applications, test conditions may differ substantially from the training scenario and, therefore, focusing on pure reward maximization during training may lead to poor results at test time. In these cases, it is important… ▽ More In reinforcement learning (RL), an autonomous agent learns to perform complex tasks by maximizing an exogenous reward signal while interacting with its environment. In real-world applications, test conditions may differ substantially from the training scenario and, therefore, focusing on pure reward maximization during training may lead to poor results at test time. In these cases, it is important to trade-off between performance and robustness while learning a policy. While several results exist for robust, model-based RL, the model-free case has not been widely investigated. In this paper, we cast the robust, model-free RL problem as a multi-objective optimization problem. To quantify the robustness of a policy, we use delay margin and gain margin, two robustness indicators that are common in control theory. We show how these metrics can be estimated from data in the model-free setting. We use multi-objective Bayesian optimization (MOBO) to solve efficiently this expensive-to-evaluate, multi-objective optimization problem. We show the benefits of our robust formulation both in sim-to-real and pure hardware experiments to balance a Furuta pendulum. △ Less

Submitted 29 October, 2019; originally announced October 2019.

Comments: Submitted to IEEE Conference on Robotics and Automation 2020 (ICRA)

arXiv:1910.07732 [pdf, other]

doi 10.1109/TAC.2020.3030877

Event-triggered Learning for Linear Quadratic Control

Authors: Henning Schlüter, Friedrich Solowjow, Sebastian Trimpe

Abstract: When models are inaccurate, the performance of model-based control will degrade. For linear quadratic control, an event-triggered learning framework is proposed that automatically detects inaccurate models and triggers the learning of a new process model when needed. This is achieved by analyzing the probability distribution of the linear quadratic cost and designing a learning trigger that levera… ▽ More When models are inaccurate, the performance of model-based control will degrade. For linear quadratic control, an event-triggered learning framework is proposed that automatically detects inaccurate models and triggers the learning of a new process model when needed. This is achieved by analyzing the probability distribution of the linear quadratic cost and designing a learning trigger that leverages Chernoff bounds. In particular, whenever empirically observed cost signals are located outside the derived confidence intervals, we can provably guarantee that this is with high probability due to a model mismatch. With the aid of numerical and hardware experiments, we demonstrate that the proposed bounds are tight and that the event-triggered learning algorithm effectively distinguishes between inaccurate models and probabilistic effects such as process noise. Thus, a structured approach is obtained that decides when model learning is beneficial. △ Less

Submitted 5 October, 2020; v1 submitted 17 October, 2019; originally announced October 2019.

Comments: 13 pages, 8 figures, accepted for publication in IEEE Transactions on Automatic Control

Journal ref: IEEE Transactions on Automatic Control, vol. 66, no. 10, pp. 4485-4498, Oct. 2021

arXiv:1910.02835 [pdf, other]

A Learnable Safety Measure

Authors: Steve Heim, Alexander von Rohr, Sebastian Trimpe, Alexander Badri-Spröwitz

Abstract: Failures are challenging for learning to control physical systems since they risk damage, time-consuming resets, and often provide little gradient information. Adding safety constraints to exploration typically requires a lot of prior knowledge and domain expertise. We present a safety measure which implicitly captures how the system dynamics relate to a set of failure states. Not only can this me… ▽ More Failures are challenging for learning to control physical systems since they risk damage, time-consuming resets, and often provide little gradient information. Adding safety constraints to exploration typically requires a lot of prior knowledge and domain expertise. We present a safety measure which implicitly captures how the system dynamics relate to a set of failure states. Not only can this measure be used as a safety function, but also to directly compute the set of safe state-action pairs. Further, we show a model-free approach to learn this measure by active sampling using Gaussian processes. While safety can only be guaranteed after learning the safety measure, we show that failures can already be greatly reduced by using the estimated measure during learning. △ Less

Submitted 7 October, 2019; originally announced October 2019.

Comments: 10 pages, Conference on Robot Learning CoRL 2019, 3 figures

arXiv:1909.10873 [pdf, other]

Fast Feedback Control over Multi-hop Wireless Networks with Mode Changes and Stability Guarantees

Authors: Dominik Baumann, Fabian Mager, Romain Jacob, Lothar Thiele, Marco Zimmerling, Sebastian Trimpe

Abstract: Closing feedback loops fast and over long distances is key to emerging cyber-physical applications; for example, robot motion control and swarm coordination require update intervals of tens of milliseconds. Low-power wireless communication technology is preferred for its low cost, small form factor, and flexibility, especially if the devices support multi-hop communication. Thus far, however, feed… ▽ More Closing feedback loops fast and over long distances is key to emerging cyber-physical applications; for example, robot motion control and swarm coordination require update intervals of tens of milliseconds. Low-power wireless communication technology is preferred for its low cost, small form factor, and flexibility, especially if the devices support multi-hop communication. Thus far, however, feedback control over multi-hop low-power wireless networks has only been demonstrated for update intervals on the order of seconds. To fill this gap, this paper presents a wireless embedded system that supports dynamic mode changes and tames imperfections impairing control performance (e.g., jitter and message loss), and a control design that exploits the essential properties of this system to provably guarantee closed-loop stability for physical processes with linear time-invariant dynamics in the presence of mode changes. Using experiments on a cyber-physical testbed with 20 wireless devices and multiple cart-pole systems, we are the first to demonstrate and evaluate feedback control and coordination with mode changes over multi-hop networks for update intervals of 20 to 50 milliseconds. △ Less

Submitted 19 September, 2019; originally announced September 2019.

Comments: Accepted for publication in ACM Transactions on Cyber-Physical Systems. arXiv admin note: text overlap with arXiv:1804.08986

arXiv:1907.12300 [pdf, other]

Predictive Triggering for Distributed Control of Resource Constrained Multi-agent Systems

Authors: José Mario Mastrangelo, Dominik Baumann, Sebastian Trimpe

Abstract: A predictive triggering (PT) framework for the distributed control of resource constrained multi-agent systems is proposed. By predicting future communication demands and deriving a probabilistic priority measure, the PT framework is able to allocate limited communication resources in advance. The framework is evaluated through simulations of a cooperative adaptive cruise control system and experi… ▽ More A predictive triggering (PT) framework for the distributed control of resource constrained multi-agent systems is proposed. By predicting future communication demands and deriving a probabilistic priority measure, the PT framework is able to allocate limited communication resources in advance. The framework is evaluated through simulations of a cooperative adaptive cruise control system and experiments on multi-agent cart-pole systems. The results of these studies show its effectiveness over other event-triggered designs at reducing network utilization, while also improving the control error of the system. △ Less

Submitted 29 July, 2019; originally announced July 2019.

Comments: 6 pages, 3 figures, to appear in Proc. of the 8th IFAC Workshop on Distributed Estimation and Control in Networked Systems, 2019

arXiv:1907.10383 [pdf, other]

Classified Regression for Bayesian Optimization: Robot Learning with Unknown Penalties

Authors: Alonso Marco, Dominik Baumann, Philipp Hennig, Sebastian Trimpe

Abstract: Learning robot controllers by minimizing a black-box objective cost using Bayesian optimization (BO) can be time-consuming and challenging. It is very often the case that some roll-outs result in failure behaviors, causing premature experiment detention. In such cases, the designer is forced to decide on heuristic cost penalties because the acquired data is often scarce, or not comparable with tha… ▽ More Learning robot controllers by minimizing a black-box objective cost using Bayesian optimization (BO) can be time-consuming and challenging. It is very often the case that some roll-outs result in failure behaviors, causing premature experiment detention. In such cases, the designer is forced to decide on heuristic cost penalties because the acquired data is often scarce, or not comparable with that of the stable policies. To overcome this, we propose a Bayesian model that captures exactly what we know about the cost of unstable controllers prior to data collection: Nothing, except that it should be a somewhat large number. The resulting Bayesian model, approximated with a Gaussian process, predicts high cost values in regions where failures are likely to occur. In this way, the model guides the BO exploration toward regions of stability. We demonstrate the benefits of the proposed model in several illustrative and statistical synthetic benchmarks, and also in experiments on a real robotic platform. In addition, we propose and experimentally validate a new BO method to account for unknown constraints. Such method is an extension of Max-Value Entropy Search, a recent information-theoretic method, to solve unconstrained global optimization problems. △ Less

Submitted 9 November, 2020; v1 submitted 24 July, 2019; originally announced July 2019.

Comments: This paper was submitted to JMLR in 2018 and rejected. Currently, it is not published, nor under review in any conference or journal venue

arXiv:1906.05554 [pdf, other]

doi 10.1145/3302506.3312483

Demo Abstract: Fast Feedback Control and Coordination with Mode Changes for Wireless Cyber-Physical Systems

Authors: Fabian Mager, Dominik Baumann, Romain Jacob, Lothar Thiele, Sebastian Trimpe, Marco Zimmerling

Abstract: This abstract describes the first public demonstration of feedback control and coordination of multiple physical systems over a dynamic multi-hop low-power wireless network with update intervals of tens of milliseconds. Our running system can dynamically change between different sets of application tasks (e.g., sensing, actuation, control) executing on the spatially distributed embedded devices, w… ▽ More This abstract describes the first public demonstration of feedback control and coordination of multiple physical systems over a dynamic multi-hop low-power wireless network with update intervals of tens of milliseconds. Our running system can dynamically change between different sets of application tasks (e.g., sensing, actuation, control) executing on the spatially distributed embedded devices, while closed-loop stability is provably guaranteed even across those so-called mode changes. Moreover, any subset of the devices can move freely, which does not affect closed-loop stability and control performance as long as the wireless network remains connected. △ Less

Submitted 13 June, 2019; originally announced June 2019.

Comments: Proceedings of the 18th International Conference on Information Processing in Sensor Networks (IPSN'19), April 16--18, 2019, Montreal, QC, Canada

arXiv:1906.03458 [pdf, other]

doi 10.1109/LCSYS.2019.2922188

Control-guided Communication: Efficient Resource Arbitration and Allocation in Multi-hop Wireless Control Systems

Authors: Dominik Baumann, Fabian Mager, Marco Zimmerling, Sebastian Trimpe

Abstract: In future autonomous systems, wireless multi-hop communication is key to enable collaboration among distributed agents at low cost and high flexibility. When many agents need to transmit information over the same wireless network, communication becomes a shared and contested resource. Event-triggered and self-triggered control account for this by transmitting data only when needed, enabling signif… ▽ More In future autonomous systems, wireless multi-hop communication is key to enable collaboration among distributed agents at low cost and high flexibility. When many agents need to transmit information over the same wireless network, communication becomes a shared and contested resource. Event-triggered and self-triggered control account for this by transmitting data only when needed, enabling significant energy savings. However, a solution that brings those benefits to multi-hop networks and can reallocate freed up bandwidth to additional agents or data sources is still missing. To fill this gap, we propose control-guided communication, a novel co-design approach for distributed self-triggered control over wireless multi-hop networks. The control system informs the communication system of its transmission demands ahead of time, and the communication system allocates resources accordingly. Experiments on a cyber-physical testbed show that multiple cart-poles can be synchronized over wireless, while serving other traffic when resources are available, or saving energy. These experiments are the first to demonstrate and evaluate distributed self-triggered control over low-power multi-hop wireless networks at update rates of tens of milliseconds. △ Less

Submitted 8 June, 2019; originally announced June 2019.

Comments: Accepted final version to appear in: IEEE Control Systems Letters

arXiv:1906.03211 [pdf, other]

doi 10.1109/LCSYS.2019.2922005

Hierarchical Event-triggered Learning for Cyclically Excited Systems with Application to Wireless Sensor Networks

Authors: Jonas Beuchert, Friedrich Solowjow, Jörg Raisch, Sebastian Trimpe, Thomas Seel

Abstract: Communication load is a limiting factor in many real-time systems. Event-triggered state estimation and event-triggered learning methods reduce network communication by sending information only when it cannot be adequately predicted based on previously transmitted data. This paper proposes an event-triggered learning approach for nonlinear discrete-time systems with cyclic excitation. The method a… ▽ More Communication load is a limiting factor in many real-time systems. Event-triggered state estimation and event-triggered learning methods reduce network communication by sending information only when it cannot be adequately predicted based on previously transmitted data. This paper proposes an event-triggered learning approach for nonlinear discrete-time systems with cyclic excitation. The method automatically recognizes cyclic patterns in data - even when they change repeatedly - and reduces communication load whenever the current data can be accurately predicted from previous cycles. Nonetheless, a bounded error between original and received signal is guaranteed. The cyclic excitation model, which is used for predictions, is updated hierarchically, i.e., a full model update is only performed if updating a small number of model parameters is not sufficient. A nonparametric statistical test enforces that model updates happen only if the cyclic excitation changed with high probability. The effectiveness of the proposed methods is demonstrated using the application example of wireless real-time pitch angle measurements of a human foot in a feedback-controlled neuroprosthesis. The experimental results show that communication load can be reduced by 70 % while the root-mean-square error between measured and received angle is less than 1°. △ Less

Submitted 7 June, 2019; originally announced June 2019.

Comments: 6 pages and 6 figures; to appear in IEEE Control Systems Letters

Journal ref: IEEE Control Systems Letters, vol. 4, no. 1, pp. 103-108, Jan. 2020

arXiv:1905.05710 [pdf, other]

Trajectory-Based Off-Policy Deep Reinforcement Learning

Authors: Andreas Doerr, Michael Volpp, Marc Toussaint, Sebastian Trimpe, Christian Daniel

Abstract: Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter s… ▽ More Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. The resulting objective is amenable to standard neural network optimization strategies like stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo. Incorporation of previous rollouts via importance sampling greatly improves data-efficiency, whilst stochastic optimization schemes facilitate the escape from local optima. We evaluate the proposed approach on a series of continuous control benchmark tasks. The results show that the proposed algorithm is able to successfully and reliably learn solutions using fewer system interactions than standard policy gradient methods. △ Less

Submitted 14 May, 2019; originally announced May 2019.

Comments: Includes appendix. Accepted for ICML 2019

arXiv:1904.03042 [pdf, other]

doi 10.1016/j.automatica.2020.109009

Event-triggered Learning

Authors: Friedrich Solowjow, Sebastian Trimpe

Abstract: The efficient exchange of information is an essential aspect of intelligent collective behavior. Event-triggered control and estimation achieve some efficiency by replacing continuous data exchange between agents with intermittent, or event-triggered communication. Typically, model-based predictions are used at times of no data transmission, and updates are sent only when the prediction error grow… ▽ More The efficient exchange of information is an essential aspect of intelligent collective behavior. Event-triggered control and estimation achieve some efficiency by replacing continuous data exchange between agents with intermittent, or event-triggered communication. Typically, model-based predictions are used at times of no data transmission, and updates are sent only when the prediction error grows too large. The effectiveness in reducing communication thus strongly depends on the quality of the prediction model. In this article, we propose event-triggered learning as a novel concept to reduce communication even further and to also adapt to changing dynamics. By monitoring the actual communication rate and comparing it to the one that is induced by the model, we detect a mismatch between model and reality and trigger model learning when needed. Specifically, for linear Gaussian dynamics, we derive different classes of learning triggers solely based on a statistical analysis of inter-communication times and formally prove their effectiveness with the aid of concentration inequalities. △ Less

Submitted 23 March, 2020; v1 submitted 5 April, 2019; originally announced April 2019.

arXiv:1903.08046 [pdf, ps, other]

Event-triggered Pulse Control with Model Learning (if Necessary)

Authors: Dominik Baumann, Friedrich Solowjow, Karl Henrik Johansson, Sebastian Trimpe

Abstract: In networked control systems, communication is a shared and therefore scarce resource. Event-triggered control (ETC) can achieve high performance control with a significantly reduced amount of samples compared to classical, periodic control schemes. However, ETC methods usually rely on the availability of an accurate dynamics model, which is oftentimes not readily available. In this paper, we prop… ▽ More In networked control systems, communication is a shared and therefore scarce resource. Event-triggered control (ETC) can achieve high performance control with a significantly reduced amount of samples compared to classical, periodic control schemes. However, ETC methods usually rely on the availability of an accurate dynamics model, which is oftentimes not readily available. In this paper, we propose a novel event-triggered pulse control strategy that learns dynamics models if necessary. In addition to adapting to changing dynamics, the method also represents a suitable replacement for the integral part typically used in periodic control. △ Less

Submitted 19 March, 2019; originally announced March 2019.

Comments: Accepted final version to appear in: Proc. of the American Control Conference, 2019

arXiv:1901.07531 [pdf, other]

doi 10.1109/JIOT.2019.2894628

Resource-aware IoT Control: Saving Communication through Predictive Triggering

Authors: Sebastian Trimpe, Dominik Baumann

Abstract: The Internet of Things (IoT) interconnects multiple physical devices in large-scale networks. When the 'things' coordinate decisions and act collectively on shared information, feedback is introduced between them. Multiple feedback loops are thus closed over a shared, general-purpose network. Traditional feedback control is unsuitable for design of IoT control because it relies on high-rate period… ▽ More The Internet of Things (IoT) interconnects multiple physical devices in large-scale networks. When the 'things' coordinate decisions and act collectively on shared information, feedback is introduced between them. Multiple feedback loops are thus closed over a shared, general-purpose network. Traditional feedback control is unsuitable for design of IoT control because it relies on high-rate periodic communication and is ignorant of the shared network resource. Therefore, recent event-based estimation methods are applied herein for resource-aware IoT control allowing agents to decide online whether communication with other agents is needed, or not. While this can reduce network traffic significantly, a severe limitation of typical event-based approaches is the need for instantaneous triggering decisions that leave no time to reallocate freed resources (e.g., communication slots), which hence remain unused. To address this problem, novel predictive and self triggering protocols are proposed herein. From a unified Bayesian decision framework, two schemes are developed: self triggers that predict, at the current triggering instant, the next one; and predictive triggers that check at every time step, whether communication will be needed at a given prediction horizon. The suitability of these triggers for feedback control is demonstrated in hardware experiments on a cart-pole, and scalability is discussed with a multi-vehicle simulation. △ Less

Submitted 19 January, 2019; originally announced January 2019.

Comments: 16 pages, 15 figures, accepted article to appear in IEEE Internet of Things Journal. arXiv admin note: text overlap with arXiv:1609.07534

arXiv:1812.09582 [pdf, other]

doi 10.1016/j.automatica.2020.109247

Online learning with stability guarantees: A memory-based real-time model predictive controller

Authors: Lukas Schwenkel, Meriem Gharbi, Sebastian Trimpe, Christian Ebenbauer

Abstract: We propose and analyze a real-time model predictive control (MPC) scheme that utilizes stored data to improve its performance by learning the value function online with stability guarantees. For linear and nonlinear systems, a learning method is presented that makes use of basic analytic properties of the cost function and is proven to learn the MPC control law and the value function on the limit… ▽ More We propose and analyze a real-time model predictive control (MPC) scheme that utilizes stored data to improve its performance by learning the value function online with stability guarantees. For linear and nonlinear systems, a learning method is presented that makes use of basic analytic properties of the cost function and is proven to learn the MPC control law and the value function on the limit set of the closed-loop state trajectory. The main idea is to generate a smart warm start based on historical data that improves future data points and thus future warm starts. We show that these warm starts are asymptotically exact and converge to the solution of the MPC optimization problem. Thereby, the suboptimality of the applied control input resulting from the real-time requirements vanishes over time. Simulative examples show that existing real-time MPC schemes can be improved by storing data and the proposed learning scheme. △ Less

Submitted 22 September, 2020; v1 submitted 22 December, 2018; originally announced December 2018.

Comments: This article is an extended version of the paper "Online learning with stability guarantees: A memory-based warm starting for real-time MPC" published in Automatica, Volume 122, 109247, 2020, including all proofs, an application example, and a detailed description of the used algorithm

Journal ref: Automatica, Volume 122, 109247, 2020

arXiv:1812.06325 [pdf, other]

doi 10.1109/TCST.2018.2886159

Data-efficient Auto-tuning with Bayesian Optimization: An Industrial Control Study

Authors: Matthias Neumann-Brosig, Alonso Marco, Dieter Schwarzmann, Sebastian Trimpe

Abstract: Bayesian optimization is proposed for automatic learning of optimal controller parameters from experimental data. A probabilistic description (a Gaussian process) is used to model the unknown function from controller parameters to a user-defined cost. The probabilistic model is updated with data, which is obtained by testing a set of parameters on the physical system and evaluating the cost. In or… ▽ More Bayesian optimization is proposed for automatic learning of optimal controller parameters from experimental data. A probabilistic description (a Gaussian process) is used to model the unknown function from controller parameters to a user-defined cost. The probabilistic model is updated with data, which is obtained by testing a set of parameters on the physical system and evaluating the cost. In order to learn fast, the Bayesian optimization algorithm selects the next parameters to evaluate in a systematic way, for example, by maximizing information gain about the optimum. The algorithm thus iteratively finds the globally optimal parameters with only few experiments. Taking throttle valve control as a representative industrial control example, the proposed auto-tuning method is shown to outperform manual calibration: it consistently achieves better performance with a low number of experiments. The proposed auto-tuning framework is flexible and can handle different control structures and objectives. △ Less

Submitted 17 December, 2018; v1 submitted 15 December, 2018; originally announced December 2018.

Comments: 11 pages, 7 figures and 4 tables. To appear in IEEE Transactions on Control Systems Technology

arXiv:1809.05152 [pdf, other]

Deep Reinforcement Learning for Event-Triggered Control

Authors: Dominik Baumann, Jia-Jie Zhu, Georg Martius, Sebastian Trimpe

Abstract: Event-triggered control (ETC) methods can achieve high-performance control with a significantly lower number of samples compared to usual, time-triggered methods. These frameworks are often based on a mathematical model of the system and specific designs of controller and event trigger. In this paper, we show how deep reinforcement learning (DRL) algorithms can be leveraged to simultaneously learn… ▽ More Event-triggered control (ETC) methods can achieve high-performance control with a significantly lower number of samples compared to usual, time-triggered methods. These frameworks are often based on a mathematical model of the system and specific designs of controller and event trigger. In this paper, we show how deep reinforcement learning (DRL) algorithms can be leveraged to simultaneously learn control and communication behavior from scratch, and present a DRL approach that is particularly suitable for ETC. To our knowledge, this is the first work to apply DRL to ETC. We validate the approach on multiple control tasks and compare it to model-based event-triggering frameworks. In particular, we demonstrate that it can, other than many model-based ETC designs, be straightforwardly applied to nonlinear systems. △ Less

Submitted 13 September, 2018; originally announced September 2018.

arXiv:1809.03225 [pdf, other]

doi 10.1109/IROS.2018.8594092

Gait learning for soft microrobots controlled by light fields

Authors: Alexander von Rohr, Sebastian Trimpe, Alonso Marco, Peer Fischer, Stefano Palagi

Abstract: Soft microrobots based on photoresponsive materials and controlled by light fields can generate a variety of different gaits. This inherent flexibility can be exploited to maximize their locomotion performance in a given environment and used to adapt them to changing conditions. Albeit, because of the lack of accurate locomotion models, and given the intrinsic variability among microrobots, analyt… ▽ More Soft microrobots based on photoresponsive materials and controlled by light fields can generate a variety of different gaits. This inherent flexibility can be exploited to maximize their locomotion performance in a given environment and used to adapt them to changing conditions. Albeit, because of the lack of accurate locomotion models, and given the intrinsic variability among microrobots, analytical control design is not possible. Common data-driven approaches, on the other hand, require running prohibitive numbers of experiments and lead to very sample-specific results. Here we propose a probabilistic learning approach for light-controlled soft microrobots based on Bayesian Optimization (BO) and Gaussian Processes (GPs). The proposed approach results in a learning scheme that is data-efficient, enabling gait optimization with a limited experimental budget, and robust against differences among microrobot samples. These features are obtained by designing the learning scheme through the comparison of different GP priors and BO settings on a semi-synthetic data set. The developed learning scheme is validated in microrobot experiments, resulting in a 115% improvement in a microrobot's locomotion performance with an experimental budget of only 20 tests. These encouraging results lead the way toward self-adaptive microrobotic systems based on light-controlled soft microrobots and probabilistic learning control. △ Less

Submitted 10 September, 2018; originally announced September 2018.

Comments: 8 pages, 7 figures, to appear in the proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems 2018

arXiv:1806.04167 [pdf, other]

doi 10.1109/LCSYS.2018.2843682

Learning an Approximate Model Predictive Controller with Guarantees

Authors: Michael Hertneck, Johannes Köhler, Sebastian Trimpe, Frank Allgöwer

Abstract: A supervised learning framework is proposed to approximate a model predictive controller (MPC) with reduced computational complexity and guarantees on stability and constraint satisfaction. The framework can be used for a wide class of nonlinear systems. Any standard supervised learning technique (e.g. neural networks) can be employed to approximate the MPC from samples. In order to obtain closed-… ▽ More A supervised learning framework is proposed to approximate a model predictive controller (MPC) with reduced computational complexity and guarantees on stability and constraint satisfaction. The framework can be used for a wide class of nonlinear systems. Any standard supervised learning technique (e.g. neural networks) can be employed to approximate the MPC from samples. In order to obtain closed-loop guarantees for the learned MPC, a robust MPC design is combined with statistical learning bounds. The MPC design ensures robustness to inaccurate inputs within given bounds, and Hoeffding's Inequality is used to validate that the learned MPC satisfies these bounds with high confidence. The result is a closed-loop statistical guarantee on stability and constraint satisfaction for the learned MPC. The proposed learning-based MPC framework is illustrated on a nonlinear benchmark problem, for which we learn a neural network controller with guarantees. △ Less

Submitted 11 June, 2018; originally announced June 2018.

Comments: 6 pages, 3 figures, to appear in IEEE Control Systems Letters

arXiv:1805.10615 [pdf, other]

A Local Information Criterion for Dynamical Systems

Authors: Arash Mehrjou, Friedrich Solowjow, Sebastian Trimpe, Bernhard Schölkopf

Abstract: Encoding a sequence of observations is an essential task with many applications. The encoding can become highly efficient when the observations are generated by a dynamical system. A dynamical system imposes regularities on the observations that can be leveraged to achieve a more efficient code. We propose a method to encode a given or learned dynamical system. Apart from its application for encod… ▽ More Encoding a sequence of observations is an essential task with many applications. The encoding can become highly efficient when the observations are generated by a dynamical system. A dynamical system imposes regularities on the observations that can be leveraged to achieve a more efficient code. We propose a method to encode a given or learned dynamical system. Apart from its application for encoding a sequence of observations, we propose to use the compression achieved by this encoding as a criterion for model selection. Given a dataset, different learning algorithms result in different models. But not all learned models are equally good. We show that the proposed encoding approach can be used to choose the learned model which is closer to the true underlying dynamics. We provide experiments for both encoding and model selection, and theoretical results that shed light on why the approach works. △ Less

Submitted 27 May, 2018; originally announced May 2018.

arXiv:1805.09714 [pdf, other]

Efficient Encoding of Dynamical Systems through Local Approximations

Authors: Friedrich Solowjow, Arash Mehrjou, Bernhard Schölkopf, Sebastian Trimpe

Abstract: An efficient representation of observed data has many benefits in various domains of engineering and science. Representing static data sets, such as images, is a living branch in machine learning and eases downstream tasks, such as classification, regression, or decision making. However, the representation of dynamical systems has received less attention. In this work, we develop a method to repre… ▽ More An efficient representation of observed data has many benefits in various domains of engineering and science. Representing static data sets, such as images, is a living branch in machine learning and eases downstream tasks, such as classification, regression, or decision making. However, the representation of dynamical systems has received less attention. In this work, we develop a method to represent a dynamical system efficiently as a combination of a state and a local model, which fulfills a criterion inspired by the minimum description length (MDL) principle. The MDL principle is used in machine learning and statistics to quantify the trade-off between the ability to explain seen data and the model complexity. Networked control systems are a prominent example, where such a representation is beneficial. When many agents share a network, information exchange is costly and should thus happen only when necessary. We empirically show the efficiency of the proposed encoding for several dynamical systems and demonstrate reduced communication for event-triggered state estimation problems. △ Less

Submitted 27 September, 2018; v1 submitted 24 May, 2018; originally announced May 2018.

Comments: 7 pages, 5 figures, to appear in 57th IEEE Conference on Decision and Control (CDC 2018)

arXiv:1804.09582 [pdf, other]

doi 10.1109/CPSBench.2018.00009

Evaluating Low-Power Wireless Cyber-Physical Systems

Authors: Dominik Baumann, Fabian Mager, Harsoveet Singh, Marco Zimmerling, Sebastian Trimpe

Abstract: Simulation tools and testbeds have been proposed to assess the performance of control designs and wireless protocols in isolation. A cyber-physical system (CPS), however, integrates control with network elements, which must be evaluated together under real-world conditions to assess control performance, stability, and associated costs. We present an approach to evaluate CPS relying on embedded dev… ▽ More Simulation tools and testbeds have been proposed to assess the performance of control designs and wireless protocols in isolation. A cyber-physical system (CPS), however, integrates control with network elements, which must be evaluated together under real-world conditions to assess control performance, stability, and associated costs. We present an approach to evaluate CPS relying on embedded devices and low-power wireless technology. Using one or multiple inverted pendulums as physical system, our approach supports a spectrum of realistic CPS scenarios that impose different requirements onto the control and networking elements. Moreover, our approach allows one to flexibly combine simulated and real pendulums, promoting adoption, scalability, repeatability, and integration with existing wireless testbed infrastructures. A case study demonstrates implementation, execution, and measurements using the proposed evaluation approach. △ Less

Submitted 25 April, 2018; originally announced April 2018.

Comments: Accepted final version to appear in: Proceedings of the 1st Workshop on Benchmarking Cyber-Physical Networks and Systems

arXiv:1804.08986 [pdf, other]

doi 10.1145/3302509.3311046

Feedback Control Goes Wireless: Guaranteed Stability over Low-power Multi-hop Networks

Authors: Fabian Mager, Dominik Baumann, Romain Jacob, Lothar Thiele, Sebastian Trimpe, Marco Zimmerling

Abstract: Closing feedback loops fast and over long distances is key to emerging applications; for example, robot motion control and swarm coordination require update intervals of tens of milliseconds. Low-power wireless technology is preferred for its low cost, small form factor, and flexibility, especially if the devices support multi-hop communication. So far, however, feedback control over wireless mult… ▽ More Closing feedback loops fast and over long distances is key to emerging applications; for example, robot motion control and swarm coordination require update intervals of tens of milliseconds. Low-power wireless technology is preferred for its low cost, small form factor, and flexibility, especially if the devices support multi-hop communication. So far, however, feedback control over wireless multi-hop networks has only been shown for update intervals on the order of seconds. This paper presents a wireless embedded system that tames imperfections impairing control performance (e.g., jitter and message loss), and a control design that exploits the essential properties of this system to provably guarantee closed-loop stability for physical processes with linear time-invariant dynamics. Using experiments on a cyber-physical testbed with 20 wireless nodes and multiple cart-pole systems, we are the first to demonstrate and evaluate feedback control and coordination over wireless multi-hop networks for update intervals of 20 to 50 milliseconds. △ Less

Submitted 19 February, 2019; v1 submitted 24 April, 2018; originally announced April 2018.

Comments: Accepted final version to appear in: 10th ACM/IEEE International Conference on Cyber-Physical Systems (with CPS-IoT Week 2019) (ICCPS '19), April 16--18, 2019, Montreal, QC, Canada

arXiv:1803.01802 [pdf, other]

doi 10.23919/ACC.2018.8431102

Event-triggered Learning for Resource-efficient Networked Control

Authors: Friedrich Solowjow, Dominik Baumann, Jochen Garcke, Sebastian Trimpe

Abstract: Common event-triggered state estimation (ETSE) algorithms save communication in networked control systems by predicting agents' behavior, and transmitting updates only when the predictions deviate significantly. The effectiveness in reducing communication thus heavily depends on the quality of the dynamics models used to predict the agents' states or measurements. Event-triggered learning is propo… ▽ More Common event-triggered state estimation (ETSE) algorithms save communication in networked control systems by predicting agents' behavior, and transmitting updates only when the predictions deviate significantly. The effectiveness in reducing communication thus heavily depends on the quality of the dynamics models used to predict the agents' states or measurements. Event-triggered learning is proposed herein as a novel concept to further reduce communication: whenever poor communication performance is detected, an identification experiment is triggered and an improved prediction model learned from data. Effective learning triggers are obtained by comparing the actual communication rate with the one that is expected based on the current model. By analyzing statistical properties of the inter-communication times and leveraging powerful convergence results, the proposed trigger is proven to limit learning experiments to the necessary instants. Numerical and physical experiments demonstrate that event-triggered learning improves robustness toward changing environments and yields lower communication rates than common ETSE. △ Less

Submitted 27 September, 2018; v1 submitted 5 March, 2018; originally announced March 2018.

Comments: 7 pages, 4 figures, to appear in the 2018 American Control Conference (ACC)

arXiv:1801.10395 [pdf, other]

Probabilistic Recurrent State-Space Models

Authors: Andreas Doerr, Christian Daniel, Martin Schiegg, Duy Nguyen-Tuong, Stefan Schaal, Marc Toussaint, Sebastian Trimpe

Abstract: State-space models (SSMs) are a highly expressive model class for learning patterns in time series data and for system identification. Deterministic versions of SSMs (e.g. LSTMs) proved extremely successful in modeling complex time series data. Fully probabilistic SSMs, however, are often found hard to train, even for smaller problems. To overcome this limitation, we propose a novel model formulat… ▽ More State-space models (SSMs) are a highly expressive model class for learning patterns in time series data and for system identification. Deterministic versions of SSMs (e.g. LSTMs) proved extremely successful in modeling complex time series data. Fully probabilistic SSMs, however, are often found hard to train, even for smaller problems. To overcome this limitation, we propose a novel model formulation and a scalable training algorithm based on doubly stochastic variational inference and Gaussian processes. In contrast to existing work, the proposed variational approximation allows one to fully capture the latent state temporal correlations. These correlations are the key to robust training. The effectiveness of the proposed PR-SSM is evaluated on a set of real-world benchmark datasets in comparison to state-of-the-art probabilistic model learning methods. Scalability and robustness are demonstrated on a high dimensional problem. △ Less

Submitted 10 February, 2018; v1 submitted 31 January, 2018; originally announced January 2018.

arXiv:1709.07089 [pdf, other]

doi 10.1109/CDC.2017.8264429

On the Design of LQR Kernels for Efficient Controller Learning

Authors: Alonso Marco, Philipp Hennig, Stefan Schaal, Sebastian Trimpe

Abstract: Finding optimal feedback controllers for nonlinear dynamic systems from data is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful framework for direct controller tuning from experimental trials. For selecting the next query point and finding the global optimum, BO relies on a probabilistic description of the latent objective function, typically a Gaussian process (GP). As… ▽ More Finding optimal feedback controllers for nonlinear dynamic systems from data is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful framework for direct controller tuning from experimental trials. For selecting the next query point and finding the global optimum, BO relies on a probabilistic description of the latent objective function, typically a Gaussian process (GP). As is shown herein, GPs with a common kernel choice can, however, lead to poor learning outcomes on standard quadratic control problems. For a first-order system, we construct two kernels that specifically leverage the structure of the well-known Linear Quadratic Regulator (LQR), yet retain the flexibility of Bayesian nonparametric learning. Simulations of uncertain linear and nonlinear systems demonstrate that the LQR kernels yield superior learning performance. △ Less

Submitted 20 September, 2017; originally announced September 2017.

Comments: 8 pages, 5 figures, to appear in 56th IEEE Conference on Decision and Control (CDC 2017)

arXiv:1707.01659 [pdf, other]

doi 10.1109/TAC.2017.2726002

Distributed Event-Based State Estimation for Networked Systems: An LMI-Approach

Authors: Michael Muehlebach, Sebastian Trimpe

Abstract: In this work, a dynamic system is controlled by multiple sensor-actuator agents, each of them commanding and observing parts of the system's input and output. The different agents sporadically exchange data with each other via a common bus network according to local event-triggering protocols. From these data, each agent estimates the complete dynamic state of the system and uses its estimate for… ▽ More In this work, a dynamic system is controlled by multiple sensor-actuator agents, each of them commanding and observing parts of the system's input and output. The different agents sporadically exchange data with each other via a common bus network according to local event-triggering protocols. From these data, each agent estimates the complete dynamic state of the system and uses its estimate for feedback control. We propose a synthesis procedure for designing the agents' state estimators and the event triggering thresholds. The resulting distributed and event-based control system is guaranteed to be stable and to satisfy a predefined estimation performance criterion. The approach is applied to the control of a vehicle platoon, where the method's trade-off between performance and communication, and the scalability in the number of agents is demonstrated. △ Less

Submitted 6 July, 2017; originally announced July 2017.

Comments: This is an extended version of an article to appear in the IEEE Transactions on Automatic Control (additional parts in the Appendix)

arXiv:1703.08342 [pdf, other]

doi 10.1049/iet-cta.2016.1021

Event-based State Estimation: An Emulation-based Approach

Authors: Sebastian Trimpe

Abstract: An event-based state estimation approach for reducing communication in a networked control system is proposed. Multiple distributed sensor agents observe a dynamic process and sporadically transmit their measurements to estimator agents over a shared bus network. Local event-triggering protocols ensure that data is transmitted only when necessary to meet a desired estimation accuracy. The event-ba… ▽ More An event-based state estimation approach for reducing communication in a networked control system is proposed. Multiple distributed sensor agents observe a dynamic process and sporadically transmit their measurements to estimator agents over a shared bus network. Local event-triggering protocols ensure that data is transmitted only when necessary to meet a desired estimation accuracy. The event-based design is shown to emulate the performance of a centralised state observer design up to guaranteed bounds, but with reduced communication. The stability results for state estimation are extended to the distributed control system that results when the local estimates are used for feedback control. Results from numerical simulations and hardware experiments illustrate the effectiveness of the proposed approach in reducing network communication. △ Less

Submitted 24 March, 2017; originally announced March 2017.

Comments: 21 pages, 8 figures, this article is based on the technical report arXiv:1511.05223 and is accepted for publication in IET Control Theory & Applications

arXiv:1703.02899 [pdf, other]

Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers

Authors: Andreas Doerr, Duy Nguyen-Tuong, Alonso Marco, Stefan Schaal, Sebastian Trimpe

Abstract: PID control architectures are widely used in industrial applications. Despite their low number of open parameters, tuning multiple, coupled PID controllers can become tedious in practice. In this paper, we extend PILCO, a model-based policy search framework, to automatically tune multivariate PID controllers purely based on data observed on an otherwise unknown system. The system's state is extend… ▽ More PID control architectures are widely used in industrial applications. Despite their low number of open parameters, tuning multiple, coupled PID controllers can become tedious in practice. In this paper, we extend PILCO, a model-based policy search framework, to automatically tune multivariate PID controllers purely based on data observed on an otherwise unknown system. The system's state is extended appropriately to frame the PID policy as a static state feedback policy. This renders PID tuning possible as the solution of a finite horizon optimal control problem without further a priori knowledge. The framework is applied to the task of balancing an inverted pendulum on a seven degree-of-freedom robotic arm, thereby demonstrating its capabilities of fast and data-efficient policy learning, even on complex real world problems. △ Less

Submitted 8 March, 2017; originally announced March 2017.

Comments: Accepted final version to appear in 2017 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:1703.01250 [pdf, other]

doi 10.1109/ICRA.2017.7989186

Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian Optimization

Authors: Alonso Marco, Felix Berkenkamp, Philipp Hennig, Angela P. Schoellig, Andreas Krause, Stefan Schaal, Sebastian Trimpe

Abstract: In practice, the parameters of control policies are often tuned manually. This is time-consuming and frustrating. Reinforcement learning is a promising alternative that aims to automate this process, yet often requires too many experiments to be practical. In this paper, we propose a solution to this problem by exploiting prior knowledge from simulations, which are readily available for most robot… ▽ More In practice, the parameters of control policies are often tuned manually. This is time-consuming and frustrating. Reinforcement learning is a promising alternative that aims to automate this process, yet often requires too many experiments to be practical. In this paper, we propose a solution to this problem by exploiting prior knowledge from simulations, which are readily available for most robotic platforms. Specifically, we extend Entropy Search, a Bayesian optimization algorithm that maximizes information gain from each experiment, to the case of multiple information sources. The result is a principled way to automatically combine cheap, but inaccurate information from simulations with expensive and accurate physical experiments in a cost-effective manner. We apply the resulting method to a cart-pole system, which confirms that the algorithm can find good control policies with fewer experiments than standard Bayesian optimization on the physical system only. △ Less

Submitted 3 March, 2017; originally announced March 2017.

Comments: 7 pages, 6 figures, to appear in IEEE 2017 International Conference on Robotics and Automation (ICRA)

arXiv:1609.07534 [pdf, other]

Predictive and Self Triggering for Event-based State Estimation

Authors: Sebastian Trimpe

Abstract: Event-based state estimation can achieve estimation quality comparable to traditional time-triggered methods, but with a significantly lower number of samples. In networked estimation problems, this reduction in sampling instants does, however, not necessarily translate into better usage of the shared communication resource. Because typical event-based approaches decide instantaneously whether com… ▽ More Event-based state estimation can achieve estimation quality comparable to traditional time-triggered methods, but with a significantly lower number of samples. In networked estimation problems, this reduction in sampling instants does, however, not necessarily translate into better usage of the shared communication resource. Because typical event-based approaches decide instantaneously whether communication is needed or not, free slots cannot be reallocated immediately, and hence remain unused. In this paper, novel predictive and self triggering protocols are proposed, which give the communication system time to adapt and reallocate freed resources. From a unified Bayesian decision framework, two schemes are developed: self-triggers that predict, at the current triggering instant, the next one; and predictive triggers that indicate, at every time step, whether communication will be needed at a given prediction horizon. The effectiveness of the proposed triggers in trading off estimation quality for communication reduction is compared in numerical simulations. △ Less

Submitted 23 September, 2016; originally announced September 2016.

Comments: 8 pages, 6 figures, accepted at 55th IEEE Conference on Decision and Control 2016

arXiv:1605.01950 [pdf, ps, other]

doi 10.1109/ICRA.2016.7487144

Automatic LQR Tuning Based on Gaussian Process Global Optimization

Authors: Alonso Marco, Philipp Hennig, Jeannette Bohg, Stefan Schaal, Sebastian Trimpe

Abstract: This paper proposes an automatic controller tuning framework based on linear optimal control combined with Bayesian optimization. With this framework, an initial set of controller gains is automatically improved according to a pre-defined performance objective evaluated from experimental data. The underlying Bayesian optimization algorithm is Entropy Search, which represents the latent objective a… ▽ More This paper proposes an automatic controller tuning framework based on linear optimal control combined with Bayesian optimization. With this framework, an initial set of controller gains is automatically improved according to a pre-defined performance objective evaluated from experimental data. The underlying Bayesian optimization algorithm is Entropy Search, which represents the latent objective as a Gaussian process and constructs an explicit belief over the location of the objective minimum. This is used to maximize the information gain from each experimental evaluation. Thus, this framework shall yield improved controllers with fewer evaluations compared to alternative approaches. A seven-degree-of-freedom robot arm balancing an inverted pole is used as the experimental demonstrator. Results of a two- and four-dimensional tuning problems highlight the method's potential for automatic controller tuning on robotic platforms. △ Less

Submitted 6 May, 2016; originally announced May 2016.

Comments: 8 pages, 5 figures, to appear in IEEE 2016 International Conference on Robotics and Automation. Video demonstration of the experiments available at https://am.is.tuebingen.mpg.de/publications/marco_icra_2016

arXiv:1602.06157 [pdf, other]

doi 10.1109/ICRA.2016.7487184

Depth-Based Object Tracking Using a Robust Gaussian Filter

Authors: Jan Issac, Manuel Wüthrich, Cristina Garcia Cifuentes, Jeannette Bohg, Sebastian Trimpe, Stefan Schaal

Abstract: We consider the problem of model-based 3D-tracking of objects given dense depth images as input. Two difficulties preclude the application of a standard Gaussian filter to this problem. First of all, depth sensors are characterized by fat-tailed measurement noise. To address this issue, we show how a recently published robustification method for Gaussian filters can be applied to the problem at ha… ▽ More We consider the problem of model-based 3D-tracking of objects given dense depth images as input. Two difficulties preclude the application of a standard Gaussian filter to this problem. First of all, depth sensors are characterized by fat-tailed measurement noise. To address this issue, we show how a recently published robustification method for Gaussian filters can be applied to the problem at hand. Thereby, we avoid using heuristic outlier detection methods that simply reject measurements if they do not match the model. Secondly, the computational cost of the standard Gaussian filter is prohibitive due to the high-dimensional measurement, i.e. the depth image. To address this problem, we propose an approximation to reduce the computational complexity of the filter. In quantitative experiments on real data we show how our method clearly outperforms the standard Gaussian filter. Furthermore, we compare its performance to a particle-filter-based tracking method, and observe comparable computational efficiency and improved accuracy and smoothness of the estimates. △ Less

Submitted 19 February, 2016; originally announced February 2016.

arXiv:1511.05223 [pdf, other]

Distributed Event-based State Estimation

Authors: Sebastian Trimpe

Abstract: An event-based state estimation approach for reducing communication in a networked control system is proposed. Multiple distributed sensor-actuator-agents observe a dynamic process and sporadically exchange their measurements and inputs over a bus network. Based on these data, each agent estimates the full state of the dynamic system, which may exhibit arbitrary inter-agent couplings. Local event-… ▽ More An event-based state estimation approach for reducing communication in a networked control system is proposed. Multiple distributed sensor-actuator-agents observe a dynamic process and sporadically exchange their measurements and inputs over a bus network. Based on these data, each agent estimates the full state of the dynamic system, which may exhibit arbitrary inter-agent couplings. Local event-based protocols ensure that data is transmitted only when necessary to meet a desired estimation accuracy. This event-based scheme is shown to mimic a centralized Luenberger observer design up to guaranteed bounds, and stability is proven in the sense of bounded estimation errors for bounded disturbances. The stability result extends to the distributed control system that results when the local state estimates are used for distributed feedback control. Simulation results highlight the benefit of the event-based approach over classical periodic ones in reducing communication requirements. △ Less

Submitted 26 January, 2017; v1 submitted 16 November, 2015; originally announced November 2015.

Comments: Technical report, 16 pages, 10 figures, minor updates

arXiv:1509.04072 [pdf, other]

Robust Gaussian Filtering using a Pseudo Measurement

Authors: Manuel Wüthrich, Cristina Garcia Cifuentes, Sebastian Trimpe, Franziska Meier, Jeannette Bohg, Jan Issac, Stefan Schaal

Abstract: Many sensors, such as range, sonar, radar, GPS and visual devices, produce measurements which are contaminated by outliers. This problem can be addressed by using fat-tailed sensor models, which account for the possibility of outliers. Unfortunately, all estimation algorithms belonging to the family of Gaussian filters (such as the widely-used extended Kalman filter and unscented Kalman filter) ar… ▽ More Many sensors, such as range, sonar, radar, GPS and visual devices, produce measurements which are contaminated by outliers. This problem can be addressed by using fat-tailed sensor models, which account for the possibility of outliers. Unfortunately, all estimation algorithms belonging to the family of Gaussian filters (such as the widely-used extended Kalman filter and unscented Kalman filter) are inherently incompatible with such fat-tailed sensor models. The contribution of this paper is to show that any Gaussian filter can be made compatible with fat-tailed sensor models by applying one simple change: Instead of filtering with the physical measurement, we propose to filter with a pseudo measurement obtained by applying a feature function to the physical measurement. We derive such a feature function which is optimal under some conditions. Simulation results show that the proposed method can effectively handle measurement outliers and allows for robust filtering in both linear and nonlinear systems. △ Less

Submitted 30 May, 2016; v1 submitted 14 September, 2015; originally announced September 2015.

arXiv:1504.07941 [pdf, other]

A New Perspective and Extension of the Gaussian Filter

Authors: Manuel Wüthrich, Sebastian Trimpe, Daniel Kappler, Stefan Schaal

Abstract: The Gaussian Filter (GF) is one of the most widely used filtering algorithms; instances are the Extended Kalman Filter, the Unscented Kalman Filter and the Divided Difference Filter. GFs represent the belief of the current state by a Gaussian with the mean being an affine function of the measurement. We show that this representation can be too restrictive to accurately capture the dependences in s… ▽ More The Gaussian Filter (GF) is one of the most widely used filtering algorithms; instances are the Extended Kalman Filter, the Unscented Kalman Filter and the Divided Difference Filter. GFs represent the belief of the current state by a Gaussian with the mean being an affine function of the measurement. We show that this representation can be too restrictive to accurately capture the dependences in systems with nonlinear observation models, and we investigate how the GF can be generalized to alleviate this problem. To this end, we view the GF from a variational-inference perspective. We analyse how restrictions on the form of the belief can be relaxed while maintaining simplicity and efficiency. This analysis provides a basis for generalizations of the GF. We propose one such generalization which coincides with a GF using a virtual measurement, obtained by applying a nonlinear function to the actual measurement. Numerical experiments show that the proposed Feature Gaussian Filter (FGF) can have a substantial performance advantage over the standard GF for systems with nonlinear observation models. △ Less

Submitted 5 June, 2015; v1 submitted 29 April, 2015; originally announced April 2015.

Comments: Will appear in Robotics: Science and Systems (R:SS) 2015

Showing 51–93 of 93 results for author: Trimpe, S