Search | arXiv e-print repository

Sub-linear Regret in Adaptive Model Predictive Control

Authors: Damianos Tranos, Alexandre Proutiere

Abstract: We consider the problem of adaptive Model Predictive Control (MPC) for uncertain linear-systems with additive disturbances and with state and input constraints. We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online algorithm that combines the certainty-equivalence principle and polytopic tubes. Specifically, at any given step, STT-MPC infers the system dynamics using the… ▽ More We consider the problem of adaptive Model Predictive Control (MPC) for uncertain linear-systems with additive disturbances and with state and input constraints. We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online algorithm that combines the certainty-equivalence principle and polytopic tubes. Specifically, at any given step, STT-MPC infers the system dynamics using the Least Squares Estimator (LSE), and applies a controller obtained by solving an MPC problem using these estimates. The use of polytopic tubes is so that, despite the uncertainties, state and input constraints are satisfied, and recursive-feasibility and asymptotic stability hold. In this work, we analyze the regret of the algorithm, when compared to an oracle algorithm initially aware of the system dynamics. We establish that the expected regret of STT-MPC does not exceed $O(T^{1/2 + ε})$, where $ε\in (0,1)$ is a design parameter tuning the persistent excitation component of the algorithm. Our result relies on a recently proposed exponential decay of sensitivity property and, to the best of our knowledge, is the first of its kind in this setting. We illustrate the performance of our algorithm using a simple numerical example. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2304.02574 [pdf, other]

Conformal Off-Policy Evaluation in Markov Decision Processes

Authors: Daniele Foffano, Alessio Russo, Alexandre Proutiere

Abstract: Reinforcement Learning aims at identifying and evaluating efficient control policies from data. In many real-world applications, the learner is not allowed to experiment and cannot gather data in an online manner (this is the case when experimenting is expensive, risky or unethical). For such applications, the reward of a given policy (the target policy) must be estimated using historical data gat… ▽ More Reinforcement Learning aims at identifying and evaluating efficient control policies from data. In many real-world applications, the learner is not allowed to experiment and cannot gather data in an online manner (this is the case when experimenting is expensive, risky or unethical). For such applications, the reward of a given policy (the target policy) must be estimated using historical data gathered under a different policy (the behavior policy). Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees. We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty. The main challenge in OPE stems from the distribution shift due to the discrepancies between the target and the behavior policies. We propose and empirically evaluate different ways to deal with this shift. Some of these methods yield conformalized intervals with reduced length compared to existing approaches, while maintaining the same certainty level. △ Less

Submitted 19 September, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

Journal ref: 2023 62nd IEEE Conference on Decision and Control (CDC). IEEE, 2023

arXiv:2210.00502 [pdf, other]

Self-Tuning Tube-based Model Predictive Control

Authors: Damianos Tranos, Alessio Russo, Alexandre Proutiere

Abstract: We present Self-Tuning Tube-based Model Predictive Control (STT-MPC), an adaptive robust control algorithm for uncertain linear systems with additive disturbances based on the least-squares estimator and polytopic tubes. Our algorithm leverages concentration results to bound the system uncertainty set with prescribed confidence, and guarantees robust constraint satisfaction for this set, along wit… ▽ More We present Self-Tuning Tube-based Model Predictive Control (STT-MPC), an adaptive robust control algorithm for uncertain linear systems with additive disturbances based on the least-squares estimator and polytopic tubes. Our algorithm leverages concentration results to bound the system uncertainty set with prescribed confidence, and guarantees robust constraint satisfaction for this set, along with recursive feasibility and input-to-state stability. Persistence of excitation is ensured without compromising the algorithm's asymptotic performance or increasing its computational complexity. We demonstrate the performance of our algorithm using numerical experiments. △ Less

Submitted 2 October, 2022; originally announced October 2022.

arXiv:2209.03500 [pdf, other]

Tube-Based Zonotopic Data-Driven Predictive Control

Authors: Alessio Russo, Alexandre Proutiere

Abstract: We present a novel tube-based data-driven predictive control method for linear systems affected by a bounded addictive disturbance. Our method leverages recent results in the reachability analysis of unknown linear systems to formulate and solve a robust tube-based predictive control problem. More precisely, our approach consists in deriving, from the collected data, a zonotope that includes the t… ▽ More We present a novel tube-based data-driven predictive control method for linear systems affected by a bounded addictive disturbance. Our method leverages recent results in the reachability analysis of unknown linear systems to formulate and solve a robust tube-based predictive control problem. More precisely, our approach consists in deriving, from the collected data, a zonotope that includes the true state error set. We show how to guarantee the stability of the resulting error zonotope, which can be exploited to increase the computational efficiency of existing zonotopic data-driven MPC formulations. Results on a double-integrator affected by strong adversarial noise demonstrate the effectiveness of the proposed control approach. △ Less

Submitted 24 November, 2022; v1 submitted 7 September, 2022; originally announced September 2022.

arXiv:2109.14429 [pdf, ps, other]

Minimal Expected Regret in Linear Quadratic Control

Authors: Yassir Jedra, Alexandre Proutiere

Abstract: We consider the problem of online learning in Linear Quadratic Control systems whose state transition and state-action transition matrices $A$ and $B$ may be initially unknown. We devise an online learning algorithm and provide guarantees on its expected regret. This regret at time $T$ is upper bounded (i) by $\widetilde{O}((d_u+d_x)\sqrt{d_xT})$ when $A$ and $B$ are unknown, (ii) by… ▽ More We consider the problem of online learning in Linear Quadratic Control systems whose state transition and state-action transition matrices $A$ and $B$ may be initially unknown. We devise an online learning algorithm and provide guarantees on its expected regret. This regret at time $T$ is upper bounded (i) by $\widetilde{O}((d_u+d_x)\sqrt{d_xT})$ when $A$ and $B$ are unknown, (ii) by $\widetilde{O}(d_x^2\log(T))$ if only $A$ is unknown, and (iii) by $\widetilde{O}(d_x(d_u+d_x)\log(T))$ if only $B$ is unknown and under some mild non-degeneracy condition ($d_x$ and $d_u$ denote the dimensions of the state and of the control input, respectively). These regret scalings are minimal in $T$, $d_x$ and $d_u$ as they match existing lower bounds in scenario (i) when $d_x\le d_u$ [SF20], and in scenario (ii) [lai1986]. We conjecture that our upper bounds are also optimal in scenario (iii) (there is no known lower bound in this setting). Existing online algorithms proceed in epochs of (typically exponentially) growing durations. The control policy is fixed within each epoch, which considerably simplifies the analysis of the estimation error on $A$ and $B$ and hence of the regret. Our algorithm departs from this design choice: it is a simple variant of certainty-equivalence regulators, where the estimates of $A$ and $B$ and the resulting control policy can be updated as frequently as we wish, possibly at every step. Quantifying the impact of such a constantly-varying control policy on the performance of these estimates and on the regret constitutes one of the technical challenges tackled in this paper. △ Less

Submitted 29 September, 2021; originally announced September 2021.

arXiv:2109.07171 [pdf, other]

Balancing detectability and performance of attacks on the control channel of Markov Decision Processes

Authors: Alessio Russo, Alexandre Proutiere

Abstract: We investigate the problem of designing optimal stealthy poisoning attacks on the control channel of Markov decision processes (MDPs). This research is motivated by the recent interest of the research community for adversarial and poisoning attacks applied to MDPs, and reinforcement learning (RL) methods. The policies resulting from these methods have been shown to be vulnerable to attacks perturb… ▽ More We investigate the problem of designing optimal stealthy poisoning attacks on the control channel of Markov decision processes (MDPs). This research is motivated by the recent interest of the research community for adversarial and poisoning attacks applied to MDPs, and reinforcement learning (RL) methods. The policies resulting from these methods have been shown to be vulnerable to attacks perturbing the observations of the decision-maker. In such an attack, drawing inspiration from adversarial examples used in supervised learning, the amplitude of the adversarial perturbation is limited according to some norm, with the hope that this constraint will make the attack imperceptible. However, such constraints do not grant any level of undetectability and do not take into account the dynamic nature of the underlying Markov process. In this paper, we propose a new attack formulation, based on information-theoretical quantities, that considers the objective of minimizing the detectability of the attack as well as the performance of the controlled process. We analyze the trade-off between the efficiency of the attack and its detectability. We conclude with examples and numerical simulations illustrating this trade-off. △ Less

Submitted 15 September, 2021; originally announced September 2021.

arXiv:2103.06208 [pdf, other]

Data-Driven Control and Data-Poisoning attacks in Buildings: the KTH Live-In Lab case study

Authors: Alessio Russo, Marco Molinari, Alexandre Proutiere

Abstract: This work investigates the feasibility of using input-output data-driven control techniques for building control and their susceptibility to data-poisoning techniques. The analysis is performed on a digital replica of the KTH Livein Lab, a non-linear validated model representing one of the KTH Live-in Lab building testbeds. This work is motivated by recent trends showing a surge of interest in usi… ▽ More This work investigates the feasibility of using input-output data-driven control techniques for building control and their susceptibility to data-poisoning techniques. The analysis is performed on a digital replica of the KTH Livein Lab, a non-linear validated model representing one of the KTH Live-in Lab building testbeds. This work is motivated by recent trends showing a surge of interest in using data-based techniques to control cyber-physical systems. We also analyze the susceptibility of these controllers to data-poisoning methods, a particular type of machine learning threat geared towards finding imperceptible attacks that can undermine the performance of the system under consideration. We consider the Virtual Reference Feedback Tuning (VRFT), a popular data-driven control technique, and show its performance on the KTH Live-In Lab digital replica. We then demonstrate how poisoning attacks can be crafted and illustrate the impact of such attacks. Numerical experiments reveal the feasibility of using data-driven control methods for finding efficient control laws. However, a subtle change in the datasets can significantly deteriorate the performance of VRFT. △ Less

Submitted 10 March, 2021; originally announced March 2021.

arXiv:2103.06199 [pdf, other]

Poisoning Attacks against Data-Driven Control Methods

Authors: Alessio Russo, Alexandre Proutiere

Abstract: This paper investigates poisoning attacks against data-driven control methods. This work is motivated by recent trends showing that, in supervised learning, slightly modifying the data in a malicious manner can drastically deteriorate the prediction ability of the trained model. We extend these analyses to the case of data-driven control methods. Specifically, we investigate how a malicious advers… ▽ More This paper investigates poisoning attacks against data-driven control methods. This work is motivated by recent trends showing that, in supervised learning, slightly modifying the data in a malicious manner can drastically deteriorate the prediction ability of the trained model. We extend these analyses to the case of data-driven control methods. Specifically, we investigate how a malicious adversary can poison the data so as to minimize the performance of a controller trained using this data. We show that identifying the most impactful attack boils down to solving a bi-level non-convex optimization problem, and provide theoretical insights on the attack. We present a generic algorithm finding a local optimum of this problem and illustrate our analysis in the case of a model-reference based approach, the Virtual Reference Feedback Tuning technique, and on data-driven methods based on Willems et al. lemma. Numerical experiments reveal that minimal but well-crafted changes in the dataset are sufficient to deteriorate the performance of data-driven control methods significantly, and even make the closed-loop system unstable. △ Less

Submitted 10 March, 2021; originally announced March 2021.

arXiv:2103.01658 [pdf, other]

Minimizing Information Leakage of Abrupt Changes in Stochastic Systems

Authors: Alessio Russo, Alexandre Proutiere

Abstract: This work investigates the problem of analyzing privacy of abrupt changes for general Markov processes. These processes may be affected by changes, or exogenous signals, that need to remain private. Privacy refers to the disclosure of information of these changes through observations of the underlying Markov chain. In contrast to previous work on privacy, we study the problem for an online sequenc… ▽ More This work investigates the problem of analyzing privacy of abrupt changes for general Markov processes. These processes may be affected by changes, or exogenous signals, that need to remain private. Privacy refers to the disclosure of information of these changes through observations of the underlying Markov chain. In contrast to previous work on privacy, we study the problem for an online sequence of data. We use theoretical tools from optimal detection theory to motivate a definition of online privacy based on the average amount of information per observation of the stochastic system in consideration. Two cases are considered: the full-information case, where the eavesdropper measures all but the signals that indicate a change, and the limited-information case, where the eavesdropper only measures the state of the Markov process. For both cases, we provide ways to derive privacy upper-bounds and compute policies that attain a higher privacy level. It turns out that the problem of computing privacy-aware policies is concave, and we conclude with some examples and numerical simulations for both cases. △ Less

Submitted 30 September, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

arXiv:2003.07937 [pdf, ps, other]

Finite-time Identification of Stable Linear Systems: Optimality of the Least-Squares Estimator

Authors: Yassir Jedra, Alexandre Proutiere

Abstract: We present a new finite-time analysis of the estimation error of the Ordinary Least Squares (OLS) estimator for stable linear time-invariant systems. We characterize the number of observed samples (the length of the observed trajectory) sufficient for the OLS estimator to be $(\varepsilon,δ)$-PAC, i.e., to yield an estimation error less than $\varepsilon$ with probability at least $1-δ$. We show t… ▽ More We present a new finite-time analysis of the estimation error of the Ordinary Least Squares (OLS) estimator for stable linear time-invariant systems. We characterize the number of observed samples (the length of the observed trajectory) sufficient for the OLS estimator to be $(\varepsilon,δ)$-PAC, i.e., to yield an estimation error less than $\varepsilon$ with probability at least $1-δ$. We show that this number matches existing sample complexity lower bounds [1,2] up to universal multiplicative factors (independent of ($\varepsilon,δ)$ and of the system). This paper hence establishes the optimality of the OLS estimator for stable systems, a result conjectured in [1]. Our analysis of the performance of the OLS estimator is simpler, sharper, and easier to interpret than existing analyses. It relies on new concentration results for the covariates matrix. △ Less

Submitted 26 March, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

arXiv:1903.10343 [pdf, ps, other]

Sample Complexity Lower Bounds for Linear System Identification

Authors: Yassir Jedra, Alexandre Proutiere

Abstract: This paper establishes problem-specific sample complexity lower bounds for linear system identification problems. The sample complexity is defined in the PAC framework: it corresponds to the time it takes to identify the system parameters with prescribed accuracy and confidence levels. By problem-specific, we mean that the lower bound explicitly depends on the system to be identified (which contra… ▽ More This paper establishes problem-specific sample complexity lower bounds for linear system identification problems. The sample complexity is defined in the PAC framework: it corresponds to the time it takes to identify the system parameters with prescribed accuracy and confidence levels. By problem-specific, we mean that the lower bound explicitly depends on the system to be identified (which contrasts with minimax lower bounds), and hence really captures the identification hardness specific to the system. We consider both uncontrolled and controlled systems. For uncontrolled systems, the lower bounds are valid for any linear system, stable or not, and only depend of the system finite-time controllability gramian. A simplified lower bound depending on the spectrum of the system only is also derived. In view of recent finitetime analysis of classical estimation methods (e.g. ordinary least squares), our sample complexity lower bounds are tight for many systems. For controlled systems, our lower bounds are not as explicit as in the case of uncontrolled systems, but could well provide interesting insights into the design of control policy with minimal sample complexity. △ Less

Submitted 25 March, 2019; originally announced March 2019.

arXiv:1412.7011 [pdf, ps, other]

Network Synchronization with Convexity

Authors: Guodong Shi, Alexandre Proutiere, Karl Henrik Johansson

Abstract: In this paper, we establish a few new synchronization conditions for complex networks with nonlinear and nonidentical self-dynamics with switching directed communication graphs. In light of the recent works on distributed sub-gradient methods, we impose integral convexity for the nonlinear node self-dynamics in the sense that the self-dynamics of a given node is the gradient of some concave functi… ▽ More In this paper, we establish a few new synchronization conditions for complex networks with nonlinear and nonidentical self-dynamics with switching directed communication graphs. In light of the recent works on distributed sub-gradient methods, we impose integral convexity for the nonlinear node self-dynamics in the sense that the self-dynamics of a given node is the gradient of some concave function corresponding to that node. The node couplings are assumed to be linear but with switching directed communication graphs. Several sufficient and/or necessary conditions are established for exact or approximate synchronization over the considered complex networks. These results show when and how nonlinear node self-dynamics may cooperate with the linear diffusive coupling, which eventually leads to network synchronization conditions under relaxed connectivity requirements. △ Less

Submitted 16 October, 2015; v1 submitted 22 December, 2014; originally announced December 2014.

Comments: Based on our previous manuscript arXiv:1210.6685. SIAM Journal on Control and Optimization, in press 2016

arXiv:1411.0074 [pdf, other]

doi 10.1109/TCNS.2014.2378915

Emergent Behaviors over Signed Random Dynamical Networks: State-Flip** Model

Authors: Guodong Shi, Alexandre Proutiere, Mikael Johansson, John S. Baras, Karl H. Johansson

Abstract: Recent studies from social, biological, and engineering network systems have drawn attention to the dynamics over signed networks, where each link is associated with a positive/negative sign indicating trustful/mistrustful, activator/inhibitor, or secure/malicious interactions. We study asymptotic dynamical patterns that emerge among a set of nodes that interact in a dynamically evolving signed ra… ▽ More Recent studies from social, biological, and engineering network systems have drawn attention to the dynamics over signed networks, where each link is associated with a positive/negative sign indicating trustful/mistrustful, activator/inhibitor, or secure/malicious interactions. We study asymptotic dynamical patterns that emerge among a set of nodes that interact in a dynamically evolving signed random network. Node interactions take place at random on a sequence of deterministic signed graphs. Each node receives positive or negative recommendations from its neighbors depending on the sign of the interaction arcs, and updates its state accordingly. Recommendations along a positive arc follow the standard consensus update. As in the work by Altafini, negative recommendations use an update where the sign of the neighbor state is flipped. Nodes may weight positive and negative recommendations differently, and random processes are introduced to model the time-varying attention that nodes pay to these recommendations. Conditions for almost sure convergence and divergence of the node states are established. We show that under this so-called state-flip** model, all links contribute to a consensus of the absolute values of the nodes, even under switching sign patterns and dynamically changing environment. A no-survivor property is established, indicating that every node state diverges almost surely if the maximum network state diverges. △ Less

Submitted 1 November, 2014; originally announced November 2014.

Comments: IEEE Transactions on Control of Network Systems, in press. arXiv admin note: substantial text overlap with arXiv:1309.5488

arXiv:1309.2574 [pdf, ps, other]

Randomized Consensus with Attractive and Repulsive Links

Authors: Guodong Shi, Alexandre Proutiere, Mikael Johansson, Karl H. Johansson

Abstract: We study convergence properties of a randomized consensus algorithm over a graph with both attractive and repulsive links. At each time instant, a node is randomly selected to interact with a random neighbor. Depending on if the link between the two nodes belongs to a given subgraph of attractive or repulsive links, the node update follows a standard attractive weighted average or a repulsive weig… ▽ More We study convergence properties of a randomized consensus algorithm over a graph with both attractive and repulsive links. At each time instant, a node is randomly selected to interact with a random neighbor. Depending on if the link between the two nodes belongs to a given subgraph of attractive or repulsive links, the node update follows a standard attractive weighted average or a repulsive weighted average, respectively. The repulsive update has the opposite sign of the standard consensus update. In this way, it counteracts the consensus formation and can be seen as a model of link faults or malicious attacks in a communication network, or the impact of trust and antagonism in a social network. Various probabilistic convergence and divergence conditions are established. A threshold condition for the strength of the repulsive action is given for convergence in expectation: when the repulsive weight crosses this threshold value, the algorithm transits from convergence to divergence. An explicit value of the threshold is derived for classes of attractive and repulsive graphs. The results show that a single repulsive link can sometimes drastically change the behavior of the consensus algorithm. They also explicitly show how the robustness of the consensus algorithm depends on the size and other properties of the graphs. △ Less

Submitted 9 September, 2013; originally announced September 2013.

arXiv:1210.6685 [pdf, ps, other]

Distributed Optimization: Convergence Conditions from a Dynamical System Perspective

Authors: Guodong Shi, Alexandre Proutiere, Karl Henrik Johansson

Abstract: This paper explores the fundamental properties of distributed minimization of a sum of functions with each function only known to one node, and a pre-specified level of node knowledge and computational capacity. We define the optimization information each node receives from its objective function, the neighboring information each node receives from its neighbors, and the computational capacity eac… ▽ More This paper explores the fundamental properties of distributed minimization of a sum of functions with each function only known to one node, and a pre-specified level of node knowledge and computational capacity. We define the optimization information each node receives from its objective function, the neighboring information each node receives from its neighbors, and the computational capacity each node can take advantage of in controlling its state. It is proven that there exist a neighboring information way and a control law that guarantee global optimal consensus if and only if the solution sets of the local objective functions admit a nonempty intersection set for fixed strongly connected graphs. Then we show that for any tolerated error, we can find a control law that guarantees global optimal consensus within this error for fixed, bidirectional, and connected graphs under mild conditions. For time-varying graphs, we show that optimal consensus can always be achieved as long as the graph is uniformly jointly strongly connected and the nonempty intersection condition holds. The results illustrate that nonempty intersection for the local optimal solution sets is a critical condition for successful distributed optimization for a large class of algorithms. △ Less

Submitted 24 October, 2012; originally announced October 2012.

Showing 1–15 of 15 results for author: Proutiere, A