-
Data-Enabled Policy Optimization for Direct Adaptive Learning of the LQR
Authors:
Feiran Zhao,
Florian Dörfler,
Alessandro Chiuso,
Keyou You
Abstract:
Direct data-driven design methods for the linear quadratic regulator (LQR) mainly use offline or episodic data batches, and their online adaptation has been acknowledged as an open problem. In this paper, we propose a direct adaptive method to learn the LQR from online closed-loop data. First, we propose a new policy parameterization based on the sample covariance to formulate a direct data-driven…
▽ More
Direct data-driven design methods for the linear quadratic regulator (LQR) mainly use offline or episodic data batches, and their online adaptation has been acknowledged as an open problem. In this paper, we propose a direct adaptive method to learn the LQR from online closed-loop data. First, we propose a new policy parameterization based on the sample covariance to formulate a direct data-driven LQR problem, which is shown to be equivalent to the certainty-equivalence LQR with optimal non-asymptotic guarantees. Second, we design a novel data-enabled policy optimization (DeePO) method to directly update the policy, where the gradient is explicitly computed using only a batch of persistently exciting (PE) data. Third, we establish its global convergence via a projected gradient dominance property. Importantly, we efficiently use DeePO to adaptively learn the LQR by performing only one-step projected gradient descent per sample of the closed-loop system, which also leads to an explicit recursive update of the policy. Under PE inputs and for bounded noise, we show that the average regret of the LQR cost is upper-bounded by two terms signifying a sublinear decrease in time $\mathcal{O}(1/\sqrt{T})$ plus a bias scaling inversely with signal-to-noise ratio (SNR), which are independent of the noise statistics. Finally, we perform simulations to validate the theoretical results and demonstrate the computational and sample efficiency of our method.
△ Less
Submitted 19 April, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Harnessing the Final Control Error for Optimal Data-Driven Predictive Control
Authors:
Alessandro Chiuso,
Marco Fabris,
Valentina Breschi,
Simone Formentin
Abstract:
Model Predictive Control (MPC) is a powerful method for complex system regulation, but its reliance on accurate models poses many limitations in real-world applications. Data-driven predictive control (DDPC) offers a valid alternative, eliminating the need for model identification. However, it may falter in the presence of noisy data. In response, in this work, we present a unified stochastic fram…
▽ More
Model Predictive Control (MPC) is a powerful method for complex system regulation, but its reliance on accurate models poses many limitations in real-world applications. Data-driven predictive control (DDPC) offers a valid alternative, eliminating the need for model identification. However, it may falter in the presence of noisy data. In response, in this work, we present a unified stochastic framework for direct DDPC where control actions are obtained by optimizing the Final Control Error, directly computed from available data only, that automatically weighs the impact of uncertainty on the control objective. Our approach generalizes existing DDPC methods, like regularized Data-enabled Predictive Control (DeePC) and $γ$-DDPC, and thus provides a path toward noise-tolerant data-based control, with rigorous optimality guarantees. The theoretical investigation is complemented by a series of numerical case studies, revealing that the proposed method consistently outperforms or, at worst, matches existing techniques without requiring tuning regularization parameters as methods do.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Dynamic Brain Networks with Prescribed Functional Connectivity
Authors:
Umberto Casti,
Giacomo Baggio,
Danilo Benozzo,
Sandro Zampieri,
Alessandra Bertoldo,
Alessandro Chiuso
Abstract:
In this paper, we consider stable stochastic linear systems modeling whole-brain resting-state dynamics. We parametrize the state matrix of the system (effective connectivity) in terms of its steady-state covariance matrix (functional connectivity) and a skew-symmetric matrix $S$. We examine how the matrix $S$ influences some relevant dynamic properties of the system. Specifically, we show that a…
▽ More
In this paper, we consider stable stochastic linear systems modeling whole-brain resting-state dynamics. We parametrize the state matrix of the system (effective connectivity) in terms of its steady-state covariance matrix (functional connectivity) and a skew-symmetric matrix $S$. We examine how the matrix $S$ influences some relevant dynamic properties of the system. Specifically, we show that a large $S$ enhances the degree of stability and excitability of the system, and makes the latter more responsive to high-frequency inputs.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Simulation of Nonlinear Systems Trajectories: between Models and Behaviors
Authors:
Antonio Fazzi,
Alessandro Chiuso
Abstract:
In this paper, we study connections between the classical model-based approach to nonlinear system theory, where systems are represented by equations, and the nonlinear behavioral approach, where systems are defined as sets of trajectories. In particular, we focus on equivalent representations of the systems in the two frameworks for the problem of simulating a future nonlinear system trajectory s…
▽ More
In this paper, we study connections between the classical model-based approach to nonlinear system theory, where systems are represented by equations, and the nonlinear behavioral approach, where systems are defined as sets of trajectories. In particular, we focus on equivalent representations of the systems in the two frameworks for the problem of simulating a future nonlinear system trajectory starting from a given set of noisy data. The goal also includes extending some existing results from the deterministic to the stochastic setting.
△ Less
Submitted 29 May, 2024; v1 submitted 6 April, 2023;
originally announced April 2023.
-
On the impact of regularization in data-driven predictive control
Authors:
Valentina Breschi,
Alessandro Chiuso,
Marco Fabris,
Simone Formentin
Abstract:
Model predictive control (MPC) is a control strategy widely used in industrial applications. However, its implementation typically requires a mathematical model of the system being controlled, which can be a time-consuming and expensive task. Data-driven predictive control (DDPC) methods offer an alternative approach that does not require an explicit mathematical model, but instead optimize the co…
▽ More
Model predictive control (MPC) is a control strategy widely used in industrial applications. However, its implementation typically requires a mathematical model of the system being controlled, which can be a time-consuming and expensive task. Data-driven predictive control (DDPC) methods offer an alternative approach that does not require an explicit mathematical model, but instead optimize the control policy directly from data. In this paper, we study the impact of two different regularization penalties on the closed-loop performance of a recently introduced data-driven method called $γ$-DDPC. Moreover, we discuss the tuning of the related coefficients in different data and noise scenarios, to provide some guidelines for the end user.
△ Less
Submitted 23 March, 2024; v1 submitted 1 April, 2023;
originally announced April 2023.
-
Uncertainty-aware data-driven predictive control in a stochastic setting
Authors:
Valentina Breschi,
Marco Fabris,
Simone Formentin,
Alessandro Chiuso
Abstract:
Data-Driven Predictive Control (DDPC) has been recently proposed as an effective alternative to traditional Model Predictive Control (MPC), in that the same constrained optimization problem can be addressed without the need to explicitly identify a full model of the plant. However, DDPC is built upon input/output trajectories. Therefore, the finite sample effect of stochastic data, due to, e.g., m…
▽ More
Data-Driven Predictive Control (DDPC) has been recently proposed as an effective alternative to traditional Model Predictive Control (MPC), in that the same constrained optimization problem can be addressed without the need to explicitly identify a full model of the plant. However, DDPC is built upon input/output trajectories. Therefore, the finite sample effect of stochastic data, due to, e.g., measurement noise, may have a detrimental impact on closed-loop performance. Exploiting a formal statistical analysis of the prediction error, in this paper we propose the first systematic approach to deal with uncertainty due to finite sample effects. To this end, we introduce two regularization strategies for which, differently from existing regularization-based DDPC techniques, we propose a tuning rationale allowing us to select the regularization hyper-parameters before closing the loop and without additional experiments. Simulation results confirm the potential of the proposed strategy when closing the loop.
△ Less
Submitted 17 August, 2023; v1 submitted 18 November, 2022;
originally announced November 2022.
-
Data-driven predictive control in a stochastic setting: a unified framework
Authors:
Valentina Breschi,
Alessandro Chiuso,
Simone Formentin
Abstract:
Data-driven predictive control (DDPC) has been recently proposed as an effective alternative to traditional model-predictive control (MPC) for its unique features of being time-efficient and unbiased with respect to the oracle solution. Nonetheless, it has also been observed that noise may strongly jeopardize the final closed-loop performance since it affects both the data-based system representat…
▽ More
Data-driven predictive control (DDPC) has been recently proposed as an effective alternative to traditional model-predictive control (MPC) for its unique features of being time-efficient and unbiased with respect to the oracle solution. Nonetheless, it has also been observed that noise may strongly jeopardize the final closed-loop performance since it affects both the data-based system representation and the control update computed from the online measurements. Recent studies have shown that regularization is potentially a successful tool to counteract the effect of noise. At the same time, regularization requires the tuning of a set of penalty terms, whose choice might be practically difficult without closed-loop experiments. In this paper, by means of subspace identification tools, we pursue a three-fold goal: $(i)$ we set up a unified framework for the existing regularized data-driven predictive control schemes for stochastic systems; $(ii)$ we introduce $γ$-DDPC, an efficient two-stage scheme that splits the optimization problem into two parts: fitting the initial conditions and optimizing the future performance, while guaranteeing constraint satisfaction; $(iii)$ we discuss the role of regularization for data-driven predictive control, providing new insight on $when$ and $how$ it should be applied. A benchmark numerical case study finally illustrates the performance of $γ$-DDPC, showing how controller design can be simplified in terms of tuning effort and computational complexity when benefiting from the insights coming from the subspace identification realm.
△ Less
Submitted 21 November, 2022; v1 submitted 21 March, 2022;
originally announced March 2022.
-
Stacked Residuals of Dynamic Layers for Time Series Anomaly Detection
Authors:
L. Zancato,
A. Achille,
G. Paolini,
A. Chiuso,
S. Soatto
Abstract:
We present an end-to-end differentiable neural network architecture to perform anomaly detection in multivariate time series by incorporating a Sequential Probability Ratio Test on the prediction residual. The architecture is a cascade of dynamical systems designed to separate linearly predictable components of the signal such as trends and seasonality, from the non-linear ones. The former are mod…
▽ More
We present an end-to-end differentiable neural network architecture to perform anomaly detection in multivariate time series by incorporating a Sequential Probability Ratio Test on the prediction residual. The architecture is a cascade of dynamical systems designed to separate linearly predictable components of the signal such as trends and seasonality, from the non-linear ones. The former are modeled by local Linear Dynamic Layers, and their residual is fed to a generic Temporal Convolutional Network that also aggregates global statistics from different time series as context for the local predictions of each one. The last layer implements the anomaly detector, which exploits the temporal structure of the prediction residuals to detect both isolated point anomalies and set-point changes. It is based on a novel application of the classic CUMSUM algorithm, adapted through the use of a variational approximation of f-divergences. The model automatically adapts to the time scales of the observed signals. It approximates a SARIMA model at the get-go, and auto-tunes to the statistics of the signal and its covariates, without the need for supervision, as more data is observed. The resulting system, which we call STRIC, outperforms both state-of-the-art robust statistical methods and deep neural network architectures on multiple anomaly detection benchmarks.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
A novel Deep Neural Network architecture for non-linear system identification
Authors:
Luca Zancato,
Alessandro Chiuso
Abstract:
We present a novel Deep Neural Network (DNN) architecture for non-linear system identification. We foster generalization by constraining DNN representational power. To do so, inspired by fading memory systems, we introduce inductive bias (on the architecture) and regularization (on the loss function). This architecture allows for automatic complexity selection based solely on available data, in th…
▽ More
We present a novel Deep Neural Network (DNN) architecture for non-linear system identification. We foster generalization by constraining DNN representational power. To do so, inspired by fading memory systems, we introduce inductive bias (on the architecture) and regularization (on the loss function). This architecture allows for automatic complexity selection based solely on available data, in this way the number of hyper-parameters that must be chosen by the user is reduced. Exploiting the highly parallelizable DNN framework (based on Stochastic optimization methods) we successfully apply our method to large scale datasets.
△ Less
Submitted 6 June, 2021;
originally announced June 2021.
-
Estimating Koopman operators for nonlinear dynamical systems: a nonparametric approach
Authors:
Francesco Zanini,
Alessandro Chiuso
Abstract:
The Koopman operator is a mathematical tool that allows for a linear description of non-linear systems, but working in infinite dimensional spaces. Dynamic Mode Decomposition and Extended Dynamic Mode Decomposition are amongst the most popular finite dimensional approximation. In this paper we capture their core essence as a dual version of the same framework, incorporating them into the Kernel fr…
▽ More
The Koopman operator is a mathematical tool that allows for a linear description of non-linear systems, but working in infinite dimensional spaces. Dynamic Mode Decomposition and Extended Dynamic Mode Decomposition are amongst the most popular finite dimensional approximation. In this paper we capture their core essence as a dual version of the same framework, incorporating them into the Kernel framework. To do so, we leverage the RKHS as a suitable space for learning the Koopman dynamics, thanks to its intrinsic finite-dimensional nature, shaped by data. We finally establish a strong link between kernel methods and Koopman operators, leading to the estimation of the latter through Kernel functions. We provide also simulations for comparison with standard procedures.
△ Less
Submitted 25 March, 2021;
originally announced March 2021.
-
Stable spline identification of linear systems under missing data
Authors:
Gianluigi Pillonetto,
Alessandro Chiuso,
Giuseppe De Nicolao
Abstract:
A different route to identification of time-invariant linear systems has been recently proposed which does not require committing to a specific parametric model structure. Impulse responses are described in a nonparametric Bayesian framework as zero-mean Gaussian processes. Their covariances are given by the so-called stable spline kernels encoding information on regularity and BIBO stability. In…
▽ More
A different route to identification of time-invariant linear systems has been recently proposed which does not require committing to a specific parametric model structure. Impulse responses are described in a nonparametric Bayesian framework as zero-mean Gaussian processes. Their covariances are given by the so-called stable spline kernels encoding information on regularity and BIBO stability. In this paper, we demonstrate that these kernels also lead to a new family of radial basis functions kernels suitable to model system components subject to disturbances given by filtered white noise. This novel class, in cooperation with the stable spline kernels, paves the way to a new approach to solve missing data problems in both discrete and continuous-time settings. Numerical experiments show that the new technique may return models more predictive than those obtained by standard parametric Prediction Error Methods, also when these latter exploit the full data set.
△ Less
Submitted 14 May, 2020; v1 submitted 11 August, 2019;
originally announced August 2019.
-
Derivative-free online learning of inverse dynamics models
Authors:
Diego Romeres,
Mattia Zorzi,
Raffaello Camoriano,
Silvio Traversaro,
Alessandro Chiuso
Abstract:
This paper discusses online algorithms for inverse dynamics modelling in robotics. Several model classes including rigid body dynamics (RBD) models, data-driven models and semiparametric models (which are a combination of the previous two classes) are placed in a common framework. While model classes used in the literature typically exploit joint velocities and accelerations, which need to be appr…
▽ More
This paper discusses online algorithms for inverse dynamics modelling in robotics. Several model classes including rigid body dynamics (RBD) models, data-driven models and semiparametric models (which are a combination of the previous two classes) are placed in a common framework. While model classes used in the literature typically exploit joint velocities and accelerations, which need to be approximated resorting to numerical differentiation schemes, in this paper a new `derivative-free' framework is proposed that does not require this preprocessing step. An extensive experimental study with real data from the right arm of the iCub robot is presented, comparing different model classes and estimation procedures, showing that the proposed `derivative-free' methods outperform existing methodologies.
△ Less
Submitted 13 September, 2018;
originally announced September 2018.
-
The role of noise modeling in the estimation of resting-state brain effective connectivity
Authors:
Giulia Prando,
Mattia Zorzi,
Alessandra Bertoldo,
Alessandro Chiuso
Abstract:
Causal relations among neuronal populations of the brain are studied through the so-called effective connectivity (EC) network. The latter is estimated from EEG or fMRI measurements, by inverting a generative model of the corresponding data. It is clear that the goodness of the estimated network heavily depends on the underlying modeling assumptions. In this present paper we consider the EC estima…
▽ More
Causal relations among neuronal populations of the brain are studied through the so-called effective connectivity (EC) network. The latter is estimated from EEG or fMRI measurements, by inverting a generative model of the corresponding data. It is clear that the goodness of the estimated network heavily depends on the underlying modeling assumptions. In this present paper we consider the EC estimation problem using fMRI data in resting-state condition. Specifically, we investigate on how to model endogenous fluctuations driving the neuronal activity.
△ Less
Submitted 13 February, 2018;
originally announced February 2018.
-
Estimating effective connectivity in linear brain network models
Authors:
Giulia Prando,
Mattia Zorzi,
Alessandra Bertoldo,
Alessandro Chiuso
Abstract:
Contemporary neuroscience has embraced network science to study the complex and self-organized structure of the human brain; one of the main outstanding issues is that of inferring from measure data, chiefly functional Magnetic Resonance Imaging (fMRI), the so-called effective connectivity in brain networks, that is the existing interactions among neuronal populations. This inverse problem is comp…
▽ More
Contemporary neuroscience has embraced network science to study the complex and self-organized structure of the human brain; one of the main outstanding issues is that of inferring from measure data, chiefly functional Magnetic Resonance Imaging (fMRI), the so-called effective connectivity in brain networks, that is the existing interactions among neuronal populations. This inverse problem is complicated by the fact that the BOLD (Blood Oxygenation Level Dependent) signal measured by fMRI represent a dynamic and nonlinear transformation (the hemodynamic response) of neuronal activity. In this paper, we consider resting state (rs) fMRI data; building upon a linear population model of the BOLD signal and a stochastic linear DCM model, the model parameters are estimated through an EM-type iterative procedure, which alternately estimates the neuronal activity by means of the Rauch-Tung-Striebel (RTS) smoother, updates the connections among neuronal states and refines the parameters of the hemodynamic model; sparsity in the interconnection structure is favoured using an iteratively reweighting scheme. Experimental results using rs-fMRI data are shown demonstrating the effectiveness of our approach and comparison with state of the art routines (SPM12 toolbox) is provided.
△ Less
Submitted 30 March, 2017;
originally announced March 2017.
-
The Harmonic Analysis of Kernel Functions
Authors:
Mattia Zorzi,
Alessandro Chiuso
Abstract:
Kernel-based methods have been recently introduced for linear system identification as an alternative to parametric prediction error methods. Adopting the Bayesian perspective, the impulse response is modeled as a non-stationary Gaussian process with zero mean and with a certain kernel (i.e. covariance) function. Choosing the kernel is one of the most challenging and important issues. In the prese…
▽ More
Kernel-based methods have been recently introduced for linear system identification as an alternative to parametric prediction error methods. Adopting the Bayesian perspective, the impulse response is modeled as a non-stationary Gaussian process with zero mean and with a certain kernel (i.e. covariance) function. Choosing the kernel is one of the most challenging and important issues. In the present paper we introduce the harmonic analysis of this non-stationary process, and argue that this is an important tool which helps in designing such kernel. Furthermore, this analysis suggests also an effective way to approximate the kernel, which allows to reduce the computational burden of the identification procedure.
△ Less
Submitted 15 March, 2017;
originally announced March 2017.
-
Online Identification of Time-Varying Systems: a Bayesian approach
Authors:
Giulia Prando,
Diego Romeres,
Alessandro Chiuso
Abstract:
We extend the recently introduced regularization/Bayesian System Identification procedures to the estimation of time-varying systems. Specifically, we consider an online setting, in which new data become available at given time steps. The real-time estimation requirements imposed by this setting are met by estimating the hyper-parameters through just one gradient step in the marginal likelihood ma…
▽ More
We extend the recently introduced regularization/Bayesian System Identification procedures to the estimation of time-varying systems. Specifically, we consider an online setting, in which new data become available at given time steps. The real-time estimation requirements imposed by this setting are met by estimating the hyper-parameters through just one gradient step in the marginal likelihood maximization and by exploiting the closed-form availability of the impulse response estimate (when Gaussian prior and Gaussian measurement noise are postulated). By relying on the use of a forgetting factor, we propose two methods to tackle the tracking of time-varying systems. In one of them, the forgetting factor is estimated by treating it as a hyper-parameter of the Bayesian inference procedure.
△ Less
Submitted 23 September, 2016;
originally announced September 2016.
-
Online semi-parametric learning for inverse dynamics modeling
Authors:
Diego Romeres,
Mattia Zorzi,
Raffaello Camoriano,
Alessandro Chiuso
Abstract:
This paper presents a semi-parametric algorithm for online learning of a robot inverse dynamics model. It combines the strength of the parametric and non-parametric modeling. The former exploits the rigid body dynamics equa- tion, while the latter exploits a suitable kernel function. We provide an extensive comparison with other methods from the literature using real data from the iCub humanoid ro…
▽ More
This paper presents a semi-parametric algorithm for online learning of a robot inverse dynamics model. It combines the strength of the parametric and non-parametric modeling. The former exploits the rigid body dynamics equa- tion, while the latter exploits a suitable kernel function. We provide an extensive comparison with other methods from the literature using real data from the iCub humanoid robot. In doing so we also compare two different techniques, namely cross validation and marginal likelihood optimization, for estimating the hyperparameters of the kernel function.
△ Less
Submitted 9 October, 2016; v1 submitted 17 March, 2016;
originally announced March 2016.
-
On-line Bayesian System Identification
Authors:
Diego Romeres,
Giulia Prando,
Gianluigi Pillonetto,
Alessandro Chiuso
Abstract:
We consider an on-line system identification setting, in which new data become available at given time steps. In order to meet real-time estimation requirements, we propose a tailored Bayesian system identification procedure, in which the hyper-parameters are still updated through Marginal Likelihood maximization, but after only one iteration of a suitable iterative optimization algorithm. Both gr…
▽ More
We consider an on-line system identification setting, in which new data become available at given time steps. In order to meet real-time estimation requirements, we propose a tailored Bayesian system identification procedure, in which the hyper-parameters are still updated through Marginal Likelihood maximization, but after only one iteration of a suitable iterative optimization algorithm. Both gradient methods and the EM algorithm are considered for the Marginal Likelihood optimization. We compare this "1-step" procedure with the standard one, in which the optimization method is run until convergence to a local minimum. The experiments we perform confirm the effectiveness of the approach we propose.
△ Less
Submitted 17 January, 2016;
originally announced January 2016.
-
Regularization and Bayesian Learning in Dynamical Systems: Past, Present and Future
Authors:
A. Chiuso
Abstract:
Regularization and Bayesian methods for system identification have been repopularized in the recent years, and proved to be competitive w.r.t. classical parametric approaches. In this paper we shall make an attempt to illustrate how the use of regularization in system identification has evolved over the years, starting from the early contributions both in the Automatic Control as well as Econometr…
▽ More
Regularization and Bayesian methods for system identification have been repopularized in the recent years, and proved to be competitive w.r.t. classical parametric approaches. In this paper we shall make an attempt to illustrate how the use of regularization in system identification has evolved over the years, starting from the early contributions both in the Automatic Control as well as Econometrics and Statistics literature. In particular we shall discuss some fundamental issues such as compound estimation problems and exchangeability which play and important role in regularization and Bayesian approaches, as also illustrated in early publications in Statistics. The historical and foundational issues will be given more emphasis (and space), at the expense of the more recent developments which are only briefly discussed. The main reason for such a choice is that, while the recent literature is readily available, and surveys have already been published on the subject, in the author's opinion a clear link with past work had not been completely clarified.
△ Less
Submitted 4 November, 2015;
originally announced November 2015.
-
Sparse plus Low rank Network Identification: A Nonparametric Approach
Authors:
Mattia Zorzi,
Alessandro Chiuso
Abstract:
Modeling and identification of high-dimensional stochastic processes is ubiquitous in many fields. In particular, there is a growing interest in modeling stochastic processes with simple and interpretable structures. In many applications, such as econometrics and biomedical sciences, it seems natural to describe each component of that stochastic process in terms of few factor variables, which are…
▽ More
Modeling and identification of high-dimensional stochastic processes is ubiquitous in many fields. In particular, there is a growing interest in modeling stochastic processes with simple and interpretable structures. In many applications, such as econometrics and biomedical sciences, it seems natural to describe each component of that stochastic process in terms of few factor variables, which are not accessible for observation, and possibly of few other components of the stochastic process. These relations can be encoded in graphical way via a structured dynamic network, referred to as "sparse plus low-rank (S+L) network" hereafter. The problem of finding the S+L network as well as the dynamic model can be posed as a system identification problem. In this paper, we introduce two new nonparametric methods to identify dynamic models for stochastic processes described by a S+L network. These methods take inspiration from regularized estimators based on recently introduced kernels (e.g. "stable spline", "tuned-correlated" etc.). Numerical examples show the benefit to introduce the S+L structure in the identification procedure.
△ Less
Submitted 9 October, 2016; v1 submitted 10 October, 2015;
originally announced October 2015.
-
Maximum Entropy Vector Kernels for MIMO system identification
Authors:
Giulia Prando,
Gianluigi Pillonetto,
Alessandro Chiuso
Abstract:
Recent contributions have framed linear system identification as a nonparametric regularized inverse problem. Relying on $\ell_2$-type regularization which accounts for the stability and smoothness of the impulse response to be estimated, these approaches have been shown to be competitive w.r.t classical parametric methods. In this paper, adopting Maximum Entropy arguments, we derive a new…
▽ More
Recent contributions have framed linear system identification as a nonparametric regularized inverse problem. Relying on $\ell_2$-type regularization which accounts for the stability and smoothness of the impulse response to be estimated, these approaches have been shown to be competitive w.r.t classical parametric methods. In this paper, adopting Maximum Entropy arguments, we derive a new $\ell_2$ penalty deriving from a vector-valued kernel; to do so we exploit the structure of the Hankel matrix, thus controlling at the same time complexity, measured by the McMillan degree, stability and smoothness of the identified models. As a special case we recover the nuclear norm penalty on the squared block Hankel matrix. In contrast with previous literature on reweighted nuclear norm penalties, our kernel is described by a small number of hyper-parameters, which are iteratively updated through marginal likelihood maximization; constraining the structure of the kernel acts as a (hyper)regularizer which helps controlling the effective degrees of freedom of our estimator. To optimize the marginal likelihood we adapt a Scaled Gradient Projection (SGP) algorithm which is proved to be significantly computationally cheaper than other first and second order off-the-shelf optimization methods. The paper also contains an extensive comparison with many state-of-the-art methods on several Monte-Carlo studies, which confirms the effectiveness of our procedure.
△ Less
Submitted 29 September, 2016; v1 submitted 12 August, 2015;
originally announced August 2015.
-
Regularized linear system identification using atomic, nuclear and kernel-based norms: the role of the stability constraint
Authors:
Gianluigi Pillonetto,
Tianshi Chen,
Alessandro Chiuso,
Giuseppe De Nicolao,
Lennart Ljung
Abstract:
Inspired by ideas taken from the machine learning literature, new regularization techniques have been recently introduced in linear system identification. In particular, all the adopted estimators solve a regularized least squares problem, differing in the nature of the penalty term assigned to the impulse response. Popular choices include atomic and nuclear norms (applied to Hankel matrices) as w…
▽ More
Inspired by ideas taken from the machine learning literature, new regularization techniques have been recently introduced in linear system identification. In particular, all the adopted estimators solve a regularized least squares problem, differing in the nature of the penalty term assigned to the impulse response. Popular choices include atomic and nuclear norms (applied to Hankel matrices) as well as norms induced by the so called stable spline kernels. In this paper, a comparative study of estimators based on these different types of regularizers is reported. Our findings reveal that stable spline kernels outperform approaches based on atomic and nuclear norms since they suitably embed information on impulse response stability and smoothness. This point is illustrated using the Bayesian interpretation of regularization. We also design a new class of regularizers defined by "integral" versions of stable spline/TC kernels. Under quite realistic experimental conditions, the new estimators outperform classical prediction error methods also when the latter are equipped with an oracle for model order selection.
△ Less
Submitted 2 July, 2015;
originally announced July 2015.
-
Classical vs. Bayesian methods for linear system identification: point estimators and confidence sets
Authors:
D. Romeres,
G. Prando,
G. Pillonetto,
A. Chiuso
Abstract:
This paper compares classical parametric methods with recently developed Bayesian methods for system identification. A Full Bayes solution is considered together with one of the standard approximations based on the Empirical Bayes paradigm. Results regarding point estimators for the impulse response as well as for confidence regions are reported.
This paper compares classical parametric methods with recently developed Bayesian methods for system identification. A Full Bayes solution is considered together with one of the standard approximations based on the Empirical Bayes paradigm. Results regarding point estimators for the impulse response as well as for confidence regions are reported.
△ Less
Submitted 2 July, 2015;
originally announced July 2015.
-
Identification of stable models via nonparametric prediction error methods
Authors:
Diego Romeres,
Gianluigi Pillonetto,
Alessandro Chiuso
Abstract:
A new Bayesian approach to linear system identification has been proposed in a series of recent papers. The main idea is to frame linear system identification as predictor estimation in an infinite dimensional space, with the aid of regularization/Bayesian techniques. This approach guarantees the identification of stable predictors based on the prediction error minimization. Unluckily, the stabili…
▽ More
A new Bayesian approach to linear system identification has been proposed in a series of recent papers. The main idea is to frame linear system identification as predictor estimation in an infinite dimensional space, with the aid of regularization/Bayesian techniques. This approach guarantees the identification of stable predictors based on the prediction error minimization. Unluckily, the stability of the predictors does not guarantee the stability of the impulse response of the system. In this paper we propose and compare various techniques to address this issue. Simulations results comparing these techniques will be provided.
△ Less
Submitted 2 July, 2015;
originally announced July 2015.
-
Maximum entropy properties of discrete-time first-order stable spline kernel
Authors:
Tianshi Chen,
Tohid Ardeshiri,
Francesca P. Carli,
Alessandro Chiuso,
Lennart Ljung,
Gianluigi Pillonetto
Abstract:
The first order stable spline (SS-1) kernel is used extensively in regularized system identification. In particular, the stable spline estimator models the impulse response as a zero-mean Gaussian process whose covariance is given by the SS-1 kernel. In this paper, we discuss the maximum entropy properties of this prior. In particular, we formulate the exact maximum entropy problem solved by the S…
▽ More
The first order stable spline (SS-1) kernel is used extensively in regularized system identification. In particular, the stable spline estimator models the impulse response as a zero-mean Gaussian process whose covariance is given by the SS-1 kernel. In this paper, we discuss the maximum entropy properties of this prior. In particular, we formulate the exact maximum entropy problem solved by the SS-1 kernel without Gaussian and uniform sampling assumptions. Under general sampling schemes, we also explicitly derive the special structure underlying the SS-1 kernel (e.g. characterizing the tridiagonal nature of its inverse), also giving to it a maximum entropy covariance completion interpretation. Along the way similar maximum entropy properties of the Wiener kernel are also given.
△ Less
Submitted 13 April, 2015;
originally announced April 2015.
-
A Bayesian Approach to Sparse plus Low rank Network Identification
Authors:
Mattia Zorzi,
Alessandro Chiuso
Abstract:
We consider the problem of modeling multivariate time series with parsimonious dynamical models which can be represented as sparse dynamic Bayesian networks with few latent nodes. This structure translates into a sparse plus low rank model. In this paper, we propose a Gaussian regression approach to identify such a model.
We consider the problem of modeling multivariate time series with parsimonious dynamical models which can be represented as sparse dynamic Bayesian networks with few latent nodes. This structure translates into a sparse plus low rank model. In this paper, we propose a Gaussian regression approach to identify such a model.
△ Less
Submitted 26 September, 2015; v1 submitted 25 March, 2015;
originally announced March 2015.
-
Robust Inference for Visual-Inertial Sensor Fusion
Authors:
Konstantine Tsotsos,
Alessandro Chiuso,
Stefano Soatto
Abstract:
Inference of three-dimensional motion from the fusion of inertial and visual sensory data has to contend with the preponderance of outliers in the latter. Robust filtering deals with the joint inference and classification task of selecting which data fits the model, and estimating its state. We derive the optimal discriminant and propose several approximations, some used in the literature, others…
▽ More
Inference of three-dimensional motion from the fusion of inertial and visual sensory data has to contend with the preponderance of outliers in the latter. Robust filtering deals with the joint inference and classification task of selecting which data fits the model, and estimating its state. We derive the optimal discriminant and propose several approximations, some used in the literature, others new. We compare them analytically, by pointing to the assumptions underlying their approximations, and empirically. We show that the best performing method improves the performance of state-of-the-art visual-inertial sensor fusion systems, while retaining the same computational complexity.
△ Less
Submitted 15 December, 2014;
originally announced December 2014.
-
Visual Representations: Defining Properties and Deep Approximations
Authors:
Stefano Soatto,
Alessandro Chiuso
Abstract:
Visual representations are defined in terms of minimal sufficient statistics of visual data, for a class of tasks, that are also invariant to nuisance variability. Minimal sufficiency guarantees that we can store a representation in lieu of raw data with smallest complexity and no performance loss on the task at hand. Invariance guarantees that the statistic is constant with respect to uninformati…
▽ More
Visual representations are defined in terms of minimal sufficient statistics of visual data, for a class of tasks, that are also invariant to nuisance variability. Minimal sufficiency guarantees that we can store a representation in lieu of raw data with smallest complexity and no performance loss on the task at hand. Invariance guarantees that the statistic is constant with respect to uninformative transformations of the data. We derive analytical expressions for such representations and show they are related to feature descriptors commonly used in computer vision, as well as to convolutional neural networks. This link highlights the assumptions and approximations tacitly assumed by these methods and explains empirical practices such as clam**, pooling and joint normalization.
△ Less
Submitted 29 February, 2016; v1 submitted 27 November, 2014;
originally announced November 2014.
-
Bayesian and regularization approaches to multivariable linear system identification: the role of rank penalties
Authors:
Giulia Prando,
Alessandro Chiuso,
Gianluigi Pillonetto
Abstract:
Recent developments in linear system identification have proposed the use of non-parameteric methods, relying on regularization strategies, to handle the so-called bias/variance trade-off. This paper introduces an impulse response estimator which relies on an $\ell_2$-type regularization including a rank-penalty derived using the log-det heuristic as a smooth approximation to the rank function. Th…
▽ More
Recent developments in linear system identification have proposed the use of non-parameteric methods, relying on regularization strategies, to handle the so-called bias/variance trade-off. This paper introduces an impulse response estimator which relies on an $\ell_2$-type regularization including a rank-penalty derived using the log-det heuristic as a smooth approximation to the rank function. This allows to account for different properties of the estimated impulse response (e.g. smoothness and stability) while also penalizing high-complexity models. This also allows to account and enforce coupling between different input-output channels in MIMO systems. According to the Bayesian paradigm, the parameters defining the relative weight of the two regularization terms as well as the structure of the rank penalty are estimated optimizing the marginal likelihood. Once these hyperameters have been estimated, the impulse response estimate is available in closed form. Experiments show that the proposed method is superior to the estimator relying on the "classic" $\ell_2$-regularization alone as well as those based in atomic and nuclear norm.
△ Less
Submitted 29 September, 2014;
originally announced September 2014.
-
A scaled gradient projection method for Bayesian learning in dynamical systems
Authors:
Silvia Bonettini,
Alessandro Chiuso,
Marco Prato
Abstract:
A crucial task in system identification problems is the selection of the most appropriate model class, and is classically addressed resorting to cross-validation or using asymptotic arguments. As recently suggested in the literature, this can be addressed in a Bayesian framework, where model complexity is regulated by few hyperparameters, which can be estimated via marginal likelihood maximization…
▽ More
A crucial task in system identification problems is the selection of the most appropriate model class, and is classically addressed resorting to cross-validation or using asymptotic arguments. As recently suggested in the literature, this can be addressed in a Bayesian framework, where model complexity is regulated by few hyperparameters, which can be estimated via marginal likelihood maximization. It is thus of primary importance to design effective optimization methods to solve the corresponding optimization problem. If the unknown impulse response is modeled as a Gaussian process with a suitable kernel, the maximization of the marginal likelihood leads to a challenging nonconvex optimization problem, which requires a stable and effective solution strategy. In this paper we address this problem by means of a scaled gradient projection algorithm, in which the scaling matrix and the steplength parameter play a crucial role to provide a meaning solution in a computational time comparable with second order methods. In particular, we propose both a generalization of the split gradient approach to design the scaling matrix in the presence of box constraints, and an effective implementation of the gradient and objective function. The extensive numerical experiments carried out on several test problems show that our method is very effective in providing in few tenths of a second solutions of the problems with accuracy comparable with state-of-the-art approaches. Moreover, the flexibility of the proposed strategy makes it easily adaptable to a wider range of problems arising in different areas of machine learning, signal processing and system identification.
△ Less
Submitted 2 February, 2015; v1 submitted 25 June, 2014;
originally announced June 2014.
-
Convex vs nonconvex approaches for sparse estimation: GLasso, Multiple Kernel Learning and Hyperparameter GLasso
Authors:
Aleksandr Y. Aravkin,
James V. Burke,
Alessandro Chiuso,
Gianluigi Pillonetto
Abstract:
The popular Lasso approach for sparse estimation can be derived via marginalization of a joint density associated with a particular stochastic model. A different marginalization of the same probabilistic model leads to a different non-convex estimator where hyperparameters are optimized. Extending these arguments to problems where groups of variables have to be estimated, we study a computational…
▽ More
The popular Lasso approach for sparse estimation can be derived via marginalization of a joint density associated with a particular stochastic model. A different marginalization of the same probabilistic model leads to a different non-convex estimator where hyperparameters are optimized. Extending these arguments to problems where groups of variables have to be estimated, we study a computational scheme for sparse estimation that differs from the Group Lasso. Although the underlying optimization problem defining this estimator is non-convex, an initialization strategy based on a univariate Bayesian forward selection scheme is presented. This also allows us to define an effective non-convex estimator where only one scalar variable is involved in the optimization process. Theoretical arguments, independent of the correctness of the priors entering the sparse model, are included to clarify the advantages of this non-convex technique in comparison with other convex estimators. Numerical experiments are also used to compare the performance of these approaches.
△ Less
Submitted 26 February, 2013; v1 submitted 26 February, 2013;
originally announced February 2013.