Joint State and Parameter Estimation Using the Partial Errors-in-Variables Principle

Peng Liu, Kailai Li, Gustaf Hendeby, and Fredrik Gustafsson The work is funded in part by the Swedish Research Council under the grant Scalable Kalman Filters and in part by ZENITH of Linkö** University under the grant Computational Agile Sensing and Inference for Intelligent Systems.Peng Liu, Gustaf Hendeby, and Fredrik Gustafsson are with the Division of Automatic Control, Department of Electrical Engineering, Linkö** University, Linkö**, Sweden. Email: {peng.liu, gustaf.hendeby, fredrik.gustafsson}@liu.seKailai Li is with the Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, Groningen, the Netherlands. Email: [email protected]
Abstract

This letter proposes a new method for joint state and parameter estimation in uncertain dynamical systems. We exploit the partial errors-in-variables (PEIV) principle and formulate a regression problem in the sense of weighted total least squares, where the uncertainty in the parameter prior is explicitly considered. Based thereon, the PEIV regression can be solved iteratively through the Kalman smoothing and the regularized least squares for estimating the state and the parameter, respectively. The simulations demonstrate improved accuracy of the proposed method compared to existing approaches, including the joint maximum a posterior-maximum likelihood, the expectation maximisation, and the augmented state extended Kalman smoother.

Index Terms:
Joint state and parameter estimation, partial errors-in-variables model, iterative estimation

I Introduction

Estimating states of uncertain dynamical systems plays fundamental roles in statistical signal processing and has various application scenarios, such as localization, tracking, energy, and robotics [1, 2, 3, 4], etc. Conventionally, state estimation problems can be solved recursively, either online using the Kalman filter and its derivatives, such as the extended Kalman filter (EKF) [5], or offline based on the smoothing techniques, such as the extended Kalman smoother (EKS), for enhanced estimation accuracy [6, 7].

However, standard filtering and smoothing algorithms assume the complete knowledge of the models, which is hard to reach in practice. A more realistic but more challenging scenario involves state-space modeling with unknown or uncertain parameters. One strategy to mitigate this issue is to augment the state with the parameter for joint estimation within the framework of EKF or EKS [8]. While the resulting augmented state EKF or EKS have become popular owing to its simplicity, they may suffer from poor estimation accuracy due to observability degradation [9]. Alternatively, iterative estimation methods have been investigated for joint state and parameter estimation, such as the maximum likelihood (ML) method [10], which demonstrates favorable asymptotic properties and has been applied for state-space models together with the expectation maximisation (EM) algorithm [11]. However, the ML method may deliver biased parameter estimates and fail to reach the Cramér–Rao bound given small datasets [12]. This issue can be mitigated by the joint maximum a posterior-maximum likelihood (JMAP-ML) method involving numerical optimisation, such as the coordinate descent algorithm [13]. This method has been widely exploited in many tasks including sensor calibration, epidemic modeling, and robust localization [14, 15, 16]. However, the JMAP-ML disregards the uncertainty in the parameter prior, which may lead to insufficient accuracy [12].

In this paper, we investigate the possibility of explicitly incorporating the uncertainty in the parameter prior for joint state and parameter estimation of linear models, where the partial errors-in-variables (PEIV) model is exploited for regression. The standard errors-in-variables (EIV) model contains a regressor matrix that is subject to noise corruption [17, 18], which can be handled by the total least squares (TLS) for i.i.d. regressor and measurement noises [19] or the weighted total least squares (WTLS) for more general noise patterns [20, 21]. For the PEIV model, where the regressor matrix is partially uncertain, it is possible to reformulate the model w.r.t. the uncertain part and apply WTLS for regression [22]. To the best of the authors’ knowledge, there is no existing literature that investigates joint state and parameter estimation problem from an errors-in-variables perspective.

Contribution

We propose a novel iterative framework for joint state and parameter estimation based on the partial errors-in-variables (PEIV) principle, which explicitly addresses the uncertainty in parameter prior. The joint estimation problem is formulated in the sense of WTLS and solved iteratively through the Kalman smoothing and the regularized least squares for estimating the state and parameter, respectively. The proposed method is evaluated through Monte Carlo simulation. Numerical results show its improved parameter estimation accuracy in comparison with the JMAP-ML, the EM, and the augmented state extended Kalman smoother (ASEKS).

The remainder of the paper is organized as follows. Sec. II provides the signal model, followed by an overview of existing methods in Sec. III and IV. Sec. V introduces the proposed PEIV-based framework, and Sec. VI presents the numerical simulation. Finally, conclusions are drawn in Sec. VII.

II Signal Model

To make the derivations explicit, we will assume a state-space model that is linear in both the state and parameters

xk+1subscript𝑥𝑘1\displaystyle x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT =F(θo)xk+vk,absent𝐹superscript𝜃osubscript𝑥𝑘subscript𝑣𝑘\displaystyle=F(\theta^{\text{o}})x_{k}+v_{k}\,,= italic_F ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , (1)
yksubscript𝑦𝑘\displaystyle y_{k}italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =H(θo)xk+ek,absent𝐻superscript𝜃osubscript𝑥𝑘subscript𝑒𝑘\displaystyle=H(\theta^{\text{o}})x_{k}+e_{k}\,,= italic_H ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ,

where the state-space matrices F(θo)𝐹superscript𝜃oF(\theta^{{\text{o}}})italic_F ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) and H(θo)𝐻superscript𝜃oH(\theta^{{\text{o}}})italic_H ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) are linear functions of the true parameter value θodsuperscript𝜃osuperscript𝑑\theta^{{\text{o}}}\in\mathbbm{R}^{d}italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT given as

F(θo)𝐹superscript𝜃o\displaystyle F(\theta^{{\text{o}}})italic_F ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) =F0+i=1dθioFiandabsentsubscript𝐹0superscriptsubscript𝑖1𝑑subscriptsuperscript𝜃o𝑖subscript𝐹𝑖and\displaystyle=F_{0}+\sum_{i=1}^{d}\theta^{{\text{o}}}_{i}F_{i}\quad\text{and}= italic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and (2)
H(θo)𝐻superscript𝜃o\displaystyle H(\theta^{{\text{o}}})italic_H ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) =H0+i=1dθioHi,absentsubscript𝐻0superscriptsubscript𝑖1𝑑subscriptsuperscript𝜃o𝑖subscript𝐻𝑖\displaystyle=H_{0}+\sum_{i=1}^{d}\theta^{{\text{o}}}_{i}H_{i}\,,= italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

respectively. The matrices Fisubscript𝐹𝑖F_{i}italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Hisubscript𝐻𝑖H_{i}italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are assumed to be known. xknsubscript𝑥𝑘superscript𝑛x_{k}\in\mathbbm{R}^{n}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT denotes the state vector, and ykmsubscript𝑦𝑘superscript𝑚y_{k}\in\mathbbm{R}^{m}italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT is the measurement. vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and eksubscript𝑒𝑘e_{k}italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are the white Gaussian-distributed process and measurement noises of covariance matrices Q𝑄Qitalic_Q and R𝑅Ritalic_R, respectively. θiosubscriptsuperscript𝜃o𝑖\theta^{{\text{o}}}_{i}italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the i𝑖iitalic_i-th element in the parameter vector. Further, xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and eksubscript𝑒𝑘e_{k}italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are assumed to be mutually independent. The initial state and the parameter priors are assumed to be Gaussian-distributed with

x0osubscriptsuperscript𝑥o0\displaystyle x^{{\text{o}}}_{0}italic_x start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 𝒩(m0,P0)andsimilar-toabsent𝒩subscript𝑚0subscript𝑃0and\displaystyle\sim\mathcal{N}(m_{0},P_{0})\quad\text{and}∼ caligraphic_N ( italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and (3)
θ^^𝜃\displaystyle\hat{\theta}over^ start_ARG italic_θ end_ARG 𝒩(θo,Σθ),similar-toabsent𝒩superscript𝜃osubscriptΣ𝜃\displaystyle\sim\mathcal{N}(\theta^{\text{o}},\Sigma_{\theta})\,,∼ caligraphic_N ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT , roman_Σ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) ,

respectively. Let Xo=[(x0o),,(xNo)]superscript𝑋osuperscriptsuperscriptsubscriptsuperscript𝑥o0topsuperscriptsubscriptsuperscript𝑥o𝑁toptopX^{\text{o}}=[\,(x^{\text{o}}_{0})^{\top},\dots,(x^{\text{o}}_{N})^{\top}\,]^{\top}italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT = [ ( italic_x start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , … , ( italic_x start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT contain all the state vectors and Y=[y1,,yN]𝑌superscriptsuperscriptsubscript𝑦1topsuperscriptsubscript𝑦𝑁toptopY=[\,y_{1}^{\top},\dots,y_{N}^{\top}\,]^{\top}italic_Y = [ italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT all the measurement. xkosubscriptsuperscript𝑥o𝑘x^{{\text{o}}}_{k}italic_x start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT denotes the true state. Using a prior on the state and parameter allows the MAP approach maximising P(X,θ|Y)𝑃𝑋conditional𝜃𝑌P(X,\theta|Y)italic_P ( italic_X , italic_θ | italic_Y ), but we compare to the EM approach that maximises P(Y|θ)𝑃conditional𝑌𝜃P(Y|\theta)italic_P ( italic_Y | italic_θ ) and the JMAP-ML method that maximises P(Y,X|θ)𝑃𝑌conditional𝑋𝜃P(Y,X|\theta)italic_P ( italic_Y , italic_X | italic_θ ) (MAP and ML for estimating X𝑋Xitalic_X and θ𝜃\thetaitalic_θ, respectively).

III Separate state and parameter estimation

The proposed PEIV method as well as the EM and JMAP-ML method all lead to algorithms that iteratively estimate the state and parameter. For that purpose, we derive the fundamental estimation modules in this section. These are rather straightforward to derive, and the main issue is to re-formulate the model given by (1) and (2) to the following linear regression models

Y¯¯𝑌\displaystyle\bar{Y}over¯ start_ARG italic_Y end_ARG =Ψ(θo)Xo+ηorabsentΨsuperscript𝜃osuperscript𝑋o𝜂or\displaystyle=\Psi(\theta^{{\text{o}}})X^{{\text{o}}}+\eta\,\quad\text{or}= roman_Ψ ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT + italic_η or (4)
Y¯¯𝑌\displaystyle\quad\bar{Y}over¯ start_ARG italic_Y end_ARG =Φ(Xo)θo+c(Xo)+η.absentΦsuperscript𝑋osuperscript𝜃o𝑐superscript𝑋o𝜂\displaystyle=\Phi(X^{{\text{o}}})\theta^{{\text{o}}}+c(X^{{\text{o}}})+\eta\,.= roman_Φ ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT + italic_c ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) + italic_η .

Here, c(Xo)𝑐superscript𝑋oc(X^{\text{o}})italic_c ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) represents the component that is independent of θosuperscript𝜃o\theta^{{\text{o}}}italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT. The interpretation of it will be provided later. The first model formulation leads to the Kalman smoother for a given parameter, and the second one induces the least squares estimate of the parameters θosuperscript𝜃o\theta^{{\text{o}}}italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT, given the state sequence Xosuperscript𝑋oX^{{\text{o}}}italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT.

III-A Kalman Smoother

The Kalman smoother (KS) can be formulated as a MAP problem given by

X^=^𝑋absent\displaystyle\hat{X}=over^ start_ARG italic_X end_ARG = argmaxXlogP(X|Y)subscript𝑋𝑃conditional𝑋𝑌\displaystyle\arg\max_{X}\log P(X|Y)roman_arg roman_max start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT roman_log italic_P ( italic_X | italic_Y ) (5)
=\displaystyle== argmaxXi=1NlogP(yi|xi)+j=1NlogP(xj|xj1)subscript𝑋superscriptsubscript𝑖1𝑁𝑃conditionalsubscript𝑦𝑖subscript𝑥𝑖superscriptsubscript𝑗1𝑁𝑃conditionalsubscript𝑥𝑗subscript𝑥𝑗1\displaystyle\arg\max_{X}\sum_{i=1}^{N}\log P(y_{i}|x_{i})+\sum_{j=1}^{N}\log P% (x_{j}|x_{j-1})roman_arg roman_max start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT roman_log italic_P ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT roman_log italic_P ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT )
+logP(x0).𝑃subscript𝑥0\displaystyle+\log P(x_{0})\,.+ roman_log italic_P ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) .

By exploiting the model (1), (5) can be expressed as

x^0:N=argminX{\displaystyle\hat{x}_{0:N}=\arg\min_{X}\Big{\{}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 : italic_N end_POSTSUBSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT { YC(θo)X𝐑12subscriptsuperscriptnorm𝑌𝐶superscript𝜃o𝑋2superscript𝐑1\displaystyle\|Y-C(\theta^{{\text{o}}})X\|^{2}_{\mathbf{R}^{-1}}∥ italic_Y - italic_C ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_X ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_R start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT (6)
+\displaystyle++ A(θo)X𝐐12+x0m0P012},\displaystyle\|A(\theta^{{\text{o}}})X\|^{2}_{\mathbf{Q}^{-1}}+\|x_{0}-m_{0}\|% ^{2}_{P^{-1}_{0}}\Big{\}}\,,∥ italic_A ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_X ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + ∥ italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT } ,

where m0subscript𝑚0m_{0}italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT denote the mean and covariance of the initial state prior x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, respectively. To achieve a conciser formulation, we introduce 𝐑=𝐝𝐢𝐚𝐠(R,,R)𝐑𝐝𝐢𝐚𝐠𝑅𝑅\mathbf{R}=\mathbf{diag}(R,\dots,R)bold_R = bold_diag ( italic_R , … , italic_R ), 𝐐=𝐝𝐢𝐚𝐠(Q,,Q)𝐐𝐝𝐢𝐚𝐠𝑄𝑄\mathbf{Q}=\mathbf{diag}(Q,\dots,Q)bold_Q = bold_diag ( italic_Q , … , italic_Q ), and ()W2=()W()subscriptsuperscriptnorm2𝑊superscripttop𝑊\|(\cdot)\|^{2}_{W}=(\cdot)^{\top}W(\cdot)∥ ( ⋅ ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = ( ⋅ ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ( ⋅ ). 𝐝𝐢𝐚𝐠𝐝𝐢𝐚𝐠\mathbf{diag}bold_diag denotes the diagonal matrix, and A(θo)𝐴superscript𝜃oA(\theta^{{\text{o}}})italic_A ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) and C(θo)𝐶superscript𝜃oC(\theta^{{\text{o}}})italic_C ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) are defined by

A(θo)𝐴superscript𝜃o\displaystyle A(\theta^{{\text{o}}})italic_A ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) =[F(θo)𝐈000F(θo)𝐈000F(θo)𝐈],absentmatrix𝐹superscript𝜃o𝐈000𝐹superscript𝜃o𝐈000𝐹superscript𝜃o𝐈\displaystyle=\begin{bmatrix}F(\theta^{{\text{o}}})&-\mathbf{I}&0&\dots&0\\ 0&F(\theta^{{\text{o}}})&-\mathbf{I}&\dots&0\\ \dots&\dots&\dots&\dots&\dots\\ 0&0&\dots&F(\theta^{{\text{o}}})&-\mathbf{I}\end{bmatrix}\,,= [ start_ARG start_ROW start_CELL italic_F ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) end_CELL start_CELL - bold_I end_CELL start_CELL 0 end_CELL start_CELL … end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_F ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) end_CELL start_CELL - bold_I end_CELL start_CELL … end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL … end_CELL start_CELL … end_CELL start_CELL … end_CELL start_CELL … end_CELL start_CELL … end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL … end_CELL start_CELL italic_F ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) end_CELL start_CELL - bold_I end_CELL end_ROW end_ARG ] , (7)
C(θo)𝐶superscript𝜃o\displaystyle C(\theta^{{\text{o}}})italic_C ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) =[0H(θo)0000H(θo)0000H(θo)],absentmatrix0𝐻superscript𝜃o0000𝐻superscript𝜃o0000𝐻superscript𝜃o\displaystyle=\begin{bmatrix}0&H(\theta^{{\text{o}}})&0&\dots&0\\ 0&0&H(\theta^{{\text{o}}})&\dots&0\\ \dots&\dots&\dots&\dots&\dots\\ 0&0&\dots&0&H(\theta^{{\text{o}}})\end{bmatrix}\,,= [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_H ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) end_CELL start_CELL 0 end_CELL start_CELL … end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL italic_H ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) end_CELL start_CELL … end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL … end_CELL start_CELL … end_CELL start_CELL … end_CELL start_CELL … end_CELL start_CELL … end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL … end_CELL start_CELL 0 end_CELL start_CELL italic_H ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) end_CELL end_ROW end_ARG ] ,

respectively. With these definitions, (6) can be formulated as the solution to the following linear regression models

Y𝑌\displaystyle Yitalic_Y =C(θo)Xo+E,absent𝐶superscript𝜃osuperscript𝑋o𝐸\displaystyle=C(\theta^{{\text{o}}})X^{{\text{o}}}+E\,,= italic_C ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT + italic_E ,
00\displaystyle 0 =A(θo)Xo+V,absent𝐴superscript𝜃osuperscript𝑋o𝑉\displaystyle=A(\theta^{{\text{o}}})X^{{\text{o}}}+V\,,= italic_A ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT + italic_V ,
m0subscript𝑚0\displaystyle m_{0}italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT =x0o+ϵ,absentsubscriptsuperscript𝑥o0italic-ϵ\displaystyle=x^{{\text{o}}}_{0}+\epsilon\,,= italic_x start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_ϵ ,

where 𝐜𝐨𝐯(E)=𝐑𝐜𝐨𝐯𝐸𝐑\mathbf{cov}(E)=\mathbf{R}bold_cov ( italic_E ) = bold_R, 𝐜𝐨𝐯(V)=𝐐𝐜𝐨𝐯𝑉𝐐\mathbf{cov}(V)=\mathbf{Q}bold_cov ( italic_V ) = bold_Q, and 𝐜𝐨𝐯(ϵ)=P0𝐜𝐨𝐯italic-ϵsubscript𝑃0\mathbf{cov}(\epsilon)=P_{0}bold_cov ( italic_ϵ ) = italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. 00 denotes zero vector. These equations can be summarized as follows

Y¯¯𝑌\displaystyle\bar{Y}over¯ start_ARG italic_Y end_ARG =[Y0m0]=[C(θo)A(θo)[𝐈,𝟎]]Xo+[EVϵ]absentmatrix𝑌0subscript𝑚0matrix𝐶superscript𝜃o𝐴superscript𝜃o𝐈0superscript𝑋omatrix𝐸𝑉italic-ϵ\displaystyle=\begin{bmatrix}Y\\ 0\\ m_{0}\end{bmatrix}=\begin{bmatrix}C(\theta^{\text{o}})\\ A(\theta^{\text{o}})\\ [\,\mathbf{I},\mathbf{0}\,]\end{bmatrix}X^{{\text{o}}}+\begin{bmatrix}E\\ V\\ \epsilon\end{bmatrix}= [ start_ARG start_ROW start_CELL italic_Y end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_C ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_A ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL [ bold_I , bold_0 ] end_CELL end_ROW end_ARG ] italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT + [ start_ARG start_ROW start_CELL italic_E end_CELL end_ROW start_ROW start_CELL italic_V end_CELL end_ROW start_ROW start_CELL italic_ϵ end_CELL end_ROW end_ARG ] (8)
=Ψ(θo)Xo+η.absentΨsuperscript𝜃osuperscript𝑋o𝜂\displaystyle=\Psi(\theta^{{\text{o}}})X^{{\text{o}}}+\eta\,.= roman_Ψ ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT + italic_η .

Given the assumption of mutual independence for the initial state, and the process and measurement noises, we have 𝐜𝐨𝐯(η)=𝐜𝐨𝐯([E,V,ϵ])=𝐝𝐢𝐚𝐠(𝐑,𝐐,P0)Ση𝐜𝐨𝐯𝜂𝐜𝐨𝐯superscriptsuperscript𝐸topsuperscript𝑉topsuperscriptitalic-ϵtoptop𝐝𝐢𝐚𝐠𝐑𝐐subscript𝑃0subscriptΣ𝜂\mathbf{cov}(\eta)=\mathbf{cov}([\,E^{\top},V^{\top},\epsilon^{\top}]^{\top})=% \mathbf{diag}(\mathbf{R},\mathbf{Q},P_{0})\eqqcolon\Sigma_{\eta}bold_cov ( italic_η ) = bold_cov ( [ italic_E start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , italic_V start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , italic_ϵ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) = bold_diag ( bold_R , bold_Q , italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ≕ roman_Σ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT. Y¯¯𝑌\bar{Y}over¯ start_ARG italic_Y end_ARG serves as an augmented measurement based on the prior of x0osubscriptsuperscript𝑥o0x^{{\text{o}}}_{0}italic_x start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. The state estimate can be determined by the least squares (LS) assuming that the parameter θosuperscript𝜃o\theta^{{\text{o}}}italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT is known, namely,

X^^𝑋\displaystyle\hat{X}over^ start_ARG italic_X end_ARG =(Ψ(θo)Ση1Ψ(θo))1Ψ(θo)Ση1Y¯.absentsuperscriptΨsuperscriptsuperscript𝜃otopsubscriptsuperscriptΣ1𝜂Ψsuperscript𝜃o1Ψsuperscriptsuperscript𝜃otopsubscriptsuperscriptΣ1𝜂¯𝑌\displaystyle=(\Psi(\theta^{{\text{o}}})^{\top}\Sigma^{-1}_{\eta}\Psi(\theta^{% {\text{o}}}))^{-1}\Psi(\theta^{{\text{o}}})^{\top}\Sigma^{-1}_{\eta}\bar{Y}\,.= ( roman_Ψ ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT roman_Ψ ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Ψ ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT over¯ start_ARG italic_Y end_ARG . (9)

The covariance matrix of the estimation error is given by

ΣX=(Ψ(θo)Ση1Ψ(θo))1.subscriptΣ𝑋superscriptΨsuperscriptsuperscript𝜃otopsubscriptsuperscriptΣ1𝜂Ψsuperscript𝜃o1\Sigma_{X}=(\Psi(\theta^{{\text{o}}})^{\top}\Sigma^{-1}_{\eta}\Psi(\theta^{{% \text{o}}}))^{-1}\,.roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT = ( roman_Ψ ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT roman_Ψ ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT . (10)

For implementing the Kalman smoother in practice, the recursive forward-backward version is perferred and runs much faster than the batch-wise solution [6]. We give the batch-wise formulation here for the sake of clearness, which also assists introducing the JMAP-ML method in Sec. IV-A.

III-B Parameter Estimation

To derive the parameter estimation solution, we first need to rewrite (8) as a linear regression in θosuperscript𝜃o\theta^{{\text{o}}}italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT, and not in Xosuperscript𝑋oX^{{\text{o}}}italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT. It is straightforward to show that Ψ(θo)XoΨsuperscript𝜃osuperscript𝑋o\Psi(\theta^{{\text{o}}})X^{{\text{o}}}roman_Ψ ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT can be written as

Ψ(θo)XoΨsuperscript𝜃osuperscript𝑋o\displaystyle\Psi(\theta^{{\text{o}}})X^{{\text{o}}}roman_Ψ ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT =D(Xo)𝐯𝐞𝐜(Ψ(θo)),withabsent𝐷superscript𝑋o𝐯𝐞𝐜Ψsuperscript𝜃owith\displaystyle=D(X^{{\text{o}}})\mathbf{vec}(\Psi(\theta^{{\text{o}}}))\,,\quad% \text{with}= italic_D ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) bold_vec ( roman_Ψ ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) ) , with (11)
D(Xo)𝐷superscript𝑋o\displaystyle D(X^{{\text{o}}})italic_D ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) =(Xo)𝐈.absenttensor-productsuperscriptsuperscript𝑋otop𝐈\displaystyle=(X^{{\text{o}}})^{\top}\otimes\mathbf{I}\,.= ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ⊗ bold_I .

tensor-product\otimes is the Kronecker product, and 𝐯𝐞𝐜()𝐯𝐞𝐜\mathbf{vec}(\cdot)bold_vec ( ⋅ ) denotes the matrix vectorisation. (7) and (8) show that only a portion of the elements in Ψ(θo)Ψsuperscript𝜃o\Psi(\theta^{{\text{o}}})roman_Ψ ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) is a function of θosuperscript𝜃o\theta^{{\text{o}}}italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT, whereas the others are independent of θosuperscript𝜃o\theta^{{\text{o}}}italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT. Accordingly, 𝐯𝐞𝐜(Ψ(θo))𝐯𝐞𝐜Ψsuperscript𝜃o\mathbf{vec}(\Psi(\theta^{{\text{o}}}))bold_vec ( roman_Ψ ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) ) can be reformulated into

𝐯𝐞𝐜(Ψ(θo))𝐯𝐞𝐜Ψsuperscript𝜃o\displaystyle\mathbf{vec}(\Psi(\theta^{{\text{o}}}))bold_vec ( roman_Ψ ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) ) =h+Bθo.absent𝐵superscript𝜃o\displaystyle=h+B\theta^{{\text{o}}}\,.= italic_h + italic_B italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT . (12)

Combining (11) and (12) leads to

Ψ(θo)XoΨsuperscript𝜃osuperscript𝑋o\displaystyle\Psi(\theta^{{\text{o}}})X^{{\text{o}}}roman_Ψ ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT =D(Xo)𝐯𝐞𝐜(Ψ(θo))absent𝐷superscript𝑋o𝐯𝐞𝐜Ψsuperscript𝜃o\displaystyle=D(X^{{\text{o}}})\mathbf{vec}(\Psi(\theta^{{\text{o}}}))= italic_D ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) bold_vec ( roman_Ψ ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) )
=D(Xo)h+D(Xo)Bθoabsent𝐷superscript𝑋o𝐷superscript𝑋o𝐵superscript𝜃o\displaystyle=D(X^{{\text{o}}})h+D(X^{{\text{o}}})B\theta^{{\text{o}}}= italic_D ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_h + italic_D ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_B italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT
=Φ(Xo)θo+c(Xo).absentΦsuperscript𝑋osuperscript𝜃o𝑐superscript𝑋o\displaystyle=\Phi(X^{{\text{o}}})\theta^{{\text{o}}}+c(X^{{\text{o}}}).= roman_Φ ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT + italic_c ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) .

Here, c(Xo)=D(Xo)h𝑐superscript𝑋o𝐷superscript𝑋oc(X^{{\text{o}}})=D(X^{{\text{o}}})hitalic_c ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) = italic_D ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_h. The solution to the parameter estimation problem in the sense of LS

θ^=argminθY¯Ψ(θ)XoΣη12^𝜃subscript𝜃subscriptsuperscriptnorm¯𝑌Ψ𝜃superscript𝑋o2subscriptsuperscriptΣ1𝜂\hat{\theta}=\arg\min_{\theta}\|\bar{Y}-\Psi(\theta)X^{{\text{o}}}\|^{2}_{% \Sigma^{-1}_{\eta}}\,over^ start_ARG italic_θ end_ARG = roman_arg roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ∥ over¯ start_ARG italic_Y end_ARG - roman_Ψ ( italic_θ ) italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT end_POSTSUBSCRIPT (13)

can then be derived as

θ^=^𝜃absent\displaystyle\hat{\theta}=over^ start_ARG italic_θ end_ARG = (BD(Xo)Ση1D(Xo)B)1superscriptsuperscript𝐵top𝐷superscriptsuperscript𝑋otopsubscriptsuperscriptΣ1𝜂𝐷superscript𝑋o𝐵1\displaystyle\,(B^{\top}D({X^{{\text{o}}}})^{\top}\Sigma^{-1}_{\eta}D({X}^{{% \text{o}}})B)^{-1}( italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_D ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT italic_D ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_B ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT
(BD(Xo)Ση1(Y¯D(Xo)h)),superscript𝐵top𝐷superscriptsuperscript𝑋otopsubscriptsuperscriptΣ1𝜂¯𝑌𝐷superscript𝑋o\displaystyle(B^{\top}D({X}^{{\text{o}}})^{\top}\Sigma^{-1}_{\eta}(\bar{Y}-D({% X}^{{\text{o}}})h))\,,( italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_D ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( over¯ start_ARG italic_Y end_ARG - italic_D ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_h ) ) ,

with covariance estimate

Σθ=(BD(Xo)Ση1D(Xo)B)1.subscriptΣ𝜃superscriptsuperscript𝐵top𝐷superscriptsuperscript𝑋otopsubscriptsuperscriptΣ1𝜂𝐷superscript𝑋o𝐵1\Sigma_{{\theta}}=(B^{\top}D({X}^{{\text{o}}})^{\top}\Sigma^{-1}_{\eta}D({X}^{% {\text{o}}})B)^{-1}\,.roman_Σ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT = ( italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_D ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT italic_D ( italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) italic_B ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT .

IV Joint State and Parameter Estimation

State and parameter estimation can be iterated in different ways. This section provides overviews to well-known methods, before we introduce the PEIV method in the next section.

IV-A Joint Maximum A Posterior-Maximum Likelihood

In this subsection, we explore the JMAP-ML method for estimating both the state and the model parameter iteratively. It aims to solve the optimisation problem given by

{X^,θ^}=^𝑋^𝜃absent\displaystyle\{\hat{X},\hat{\theta}\}={ over^ start_ARG italic_X end_ARG , over^ start_ARG italic_θ end_ARG } = argminX,θlogP(Y,X|θ)subscript𝑋𝜃𝑃𝑌conditional𝑋𝜃\displaystyle\arg\min_{X,\theta}\,\,\log P(Y,X|\theta)roman_arg roman_min start_POSTSUBSCRIPT italic_X , italic_θ end_POSTSUBSCRIPT roman_log italic_P ( italic_Y , italic_X | italic_θ ) (14)
=\displaystyle== argminθ{argminX{logP(X|Y,θ)}+logP(Y|θ)},subscript𝜃subscript𝑋𝑃conditional𝑋𝑌𝜃𝑃conditional𝑌𝜃\displaystyle\arg\min_{\theta}\{\arg\min_{X}\{\log P(X|Y,\theta)\}+\log P(Y|% \theta)\}\,,roman_arg roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT { roman_arg roman_min start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT { roman_log italic_P ( italic_X | italic_Y , italic_θ ) } + roman_log italic_P ( italic_Y | italic_θ ) } ,

where the parameter θ𝜃\thetaitalic_θ is a deterministic parameter, and the state X𝑋Xitalic_X is a random vector. This problem can be iteratively computed with an initialisation θ^1superscript^𝜃1\hat{\theta}^{1}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT following [12]

X^i+1superscript^𝑋𝑖1\displaystyle\hat{X}^{i+1}over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT =(Ψ(θ^i)Ση1Ψ(θ^i))1Ψ(θ^i)Ση1Y¯,absentsuperscriptΨsuperscriptsuperscript^𝜃𝑖topsubscriptsuperscriptΣ1𝜂Ψsuperscript^𝜃𝑖1Ψsuperscriptsuperscript^𝜃𝑖topsubscriptsuperscriptΣ1𝜂¯𝑌\displaystyle=(\Psi(\hat{\theta}^{i})^{\top}\Sigma^{-1}_{\eta}\Psi(\hat{\theta% }^{i}))^{-1}\Psi(\hat{\theta}^{i})^{\top}\Sigma^{-1}_{\eta}\bar{Y}\,,= ( roman_Ψ ( over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT roman_Ψ ( over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Ψ ( over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT over¯ start_ARG italic_Y end_ARG , (15)
θ^i+1superscript^𝜃𝑖1\displaystyle\hat{\theta}^{i+1}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT =(BD(X^i+1)Ση1D(X^i+1)B)1absentsuperscriptsuperscript𝐵top𝐷superscriptsuperscript^𝑋𝑖1topsubscriptsuperscriptΣ1𝜂𝐷superscript^𝑋𝑖1𝐵1\displaystyle=(B^{\top}D(\hat{X}^{i+1})^{\top}\Sigma^{-1}_{\eta}D(\hat{X}^{i+1% })B)^{-1}= ( italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_D ( over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT italic_D ( over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT ) italic_B ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT
(BD(X^i+1)Ση1(Y¯D(X^i+1)h)),superscript𝐵top𝐷superscriptsuperscript^𝑋𝑖1topsubscriptsuperscriptΣ1𝜂¯𝑌𝐷superscript^𝑋𝑖1\displaystyle(B^{\top}D(\hat{X}^{i+1})^{\top}\Sigma^{-1}_{\eta}(\bar{Y}-D(\hat% {X}^{i+1})h))\,,( italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_D ( over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( over¯ start_ARG italic_Y end_ARG - italic_D ( over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT ) italic_h ) ) ,

where the KS and the LS are exploited for updating the state and parameter, respectively.

IV-B Expectation Maximisation

The JMAP-ML method discussed in Sec. IV-A only utilizes the mean of the state estimate, and the covariance estimate is ignored. To fully exploit the information from state estimation, the EM method can be deployed. It optimises for the parameter and state iteratively in the following ML problem

θ^^𝜃\displaystyle\hat{\theta}over^ start_ARG italic_θ end_ARG =argmaxθlogP(Y|θ).absentsubscript𝜃𝑃conditional𝑌𝜃\displaystyle=\arg\max_{\theta}\log P(Y|\theta)\,.= roman_arg roman_max start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT roman_log italic_P ( italic_Y | italic_θ ) . (16)

The absence of state X𝑋Xitalic_X makes (16) difficult to solve directly. The EM algorithm tackles this in two steps, namely, the E𝐸Eitalic_E step and M𝑀Mitalic_M step. The E𝐸Eitalic_E step estimates the state according to

𝒬(θ,θ^i)=𝐄P(X|Y,θ^i)(logP(Y,X|θ)),𝒬𝜃superscript^𝜃𝑖subscript𝐄𝑃conditional𝑋𝑌superscript^𝜃𝑖𝑃𝑌conditional𝑋𝜃\mathcal{Q}(\theta,\hat{\theta}^{i})=\mathbf{E}_{P(X|Y,\hat{\theta}^{i})}(\log P% (Y,X|\theta))\,,caligraphic_Q ( italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = bold_E start_POSTSUBSCRIPT italic_P ( italic_X | italic_Y , over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ( roman_log italic_P ( italic_Y , italic_X | italic_θ ) ) , (17)

where θ^isuperscript^𝜃𝑖\hat{\theta}^{i}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT denotes the parameter estimate in the i𝑖iitalic_i-th iteration. The posterior distribution P(X|Y,θ^i)𝑃conditional𝑋𝑌superscript^𝜃𝑖P(X|Y,\hat{\theta}^{i})italic_P ( italic_X | italic_Y , over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) can be solved using KS introduced in Sec. III-A, with θosuperscript𝜃o\theta^{{\text{o}}}italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT substituted by its estimate θ^isuperscript^𝜃𝑖\hat{\theta}^{i}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. After the E𝐸Eitalic_E step, the M𝑀Mitalic_M step updates θ^isuperscript^𝜃𝑖\hat{\theta}^{i}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT following

θ^i+1=argmaxθ𝒬(θ,θ^i).superscript^𝜃𝑖1subscript𝜃𝒬𝜃superscript^𝜃𝑖\hat{\theta}^{i+1}=\arg\max_{\theta}\mathcal{Q}(\theta,\hat{\theta}^{i})\,.over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_Q ( italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (18)

After updating the parameter in (18), we go back to the E𝐸Eitalic_E step in (17) and repeat until convergence. In summary, the method resembles the one in (15). The only difference lies in iterating the parameter, where the uncertainty in the state estimate is considered as follows

θ^i+1=superscript^𝜃𝑖1absent\displaystyle\hat{\theta}^{i+1}=over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT = (𝐄(BD(X^i+1)Ση1D(X^i+1)B))1superscript𝐄superscript𝐵top𝐷superscriptsuperscript^𝑋𝑖1topsubscriptsuperscriptΣ1𝜂𝐷superscript^𝑋𝑖1𝐵1\displaystyle(\mathbf{E}(B^{\top}D(\hat{X}^{i+1})^{\top}\Sigma^{-1}_{\eta}D(% \hat{X}^{i+1})B))^{-1}( bold_E ( italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_D ( over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT italic_D ( over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT ) italic_B ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT (19)
𝐄(BD(X^i+1)Ση1(Y¯D(X^i+1)h)).𝐄superscript𝐵top𝐷superscriptsuperscript^𝑋𝑖1topsubscriptsuperscriptΣ1𝜂¯𝑌𝐷superscript^𝑋𝑖1\displaystyle\mathbf{E}(B^{\top}D(\hat{X}^{i+1})^{\top}\Sigma^{-1}_{\eta}(\bar% {Y}-D(\hat{X}^{i+1})h))\,.bold_E ( italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_D ( over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( over¯ start_ARG italic_Y end_ARG - italic_D ( over^ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT ) italic_h ) ) .

Here, the expectation is computed with respect to P(X|Y,θ^i)𝑃conditional𝑋𝑌superscript^𝜃𝑖P(X|Y,\hat{\theta}^{i})italic_P ( italic_X | italic_Y , over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ).

V PEIV-based State and Parameter Estimation

We now introduce how to exploit the partial errors-in-variables (PEIV) modeling to facilitate joint state and parameter estimation. The regression on states in (8) contains a partially unknown regressor matrix Ψ(θo)Ψsuperscript𝜃o\Psi(\theta^{{\text{o}}})roman_Ψ ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) due to the uncertainty when estimating parameter θosuperscript𝜃o\theta^{{\text{o}}}italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT in (7). Based on (3) and (8), we formulate the following WTLS problem to jointly estimate the state Xosuperscript𝑋oX^{{\text{o}}}italic_X start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT and the parameter θosuperscript𝜃o\theta^{{\text{o}}}italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT

{θ^,X^}^𝜃^𝑋\displaystyle\{\hat{\theta},\hat{X}\}{ over^ start_ARG italic_θ end_ARG , over^ start_ARG italic_X end_ARG } =argminθ,η[θθ^1η]Σ12,absentsubscript𝜃𝜂subscriptsuperscriptnormmatrix𝜃superscript^𝜃1𝜂2superscriptΣ1\displaystyle=\arg\min_{\theta,\eta}\bigg{\|}\begin{bmatrix}\theta-\hat{\theta% }^{1}\\ \eta\end{bmatrix}\bigg{\|}^{2}_{\Sigma^{-1}},\,= roman_arg roman_min start_POSTSUBSCRIPT italic_θ , italic_η end_POSTSUBSCRIPT ∥ [ start_ARG start_ROW start_CELL italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_η end_CELL end_ROW end_ARG ] ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ,
s.t.η=Y¯Ψ(θ)X,\displaystyle s.t.\quad\eta=\bar{Y}-\Psi(\theta)X\,,italic_s . italic_t . italic_η = over¯ start_ARG italic_Y end_ARG - roman_Ψ ( italic_θ ) italic_X ,

where Σ=𝐝𝐢𝐚𝐠(Σθ,Ση)Σ𝐝𝐢𝐚𝐠subscriptΣ𝜃subscriptΣ𝜂\Sigma=\mathbf{diag}(\Sigma_{\theta},\Sigma_{\eta})roman_Σ = bold_diag ( roman_Σ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , roman_Σ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ). It is straightforward to reformulate the objective by replacing η𝜂\etaitalic_η with the constraint. This leads to

J(θ,X)=θθ^1Σθ12+Y¯Ψ(θ)XΣη12,𝐽𝜃𝑋subscriptsuperscriptnorm𝜃superscript^𝜃12subscriptsuperscriptΣ1𝜃subscriptsuperscriptnorm¯𝑌Ψ𝜃𝑋2subscriptsuperscriptΣ1𝜂J(\theta,X)=\|\theta-\hat{\theta}^{1}\|^{2}_{\Sigma^{-1}_{\theta}}+\|\bar{Y}-% \Psi(\theta)X\|^{2}_{\Sigma^{-1}_{\eta}}\,,italic_J ( italic_θ , italic_X ) = ∥ italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT + ∥ over¯ start_ARG italic_Y end_ARG - roman_Ψ ( italic_θ ) italic_X ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT end_POSTSUBSCRIPT , (20)

where the first term can be seen as a generalized Tikhonov regularizer, with θ^1superscript^𝜃1\hat{\theta}^{1}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT being the initialised parameter estimate [23]. At each iteration, we can update the state following the first equation in (15). Afterward, the parameter estimate can be updated via the regularized least squares given the iterated state estimate X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG. For that, we derive the closed-form derivative of J(θ,X^)𝐽𝜃^𝑋J(\theta,\hat{X})italic_J ( italic_θ , over^ start_ARG italic_X end_ARG ) w.r.t. θ𝜃\thetaitalic_θ and set it to 00, yielding

θ^=N1^𝜃superscript𝑁1\displaystyle\hat{\theta}=N^{-1}over^ start_ARG italic_θ end_ARG = italic_N start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT (Σθ1θ^1+BD(X^)Ση1(Y¯D(X^)h)).subscriptsuperscriptΣ1𝜃superscript^𝜃1superscript𝐵top𝐷superscript^𝑋topsubscriptsuperscriptΣ1𝜂¯𝑌𝐷^𝑋\displaystyle(\Sigma^{-1}_{\theta}\hat{\theta}^{1}+B^{\top}D(\hat{X})^{\top}% \Sigma^{-1}_{\eta}(\bar{Y}-D(\hat{X})h))\,.( roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_D ( over^ start_ARG italic_X end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( over¯ start_ARG italic_Y end_ARG - italic_D ( over^ start_ARG italic_X end_ARG ) italic_h ) ) . (21)

The notation N𝑁Nitalic_N in (21) follows

N=Σθ1+BD(X^)Ση1D(X^)B.𝑁subscriptsuperscriptΣ1𝜃superscript𝐵top𝐷superscript^𝑋topsubscriptsuperscriptΣ1𝜂𝐷^𝑋𝐵N=\Sigma^{-1}_{\theta}+B^{\top}D(\hat{X})^{\top}\Sigma^{-1}_{\eta}D(\hat{X})B\,.italic_N = roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT + italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_D ( over^ start_ARG italic_X end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT italic_D ( over^ start_ARG italic_X end_ARG ) italic_B . (22)

(15) and (21) should be implemented iteratively. Once converged, we can compute the estimation covariance of state X𝑋Xitalic_X according to (10) with the parameter estimate θ^^𝜃\hat{\theta}over^ start_ARG italic_θ end_ARG. The estimation covariance of θ^^𝜃\hat{\theta}over^ start_ARG italic_θ end_ARG can be obtained by reformulating (20) given the state estimate X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG as follows

[θ^1Y¯D(X^)h]matrixsuperscript^𝜃1¯𝑌𝐷^𝑋\displaystyle\begin{bmatrix}\hat{\theta}^{1}\\ \bar{Y}-D(\hat{X})h\end{bmatrix}[ start_ARG start_ROW start_CELL over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over¯ start_ARG italic_Y end_ARG - italic_D ( over^ start_ARG italic_X end_ARG ) italic_h end_CELL end_ROW end_ARG ] =[𝐈D(X^)B]θo+[θ~1η].absentmatrix𝐈𝐷^𝑋𝐵superscript𝜃omatrixsuperscript~𝜃1𝜂\displaystyle=\begin{bmatrix}\mathbf{I}\\ D(\hat{X})B\end{bmatrix}\theta^{{\text{o}}}+\begin{bmatrix}\tilde{\theta}^{1}% \,\\ \eta\,\end{bmatrix}\,.= [ start_ARG start_ROW start_CELL bold_I end_CELL end_ROW start_ROW start_CELL italic_D ( over^ start_ARG italic_X end_ARG ) italic_B end_CELL end_ROW end_ARG ] italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT + [ start_ARG start_ROW start_CELL over~ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_η end_CELL end_ROW end_ARG ] . (23)

Here, θ~1superscript~𝜃1\tilde{\theta}^{1}over~ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT is the initialisation error. This leads to the covariance matrix 𝐜𝐨𝐯(θ^)=N1𝐜𝐨𝐯^𝜃superscript𝑁1\mathbf{cov}(\hat{\theta})=N^{-1}bold_cov ( over^ start_ARG italic_θ end_ARG ) = italic_N start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

Refer to caption

Figure 1: RMSE of parameter estimation w.r.t.  batch size. The dashed lines in each color bounds the 5% and 95% quantiles given by each method.

VI Numerical Simulation

To demonstrate the merit of the PEIV principle in joint state and parameter estimation, we synthesize a numerical example with Monte Carlo simulation. We consider the following state-space model with scalar-valued state and parameter

xk+1subscript𝑥𝑘1\displaystyle x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT =θoxk+vk,absentsuperscript𝜃osubscript𝑥𝑘subscript𝑣𝑘\displaystyle=\theta^{{\text{o}}}x_{k}+v_{k}\,,= italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , (24)
yksubscript𝑦𝑘\displaystyle y_{k}italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =xk+ek.absentsubscript𝑥𝑘subscript𝑒𝑘\displaystyle=x_{k}+e_{k}\,.= italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT .

The process noise vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, the measurement noise eksubscript𝑒𝑘e_{k}italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and the initial state x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are assumed to be independent of each other, and we assume vk𝒩(0,0.2)similar-tosubscript𝑣𝑘𝒩00.2v_{k}\sim\mathcal{N}(0,0.2)italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 0.2 ) and ek𝒩(0,0.09)similar-tosubscript𝑒𝑘𝒩00.09e_{k}\sim\mathcal{N}(0,0.09)italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 0.09 ). We assume a stationary process with xko𝒩(0,P)similar-tosubscriptsuperscript𝑥o𝑘𝒩0𝑃x^{{\text{o}}}_{k}\sim\mathcal{N}(0,P)italic_x start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_P ), where

P=0.2/(1(θo)2).𝑃0.21superscriptsuperscript𝜃o2P=0.2/(1-(\theta^{{\text{o}}})^{2})\,.italic_P = 0.2 / ( 1 - ( italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

The state estimate is initialised as x^0𝒩(y1,2P)similar-tosubscript^𝑥0𝒩subscript𝑦12𝑃\hat{x}_{0}\sim\mathcal{N}(y_{1},2P)over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∼ caligraphic_N ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 2 italic_P ), which implies that it is not necessary to know the prior. The true value of the parameter in the model is θo=0.9superscript𝜃o0.9\theta^{{\text{o}}}=0.9italic_θ start_POSTSUPERSCRIPT o end_POSTSUPERSCRIPT = 0.9. To quantify the estimation accuracy, we employ root mean square error (RMSE) criterion given by

RMSEθ=1Mi=1M(θ^iθ)2,subscriptRMSE𝜃1𝑀superscriptsubscript𝑖1𝑀superscriptsubscript^𝜃𝑖𝜃2\texttt{RMSE}_{\theta}=\sqrt{\frac{1}{M}\sum_{i=1}^{M}(\hat{\theta}_{i}-\theta% )^{2}}\,,RMSE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT = square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,

where i𝑖iitalic_i denotes the i𝑖iitalic_i-th simulation (This equation shows the case for the parameter). We set the number of simulations M=1000𝑀1000M=1000italic_M = 1000.

We evaluate our PEIV-based method with a focus on joint estimation using small batch size of data ranging within {10,15,20,25,30,35,40,45,50,100,150,200}101520253035404550100150200\{10,15,20,25,30,35,40,45,50,100,150,200\}{ 10 , 15 , 20 , 25 , 30 , 35 , 40 , 45 , 50 , 100 , 150 , 200 } time steps. Three other state-of-the-art methods are involved for comparison, including the expectation maximisation (EM), the joint maximum a posterior-maximum likelihood (JMAP-ML), and the augmented state extended Kalman smoother (ASEKS) methods.

As shown in Fig. 1, the proposed PEIV-based method outperforms all the other methods given small batch size of data (100absent100\leq 100≤ 100) in terms of RMSE and 95%percent9595\%95 % quantile. Additionally, we depict the error ellipses given by the simulations with a batch size of 30303030 data points in Fig. 2, where x~0subscript~𝑥0\tilde{x}_{0}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and θ~~𝜃\tilde{\theta}over~ start_ARG italic_θ end_ARG denotes the estimation errors of the initial state and the parameter, respectively. The proposed PEIV-based method delivers the best result in the benchmarking with the smallest error ellipse.

Refer to caption

Figure 2: Error ellipse denoting the 95%percent9595\%95 % confidence interval for joint state and parameter estimation. Markers denote the biases of estimates.

VII Conclusion

In this letter, a novel principle for joint state and parameter estimation is proposed through the partial errors-in-variables modeling, where the uncertainty in the parameter prior is explicitly considered. Based thereon, we formulate the regression problem in the sense of WTLS, which is solved iteratively by the Kalman smoothing and the regularized least squares for updating the state and parameter, respectively. Numerical results based on simulations demonstrate that the proposed method outperforms state-of-the-art methods, including the EM, the JMAP-ML, and the ASEKS methods, in terms of estimation accuracy.

For future investigation, we look forward to incorporating the uncertainty of state estimates into parameter estimation. Another possibility for extending the PEIV-based framework can be focused on tackling non-Gaussian noise patterns in state-space modeling.

References

  • [1] Fredrik Gustafsson, Fredrik Gunnarsson, Niclas Bergman, Urban Forssell, Jonas Jansson, Rickard Karlsson, and P-J Nordlund. Particle filters for positioning, navigation, and tracking. IEEE Transactions on signal processing, 50(2):425–437, 2002.
  • [2] Michael Roth, Gustaf Hendeby, and Fredrik Gustafsson. EKF/UKF maneuvering target tracking using coordinated turn models with polar/Cartesian velocity. In 17th International Conference on Information Fusion (FUSION), pages 1–8. IEEE, 2014.
  • [3] Esmaeil Ghahremani and Innocent Kamwa. Dynamic state estimation in power system by applying the extended Kalman filter with unknown inputs to phasor measurements. IEEE Transactions on Power Systems, 26(4):2556–2566, 2011.
  • [4] Jakub Simanek, Michal Reinstein, and Vladimir Kubelka. Evaluation of the EKF-based estimation architectures for data fusion in mobile robots. IEEE/ASME transactions on mechatronics, 20(2):985–990, 2014.
  • [5] Brian DO Anderson and John B Moore. Optimal filtering. Courier Corporation, 2012.
  • [6] Herbert E Rauch, F Tung, and Charlotte T Striebel. Maximum likelihood estimates of linear dynamic systems. AIAA journal, 3(8):1445–1450, 1965.
  • [7] Simo Särkkä and Lennart Svensson. Bayesian filtering and smoothing, volume 17. Cambridge university press, 2023.
  • [8] Lennart Ljung. Asymptotic behavior of the extended Kalman filter as a parameter estimator for linear systems. IEEE Transactions on Automatic Control, 24(1):36–50, 1979.
  • [9] Anxi Yu, Ye Liu, Jubo Zhu, and Zhen Dong. An improved dual unscented Kalman filter for state and parameter estimation. Asian Journal of Control, 18(4):1427–1440, 2016.
  • [10] Steven M Kay. Fundamentals of statistical signal processing: estimation theory. Prentice-Hall, Inc., 1993.
  • [11] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological), 39(1):1–22, 1977.
  • [12] Arie Yeredor. The joint MAP-ML criterion and its relation to ML and to extended least-squares. IEEE Transactions on Signal Processing, 48(12):3484–3492, 2000.
  • [13] Stephen J Wright. Coordinate descent algorithms. Mathematical programming, 151(1):3–34, 2015.
  • [14] Manon Kok and Thomas B Schön. Maximum likelihood calibration of a magnetometer using inertial sensors. IFAC Proceedings Volumes, 47(3):92–97, 2014.
  • [15] Peng Liu, Gustaf Hendeby, and Fredrik Gustafsson. Joint estimation of states and parameters in stochastic SIR model. In 2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), pages 1–6. IEEE, 2022.
  • [16] Feng Yin, Carsten Fritsche, Fredrik Gustafsson, and Abdelhak M Zoubir. EM-and JMAP-ML based joint estimation algorithms for robust wireless geolocation in mixed LOS/NLOS environments. IEEE Transactions on Signal Processing, 62(1):168–182, 2013.
  • [17] Wayne A Fuller. Measurement error models. John Wiley & Sons, 2009.
  • [18] Peng Liu, Kailai Li, Gustaf Hendeby, and Fredrik Gustafsson. Weighted total least squares for quadratic errors-in-variables regression. In 2023 31st European Signal Processing Conference (EUSIPCO), pages 1893–1897. IEEE, 2023.
  • [19] Gene H Golub and Charles F Van Loan. An analysis of the total least squares problem. SIAM journal on numerical analysis, 17(6):883–893, 1980.
  • [20] A Amiri-Simkooei and S Jazaeri. Weighted total least squares formulated by standard least squares theory. Journal of geodetic science, 2(2):113–124, 2012.
  • [21] Burkhard Schaffrin and Andreas Wieser. On weighted total least-squares adjustment for linear regression. Journal of geodesy, 82:415–421, 2008.
  • [22] Peiliang Xu, **gnan Liu, and Chuang Shi. Total least squares adjustment in partial errors-in-variables models: algorithm and statistical analysis. Journal of geodesy, 86:661–675, 2012.
  • [23] Gene H Golub, Per Christian Hansen, and Dianne P O’Leary. Tikhonov regularization and total least squares. SIAM journal on matrix analysis and applications, 21(1):185–194, 1999.