Convolutional Bayesian Filtering
Abstract
Bayesian filtering serves as the mainstream framework of state estimation in dynamic systems. Its standard version utilizes total probability rule and Bayes’ law alternatively, where how to define and compute conditional probability is critical to state distribution inference. Previously, the conditional probability is assumed to be exactly known, which represents a measure of the occurrence probability of one event, given the second event. In this paper, we find that by adding an additional event that stipulates an inequality condition, we can transform the conditional probability into a special integration that is analogous to convolution. Based on this transformation, we show that both transition probability and output probability can be generalized to convolutional forms, resulting in a more general filtering framework that we call convolutional Bayesian filtering. This new framework encompasses standard Bayesian filtering as a special case when the distance metric of the inequality condition is selected as Dirac delta function. It also allows for a more nuanced consideration of model mismatch by choosing different types of inequality conditions. For instance, when the distance metric is defined in a distributional sense, the transition probability and output probability can be approximated by simply rescaling them into fractional powers. Under this framework, a robust version of Kalman filter can be constructed by only altering the noise covariance matrix, while maintaining the conjugate nature of Gaussian distributions. Finally, we exemplify the effectiveness of our approach by resha** classic filtering algorithms into convolutional versions, including Kalman filter, extended Kalman filter, unscented Kalman filter and particle filter.
keywords:
Bayesian filtering; conditional probability; convolution; model mismatch, , , , , ,
1 Introduction
Accurately estimating the state value of dynamic systems is a crucial task in science and engineering, such as robotics, power systems, aerospace engineering, and manufacturing. Since the 1960s, Bayesian filtering has become a principled framework for optimal state estimation. The essence of this framework is to find a balance between uncertain system model and noisy state measurement. Its associated algorithm iteratively updates the probability density function of system state using the prior from the last step and the likelihood of the new observation. Afterward, typical estimation criteria like minimum mean-square error and maximum a posteriori are utilized to acquire the optimal point estimate.
In mathematics, Bayesian filtering relies on two conditional probabilities: transition probability and output probability. The transition probability describes how the system state evolves over time, and the output probability depicts the relationship between noisy measurement and ground truth state. To incorporate the information of those two probabilities, each iteration of Bayes filter is composed of two steps [11, 25]: prediction and update. The prediction step employs total probability rule to integrate the product of transition probability and the state distribution of previous time to obtain the prior distribution. The update step employs Bayes’ law to calculate the posterior distribution by adding information of the current measurement, where the output probability is used as a likelihood term. Since being proposed, Bayesian filtering has become the foundation of optimal filtering algorithms, including the well-known Kalman filter family, particle filter, and variational Bayesian filter.
The origin of optimal filtering theory can be traced back to the early 1940s, marked by the groundbreaking contribution of Norbert Wiener [37] and Kolmogorov [21]. This field reached a significant milestone in 1960 with the invention of discrete-time Kalman filter [18], followed by its continuous-time version published one year later [19]. Unlike Wiener’s work, which deals with stationary processes in the frequency domain, Kalman filter addresses dynamic processes in the time domain. The Kalman filter is a direct consequence of applying Bayesian filtering to linear Gaussian systems. In its prediction step, the conditional probability, i.e., transition probability of system model, is Gaussian. Due to the closure property of Gaussian distributions in linear transformation, the resulting expectation after applying total probability rule is also Gaussian, which is referred to as the prior distribution. In the update step, the conditional probability, i.e., output probability of measurement model, is naturally Gaussian. Owing to the conjugacy property of Gaussian distributions, the resulting posterior of Bayes’ law keeps Gaussian. Since both the prior and posterior are proven to be Gaussian, Kalman filter can be solved analytically by solely computing the mean and variance of Gaussian distribution using closed-form formulas.
When facing nonlinear systems, one big issue of Bayesian filtering is that the closure and conjugacy properties no longer hold. As a result, the calculation of total probability rule and Bayes’ law in almost all nonlinear systems has no analytical solution. So far, several approximation methods have been proposed to replace the accurate calculation of the two rules. A notable early advancement in this area was extended Kalman filter (EKF), pioneered by NASA for spacecraft navigation [33]. This extended filter employs the Taylor series expansion to linearize nonlinear dynamics around the current state. Its state estimation can achieve the so-called first-order accuracy, i.e., EKF is perfect if the dynamics is linear with respect to the state. In contrast, unscented Kalman filter (UKF) employs the unscented transform, a deterministic sampling technique, to achieve a third-order accuracy in approximating nonlinear dynamics for symmetric noise distributions [17]. This technique acquires the transformed distribution by generating a set of sigma points around the mean state estimate and then propagating them through the known nonlinear function, offering the advantage of preserving second-moment information compared to EKF. Obviously, both EKF and UKF implicitly calculate the solution of total probability rule and Bayes’ law using Gaussian distributions. Due to the adoption of approximation techniques, neither of them can offer formal guarantees on the estimation accuracy in highly nonlinear systems.
Instead of approximate prior and posterior as Gaussian distributions, particle filter (PF) represents them with a group of particles by the Monte Carlo method [26]. In the prediction step, particles are propagated according to the transition probability of system model to predict the distribution of next state. The resulting particles constitute a discrete approximation of the prior. In the update step, each particle receives a weight related to the output probability of observed data. All the particles are resampled according to their weights, which builds a discrete approximation of the posterior. It has been proven that these kinds of discrete distributions can converge to real distributions as the number of particles becomes sufficiently large. Nevertheless, PF often requires substantial computational resources due to the use of Monte carlo sampling, which limits its application in many real-time scenarios.
The variational Bayesian filter addresses the intensive computation associated with particles by adopting variational inference as an alternative approximation technique [31, 22]. The prior of its prediction step is assumed to be in a Gaussian form. This assumption is achieved by computing the expectation of a conditional probability using the closure of Gaussian distribution, which is identical to Kalman filter. The update step avoids calculating the computationally intensive integral in the Bayes’ law. Instead, it seeks to numerically minimize the Kullback-Leibler divergence between the proposal distribution and the real posterior. The proposal distribution is often chosen as a parameterized function to obtain an approximate solution of the minimization problem. The real posterior is represented as the product of the prior distribution and the output probability. In general, variational Bayesian filter is computationally beneficial in high-dimensional estimation problems and it has been widely adopted in adaptive filtering applications.
All the state estimation algorithms discussed above adhere strictly to the Bayesian filtering framework. In this framework, the mathematical form of conditional probabilities, including transition probability and output probability, plays an important rule in computing total probability rule and Bayes’ law. The standard definition of conditional probability is a measure of the occurrence probability of one event, given the second event. Moreover, its distribution in the whole space is often assumed to be exactly known in Bayes filter design. In this paper, we find that by conditioning on an additional event, which stipulates a distance metric between two observed variables within a specified threshold, one can transform the conditional probability to a special integral form that is similar to convolution operation. This definition relaxes the necessity of information completeness, which allows us to design a more general filter. We define this new probability as convolutional conditional probability.
Based on this definition, the transition probability can be extended to a convolutional form by conditioning on the event that the real state and its virtual state satisfy an inequality condition. The same extension can be applied to the output probability. Collectively, these two extensions forge a generalized filtering framework, which we refer to as convolutional Bayesian filtering. This new framework encompasses standard Bayesian filtering as a special case when the distance metric is set as the Dirac delta function. One of its natural benefit is the capability to explicitly handle mismatches between mathematical model and the real system by tailoring the distance metric properly.
Under this new framework, we can reformulate nearly all Bayesian filtering algorithms into a more generalized type. Particularly, convolutional Bayes filter possesses analytical forms of convolution operation in systems with Gaussian noises, which allows to design a robust Kalman filter family. For non-Gaussian systems, convolution operation usually has no analytical forms but can be efficiently approximated by a newly proposed exponential density rescaling technique. This technique enables to rescale transition probability and output probability into their fractional powers when the distance metric is defined in a distributional sense. We further establish the theoretical connection between this approximation technique and information bottleneck theory. It is proven that the fractional power from density rescaling is related to Lagrange multiplier of an optimization problem whose objective is to modulate the balance between preserving information about measurement model and squeezing representation of measurement data.
The remainder of this paper is organized as follows: Section 2 introduces the definition of convolutional conditional probability. Section 3 discusses the framework of convolutional Bayesian filtering. Section 4 introduces an approximation technique for non-Gaussian systems. Section 5 shows the simulation results. Section 6 concludes this paper.
2 Convolutional Conditional Probability
As discussed before, Bayesian filtering is built upon two pillars: total probability rule (1a) and Bayes’ law (1b). Both of them rely on how to handle the conditional probability , which is a measure of the occurrence probability of the event , given the event . This can also be interpreted as the ratio of the probability of both events happening to the probability of the original event. According to this interpretation, the total probability rule is
(1a) | |||
and the Bayes’ law is | |||
(1b) |
Note that we use boldface to denote a random variable, such as , and normal font to denote the realization of this variable, such as . Previously, the explicit form of the conditional probability is assumed known directly in Bayes filter design. One may be interested in what will happen if the conditional probability is unknown. This question helps us to conceive a new definition of conditional probability, i.e., convolutional conditional probability.
Given three random variables , , and , where the information of is known and certain constraints exist between and , our objective is to compute by leveraging these constraints. In the case that and are equal, i.e., , we have , as shown in Fig. 1(a). Conversely, if is not equal to and their difference is bounded by an inequality function, we can define a convolutional version of conditional probability, as shown in Fig. 1(b).
Definition 1 (Convolutional Conditional Probability).
Given , if and are conditionally independent given , then convolutional conditional probability is defined as
Here, is the inequality condition, where is a threshold random variable with known cumulative distribution function , and is the distance metric for two random variables. The calculation of convolutional conditional probability is summarized in the subsequent proposition:
Proposition 1.
The convolutional conditional probability satisfies
(2) | ||||
Proof.
According to Bayes’ law, we have
(3) |
In (3), is the prior of . It is chosen as an uninformative probability since we have no knowledge of [13]:
(4) |
The likelihood term in (3) is simplified as
(5) | ||||
Substituting (4) and (5) into (3), we have (2). Note that the final equation of (5) holds due to the conditional independence of and , given . ∎
Remark 1.
The calculation of convolutional conditional probability resembles convolution operation. In the convolution operation, a kernel is applied over an input space to generate a modified output. Here, serves as the kernel function, which is a weighting coefficient of based on the distance between and .
We want to emphasize that , as used in (1a) and (1b), can become any form of conditional probabilities. Actually, represents a specific form of conditional probability, which measures the probability of the event , conditioned on two events and . Compared to standard definition, convolutonal conditional probability has the third event which describes the upper bound between two random variables. Under this new definition, one can substitute with in (1a) and (1b) to construct two new rules:
(6a) | ||||
(6b) |
Here, (6a) can be regarded as a generalized total probability rule and (6b) can be regarded as a generalized Bayes’ law. Note that and in (6) are distinct from those notations in (1) because they implicitly condition on the third event , while this event in (1) is reduced to . That is to say, an equivalence event is implicitly conditioned in standard definition (1). It can be proven that total probability rule (1a) and Bayes’ law (1b) are the special cases of (6a) and (6b), as described in Proposition 2.
Proposition 2 (Limiting Property).
Proof.
Proposition 2 elucidates that the kernel function converges to a Dirac delta function at as the scale parameter tends to zero. As a result, the convolutional conditional probability reduces to its standard version. For finite values of scale parameter, this new definition considers a controllable amount of uncertainty governed by the scale parameter, offering an extension to the previous one.
3 Convolutional Bayesian Filtering
In this section, we demonstrate how model mismatches in the filtering problem can be explicitly addressed using convolutional conditional probability. This is achieved by conditioning on an additional event representing the error bound between the system model and the real system. Further, by substituting the total probability rule and Bayes’ law with (6a) and (6b) in the Bayes filter, we can establish a more generalized filtering framework called convolutional Bayesian filtering.
3.1 Uncertain Hidden Markov Model with Model Mismatch
The essence of Bayesian filtering is to find a balance between the stoasticities of state transition and state observation. The stochasticity of the former comes from inherent randomness in the environment dynamics while that of the latter comes from sensor noises. These two stochastic processes are typically represented by hidden Markov model (HMM):
(8) | ||||
Here, is the system state and is the corresponding measurement. Besides, denotes the probability distribution of the initial state , represents the transition probability, and is the output probability.
The standard HMM implicitly assumes that the real system is perfectly modelled, i.e., (8) is an exact description of system dynamics and measurement sensors. However, perfect information about transition or output probabilities is often unattainable due to parametric variation, unmodeled dynamics or external disturbances. In other words, there must be some model errors in engineering practice. This error can lead to significant accuracy degradation in state estimation if not properly considered. To build an HMM with model mismatch, we have to distinguish two kinds of states: the real state and the virtual state. The former is an accurate yet unattainable description of the system. The latter is an artificial construct generated by nominal models and does not exist in the physical world. We use to represent the virtual state and to represent the real state. Likewise, the real measurement is denoted as , and its virtual counterpart, which is generated by the nominal output probability, is denoted as . The HMM with model mismatch is depicted in Fig. 2 and outlined in (9) as follows:
(9a) | |||
(9b) |
Here, and denote the nominal transition probability and nominal output probability respectively; and are the threshold random variables depicting the upper bound of model mismatch, with their distributions characterized by the cumulative distribution functions and , respectively. Importantly, is assumed to be independent of both and , and a similar independence assumption is made for relative to and . This new form in (9) is called uncertain hidden Markov model.
It is crucial to differentiate between the concepts of system stochasticity (see (9a)) and model mismatch (see (9b)). The distinction hinges on the presence of known mathematical forms. System stochasticity can be accurately modeled using explicit distributions with associated parameters. In contrast, model mismatch refers to the inherent limitations and uncertainties in a model’s ability to represent the real system. Typically, this can only be quantified by an upper bound that reflects the extent to which the model deviates from reality. If we can only acquire the bound of system stochasticity, it inherently becomes a special case of model mismatch. Conversely, if the distribution of model mismatch is determined, it then becomes part of the system’s stochasticity. This distinction is pivotal in understanding and building uncertain HMM.
Remark 2.
Remark 3.
The two nominal probabilities and can be written to state space model (SSM):
(10) | |||
Here, is nominal transition model, is nominal measurement model, is virtual process noise, and is virtual measurement noise.
In essence, HHM and SSM are just different representations of the same system model, and they can be converted into each other [24, 8]. For example, consider the HMM’s nominal transition probability expressed as . This can be equivalently represented in the SSM format as , where with denoting the covariance matrix of the virtual process noise. Similarly, an HMM with a nominal transition probability defined by a Laplace distribution, , can be represented in SSM format as , with following the Laplace distribution .
3.2 Filtering Algorithm
When the system is perfectly modelled as in (8), Bayesian filtering serves as an ideal framework to calculate the posterior of system state by iteratively performing (11a) and (11b):
(11a) | ||||
(11b) |
Here, is the posterior, is the prior. As a tradition, (11a) is called prediction and (11b) is called update. These two equations originate from total probability rule and Bayes’ law respectively. In fact, almost all existing Bayesian filtering algorithms adhere to this framework, with their differences lying in how these two equations are calculated.
When considering the HMM with model mismatch (9), we need to shift the mathematical foundation to convolutional rules, as demonstrated in (6). First, let us redefine some core probability distributions in Bayesian filtering, including posterior distribution, prior distribution, transition probability, and output probability. The redefinition relies on the uncertain HMM:
(12) | ||||
Here, , , and are convolutional distributions, each corresponding to their respective physical meanings. Then, we will illustrate how to use these definitions in (12) to derive a new Bayesian filtering framework to handle model mismatch. We begin with the assumption of conditional independence:
Assumption 1 (Conditional Independence).
and are conditionally independent given , i.e.,
Besides, and are conditionally independent given .
This assumption suggests that the virtual state can be inferred directly from the past state, without additional information from the current state. Also, the virtual measurement can be inferred directly from the state, without additional information from the real measurement. This assumption originates from the philosophical belief that the physical world and the modeling of a system are mutually exclusive at any given moment; that is, the act of modeling does not affect the system in the physical world, nor does the physical system influence its nominal model. This principle is crucial for estimating the transition and output probabilities using their nominal models. Under the assumption of conditionally independence, we can obtain the main result of the paper:
Theorem 1 (Convolutional Bayesian filtering).
Proof.
The transition and output probabilities in (8) are standard conditional probabilities. By leveraging the definition of convolutional conditional probability, we can derive the convolutional forms of (8) as in (14). Consequently, their convolutional counterparts are referred to as the convolutional transition probability (14a) and the convolutional output probability (14b), respectively. Then by utilizing (6), we can deduce convolutional Bayesian filtering (13). ∎
Remark 4.
In the iterative process, (13a) and (13b) resemble the prediction and update steps of Bayesian filtering, respectively. The only difference is that all the probability distributions are transformed into their convolutional counterparts. Therefore, we refer to this iterative process as convolutional Bayesian filtering.
3.3 Analytical Form in Gaussian Cases
A major challenge in convolutional Bayesian filtering is the difficulty in computing the integrals in (14a) and (14b), as their analytical solutions generally do not exist. However, an exceptional case arises when distance metrics are represented as quadratic forms, threshold distributions are chosen as exponential distributions, and virtual noises are characterized as additive Gaussian. In this specific case, it is possible to derive an analytical version of convolutional Bayesian filtering.
Corollary 1.
Consider the following nominal system model
(15) | ||||
If , , and with being exponential coefficients, we have
(16a) | ||||
(16b) |
Proof.
The proof is provided only for the first part, namely, proving the analytical form of in (16a). The second part (16b) can be proved in a similar manner. According to (14a), we have
By completing the square, we have
(17) | ||||
where indicates terms that do not depend on . The integral of (17) over is proportional to
where is the covariance matrix of the convolutional transition probability and is its mean. This results in an analytical form of the convolutional transition probability:
∎
This corollary shows that by using quadratic distance metrics and choosing exponential threshold variables, the covariance matrix of the convolutional transition probability for system (15) essentially becomes the nominal covariance matrix plus a constant matrix related to the exponential coefficient. As the exponential coefficient increases, the exponential distribution becomes more concentrated, with its mean and variance tending towards zero. This implies that the uncertain HMM becomes increasingly deterministic. When the exponential coefficient becomes infinity, the effect of model mismatch diminishes, and the convolutional transition probability reduces to the nominal transition probability . This analysis is equally applicable to convolutional output probability and nominal output probability.
For linear Gaussian case, where the system in (15) satisfies and , standard Bayesian filtering simplifies to the canonical Kalman filter. By using Corollary 1, the canonical Kalman filter can be transformed into its convolutional version by only replacing the covariance matrix of process noise with , and the covariance matrix of measurement noise with .
Remark 5.
The resulting method is an outlier-robust Kalman filter (KF), which we name as convolutional KF (ConvKF). Unlike the robust regression KF that employs Huber loss [14] or correntropy loss [9, 35, 34], and the student-t KF [30, 2] designed for handling non-Gaussian heavy-tailed distributions, ConKF offers several benefits: First, it quantitatively considers the impact of model mismatch with a clear probabilistic meaning; second, it preserves the original structure of KF, maintaining the conjugate nature of Gaussian distributions without increasing the computational burden; third, our method is in alignment with the well-established results for engineering practice of the KF, as discussed in Chapter 6.1 of [4] and Theorem 7.6 of [15]: if the modeling of noise covariance is imprecise, it is common practice to opt for a larger covariance in the KF. This treatment is proven to preserve stability, albeit resulting in a more conservative filter.
4 Approximation via Exponential Density Rescaling
Except for the Gaussian case addressed in Corollary 1, the convolutional conditional probabilities in Bayesian filtering typically lack analytical forms. In this section, we introduce an approximation technique for computing the convolutional conditional probability, namely the exponential density rescaling technique. Moreover, we offer a theoretical explanation for this technique using the theory of information bottleneck.
4.1 Exponential Density Rescaling
When the distance metric is defined in terms of relative entropy, the transition probability and output probability can be approximated by simply reformulating them into exponential forms with fractional powers. Specifically, we have the following theorem:
Theorem 2 (Exponential Approximation).
When the distance metrics and are chosen as
with and representing the empirical distribution, and , with , the convolutional transition probability and convolutional output probability can be approximated as
Proof.
We prove the theorem only for the convolutional output probability, because the proof logic for the convolutional transition probability is analogous and thus omitted. Besides, the proof is confined to cases where the sample space is finite. Supposing the sample space has elements, we define the set of all the probability distributions as
By Proposition 1, we have
According to (5), we have
For any , we define and satisfying . Under the assumption of finite sample space, we denote and . Thus, we have
The chi-squared distance is a second-order Taylor approximation of the relative entropy [12]. That is to say, for any , we have
(18) |
where and . Besides, for all , the chi-squared distance equals
(19) |
where [12]. According to the central limit theorem [23], the empirical distribution can be approximated by a Gaussian distribution [28] with being the nominal distribution. Defining , by (18) and (19), we have
Thus, we finally achieve . ∎
Based on the second order approximation of the relative entropy, this theorem provides an effective way of performing convolutional Bayesian filtering by simply transforming transition probability and output probability using their fractional orders.
Remark 6.
In linear Gaussian systems, the proposed approximation alters the covariances for both transition and measurement noises. Consider, for example, the nominal transition probability ; this becomes , thereby changing the covariance of the transition noise from to . Such a modification aligns with the guidelines of Corollary 1. However, it is crucial to emphasize that while Corollary 1 is confined to Gaussian distributions, Theorem 2 broadens the scope to include any type of distribution. The extensive applicability of this approach is also reflected in the convolutional particle filter algorithm (see Algorithm 1), which is formulated without being limited to any specific distribution type.
4.2 Connection with Information Bottleneck Theory
Previously, we have proved that convolutional Bayesian filtering can be approximated by exponential density rescaling technique. This section will provide a theoretical view of this technique using the information bottleneck theory.
Given the measurement data , the state can be regarded as its compressed representation. Leveraging the information bottleneck theory [36, 6], we can express the information bottleneck objective as
(20) | ||||
s.t. |
where is defined as the mutual information between random variables and . Here, the mutual information and are defined by two joint probability distributions and , which can be decomposed as
The goal of (20) is to maximize the mutual information between the virtual measurement and its compression, the system state, while ensuring that the mutual information between the state and the actual measurement does not exceed . The concept of the “information bottleneck” emerges from the limitation that must not exceed , which requires compressing the information in through a bottleneck, as depicted in Fig. 3(a).
The constrained optimization problem in (20) can be transformed into an unconstrained optimization problem by using the Lagrange multiplier :
(21) |
By leveraging the Markov property (see Fig. 3(b)) [3, 20], (21) can further be rewritten as
(22) |
We can find an approximate upper bound of (22). For the first term, we have:
(23) | ||||
Note that we have due to the Markov property [3, 20]. Because the KL divergence is always positive, (23) can be upper bounded by
The approximate equality in the penultimate line is due to the substitution of expected values with sample values. Besides, the second term is approximately lower bounded by
(24) | ||||
Here, is not real output probability. Instead, it is the value of the nominal output probability at the real measurement data . Combining (23) and (24), we can transform the unconstrained optimization problem in (22) into minimizing its variational lower bound.
(25) | ||||
Here, the entropy of the measurement is omitted because it is a constant term regarding the optimization objective (22). From (25), it can be seen that by setting , the solution to the information bottleneck problem coincides with the update step of convolutional Bayesian filtering approximated by the exponential density rescaling technique. This relationship offers a new perspective for understanding our framework. In more details, the variable serves as a Lagrange multiplier that balances the trade-off between reconstructing information about measurement model and compressing representation of measurement data. As increases, the compression bottleneck becomes less restrictive. When , convolutional Bayesian filtering simplifies to standard Bayesian filtering, and , indicating that information is constructed without bottleneck.
5 Simulations
In this section, we evaluate our proposed framework across three benchmark systems to demonstrate its applicability to classic filtering algorithms in addressing model mismatches. We conduct Monte Carlo experiments with time steps for each simulation. In each experiment, the chosen evaluation metric is the root mean square error (RMSE), which is defined as
RMSE |
Here, stand for the real and estimated state at the -th step. This metric is averaged with 100 experiments for fair performance evaluation in our simulations.
5.1 Linear Wiener Velocity Model
The Wiener velocity model is a well-known standard environment in the field of target tracking, where the velocity is modeled as the Wiener process [42]. The state consists of system positions and system velocities . The Wiener velocity model is described by
Here, the virtual process noise is modelled by with covariance , and virtual measurement noise satisfies with . Additionally, the initial state satisfies .
To evaluate the effectiveness of our designed filter, we consider two different cases for model mismatch, which is a common setting in existing works [14, 30]:
-
•
Case A: Transition Model Mismatch: In the real system, the process noise is a mixture of Gaussian noises, while the measurement noise is Gaussian:
-
•
Case B: Measurement Model Mismatch: The process noise is Gaussian, while the measurement noise is a mixture of Gaussian noises:
Our proposed convolutional Bayesian filtering framework is applied to the KF through the application of Corollary 1, which we have named the ConvKF. We conduct comparisons of ConvKF using various values for parameters defined in Corollary 1, with the standard KF and the Huber KF. Note that Huber KF is a widely-used robust method that replaces the quadratic loss in the optimization formulation of KF with the Huber loss [14]. In Fig. 4, a box plot of the RMSE demonstrates that ConvKF outperforms the standard KF across a broad range of parameters in both cases A and B. Specifically, in case A, altering the exponential coefficient from 0.005 to 0.05 results in an almost unchanged RMSE for ConvKF. In contrast, for case B, adjusting the exponential coefficient from 0.005 to 0.05 leads to a slight increase in the RMSE for ConvKF.
5.2 Sequence Forecasting System
In this subsection, we consider a popular nonlinear system used for sequence forecasting [35]. The state space model is given by
where . Both the constants and are set to . We assume for virtual process noise and virtual measurement noise , respectively. We construct convolutional approaches for EKF and UKF by considering quadratic form distance metrics and an exponential distribution threshold variable, similar to Corollary 1. These approaches are named the Convolutional EKF (ConvEKF) and Convolutional UKF (ConvUKF), respectively. Similar to the discussion in Subsection 5.1, we compare our methods with the standard UKF [16], standard EKF [33], Huber UKF [7], and Iterated EKF (IEKF) [5]. The IEKF is a variant of the EKF that enhances linear approximation to nonlinear systems through iterative updates, thereby improving filter performance. The Huber UKF is a robust version of the UKF, which replaces the quadratic loss in the optimization of the update step with Huber loss. Our comparisons consider the following two cases:
-
•
Case A: Transition Model Mismatch: In the real system, the process noise is a mixture of Gaussian noises, while the measurement noise is Gaussian. Specifically, we have
-
•
Case B: Measurement Model Mismatch: The process noise is Gaussian, while the measurement noise is a mixture of Gaussian noises. This is represented as
As demonstrated in Fig. 5, ConvEKF outperforms the other methods in both case A and case B over a wide range of parameters. Additionally, ConvUKF also shows improvements over the standard UKF, particularly in situations with measurement outliers. Notably, the Huber UKF fails in scenarios with transition model mismatch, possibly because it is designed to enhance robustness by considering the post-prediction prior in the update step, rather than directly incorporating robustness into the prediction step.
5.3 Isothermal Gas-phase Reactor Model
We perform simulation on a commonly used isothermal gas-phase reactor model for state estimation [32]. This model describes the reversible chemical reaction . Initially, the reactor is charged with certain amounts of and , but the exact composition of the original mixture remains uncertain. The state includes the partial pressures, i.e., . The discrete-time version of the gas-phase reactor model with Euler method is
where , , , and . The virtual process noise satisfies with , and the virtual measurement noise satisfies with . For our subsequent verification, we will also set up two different simulations similar to the Section 5.1:
-
•
Case A, Transition model mismatch: The measurement noise obeys Laplace distribution while the process noise is a mixture of Laplace noise, i.e.,
-
•
Case B, Measurement model mismatch: The process noise obeys Laplace distribution while the measurement noise is a mixture of Laplace noise, i.e.,
We apply our proposed convolutional Bayesian filtering framework to PF, approximated using exponential density rescaling, and refer to it as ConvPF, as shown in Algorithm 1. Our method is compared with standard PF, auxiliary PF (APF) [29], and student-t PF (STPF) [38]. Note that APF and STPF are two widely used robust algorithms. APF introduces an auxiliary variable to select particles based on both their weights and the likelihood of the current observation prior to the actual resampling step. This method focuses computational resources on more promising particles, enhancing the filter’s performance, particularly in scenarios with tailed observation densities. On the other hand, STPF employs the Student’s t distribution, which has heavier tails, making it more capable of handling extreme values or deviations from normal assumptions.
As depicted in Fig. 6, our approach yields the minimum estimation error in scenarios involving both transition (Case A), and measurement model mismatches (Case B). Although PF exhibits a marginal enhancement in RMSE over the standard PF, the improvement is not significant. STPF shows varied performance; in Case A, the STPF’s median RMSE is marginally better than that of PF, yet its overall variance and average RMSE are notably higher. In Case B, the STPF does offer an improvement compared to the PF. However, our ConvPF method, with tuning parameters and , consistently outperforms the other methods.
6 Conclusion and Discussion
This paper extends the definition of conditional probability and introduces a convolutional Bayesian filtering framework by transforming transition and output probabilities into convolutional forms, broadening the scope of Bayesian filtering. We demonstrate that convolutional Bayesian filtering possesses analytical forms of convolution operation in systems with Gaussian noises. For non-Gaussian cases, the transition and output probabilities can be effectively approximated by scaling them into fractional powers, when employing the relative entropy as the distance measure. This leads to an enhanced version of the Kalman filter, which achieves robustness through simple modifications to the noise covariance matrix, while still preserving the conjugate nature of Gaussian distributions. The practical efficacy of convolutional Bayesian filtering is demonstrated through its application to various common filtering algorithms, including the Kalman filter, extended Kalman filter, unscented Kalman filter, and particle filter.
In this paper, our primary focus is the generalization of Bayesian filtering theory to a convolutional form. Bayesian filtering undeniably forms the foundation of optimal filtering theory for discrete-time systems, highlighting the significance and applicability of our extension. Nevertheless, it’s also crucial to acknowledge the distinctive aspects of filtering theory for continuous-time systems. In these systems, the conditional density function of states typically derives from numerical solutions of Kusher’s or Duncan-Mortensen-Zakai’s equations [41, 1], rather than Bayes’ law. A notable advancement in this domain is the Yau-Yau method [39, 40], which is rigorously proven to converge to a global solution (a type of convergence otherwise only seen in particle filters in discrete-time systems) and can be pre-computed offline, facilitating real-time applications [10, 27]. While we do not explore how to apply our approach to continuous-time systems in this paper, such an extension is a compelling future research avenue.
References
- [1] Optimal control of continuous-time stochastic systems, author=Mortensen, Richard Edgar. University of California, Berkeley, 1966.
- [2] Gabriel Agamennoni, Juan I Nieto, and Eduardo M Nebot. Approximate inference in state-space models with heavy-tailed noise. IEEE Transactions on Signal Processing, 60(10):5024–5037, 2012.
- [3] Alexander A Alemi. Variational predictive information bottleneck. In Symposium on Advances in Approximate Bayesian Inference, pages 1–6. PMLR, 2020.
- [4] Brian DO Anderson and John B Moore. Optimal filtering. Courier Corporation, 2012.
- [5] Bradley M Bell and Frederick W Cathey. The iterated Kalman filter update as a Gauss-Newton method. IEEE Transactions on Automatic Control, 38(2):294–297, 1993.
- [6] William Bialek, Ilya Nemenman, and Naftali Tishby. Predictability, complexity, and learning. Neural computation, 13(11):2409–2463, 2001.
- [7] Zhu Bing, Chang Lubin, Jiangning Xu, et al. Huber-Based Adaptive Unscented Kalman Filter with Non-Gaussian Measurement Noise [J]. Circuits Systems and Signal Processing, 37(12):1–21, 2018.
- [8] Wenhan Cao, Chang Liu, Zhiqian Lan, Yingxi Piao, and Shengbo Eben Li. Generalized moving horizon estimation for nonlinear systems with robustness to measurement outliers. In 2023 American Control Conference (ACC), pages 1614–1621. IEEE, 2023.
- [9] Badong Chen, Xi Liu, Haiquan Zhao, and Jose C Principe. Maximum correntropy Kalman filter. Automatica, 76:70–77, 2017.
- [10] Xiuqiong Chen, Ji Shi, and Stephen S-T Yau. Real-time solution of time-varying yau filtering problems via direct method and gaussian approximation. IEEE Transactions on Automatic Control, 64(4):1648–1654, 2018.
- [11] Zhe Chen et al. Bayesian filtering: From Kalman filters to particle filters, and beyond. Statistics, 182(1):1–69, 2003.
- [12] Thomas M Cover. Elements of information theory. John Wiley & Sons, 1999.
- [13] Karl J Friston, William Penny, Christophe Phillips, S Kiebel, G Hinton, and John Ashburner. Classical and Bayesian inference in neuroimaging: theory. NeuroImage, 16(2):465–483, 2002.
- [14] Peter J Huber. Robust statistics, volume 523. John Wiley & Sons, 2004.
- [15] Andrew H Jazwinski. Stochastic processes and filtering theory. Courier Corporation, 2007.
- [16] Simon J Julier and Jeffrey K Uhlmann. New extension of the Kalman filter to nonlinear systems. In Signal processing, sensor fusion, and target recognition VI, volume 3068, pages 182–193. Spie, 1997.
- [17] Simon J Julier and Jeffrey K Uhlmann. Unscented filtering and nonlinear estimation. Proceedings of the IEEE, 92(3):401–422, 2004.
- [18] R.E. Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1):35–45, 1960.
- [19] Rudolph E Kalman and Richard S Bucy. New results in linear filtering and prediction theory. 1961.
- [20] Jeremias Knoblauch, Jack Jewson, and Theodoros Damoulas. An optimization-centric view on Bayes’ rule: Reviewing and generalizing variational inference. Journal of Machine Learning Research, 23(132):1–109, 2022.
- [21] A Kolmogorov. Interpolation and extrapolation of stationary random sequences. Izvestiya Rossiiskoi Akademii Nauk. Seriya Matematicheskaya, 5:3, 1941.
- [22] Rahul Krishnan, Uri Shalit, and David Sontag. Structured inference networks for nonlinear state space models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
- [23] Sang Gyu Kwak and Jong Hae Kim. Central limit theorem: the cornerstone of modern statistics. Korean journal of anesthesiology, 70(2):144–156, 2017.
- [24] Shengbo Eben Li. Reinforcement learning for sequential decision and optimal control. Springer, 2023.
- [25] Chang Liu, Shengbo Eben Li, Diange Yang, and J. Karl Hedrick. Distributed Bayesian Filter Using Measurement Dissemination for Multiple Unmanned Ground Vehicles With Dynamically Changing Interaction Topologies. Journal of Dynamic Systems, Measurement, and Control, 140(3):030903, 11 2017.
- [26] Jun S Liu and Rong Chen. Sequential Monte Carlo methods for dynamic systems. Journal of the American statistical association, pages 1032–1044, 1998.
- [27] Xue Luo and Stephen S-T Yau. Complete real time solution of the general nonlinear filtering problem without memory. IEEE Transactions on Automatic Control, 58(10):2563–2578, 2013.
- [28] Jeffrey W Miller and David B Dunson. Robust Bayesian inference via coarsening. Journal of the American Statistical Association, 2018.
- [29] Michael K Pitt and Neil Shephard. Filtering via simulation: Auxiliary particle filters. Journal of the American statistical association, 94(446):590–599, 1999.
- [30] Michael Roth, Emre Özkan, and Fredrik Gustafsson. A Student’s t filter for heavy tailed process and measurement noise. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 5770–5774. IEEE, 2013.
- [31] Simo Sarkka and Aapo Nummenmaa. Recursive noise adaptive Kalman filtering by variational Bayesian approximations. IEEE Transactions on Automatic control, 54(3):596–600, 2009.
- [32] Julian D Schiller, Simon Muntwiler, Johannes Köhler, Melanie N Zeilinger, and Matthias A Müller. A Lyapunov function for robust stability of moving horizon estimation. IEEE Transactions on Automatic Control, 2023.
- [33] Gerald L Smith, Stanley F Schmidt, and Leonard A McGee. Application of statistical filter theory to the optimal estimation of position and velocity on board a circumlunar vehicle, volume 135. National Aeronautics and Space Administration, 1962.
- [34] Yangtianze Tao, Jiayi Kang, and Stephen Shing-Toung Yau. Maximum Correntropy Ensemble Kalman Filter. In 2023 62nd IEEE Conference on Decision and Control (CDC), pages 8659–8664. IEEE, 2023.
- [35] Yangtianze Tao and Stephen Shing-Toung Yau. Outlier-Robust Iterative Extended Kalman Filtering. IEEE Signal Processing Letters, 2023.
- [36] Naftali Tishby, Fernando C Pereira, and William Bialek. The information bottleneck method. arXiv preprint physics/0004057, 2000.
- [37] Norbert Wiener. Extrapolation, interpolation, and smoothing of stationary time series: with engineering applications. The MIT press, 1949.
- [38] Dingjie Xu, Chen Shen, and Feng Shen. A robust particle filtering algorithm with non-Gaussian measurement noise using student-t distribution. IEEE Signal Processing Letters, 21(1):30–34, 2013.
- [39] Shing-Tung Yau and Stephen S-T Yau. Real time solution of nonlinear filtering problem without memory I. Mathematical Research Letters, 7(6):671–693, 2000.
- [40] Shing-Tung Yau and Stephen S-T Yau. Real time solution of the nonlinear filtering problem without memory II. SIAM Journal on Control and Optimization, 47(1):163–195, 2008.
- [41] Moshe Zakai. On the optimal filtering of diffusion processes. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 11(3):230–243, 1969.
- [42] Zhengxin Zhang, Xiaosheng Si, Changhua Hu, and Yaguo Lei. Degradation data analysis and remaining useful life estimation: A review on Wiener-process-based methods. European Journal of Operational Research, 271(3):775–796, 2018.