Coded Kalman Filtering over MIMO Gaussian Channels with Feedback

Barron Han, Victoria Kostina, Babak Hassibi Department of Electrical Engineering
Caltech
Pasadena, CA, USA
Email: {bshan, vkostina, hassibi}@caltech.edu Oron Sabag School of Computer Science and Engineering
The Hebrew University of Jerusalem
Jerusalem, Israel
Email: [email protected]

Abstract

We consider the problem of remotely stabilizing a linear dynamical system. In this setting, a sensor co-located with the system communicates the system’s state to a controller over a noisy communication channel with feedback. The objective of the controller (decoder) is to use the channel outputs to estimate the vector state with finite zero-delay mean squared error (MSE) at the infinite horizon. It has been shown in [1] that for a vector Gauss-Markov source and either a single-input multiple-output (SIMO) or a multiple-input single-output (MISO) channel, linear codes require the minimum capacity to achieve finite MSE. This paper considers the more general problem of linear zero-delay joint-source channel coding (JSCC) of a vector-valued source over a multiple-input multiple-output (MIMO) Gaussian channel with feedback. We study sufficient and necessary conditions for linear codes to achieve finite MSE. For sufficiency, we introduce a coding scheme where each unstable source mode is allocated to a single channel for estimation. Our proof for the necessity of this scheme relies on a matrix-algebraic conjecture that we prove to be true if either the source or channel is scalar. We show that linear codes achieve finite MSE for a scalar source over a MIMO channel if and only if the best scalar sub-channel can achieve finite MSE. Finally, we provide a new counter-example demonstrating that linear codes are generally sub-optimal for coding over MIMO channels.

I Introduction

Controlling an unstable plant over a noisy communication channel is a hurdle for emerging technologies such as autonomous vehicles, Internet of Things devices, and remote surgery systems. This problem setting deviates from Shannon’s communication problem in two ways that make it more challenging [2]. First, in the control setting, the data to be transmitted correspond to physical measurements and arrive in a streaming fashion instead of being made available in its entirety before transmission. Thus, we must design causal encoders and decoders for this task. Second, typical control systems are unstable, and their stabilization requires near-instantaneous and accurate estimates of the plant’s state to produce effective control actions. Consequently, codes must be low-delay yet highly reliable to perform control tasks over communication channels. We employ a class of low-delay joint-source channel codes to address the two objectives.

In a seminal paper, Sahai and Mitter [3] proved that Shannon’s channel capacity is an insufficient characterization of channel quality when the goal is to stabilize a system over a channel. They introduced the notion of anytime capacity, which is in general upper-bounded by channel capacity, as an alternative measure. While the anytime capacity of a channel provides a useful converse on the channel quality required to stabilize a specific system, achievability schemes are generally open. A class of tree codes such as those studied by Schulman [4] achieve error probabilities that decay exponentially with the delay since a source symbol was emitted. While tree codes exist for a large class of discrete channels, they are only known to be efficiently decodable in limited settings [5].

A noiseless feedback channel connecting the decoder back to the encoder does not improve the Shannon capacity of the channel [6], but feedback can significantly simplify code design and improve the reliability-delay trade-offs for communication [7, 8]. Noiseless feedback channels are reasonable assumptions when the receiver has access to more power than the transmitter, which is often the case in control systems since the controller must provide essentially noiseless control inputs. Coding for bit-streaming sources over discrete channels with feedback, which is relevant to control systems where the state has been quantized for digital transmission, has been studied in [9, 5, 10]. This paper considers a setting where measurement and coding are analog operations applied in discrete time.

Consider the problem of estimating a vector-valued plant, modeled as a Gauss-Markov source, over a multiple-input multiple-output (MIMO) additive white Gaussian noise (AWGN) channel with feedback. The causal rate-distortion function [11] provides a lower bound to the channel capacity necessary for causally estimating the source subject to a given distortion over this channel [12]. The causal rate-distortion functions for both scalar and vector Gauss-Markov sources have been studied in [11] and [13] respectively. The lower bound to channel capacity provided by the causal rate-distortion function is known to be tight only when the source is matched to the channel at hand [14]. For example, a scalar Gauss-Markov source is matched to the scalar AWGN channel [11, 3].

When the criterion is finite MSE, a known converse result is that the Shannon capacity should be greater than the sum of logs of unstable eigenvalues of the source [15, Thm. 4.1]. In the case of a scalar source and a scalar channel, this bound is tight and can be achieved by a linear innovations’ encoder [16, 3, 17, 18]. For a vector source and parallel Gaussian channels with independent power constraints, [19] also proposes a periodic linear scheme leading to sufficient conditions for achieving finite MSE. The case of a vector source with scalar channel was studied in [1], which showed that the Shannon capacity remains a necessary and sufficient measure even though the source and channel dimensions are not matched.

This paper considers the general case of vector source and MIMO channels. We first focus on the fundamental limits of achieving finite MSE using linear time-invariant codes. The innovations’ encoder that generates channel inputs as a function of the source estimation error (at the decoder) is optimal for this general problem [1]. The sequential encoder’s structure implies that the optimal decoder is a Kalman filter and its MSE can be analyzed with linear estimation theory.

Our first result is a sufficient condition to achieve finite MSE by partitioning the vector source to different sub-channels. The analysis is carried out by showing an equivalence between achieving finite MSE and the existence of a stabilizing solution to a DARE equation. The sufficient condition (achievability) is then shown to be necessary for two cases including the scenario of a scalar and a MIMO channel. In particular, it is shown that allocating the entire power to the best sub-channel is optimal, while typical water-filling solutions that distribute the power among the sub-channels are sub-optimal. Indeed, this example reveals that the Shannon channel capacity is not the figure of merit if the objective is finite MSE with linear codes. Motivated by this result, we define the linear stabilizing capacity (LSC) as an optimization problem, which is in general a lower bound to the channel capacity. Finite MSE is achievable using linear codes if and only if there exists a feasible solution. The optimization of the LSC is non-convex, but we are able to utilize it to show that linear codes are not optimal by comparing the LSC with rates that can be achieved using non-linear Shannon-Kotel’nikov map**s for a specific source-channel pair. The general case of our problem remains open but, based on numerical observation, we conjecture that the partitioning property is necessary. The equivalence to the DARE feasibility allows us to extract an algebraic condition that if it holds true then partitioning schemes achieve the fundamental limits.

The paper is organized as follows. Section II specifies the source and channel models and defines zero-delay joint source-channel codes with an MSE performance criterion. Section III presents an optimal linear code structure and applies it to the MIMO channel setting. It also defines the linear stabilizing capacity. Section IV presents our main contributions on the sufficient and necessary conditions for finite estimation error of a vector source over a MIMO Gaussian channel using linear codes and demonstrates that, in general, linear coding is not optimal.

II Problem Setup

II-A Notation

We denote by $\{\mathbf{X}_{t}\}_{t=0}^{T}$ a discrete time random process and we denote the vector $X^{t}\triangleq\{x_{0},x_{1},\ldots,x_{t}\}$ . We write $\mathbf{X}\sim\mathcal{N}(\mu,\Sigma)$ to say that the random vector $\mathbf{X}$ has a Gaussian distribution with mean $\mathbb{E}[\mathbf{X}]=\mu$ and covariance matrix $\mathrm{Cov}[\mathbf{X}]=\Sigma$ . Matrices and vectors are denoted with uppercase letters, while scalars are denoted with lowercase mathematical font. Sets are denoted using the calligraphic font.

II-B System Model

The setup is depicted in Figure 1. We define its main components: a MIMO AWGN channel, a Gauss-Markov streaming source, and a zero-delay code.

Definition 1 (MIMO AWGN Channel)

The channel accepts a vector input $\mathbf{X}_{t}\in\mathbb{R}^{n}$ and produces a vector output $\mathbf{Y}_{t}\in\mathbb{R}^{m}$ ,

\mathbf{Y}_{t}=H\mathbf{X}_{t}+\mathbf{Z}_{t},\ t\geq 1.

(1)

$H\in\mathbb{R}^{m\times n}$ is a deterministic, fixed channel gain matrix and the noise is $\mathbf{Z}_{t}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathcal{N}(0,R).$

We consider a channel with a diagonal channel matrix, $H=\mathrm{diag}\{h_{1},\ldots,h_{n}\}$ , and identity noise covariance, $\mathbf{Z}_{t}\sim\mathcal{N}(0,I)$ . This is without loss of generality as any channel can be diagonalized to this form [20, Thm. 9.1].

The streaming source in Figure 1 is a Gauss-Markov source.

Definition 2 (Gauss-Markov source)

The Gauss-Markov source evolves according to the linear dynamical system:

\mathbf{S}_{t+1}=A\mathbf{S}_{t}+\mathbf{W}_{t},

(2)

where $A\in\mathbb{R}^{k\times k}$ , $\mathbf{W}_{t}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathcal{N}(0,Q)$ and the initial state is $\mathbf{S}_{0}\sim\mathcal{N}(0,Q)$ .

Without loss of generality, we assume that the matrix $A$ can be written as

A=\left[\begin{array}[]{cc}A_{s}&0\\ 0&A_{u}\end{array}\right],

(3)

where $A_{s}$ is stable (eigenvalues on and inside unit circle), $A_{u}$ is strictly unstable (eigenvalues outside unit circle) and both $A_{s},A_{u}$ are in Jordan form.

We make the following assumptions about our system:

Assumption 1

The pair $(A,Q)$ is controllable.

Assumption 2

The source is strictly unstable, so $A_{s}=0$ , and $A_{u}$ in (3) has distinct eigenvalues.

Assumption 1 guarantees that the error covariance is positive definite [21, Apx. C]. Assumption 2 yields a cleaner analysis since we can take $A$ to be diagonal without loss of generality. Our results hold in the general case where $A_{u}$ is in Jordan block form. Assuming that the source is unstable does not lose generality since the stable part of the source can have a finite estimation error even if no communication is allowed. We limit the source to being strictly unstable so that there are no eigenvalues on the unit circle and a Lyapunov equation of the form $X=AXA^{T}+W$ has a unique solution [21, Lem. D.1.1]. This assumption is common in classical linear estimation theory [21, Apx. C].

Definition 3 (A zero-delay joint-source channel feedback code)

The feedback code for the source-channel pair in Definitions 1 and 2 consists of the following:

An encoder that at time $t$ has access to $\mathbf{S}^{t}$ and $\mathbf{Y}^{t-1}$ and generates

\mathbf{X}_{t}=f_{t}(\mathbf{S}^{t},\mathbf{Y}^{t-1}),\ t\geq 1

(4)

where $f_{t}\colon\mathcal{S}^{t}\times\mathcal{Y}^{t-1}\mapsto\mathbb{R}^{n}$ and $\mathcal{S}=\mathbb{R}^{k},\mathcal{Y}=\mathbb{R}^{m}$ . The channel inputs must satisfy an average power constraint,

\frac{1}{T}\sum_{t=1}^{T-1}\mathbb{E}[\mathbf{X}_{t}^{T}\mathbf{X}_{t}]\leq p,

(5)

over the time horizon $T$ .

A decoder that at time $t$ predicts the next source state,

\hat{\mathbf{S}}_{t}=g_{t}(\mathbf{Y}^{t-1}).

(6)

For a given code in Definition 3, we denote the predicted error covariance

P_{t}\triangleq\mathrm{Cov}(\mathbf{S}_{t}-\hat{\mathbf{S}}_{t}).

(7)

Note that for an encoder $\{f_{t}\}_{t=0}^{\infty}$ , the decoder

\hat{\mathbf{S}}_{t}=\mathbb{E}[\mathbf{S}_{t+1}|\mathbf{Y}^{t}]

(8)

minimizes the MSE

D_{t}\triangleq\mathrm{Tr}\left(\mathrm{Cov}(\mathbf{S}_{t}-\hat{\mathbf{S}}_{% t})\right).

(9)

We denote the asymptotic MSE

D\triangleq\text{limsup}_{t\to\infty}D_{t}.

(10)

In this paper, we study conditions for there to exist a linear encoder $\{f_{t}\}_{t=0}^{\infty}$ such that $D<\infty$ .

This setting is relevant to the control problem where the source in (2) is modified to include a control input $U_{t}$ ,

S_{t+1}=AS_{t}+BU_{t}+W_{t}

(11)

where $B$ is a constant matrix, and the decoder decides $U_{t}$ . Provided that $U^{t}$ are available at the encoder at time $t$ , the classical result [22] of certainty equivalence holds, implying the system in (11) is stabilizable if and only if the controller (decoder) can estimate the source with finite MSE, $D<\infty$ (10).

Refer to caption — Figure 1: A MIMO AWGN channel, described in Definition 1, with a noiseless feedback link is shown. The Gauss-Markov source, described in Definition 2, produces information at every time, which is encoded and passed through the channel. The decoder seeks to estimate the source at the next time given all channel outputs. We display the optimal encoding structure of Lemma 1.

III A Linear Code

In this section, we present the optimal linear code and discuss the performance of linear codes in Definition 3.

III-A Optimal Linear Code Structure

A general linear encoder has the form

\mathbf{X}_{t}=\Xi_{t}(\mathbf{S}^{t})+\Psi_{t}(\mathbf{Y}^{t-1})+\mathbf{M}_{% t},

(12)

where $\mathbf{M}_{t}$ is a Gaussian random variable that is independent of $(\mathbf{S}^{t},\mathbf{Y}^{t-1})$ . While this encoder involves all past states and channel outputs from feedback, the next result establishes a simplified optimal code structure involving only the recent state estimate error.

Lemma 1 (Innovations encoder)

The optimal linear encoder can be written as

\mathbf{X}_{t}=\tilde{\Gamma}_{t}P_{t}^{-1}(\mathbf{S}_{t}-\hat{\mathbf{S}}_{t% })+\mathbf{M}_{t},

(13)

where $\mathbf{M}_{t}\sim\mathcal{N}(0,\Omega_{t})$ is independent of $\mathbf{S}_{t}-\hat{\mathbf{S}}_{t}$ , and $\hat{\mathbf{S}}_{t}$ is the optimal decoder’s estimate (8), given recursively by

\hat{\mathbf{S}}_{t}=A\hat{\mathbf{S}}_{t-1}+\mathrm{K}_{t}(\mathbf{Y}_{t-1}-% \tilde{\Gamma}_{t}H\hat{\mathbf{S}}_{t-1}),

(14)

where $K_{t}=A\tilde{\Gamma}_{t}^{T}H^{T}\left(H\tilde{\Gamma}_{t}P_{t}^{-1}\tilde{% \Gamma}_{t}^{T}H^{T}+I\right)^{-1}$ , $\hat{\mathbf{S}}_{0}=0$ , and the error covariance $P_{t}=\mathrm{Cov}(\mathbf{S}_{t}-\hat{\mathbf{S}}_{t})$ . Encoder parameters $\tilde{\Gamma}_{t},\Omega_{t}$ must satisfy the power constraint (5)

\frac{1}{T}\sum_{t=1}^{T-1}\mathrm{Tr}(\tilde{\Gamma}_{t}P_{t}^{-1}\tilde{% \Gamma}_{t}^{T}+\Omega_{t})\leq p.

(15)

Intuitively, the encoder can communicate what is currently unknown to the decoder with minimal power by transmitting the innovation $\mathbf{S}_{t}-\hat{\mathbf{S}}_{t}$ . The decoder’s prediction, $\hat{\mathbf{S}}_{t}$ , is computed using channel feedback. Compared to [1, Lem. 1], Lemma 1 above introduces an independent additive term $\mathbf{M}_{t}$ in (13). Its covariance $\Omega_{t}$ can be chosen at will to allow more freedom in selecting the channel input distribution.

We make the following assumption on our encoder, which simplifies our analysis of the infinite horizon estimation error.

Assumption 3

The encoder in Definition 3 is time-invariant, meaning that $\tilde{\Gamma}_{t}=\tilde{\Gamma},\Omega_{t}=\Omega,\forall t$ in (13).

The encoding structure in Lemma 1 reveals a state space model, defined by (13) and (1), that admits a Kalman filter solution (14). Consequently, the Riccati recursions in Lemma 2, stated next, give the estimation error at the infinite horizon. This lemma generalizes Lemma 2 in [1].

Lemma 2 (Riccati recursions and convergent behavior)

The prediction error covariance, $P_{t}$ , defined in (7), of the optimal linear code introduced in Lemma 1 evolves according to a Riccati recursion that either diverges or converges to the stabilizing solution $P$ of the DARE [21, Sec. E.4]: {dmath} P = APA^T + Q - A~Γ^T H^T ⋅
⋅ (I+H (~ΓP^-1 ~Γ^T + Ω) H^T )^-1H~ΓA^T, where asymptotically, the power constraint (15) becomes

\mathrm{Tr}(\tilde{\Gamma}P^{-1}\tilde{\Gamma}^{T}+\Omega)\leq p.

(16)

The asymptotic MSE is computed as

D=\mathrm{Tr}(P),

(17)

where $P$ is the stabilizing solution of (2).

III-B The Linear Stabilizing Capacity

We express the channel input covariance (16) as

\Pi=\tilde{\Gamma}P^{-1}\tilde{\Gamma}^{T}+\Omega,

(18)

and define a new measure of channel capacity, the linear stabilizing capacity (LSC), below.

Definition 4 (Linear stabilizing capacity)

The linear stabilizing capacity of a MIMO AWGN Channel (Def. 1) for a Gauss-Markov source (Def. 2) is

\displaystyle\mathrm{LSC}(p)=\sup_{\begin{subarray}{c}\Pi\succeq 0,\ P\succeq 0% ,\ \Omega,\ \tilde{\Gamma}\colon\\ \mathrm{Tr}(\Pi)\leq p,\ \Omega\succeq 0,\ \eqref{DARE},\ \eqref{Pi_input_def}% \end{subarray}}\ \frac{1}{2}\log\det(I+H\Pi H)

(19)

Finite MSE is achievable under power constraint, $p$ (16), with linear encoders if and only if the constraint set in (19) is feasible. In Section IV, we will characterize this feasibility condition in terms of the source and channel parameters.

In general,

\mathrm{LSC}(p)\leq C(p),

(20)

where $C(p)$ is the Shannon capacity, expressed as

\displaystyle C(p)=\sup_{\begin{subarray}{c}\Pi\succeq 0\colon\\ \mathrm{Tr}(\Pi)\leq p\end{subarray}}\ \frac{1}{2}\log\det(I+H\Pi H).

(21)

The $\Pi$ that solves (21) is known as the water-filling solution [20, Th. 9.1]. The linear stabilizing capacity (19) adds the additional constraints (2), (16) and (18), so a solution $\Pi$ of (19) is generally sub-optimal in (21).

IV Main Results

We present sufficient and necessary conditions for finite MSE achievable with linear encoders in the transmission of a vector-valued source over an arbitrary rank channel. We also investigate whether linear encoders are optimal for MIMO channels.

IV-A Linear Coding for MIMO Channels

First, we present a sufficient condition for linear codes to achieve finite MSE.

Theorem 1 (MIMO channels - sufficiency)

In zero-delay JSCC (Def. 3) of a $k$ -dimensional Gauss-Markov source (Def. 2) for transmission over an $n$ -input MIMO AWGN channel (Def. 1) with power constraint $p$ , finite asymptotic error, $D<\infty$ (10), is achievable if there exists an $n$ -set partition, $\{\mathcal{S}_{i}\}_{i=1}^{n}$ , of $\{1,\ldots,k\}$ , such that

\sum_{j\in\mathcal{S}_{i}}\log|\lambda_{j}|<C_{i}(\pi_{i}),\ \forall i=1,% \ldots,n

(22)

where $\lambda_{j}$ are the eigenvalues of $A$ (Def. 2), $C_{i}(\pi_{i})=\frac{1}{2}\log(1+h_{i}^{2}\pi_{i})$ is the Shannon capacity of the $i$ th channel with power $\pi_{i}\geq 0$ and

\sum_{i=1}^{n}\pi_{i}=p.

(23)

Each set, $\mathcal{S}_{i}$ , of the partition $\{\mathcal{S}_{i}\}_{i=1}^{n}$ contains the indices of the unstable modes assigned to channel $i$ . Thus, Theorem 1 allocates each unstable mode of the source to a single channel output and defines a power allocation over the channels that allows each channel to stabilize its assigned modes independently. See Figure 2 for an example. We show in Theorem 2 that this partitioning property is necessary if either the source or the channel is scalar.

Theorem 2 (Scalar source or scalar channel)

If either $a\in\mathbb{R}$ or $h\in\mathbb{R}$ , finite asymptotic error, $D<\infty$ (10), is achievable by linear codes if and only if there exists a partition $\{\mathcal{S}_{i}\}_{i=1}^{n}$ and power allocation vector $\pi$ satisfying (22), (23) in Theorem 1.

Case I (Scalar channel): The vector-source and scalar-channel scenario was shown in [1]. In this setting $h\in\mathbb{R}$ , so the conditions of Theorem 1 reduce to $\mathcal{S}_{1}=\{1,\ldots,k\}$ , $\pi=p$ . We can explicitly compute

\mathrm{LSC}(p)=C(p)=\frac{1}{2}\log(1+h^{2}p)

(24)

Case II (Scalar source): In the scalar source and MIMO channel setting, $a\in\mathbb{R}$ . The sets $\{\mathcal{S}_{i}\}_{i=1}^{n}$ as defined in Theorem 1 satisfy $\mathcal{S}_{i}=\{1\}$ and $\mathcal{S}_{l}=\emptyset$ for all $l\neq i$ . From (22), finite MSE is achievable by linear encoders if and only if

\log|a|<\max_{i}C_{i}(p).

(25)

Generally, for scalar sources and MIMO channels, $\mathrm{LSC}(p)<C(p)$ , which we show in Theorem 4.

Finally, particularizing Theorem 2 to scalar Gauss-Markov sources that are transmitted over a scalar AWGN channel recovers the classical result from [11] that the finite MSE is achievable if and only if $\log|a|<C(p),$ where $\mathrm{LSC}(p)=C(p)=\frac{1}{2}\log(1+h^{2}p)$ [16, 3, 17].

It is unknown whether the partitioning property of Theorem 1 is necessary for vector sources and MIMO channels. Towards this goal, we pose the following matrix-algebraic conjecture, which, if true, implies the necessity of the partitioning property in Theorem 1.

Conjecture 1 (Lyapunov positivity)

Let $J$ be the unique positive solution to the Lyapunov equation {dmath} J = B J B - Γ^T (I + H ΠH)^-1 Γ+ B Γ^T ΓB where $J\in\mathbb{R}^{k\times k}$ , $\Gamma\in\mathbb{R}^{n\times k}$ , $B=\mathrm{diag}(b_{1},\ldots,b_{k}),b_{i}<1\ \forall i$ , and $H=\mathrm{diag}(h_{1},\ldots,h_{n})$ . There exists an optimal solution $\Gamma^{*}$ to

\inf_{\begin{subarray}{c}\Pi\succeq 0,J\succ 0,\Gamma\colon\\ \eqref{Jlyap_cond},\ \Pi\text{ is diagonal}\end{subarray}}\mathrm{Tr}(\Pi)

(26)

that has exactly one non-zero entry per column.

We observe that Conjecture 1 holds numerically for all systems we have simulated with $n,k\in\{2,3\}$ .

Theorem 3 (Vector source and MIMO channel - necessity)

Assume Conjecture 1 holds. In linear zero-delay JSCC (Def. 3) of a vector Gauss-Markov source (Def. 2) and MIMO AWGN channel (Def. 1), finite asymptotic error, $D<\infty$ (10), is achievable if and only if there exists a partition $\{\mathcal{S}_{i}\}_{i=1}^{n}$ and power allocation vector $\pi$ satisfying (22), (23) in Theorem 1.

If Theorem 3 holds, the necessary and sufficient conditions coincide and (22) reduces to

\sum_{i=1}^{n}\ \log|\lambda_{i}|\ \leq\mathrm{LSC}(p)=\sup_{\begin{subarray}{% c}\pi\colon\ \pi_{i}\geq 0\\ \eqref{mimo_cond_part},\ \eqref{powerc}\end{subarray}}\ \sum_{i=1}^{n}C_{i}(% \pi_{i}).

(27)

Figure 2: Example assignment of source components into the channels and associated power allocations for each channel as described in Theorem 1. A source can only be allocated to a single channel, and every source must be allocated to a channel. Consequently, the sets

\{\mathcal{S}_{i}\}_{i=1}^{n}

form a partition of

\{1,\ldots,k\}

. The power allocations must satisfy (23) where

p

is the total power constraint.

IV-B Linear Codes are NOT Optimal for MIMO Channels

The optimality of linear encoding, sometimes referred to as uncoded transmission, has been studied in [14]. In this section, we establish that linear encoders are generally sub-optimal for achieving finite MSE over MIMO channels and present a sufficient optimality condition for linear encoders.

Theorem 4

(Linear coding is sub-optimal for MIMO channels) Linear codes are not optimal for the zero-delay JSCC problem of a Gauss-Markov source over a MIMO AWGN channel (Def. 1 - 3). Equivalently, there exist source-channel pairs $(A,H,p)$ (Def. 1, Def. 2, (5)) such that all linear codes result in $D=\infty$ , while there exist non-linear codes that can achieve $D<\infty$ .

Proof:

We show this via a counterexample. Consider the setting of a scalar Gauss-Markov source and an arbitrary rank AWGN channel studied in Theorem 2. Let $H=I_{2}$ and $p\gg 1$ , which correspond to the high SNR regime given two identical channels. Shannon-Kotel’nikov map**s as studied in [17, Thm. 7.1] can achieve finite estimation error if

|a|<C(p)-o(1)

(28)

as $p\to\infty$ , while by Theorem 2, a linear encoder achieves finite estimation error only if

|a|<\frac{1}{2}\ C(p).

(29)

This follows from $\max_{i}C_{i}(p)=\frac{1}{2}\log(1+p)$ in (25) while $C(p)=2\cdot\frac{1}{2}\log(1+\frac{p}{2})$ since the water-filling solution places power $\frac{p}{2}$ in each of the parallel channels. Taking the limit $p\to\infty$ , $\max_{i}C_{i}(p)=\frac{1}{2}\ C(p).$ ∎

Next, we will show a sufficient condition for linear codes to achieve finite MSE with minimum power. To do so, we introduce source channel matching in the stability sense.

Definition 5

(Source-channel matching in the stability sense) The source (Def. 2) and the channel (Def. 1) are matched in the stability sense if

\sum_{i=1}^{n}\log\ |\lambda_{i}|\ =\mathrm{LSC}(p)=C(p),

(30)

where $\mathrm{LSC}(p)$ is the linear stabilizing capacity (Def. 4) and $C(p)$ is given by (21).

If a source (Def. 2) and channel (Def. 1) are matched in the stability sense, linear codes require the minimum power to stabilize the system. Source-channel matching in Definition 5 holds if and only if $\Pi$ (19) coincides with the waterfilling solution to $C(p)$ (21). In that case, condition (22) in Theorem 1 reduces to the converse result from [15, Thm. 4.1] stating that finite MSE is achievable only if

\sum_{i=1}^{k}\log\left(|\lambda_{i}|\right)<C(p).

(31)

Consequently, source-channel matching ensures linear codes perform as well as the best nonlinear codes.

V Conclusion

This paper contributes to a complete characterization of the performance of linear codes in zero-delay joint-source channel coding with feedback. Theorem 1 showed sufficiency conditions for finite MSE using linear codes for the general case. We conjecture that these conditions are necessary and prove in Theorem 2 that our conjecture holds if either the source or channel is scalar. Finally, we propose a notion of source-channel matching in the stability sense in Definition 5 and demonstrate that linear encoders are generally not optimal for MIMO channels in Theorem 4, motivating further research into non-linear codes. Linear codes remain viable in practice due to their low complexity. An exciting direction for future work is to explore the entire distortion-capacity tradeoff for linear coding of Gauss-Markov sources over AWGN Channels. While converse bounds have been analyzed in [12], achievability bounds are poorly understood for such problems.

Acknowledgment

This work was supported in part by the NSF under grants CCF-1751356 and CCF-1956386, and the Israel Science Foundation (ISF) under grant 1096/23.

References

[1] B. Han, O. Sabag, V. Kostina, and B. Hassibi, “Coded Kalman filtering over Gaussian channels with feedback,” in 2023 59th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1–8, Sep. 2023.
[2] C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, pp. 379–423, Oct. 1948.
[3] A. Sahai and S. Mitter, “The necessity and sufficiency of anytime capacity for stabilization of a linear system over a noisy communication link—part i: Scalar systems,” IEEE Transactions on Information Theory, vol. 52, pp. 3369–3395, July 2006.
[4] L. Schulman, “Coding for interactive communication,” IEEE Transactions on Information Theory, vol. 42, no. 6, pp. 1745–1756, 1996.
[5] R. T. Sukhavasi and B. Hassibi, “Linear time-invariant anytime codes for control over noisy channels,” IEEE Transactions on Automatic Control, vol. 61, pp. 3826–3841, Feb. 2016.
[6] C. E. Shannon, “The zero error capacity of a noisy channel,” IRE Transactions on Information Theory, vol. 2, pp. 8–19, Sep. 1956.
[7] Y. Polyanskiy, H. V. Poor, and S. Verdu, “Feedback in the non-asymptotic regime,” IEEE Transactions on Information Theory, vol. 57, no. 8, pp. 4903–4925, Aug. 2011.
[8] M. V. Burnashev, “Data transmission over a discrete channel with feedback,” Problemy Peredaci Informacii, vol. 12, no. 4, pp. 10–30, Oct. 1976.
[9] N. Guo and V. Kostina, “Reliability function for streaming over a DMC with feedback,” IEEE Transactions on Information Theory, vol. 69, pp. 2165–2192, Nov. 2023.
[10] O. Sabag, P. Tian, V. Kostina, and B. Hassibi, “Reducing the LQG cost with minimal communication,” IEEE Transactions on Automatic Control, vol. 68, no. 9, pp. 5258–5270, 2023.
[11] A. Gorbunov and M. S. Pinsker, “Prognostic epsilon entropy of a Gaussian message and a Gaussian source,” Problemy Peredachi Informatsii, vol. 10, pp. 5–25, Aug. 1974.
[12] V. Kostina and B. Hassibi, “Rate-cost tradeoffs in control,” IEEE Transactions on Automatic Control, vol. 64, pp. 4525–4540, Apr. 2019.
[13] T. Tanaka, K. K. Kim, P. A. Parrilo, and S. K. Mitter, “Semidefinite programming approach to gaussian sequential rate-distortion trade-offs,” IEEE Transactions on Automatic Control, vol. 62, no. 4, pp. 1896–1910, 2017.
[14] M. Gastpar, B. Rimoldi, and M. Vetterli, “To code, or not to code: lossy source-channel communication revisited,” IEEE Transactions on Information Theory, vol. 49, no. 5, pp. 1147–1158, 2003.
[15] S. Yüksel, “Characterization of information channels for asymptotic mean stationarity and stochastic stability of nonstationary/unstable linear systems,” IEEE Transactions on Information Theory, vol. 58, no. 10, pp. 6332–6354, Oct. 2012.
[16] S. Tatikonda, A. Sahai, and S. Mitter, “Stochastic linear control over a communication channel,” IEEE Transactions on Automatic Control, vol. 49, pp. 1549–1561, Sep. 2004.
[17] A. Khina, E. R. Gårding, G. M. Pettersson, V. Kostina, and B. Hassibi, “Control over Gaussian channels with and without source–channel separation,” IEEE Transactions on Automatic Control, vol. 64, pp. 3690–3705, Apr. 2019.
[18] N. Elia, “When Bode meets Shannon: control-oriented feedback communication schemes,” IEEE Transactions on Automatic Control, vol. 49, pp. 1477–1488, Sep. 2004.
[19] A. A. Zaidi, S. Yüksel, T. J. Oechtering, and M. Skoglund, “On the tightness of linear policies for stabilization of linear systems over Gaussian networks,” Systems and Control Letters, vol. 88, pp. 32–38, Sep. 2016.
[20] A. E. Gamal and Y. Kim, Network Information Theory. Cambridge University Press, 2011.
[21] T. Kailath, A. Sayed, and B. Hassibi, Linear Estimation. Prentice-Hall information and system sciences series, Prentice Hall, 2000.
[22] K. Åström, Introduction to stochastic control theory, vol. 70 of Mathematics in science and engineering. United States: Academic Press, 1970.

Appendix A Proof of Theorem 1 and Conjecture 3

We start with the DARE (2) and power constraint (16) and show that a sequence of statements are equivalent.

First, we apply the transformation

\tilde{\Gamma}P^{-1}=\Gamma

(32)

to obtain a standard DARE.

Statement 1. A finite MSE is achievable if and only if there exists a $P\succeq 0,\Gamma,\Omega\succeq 0$ satisfying (2) and (16).

	$\displaystyle\begin{split}&P=APA^{T}+Q-AP\Gamma^{T}H^{T}\\ \qquad&\ \ \ \left(I+H(\Gamma P\Gamma^{T}+\Omega)H^{T}\right)^{-1}H\Gamma PA^{% T}\\ \end{split}$			(33)
		$\displaystyle\mathrm{Tr}(\Gamma P\Gamma^{T}+\Omega)\leq p.$		(34)

We will define $\Pi\triangleq\Gamma P\Gamma^{T}+\Omega$ as a redundant intermediate variable, resulting in Statement 2.

Statement 2. A finite MSE is achievable if and only if there exists a $P\succeq 0,\Gamma,\Omega\succeq 0,\Pi\succeq 0$ satisfying

$\displaystyle\begin{split}&P=APA^{T}+Q\\ \qquad&\ \ \ -AP\Gamma^{T}H^{T}\left(I+H\Pi H^{T}\right)^{-1}H\Gamma PA^{T}\\ \end{split}$		(35)
	$\displaystyle\Pi=\Gamma P\Gamma^{T}+\Omega$	(36)
	$\displaystyle\mathrm{Tr}(\Pi)\leq p.$	(37)

We equivalently write (36) as $\Pi\succeq\Gamma P\Gamma^{T}$ . Note also that any $\Omega\neq 0$ hardens the condition of Statement 2 since $P^{+}\succeq P$ where $P^{+}$ is the unique positive solution of {dmath} P^+ = AP^+A^T + Q - A P^+ Γ^T H^T
(R+H (ΓP^+ Γ^T + Ω) H^T )^-1HΓP^+ A^T and $P$ is the unique positive solution of {dmath} P = APA^T + Q
- A P Γ^T H^T (I+H (ΓP Γ^T) H^T )^-1HΓP A^T Here, we use $P^{+}$ to denote the solution to (2) when $\Omega\neq 0$ and $P$ to denote the solution when $\Omega=0$ . However, moving forward we will use $P$ to denote the solution of (A).

Statement 3. A finite MSE is achievable if and only if there exists a $P\succeq 0,\Gamma,\Pi\succeq 0$ satisfying

	$\displaystyle P=APA^{T}+Q-AP\Gamma^{T}H^{T}\left(I+H\Pi H^{T}\right)^{-1}H% \Gamma PA^{T}$		(38)
	$\displaystyle\Pi\succeq\Gamma P\Gamma^{T}$		(39)
	$\displaystyle\mathrm{Tr}(\Pi)\leq p.$		(40)

We reapply the transformation (32).

Statement 4. A finite MSE is achievable if and only if there exists a $P\succeq 0,\tilde{\Gamma},\Pi\succeq 0$ satisfying

	$\displaystyle P=APA^{T}+Q-A\tilde{\Gamma}^{T}H^{T}\left(I+H\Pi H^{T}\right)^{-% 1}H\tilde{\Gamma}A^{T}$		(41)
	$\displaystyle\Pi\succeq\tilde{\Gamma}P^{-1}\tilde{\Gamma}^{T}$		(42)
	$\displaystyle\mathrm{Tr}(\Pi)\leq p.$		(43)

Finally, we add two redundant constraints. First,

J=P-\tilde{\Gamma}^{T}\Pi^{-1}\tilde{\Gamma}\succeq 0

which holds iff (42) holds by the Schur complement lemma. Then, by substituting $J$ into (41), we obtain a Lyapunov for $J$ , {dmath} AJA^T - J + A Γ^T Π^-1 ΓA^T + Q - A Γ^T H^T (I + H ΠH^T)^-1 H ΓA^T - Γ^T Π^-1 Γ= 0. With these redundant conditions, we obtain the statement:

Statement 5. A finite MSE is achievable if and only if there exists a $J\succeq 0,P\succeq 0,\tilde{\Gamma},\Pi\succeq 0$ satisfying

$\displaystyle\begin{split}&AJA^{T}-J+A\Gamma^{T}\Pi^{-1}\Gamma A^{T}+Q\\ &\qquad\ \ -A\Gamma^{T}H^{T}(I+H\Pi H^{T})^{-1}H\Gamma A^{T}\\ &\qquad\ \ -\Gamma^{T}\Pi^{-1}\Gamma=0\\ \end{split}$		(44)
	$\displaystyle J=P-\tilde{\Gamma}^{T}\Pi^{-1}\tilde{\Gamma}$	(45)
	$\displaystyle P=APA^{T}+Q-A\tilde{\Gamma}^{T}H^{T}\left(I+H\Pi H^{T}\right)^{-% 1}H\tilde{\Gamma}A^{T}$	(46)
	$\displaystyle\Pi\succeq\tilde{\Gamma}P^{-1}\tilde{\Gamma}^{T}$	(47)
	$\displaystyle\mathrm{Tr}(\Pi)\leq p.$	(48)

As in Statement 3, (47) should be satisfied with equality. We will focus our analysis on Statement 5.

Left and right multiplying (44) by $A^{-1}$ and $A^{-T}$ to obtain a stable Lyapunov equation, we have {dmath} A^-1 J A^-T - J - ~Γ^T Π^-1 ~Γ- A^-1QA^-T + ~Γ^T H^T (I+H ΠH^T)^-1H ~Γ+ A^-1 ~Γ^T Π^-1 ~ΓA^-T = 0 By linearity, we can separate $J=\hat{J}+\tilde{J}$ where

	$\displaystyle\hat{J}$	$\displaystyle=A^{-1}JA^{-T}-A^{-1}QA^{-T}$		(49)
	$\displaystyle\begin{split}\tilde{J}&=A^{-1}JA^{-T}-\tilde{\Gamma}^{T}\Pi^{-1}% \tilde{\Gamma}\\ &\qquad+\tilde{\Gamma}^{T}H^{T}(I+H\Pi H^{T})^{-1}H\tilde{\Gamma}\\ &\qquad+A^{-1}\tilde{\Gamma}^{T}\Pi^{-1}\tilde{\Gamma}A^{-T}\end{split}$			(50)

The purpose of this step is to separate the terms involving $\tilde{\Gamma}$ so that $\tilde{\Gamma}$ only affects $\tilde{J}$ . If there exists a $\tilde{\Gamma}$ that makes $\tilde{J}$ strictly positive, we can arbitrarily scale $\tilde{\Gamma}$ so that $\tilde{J}$ is arbitrarily positive. By making $\tilde{J}$ arbitrarily positive, $J\succeq 0$ , so we will limit our investigation to the positivity of $\tilde{J}$ .

Let $\overline{\Gamma}_{i}$ denote the $i$ th row of $\overline{\Gamma}$ . We now have the following chain of equalities

	$\displaystyle\begin{split}\tilde{J}&=A^{-1}\tilde{J}A^{-T}-\tilde{\Gamma}^{T}% \Pi^{-1}\tilde{\Gamma}\\ \qquad&\ \ \ +\tilde{\Gamma}^{T}H^{T}(I+H\Pi H^{T})^{-1}H\tilde{\Gamma}\\ \qquad&\ \ \ +A^{-1}\tilde{\Gamma}^{T}\Pi^{-1}\tilde{\Gamma}A^{-T}\\ \end{split}$			(51)
	$\displaystyle\begin{split}&=A^{-1}\tilde{J}A^{-T}-\tilde{\Gamma}^{T}(\Pi+\Pi H% ^{T}H\Pi)^{-1}\tilde{\Gamma}\\ \qquad&\ \ \ +A^{-1}\tilde{\Gamma}^{T}\Pi^{-1}\tilde{\Gamma}A^{-T}\\ \end{split}$			(52)
	$\displaystyle\begin{split}&=A^{-1}\tilde{J}A^{-T}\\ \qquad&\ \ \ -\tilde{\Gamma}^{T}\Pi^{-\frac{1}{2}}(I+\Pi^{1/2}H^{T}H\Pi^{1/2})% ^{-1}\Pi^{-\frac{1}{2}}\tilde{\Gamma}\\ \qquad&\ \ \ +A^{-1}\tilde{\Gamma}^{T}\Pi^{-1}\tilde{\Gamma}A^{-T}\\ \end{split}$			(53)
	$\displaystyle\begin{split}&=A^{-1}\tilde{J}A^{-T}\\ \qquad&\ \ \ -\tilde{\Gamma}^{T}\Pi^{-\frac{1}{2}}U^{*}(I+\Lambda)^{-1}U\Pi^{-% \frac{1}{2}}\tilde{\Gamma}\\ \qquad&\ \ \ +A^{-1}\tilde{\Gamma}^{T}\Pi^{-1}\tilde{\Gamma}A^{-T}\\ \end{split}$			(54)
	$\displaystyle\begin{split}&=A^{-1}\tilde{J}A^{-T}-\overline{\Gamma}^{T}(I+% \Lambda)^{-1}\overline{\Gamma}\\ \qquad&\ \ \ +A^{-1}\overline{\Gamma}^{T}\overline{\Gamma}A^{-T}\\ \end{split}$			(55)
	$\displaystyle\begin{split}&=A^{-1}\tilde{J}A^{-T}+\\ \qquad&\ \ \ \sum_{i}\left(-\frac{1}{1+\Lambda_{i}}\overline{\Gamma_{i}}^{T}% \overline{\Gamma_{i}}+A^{-1}\overline{\Gamma_{i}}^{T}\overline{\Gamma_{i}}A^{-% T}\right),\end{split}$			(56)

where (52) follows from the Matrix Inversion lemma, (54) follows from the singular value decomposition of

\Pi^{1/2}H^{T}H\Pi^{1/2}=U\Lambda U^{*}

(57)

where $U$ is unitary and $\Lambda$ is diagonal, and (55) follows from the notation

\overline{\Gamma}\triangleq U\Pi^{-\frac{1}{2}}\tilde{\Gamma}.

(58)

Let $J_{i}$ be the unique solution to the Lyapunov equation

\tilde{J}_{i}=A^{-1}\tilde{J}_{i}A^{-T}-\frac{\bar{\Gamma}_{i}^{T}\bar{\Gamma}% _{i}}{1+\Lambda_{i}}+A^{-1}\bar{\Gamma}_{i}^{T}\bar{\Gamma}_{i}A^{-T}

(59)

then it follows that the unique solution to $J$ is given by

\tilde{J}=\sum J_{i}.

(60)

From (57), $\Lambda_{i}=\sigma_{i}\left(\Pi^{\frac{1}{2}}H^{T}H\Pi^{\frac{1}{2}}\right)$ , where $\sigma_{i}(\cdot)$ indicates the $i$ th singular value in order.

To connect the definition of $\bar{\Gamma}$ to the condition of Theorem 1, we have the following definition:

\displaystyle\begin{split}\textit{Let }\mathcal{S}_{i}\subset\{1,\ldots,k\}% \textit{ be the set of indices where }\\ \qquad\bar{\Gamma}_{i}\textit{ is non-zero.}\end{split}

(61)

A-A Proof of sufficiency

Leveraging the result for vector sources and scalar channels, see Appendix B, $\tilde{J}_{i}\succ 0$ if and only if

\sum_{j\in\mathcal{S}_{i}}\log|a_{j}|<\frac{1}{2}\log(1+h_{i}^{2}\pi_{i}).

(62)

The sufficiency of the theorem follows by letting $\Pi$ be diagonal, in which case $\Lambda_{i}=\pi_{i}h_{i}^{2}$ . Let $\tilde{J}_{i}^{\mathcal{S}}$ be the submatrix of $\tilde{J}_{i}$ formed by selecting the column and row indices that are members of $\mathcal{S}_{i}$ , the support of the $i$ th row of $\bar{\Gamma}$ . Note that $\tilde{J}_{i}^{\mathcal{S}}$ will be strictly positive by the MISO theorem and zero elsewhere. Since the union of supports of the rows of $\bar{\Gamma}$ include every possible index, $\tilde{J}=\sum_{i}\tilde{J}_{i}\succ 0$ and can be made arbitrarily positive by scaling $\bar{\Gamma}$ . Such a $\Pi$ , $\tilde{\Gamma}=\Pi^{1/2}\bar{\Gamma},J=\tilde{J}+\hat{J},P=J+\tilde{\Gamma}^{T% }\Pi^{-1}\tilde{\Gamma}$ is sufficient for Statement 5.

A-B Proof of necessity

First, we show that a diagonal $\Pi$ is necessary.

Lemma 3

If $\tilde{J}$ can be made arbitrarily positive by some $\Gamma,\Pi$ , it can also be made arbitrarily positive by $\tilde{\Gamma},\tilde{\Pi}$ where $\tilde{\Pi}$ is diagonal with $\mathrm{Tr}(\tilde{\Pi})\leq\mathrm{Tr}(\Pi)$ .

Proof:

Fix a $\Lambda$ in (59). We will show that given an arbitrary $\Gamma,\Pi$ satisfying the power constraint, the same $\Lambda$ is achievable by a diagonal $\tilde{\Pi}$ and associated $\tilde{\Gamma}$ with less power.

Let

D\triangleq HH^{T},

(63)

where by assumption, $h_{1}\geq h_{2}\geq\ldots\geq h_{n}$ . From (57),

\Lambda=U^{*}\Pi^{1/2}H^{T}H\Pi^{1/2}U.

(64)

This implies

\Pi=U^{*}\Lambda^{\frac{1}{2}}V^{*}D^{-1}V\Lambda^{\frac{1}{2}}U

(65)

where $V$ is unitary. For a diagonal $\tilde{\Pi}$ ,

\Lambda=\tilde{\Pi}^{1/2}H^{T}H\tilde{\Pi}^{1/2}=\tilde{\Pi}HH^{T}

(66)

Then by isolating $\tilde{\Pi}$ and applying the trace to both sides,

\mathrm{Tr}(\tilde{\Pi})=\mathrm{Tr}(\Lambda D^{-1}),

(67)

and

\mathrm{Tr}(\Pi)=\mathrm{Tr}(U^{*}\Lambda^{\frac{1}{2}}V^{*}D^{-1}V\Lambda^{% \frac{1}{2}}U)=\mathrm{Tr}(\Lambda V^{*}D^{-1}V).

(68)

Note that the entries of $\Lambda$ are in descending order and the entries of $D^{-1}$ are listed in ascending order by assumption that $h_{1}\geq h_{2}\geq\ldots\geq h_{n}$ .

We apply Ruhe’s Trace Inequality, which states that if $A,B$ are PSD matrices, with eigenvalues $a_{1}\geq a_{2}\geq\ldots a_{n}$ and $b_{1}\geq b_{2}\geq\ldots b_{n}$ ,

\sum_{i=1}^{n}a_{i}b_{n-i+1}\leq\mathrm{Tr}(AB).

(69)

Here, $A=\Lambda$ , $B=V^{*}D^{-1}V$ , and $\mathrm{Tr}(\tilde{\Pi})$ achieves the lower bound with equality since $\Lambda$ and $D^{-1}$ are both diagonal with opposing ordering as in (69). $\mathrm{Tr}(\Pi)$ has a sandwiched unitary $V$ in (68). Thus,

\mathrm{Tr}(\tilde{\Pi})\leq\mathrm{Tr}(\Pi)

as desired. ∎

The above lemma shows that we can take $\Pi$ to be diagonal without loss of generality since that is what minimizes the power. Then,

\Lambda_{i}=\pi_{i}H_{i}^{2}.

and

\tilde{J}_{i}=A^{-1}\tilde{J}_{i}A^{-T}-\frac{\bar{\Gamma}_{i}^{T}\bar{\Gamma}% _{i}}{1+h_{i}^{2}\pi_{i}}+A^{-1}\bar{\Gamma}_{i}^{T}\bar{\Gamma}_{i}A^{-T}

(70)

To show the partitioning property of Theorem 1, we need to show that the sets $\{\mathcal{S}_{i}\}_{i=1}^{n}$ cover $\{1,\ldots,k\}$ and are disjoint.

We show the necessity of $\bigcup_{i=1}^{k}\mathcal{S}_{i}=\{1,\ldots,k\}$ . For $\tilde{J}$ to be arbitrarily positive, $\Gamma$ must excite all directions of $\tilde{J}$ . To see this, suppose an index $j$ exists such that $\bar{\Gamma}_{ij}=0$ for all $i$ in (58). Then, $e_{j}^{T}\tilde{J}e_{j}=0$ , where $e_{j}$ is the $j$ th standard basis vector, and $\tilde{J}$ cannot be positive definite.

What remains to be shown is the necessity of the disjointness of the sets $\{\mathcal{S}_{i}\}_{i=1}^{n}$ . By showing Conjecture 1, $\tilde{J}$ (50) can be made arbitrarily positive by scaling $\tilde{\Gamma}$ of the form described in Conjecture 1. The sets $\{\mathcal{S}_{i}\}_{i=1}^{n}$ are then disjoint.

We can consider the equivalent (redundant) constraints on $P$ in Statement 5. In our construction of $\Pi$ in (47), we demonstrated that slack in $\Pi\succeq\tilde{\Gamma}P^{-1}\tilde{\Gamma}$ only hardens the conditions, so we can always take $\Pi=\tilde{\Gamma}P^{-1}\tilde{\Gamma}$ . In Lemma 3, we showed that the optimal $\Pi$ is diagonal. Consequently, $\tilde{\Gamma}^{T}P^{-1}\tilde{\Gamma}$ can also be considered to be diagonal. We now ask what this diagonality constraint imposes on the structure of $\tilde{\Gamma}$ .

Recall our original DARE:

P=APA+Q-AP\Gamma^{T}H(I+H\Gamma P\Gamma^{T}H)^{-1}H\Gamma PA^{T}

where $\Gamma=\tilde{\Gamma}P^{-1}$ .

Under the condition $\Gamma P\Gamma^{T}$ is diagonal, the measurement covariance $I+H\Gamma P\Gamma^{T}H$ is diagonal as well since $H$ is diagonal. We conjecture that the diagonality of $\Gamma P\Gamma^{T}$ imposes a structure on $\tilde{\Gamma}$ so that $\mathcal{S}_{i}$ is consistent with the condition of Theorem 1. This means that $\tilde{\Gamma}$ has only a single non-zero entry in each column. Equivalently, each mode is assigned to a single channel. If this matrix-algebraic fact on the structure of $\tilde{\Gamma}$ holds, Conjecture 3 holds as well.

This concludes the proof.

Appendix B Proof of Lemma 3 - Vector Source over Scalar Channel

This proof was initially presented in [1], but we give it here for completeness.

The channel is scalar, so $\tilde{J}=\tilde{J}_{1}$ in 70. Further, we can always set $\pi_{1}=p$ since the optimal encoder uses all the available power. From (70), we have

\tilde{J}=A^{-1}\tilde{J}A^{-T}-\frac{\bar{\Gamma}^{T}\bar{\Gamma}_{i}}{1+h^{2% }p}+A^{-1}\bar{\Gamma}^{T}\bar{\Gamma}A^{-T}

(71)

Defining $D_{\Gamma}\triangleq\mbox{diag}(\bar{\Gamma})$ , i.e., the diagonal matrix whose components are the elements of the vector $\bar{\Gamma}^{T}$ , we may now write

A^{-1}\bar{\Gamma}^{T}=D_{\Gamma}a~{}~{}~{}\mbox{and}~{}~{}~{}\bar{\Gamma}^{T}% =D_{\Gamma}1,

where $a$ is the vector of the diagonal elements of $A^{-1}$ and $1$ is the all-one vector. (71) becomes {dmath} ~J = A^-1~J A^-T + D_Γ(aa^T-11T1 + h2p )D_Γ. Let $M$ be the solution to the Lyapunov equation

M=A^{-1}MA^{-T}+11^{T}.

(72)

Let $(A_{u},1)$ be controllable, which holds by Assumption 2. Then by the Lyapunov stability theorem, $M\succ 0$ . We now claim that

{\tilde{J}}=D_{\Gamma}\left(\frac{h^{2}p}{1+h^{2}p}M-11^{T}\right)D_{\Gamma}.

(73)

This can be verified by plugging (73) into (B). It follows that ${\tilde{J}}\succ 0$ if and only if $\frac{h^{2}p}{1+h^{2}p}M-11^{T}>0$ . But the latter is equivalent to

\left[\begin{array}[]{cc}M&1\\ 1^{T}&\frac{h^{2}p}{1+h^{2}p}M-11^{T}\end{array}\right]\succ 0,

(74)

\frac{h^{2}p}{1+h^{2}p}M-11^{T}>1^{T}M^{-1}1.

(75)

Assume $M$ satisfies the Lyapunov equation (72). Then

1^{T}M^{-1}1=1-\left|\mbox{det}(A)\right|^{-2}.

(76)

This follows since $A^{-1}MA^{-T}=M-11^{T}$ from (72). On the one hand

\mbox{det}A^{-1}MA^{-T}=\left|\mbox{det}A\right|^{-2}\mbox{det}{M}

(77)

and on the other {dmath} det(M-11^T) = det(I-11^TM^-1) det M = (1-1^TM^-11) detM, which yields the desired result.

Then, (75) and (76) imply that $\tilde{J}\succ 0$ if and only if

\frac{h^{2}p}{1+h^{2}p}>1-\left|\mbox{det}A\right|^{-2},

(78)

or equivalently

1+h^{2}p>\left|\mbox{det}A_{u}\right|^{2},

(79)

which is the capacity condition we are seeking.

Note that when this capacity condition holds, $\frac{h^{2}p}{1+h^{2}p}M-11^{T}\succ 0$ and therefore ${\tilde{J}}\succ 0$ in (73). We can arbitrarily scale $\bar{\Gamma}$ and therefore $D_{\Gamma}$ to make $\tilde{J}$ arbitrarily positive. This demonstrates both sufficiency and necessity.

Appendix C Proof of Lemma 3 - Scalar Source over MIMO Channel

We also consider a scalar source and arbitrary rank channel in Lemma 3. We can solve (46) explicitly as

P=\frac{q-a^{2}\tilde{\Gamma}^{T}H^{T}(I+H\Pi H^{T})^{-1}H\tilde{\Gamma}}{1-a^% {2}}.

(80)

Plugging this into (45) we obtain {dmath} J = -qa2-1 + ~Γ^T ( a2a2-1 H^T (I + H ΠH^T)^-1 H - Π^-1 ) ~Γ⪰0 Finite estimation error can be achieved iff there exists a $\tilde{\Gamma}$ and $\Pi\geq 0$ , $\mathrm{Tr}(\Pi)\leq p$ such that $J\geq 0$ . The above statement is equivalent to the statement: finite error is not achievable iff for all $\tilde{\Gamma},\Pi$ such that $\Pi\geq 0$ , $\mathrm{Tr}(\Pi)\leq p$ , we have $J<0$ . This is what we set out to show.

Let

O\triangleq\frac{a^{2}}{a^{2}-1}H^{T}(I+H\Pi H^{T})^{-1}H-\Pi^{-1}.

(81)

Note that $J<0$ for all $\tilde{\Gamma}$ if and only if $O\preceq 0$ .

\displaystyle-O=\Pi^{-1}-\frac{a^{2}}{a^{2}-1}H^{T}(I+H\Pi H^{T})^{-1}H

\displaystyle\geq 0

(82)

The inequality is equivalent to

\begin{pmatrix}\Pi^{-1}&\sqrt{\frac{a^{2}}{a^{2}-1}}H^{T}\\ \sqrt{\frac{a^{2}}{a^{2}-1}}H&I+H\Pi H^{T}\end{pmatrix}\geq 0

(83)

which is also equivalent to

	$\displaystyle I+H\Pi H^{T}-\frac{a^{2}}{a^{2}-1}H\Pi H^{T}$	$\displaystyle\geq 0$		(84)
	$\displaystyle I-\frac{1}{a^{2}-1}H\Pi H^{T}$	$\displaystyle\geq 0$		(85)

Recall that $H$ is diagonal. Let $h_{max}$ be the entry of $H$ with the greatest magnitude. Note that $\Pi$ has the best chance of overcoming (85) by placing all power on the best channel. Thus, if this alignment of $\Pi$ does not violate (85), the inequality holds for any $\Pi$ . Then, the final inequality holds for any $\Pi\geq 0,\mathrm{Tr}(\Pi)\leq p$ if and only if

a^{2}\geq 1+h_{max}^{2}p

(86)

\log a\geq\frac{1}{2}\log(1+h_{max}^{2}p)

(87)

This is a necessary and sufficient condition for $D<\infty$ to be unachievable. Taking the inverse of the above, we obtain the desired condition.