Automatic Regularization for Linear MMSE Filters

Daniel Gomes de Pinho Zanco [email protected] Leszek Szczecinski [email protected] Jacob Benesty [email protected] INRS–Institut National de la Recherche Scientific, Montreal, QC, H5A-1K6, Canada.
Abstract

In this work, we consider the problem of regularization in the design of minimum mean square error (MMSE)linear filters. Using the relationship with statistical machine learning methods, using a Bayesian approach, the regularization parameter is found from the observed signals in a simple and automatic manner. The proposed approach is illustrated in system identification and beamforming examples, where the automatic regularization is shown to yield near-optimal results.

keywords:
MMSE filter, regularization, Bayesian approach, system identification, beamforming.

1 Introduction

Minimum mean square error (MMSE) linear filters are ubiquitous in many signal processing applications such as channel equalization [1, Ch. 5.4], system identification [2], antenna beamforming [1, Ch. 6.5], and many others.

The two main classes of MMSE filters are (i) the error minimization, where the linear filter is designed to approximate the desired signal with the smallest average squared error, and (ii) interference suppression, where the objective is to minimize the interference energy while maintaining the energy of the desired signal.

The equations solved to obtain the MMSE filters rely on the implicit or explicit inversion of the covariance matrix of the input signal. To avoid numerical problems and to guarantee the uniqueness of the solution, the equations must be regularized, as is most often done by adding a positive regularization parameter to the diagonal elements of the covariance matrix.

Determining the regularization parameter is frequently regarded as a challenge for practitioners and, depending on the signal-to-noise ratio (SNR) or the type of problem, it is often handcrafted for each specific problem. This attitude changes and, recently, the regularization received in-depth attention in the context of system identification [3].

On the other hand, this issue is rather well known in the contexts of machine learning and regression analysis, where methods such as cross-validation [4, 5] or expectation maximization (EM) [6, 18.1.3] are often used to find parameters which are not of direct interest, but affect the solutions (known as hyperparameters).

However, despite regularization being crucial to finding MMSE filters, the signal processing literature, in general does not use simple and general solutions from the area of machine learning. The main reason, we believe, is that they are not offered in closed form and, in general, may require searching over the entire space of solutions and solving the regularized equations multiple times. We show that, in practice, the solution can be found very efficiently via fixed-point iteration and does not entail any significant complexity increase if we exploit the eigenvalue decomposition of the covariance matrix.

This paper is organized as follows. We start with the general problem formulation in Sec. 2 and, in Sec. 3 we reformulate it using the probabilistic framework, which allows us to apply the maximum likelihood (ML) estimation to the parameters defining the model and obtain the optimal regularization parameter. Section 4 discusses automatic regularization in the interference-suppression problem. In Sec. 5, to illustrate the operation of the proposed method, we apply it to system identification (as an example of error-minimization) and to beamforming (as an example of interference suppression) to show how the automatically regularized MMSE filters compare to other methods proposed in the literature and to an “oracle” solution. The latter relies on ex-ante knowledge of the best regularization parameter, and is obtained by grid search over the space of the latter, by maximizing the performance criterion of interest, which is possible in the simulations where we know all signals involved.

The examples indicate that the regularization parameter, which we find, automatically adjusts to the changes in operational conditions (such as the SNR) and to the problem structure.

Our main conclusion is that, by adopting the machine learning approach, the automatic regularization is so simple that it deserves to be a go-to solution in the signal processing context.

2 MMSE problem formulation

We consider the linear filtering of the input signal 𝒙(t)M𝒙𝑡superscript𝑀\boldsymbol{x}(t)\in\mathbb{C}^{M}bold_italic_x ( italic_t ) ∈ blackboard_C start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT using the weights/filter 𝒘𝒘\boldsymbol{w}bold_italic_w aiming at the approximation of the desired signal d(t)𝑑𝑡d(t)\in\mathbb{C}italic_d ( italic_t ) ∈ blackboard_C. There are two categories of this problem with respect to how the filter 𝒘𝒘\boldsymbol{w}bold_italic_w is found, which are described below.

  • The error-minimization problem, where we know the desired signal d(t)𝑑𝑡d(t)italic_d ( italic_t ), the filtering error is given by

    e(t)𝑒𝑡\displaystyle e(t)italic_e ( italic_t ) =d(t)𝒘H𝒙(t),t=0,1,,N1,formulae-sequenceabsent𝑑𝑡superscript𝒘H𝒙𝑡𝑡01𝑁1\displaystyle=d(t)-\boldsymbol{w}^{\mathrm{H}}\boldsymbol{x}(t),\quad t=0,1,% \ldots,N-1,= italic_d ( italic_t ) - bold_italic_w start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_italic_x ( italic_t ) , italic_t = 0 , 1 , … , italic_N - 1 , (1)

    and the MMSE problem consists in solving

    𝒘^^𝒘\displaystyle\hat{\boldsymbol{w}}over^ start_ARG bold_italic_w end_ARG =argmin𝒘{𝔼[|d(t)𝒘H𝒙(t)|2]+α𝒘22}absentsubscriptargmin𝒘𝔼delimited-[]superscript𝑑𝑡superscript𝒘H𝒙𝑡2𝛼superscriptsubscriptnorm𝒘22\displaystyle=\mathop{\mathrm{argmin}}_{\boldsymbol{w}}\left\{\mathds{E}\left[% |d(t)-\boldsymbol{w}^{\mathrm{H}}\boldsymbol{x}(t)|^{2}\right]+\alpha\|% \boldsymbol{w}\|_{2}^{2}\right\}= roman_argmin start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT { blackboard_E [ | italic_d ( italic_t ) - bold_italic_w start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_italic_x ( italic_t ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + italic_α ∥ bold_italic_w ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } (2)
    =(R¯𝒙+αI)1𝒓¯𝒙d,absentsuperscriptsubscript¯R𝒙𝛼I1subscript¯𝒓𝒙𝑑\displaystyle=(\overline{{\textnormal{{R}}}}_{\boldsymbol{x}}+\alpha{% \textnormal{{I}}})^{-1}\overline{\boldsymbol{r}}_{\boldsymbol{x}d},= ( over¯ start_ARG R end_ARG start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over¯ start_ARG bold_italic_r end_ARG start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT , (3)

    where 𝔼[]𝔼delimited-[]\mathds{E}[\cdot]blackboard_E [ ⋅ ] denotes mathematical expectation taken with respect to all random variables, α0𝛼0\alpha\geq 0italic_α ≥ 0 is a regularization parameter, ||||2||\cdot||_{2}| | ⋅ | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is the Euclidean norm, R¯𝒙=𝔼[𝒙(t)𝒙H(t)]subscript¯R𝒙𝔼delimited-[]𝒙𝑡superscript𝒙H𝑡\overline{{\textnormal{{R}}}}_{\boldsymbol{x}}=\mathds{E}\left[\boldsymbol{x}(% t)\boldsymbol{x}^{\mathrm{H}}(t)\right]over¯ start_ARG R end_ARG start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT = blackboard_E [ bold_italic_x ( italic_t ) bold_italic_x start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT ( italic_t ) ], I is the identity matrix, and 𝒓¯𝒙d=𝔼[𝒙(t)d(t)]subscript¯𝒓𝒙𝑑𝔼delimited-[]𝒙𝑡superscript𝑑𝑡\overline{\boldsymbol{r}}_{\boldsymbol{x}d}=\mathds{E}\left[\boldsymbol{x}(t)d% ^{*}(t)\right]over¯ start_ARG bold_italic_r end_ARG start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT = blackboard_E [ bold_italic_x ( italic_t ) italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) ]; we use ()HsuperscriptH(\cdot)^{\mathrm{H}}( ⋅ ) start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT to denote conjugate-transpose operation, and ()superscript(\cdot)^{*}( ⋅ ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT denotes complex conjugation.

  • The interference suppression problem, where we assume that the signal 𝒙(t)𝒙𝑡\boldsymbol{x}(t)bold_italic_x ( italic_t ) has the form:

    𝒙(t)=d(t)𝒂+𝒛(t)M,𝒙𝑡𝑑𝑡𝒂𝒛𝑡superscript𝑀\displaystyle\boldsymbol{x}(t)=d(t)\boldsymbol{a}+\boldsymbol{z}(t)\in\mathbb{% C}^{M},bold_italic_x ( italic_t ) = italic_d ( italic_t ) bold_italic_a + bold_italic_z ( italic_t ) ∈ blackboard_C start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT , (4)

    with 𝒛(t)𝒛𝑡\boldsymbol{z}(t)bold_italic_z ( italic_t ) being the interference, and 𝒂𝒂\boldsymbol{a}bold_italic_a the response generated by the desired signal d(t)𝑑𝑡d(t)italic_d ( italic_t ), where 𝒂2=Msuperscriptnorm𝒂2𝑀\|\boldsymbol{a}\|^{2}=M∥ bold_italic_a ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_M. The goal is then to minimize the (energy of) interference in the filtered output 𝒘H𝒙(t)superscript𝒘H𝒙𝑡\boldsymbol{w}^{\mathrm{H}}\boldsymbol{x}(t)bold_italic_w start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_italic_x ( italic_t ), i.e.,

    𝒘^^𝒘\displaystyle\hat{\boldsymbol{w}}over^ start_ARG bold_italic_w end_ARG =argmin𝒘{𝔼[|𝒘H𝒙(t)|2]+α𝒘2}s.  t.𝒘H𝒂=1,formulae-sequenceabsentsubscriptargmin𝒘𝔼delimited-[]superscriptsuperscript𝒘H𝒙𝑡2𝛼superscriptnorm𝒘2s.  t.superscript𝒘H𝒂1\displaystyle=\mathop{\mathrm{argmin}}_{\boldsymbol{w}}\left\{\mathds{E}\left[% |\boldsymbol{w}^{\mathrm{H}}\boldsymbol{x}(t)|^{2}\right]+\alpha\|\boldsymbol{% w}\|^{2}\right\}\quad{\textnormal{s. \ t.}}\quad\boldsymbol{w}^{\mathrm{H}}% \boldsymbol{a}=1,= roman_argmin start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT { blackboard_E [ | bold_italic_w start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_italic_x ( italic_t ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + italic_α ∥ bold_italic_w ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } s. t. bold_italic_w start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_italic_a = 1 , (5)

    while maintaining the energy of the desired signal, as enforced by the constraint 𝒘H𝒂=1superscript𝒘H𝒂1\boldsymbol{w}^{\mathrm{H}}\boldsymbol{a}=1bold_italic_w start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_italic_a = 1. The problem (5) is known to be solved by [7, Sec. 2.8]

    𝒘^^𝒘\displaystyle\hat{\boldsymbol{w}}over^ start_ARG bold_italic_w end_ARG =(R¯𝒙+αI)1𝒂𝒂H(R¯𝒙+αI)1𝒂.absentsuperscriptsubscript¯R𝒙𝛼I1𝒂superscript𝒂Hsuperscriptsubscript¯R𝒙𝛼I1𝒂\displaystyle=\frac{\big{(}\overline{{\textnormal{{R}}}}_{\boldsymbol{x}}+% \alpha{\textnormal{{I}}}\big{)}^{-1}\boldsymbol{a}}{\boldsymbol{a}^{\mathrm{H}% }\big{(}\overline{{\textnormal{{R}}}}_{\boldsymbol{x}}+\alpha{\textnormal{{I}}% }\big{)}^{-1}\boldsymbol{a}}.= divide start_ARG ( over¯ start_ARG R end_ARG start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_a end_ARG start_ARG bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT ( over¯ start_ARG R end_ARG start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_a end_ARG . (6)

Numerous applications of these two problems have been presented in the literature. For example, the error minimization problem (3) is found in system identification, equalization [7, Ch. 2], interference cancellation [8, Ch. 8], and many others. The interference suppression problem (5) is popular in beamforming [9] and spectral estimation [10].

Note that (3) is a regularized version of the Wiener equation [7, Ch. 2.4] and (6) is the regularized version of the linearly-constrained minimum variance (LCMV) filter [7, Ch. 2.8]. However, in textbook formulations, the problems (2) or (5) are defined with α=0𝛼0\alpha=0italic_α = 0, i.e., without regularization. The latter is added in (3) and (6) by practitioners [7, Ch. 8.10], [2, Sec. 4], [11, Sec. 2.B] with the aim of improving conditioning of the matrix R¯𝒙+αIsubscript¯R𝒙𝛼I\overline{{\textnormal{{R}}}}_{\boldsymbol{x}}+\alpha{\textnormal{{I}}}over¯ start_ARG R end_ARG start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I, which must be inverted (at least implicitly111The explicit inversion of the matrix in (3) may be avoided by solving linear equations (R¯𝒙+αI)𝒘^=𝒓𝒙dsubscript¯R𝒙𝛼I^𝒘subscript𝒓𝒙𝑑(\overline{{\textnormal{{R}}}}_{\boldsymbol{x}}+\alpha{\textnormal{{I}}})\hat{% \boldsymbol{w}}=\boldsymbol{r}_{\boldsymbol{x}d}( over¯ start_ARG R end_ARG start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) over^ start_ARG bold_italic_w end_ARG = bold_italic_r start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT.).

The main reason why regularization is required comes from the fact that, in practice, we do not have access to R¯𝒙subscript¯R𝒙\overline{{\textnormal{{R}}}}_{\boldsymbol{x}}over¯ start_ARG R end_ARG start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT or 𝒓¯𝒙dsubscript¯𝒓𝒙𝑑\overline{\boldsymbol{r}}_{\boldsymbol{x}d}over¯ start_ARG bold_italic_r end_ARG start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT. Rather, they are estimated from the data using time-averaging,

R¯𝒙subscript¯R𝒙\displaystyle\overline{{\textnormal{{R}}}}_{\boldsymbol{x}}over¯ start_ARG R end_ARG start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT R𝒙=1Nt=0N1𝒙(t)𝒙H(t),absentsubscriptR𝒙1𝑁superscriptsubscript𝑡0𝑁1𝒙𝑡superscript𝒙H𝑡\displaystyle\approx{\textnormal{{R}}}_{\boldsymbol{x}}=\frac{1}{N}\sum_{t=0}^% {N-1}\boldsymbol{x}(t)\boldsymbol{x}^{\mathrm{H}}(t),≈ R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT bold_italic_x ( italic_t ) bold_italic_x start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT ( italic_t ) , (7)
𝒓¯𝒙dsubscript¯𝒓𝒙𝑑\displaystyle\overline{\boldsymbol{r}}_{\boldsymbol{x}d}over¯ start_ARG bold_italic_r end_ARG start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT 𝒓𝒙d=1Nt=0N1𝒙(t)d(t).absentsubscript𝒓𝒙𝑑1𝑁superscriptsubscript𝑡0𝑁1𝒙𝑡superscript𝑑𝑡\displaystyle\approx\boldsymbol{r}_{\boldsymbol{x}d}=\frac{1}{N}\sum_{t=0}^{N-% 1}\boldsymbol{x}(t)d^{*}(t).≈ bold_italic_r start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT bold_italic_x ( italic_t ) italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) . (8)

Then, the regularization term, αI𝛼I\alpha{\textnormal{{I}}}italic_α I, is a practical solution to deal with imperfect estimates (7)-(8), and/or with the numerical errors involved in solving (3). The parameter α𝛼\alphaitalic_α has to be “appropriately chosen” and will depend on all the elements of the model (1). In particular, since the importance of the estimation errors in (7)-(8) decreases with N𝑁Nitalic_N, we expect that the value of α𝛼\alphaitalic_α also decreases with N𝑁Nitalic_N.

2.1 Known regularization solutions in signal processing

Recognizing regularization to be an important practical element in the definition of linear filters, this problem was addressed in the literature, particularly in the context of the minimum variance distortionless response (MVDR) formulation; two, the most representative solutions, are shown below.

2.1.1 Ledoit-Wolf matrix shrinkage

The Ledoit-Wolf matrix shrinkage method [12] assumes the following relationship between the true and empirical covariance matrix

R¯𝒙subscript¯R𝒙\displaystyle\overline{{\textnormal{{R}}}}_{\boldsymbol{x}}over¯ start_ARG R end_ARG start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT βR𝒙+ηI,absent𝛽subscriptR𝒙𝜂I\displaystyle\approx\beta{\textnormal{{R}}}_{\boldsymbol{x}}+\eta{\textnormal{% {I}}},≈ italic_β R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_η I , (9)

and, by minimizing the squared Frobenius norm of the approximation error:

η^,β^=minβ,η𝔼[R¯𝒙βR𝒙ηIF2],^𝜂^𝛽subscript𝛽𝜂𝔼delimited-[]subscriptsuperscriptnormsubscript¯R𝒙𝛽subscriptR𝒙𝜂I2F\displaystyle\hat{\eta},\hat{\beta}=\min_{\beta,\eta}\mathds{E}[\|\overline{{% \textnormal{{R}}}}_{\boldsymbol{x}}-\beta{\textnormal{{R}}}_{\boldsymbol{x}}-% \eta{\textnormal{{I}}}\|^{2}_{\mathrm{F}}],over^ start_ARG italic_η end_ARG , over^ start_ARG italic_β end_ARG = roman_min start_POSTSUBSCRIPT italic_β , italic_η end_POSTSUBSCRIPT blackboard_E [ ∥ over¯ start_ARG R end_ARG start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT - italic_β R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT - italic_η I ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT ] , (10)

finds the shrinkage parameters as [11, Eqs. (32)-(33)]

η^^𝜂\displaystyle\hat{\eta}over^ start_ARG italic_η end_ARG =min[1,ρ^R𝒙ν^IF2]ν^,absent1^𝜌subscriptsuperscriptnormsubscriptR𝒙^𝜈I2F^𝜈\displaystyle=\min\left[1,\frac{\hat{\rho}}{\|{\textnormal{{R}}}_{\boldsymbol{% x}}-\hat{\nu}{\textnormal{{I}}}\|^{2}_{\mathrm{F}}}\right]\hat{\nu},= roman_min [ 1 , divide start_ARG over^ start_ARG italic_ρ end_ARG end_ARG start_ARG ∥ R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT - over^ start_ARG italic_ν end_ARG I ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT end_ARG ] over^ start_ARG italic_ν end_ARG , (11)
β^^𝛽\displaystyle\hat{\beta}over^ start_ARG italic_β end_ARG =1η^ν^,absent1^𝜂^𝜈\displaystyle=1-\frac{\hat{\eta}}{\hat{\nu}},= 1 - divide start_ARG over^ start_ARG italic_η end_ARG end_ARG start_ARG over^ start_ARG italic_ν end_ARG end_ARG , (12)

where

ρ^^𝜌\displaystyle\hat{\rho}over^ start_ARG italic_ρ end_ARG =1N2t=0N1𝒙(t)41NR𝒙F2,absent1superscript𝑁2superscriptsubscript𝑡0𝑁1superscriptnorm𝒙𝑡41𝑁subscriptsuperscriptnormsubscriptR𝒙2F\displaystyle=\frac{1}{N^{2}}\sum_{t=0}^{N-1}\|\boldsymbol{x}(t)\|^{4}-\frac{1% }{N}\|{\textnormal{{R}}}_{\boldsymbol{x}}\|^{2}_{\mathrm{F}},= divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT ∥ bold_italic_x ( italic_t ) ∥ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∥ R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT , (13)
ν^^𝜈\displaystyle\hat{\nu}over^ start_ARG italic_ν end_ARG =1MTr(R𝒙),absent1𝑀TrsubscriptR𝒙\displaystyle=\frac{1}{M}\textrm{Tr}({\textnormal{{R}}}_{\boldsymbol{x}}),= divide start_ARG 1 end_ARG start_ARG italic_M end_ARG Tr ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ) , (14)

with Tr()Tr\textrm{Tr}(\cdot)Tr ( ⋅ ) being the trace of a square matrix.

By factorizing β𝛽\betaitalic_β, the shrinkage parameters can then be converted back into a regularization parameter:

αLW=ηβ.subscript𝛼LW𝜂𝛽\displaystyle\alpha_{\textrm{LW}}=\frac{\eta}{\beta}.italic_α start_POSTSUBSCRIPT LW end_POSTSUBSCRIPT = divide start_ARG italic_η end_ARG start_ARG italic_β end_ARG . (15)

This method has been used to find regularization in the interference suppression problem (6), e.g., in [11]. On the other hand, we are not aware of its application to the error-minimization problem (3), most likely because the latter depends not only on the noisy covariance matrix R𝒙subscriptR𝒙{\textnormal{{R}}}_{\boldsymbol{x}}R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT but, also, explicitly requires a noisy cross-correlation vector 𝒓𝒙dsubscript𝒓𝒙𝑑\boldsymbol{r}_{\boldsymbol{x}d}bold_italic_r start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT.

In that regard, the interference suppression problem uses noisy R𝒙subscriptR𝒙{\textnormal{{R}}}_{\boldsymbol{x}}R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT and error-free 𝒂𝒂\boldsymbol{a}bold_italic_a, and, therefore, appears to be affected only by errors in the former. As we will see, such an interpretation is misleading, and the regularization depends also on 𝒂𝒂\boldsymbol{a}bold_italic_a.222Note that, in some works, e.g., [9], the problem is formulated assuming that 𝒂𝒂\boldsymbol{a}bold_italic_a is also corrupted by errors. We do not use such a model, as assuming that 𝒂𝒂\boldsymbol{a}bold_italic_a is perfectly known allows us to emphasize the fact that the regularization depends not only on the noise but also on the deterministic elements of the model.

2.1.2 Hoerl, Kennard, and Baldwin regularization

Some regularization strategies are derived by exploiting the fact that the Wiener equations (3) can be obtained from the regularized ordinary least squares (OLS) problem:

𝒘^(α)=argmin𝒘[1N𝒅XH𝒘2+α𝒘2],^𝒘𝛼subscriptargmin𝒘delimited-[]1𝑁superscriptnorm𝒅superscriptXH𝒘2𝛼superscriptnorm𝒘2\displaystyle\hat{\boldsymbol{w}}(\alpha)=\mathop{\mathrm{argmin}}_{% \boldsymbol{w}}\left[\frac{1}{N}\|\boldsymbol{d}-{\textnormal{{X}}}^{\mathrm{H% }}\boldsymbol{w}\|^{2}+\alpha\|\boldsymbol{w}\|^{2}\right],over^ start_ARG bold_italic_w end_ARG ( italic_α ) = roman_argmin start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT [ divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∥ bold_italic_d - X start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_italic_w ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α ∥ bold_italic_w ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] , (16)

where 𝒅=[d(0),,d(N1)]𝖳𝒅superscriptsuperscript𝑑0superscript𝑑𝑁1𝖳\boldsymbol{d}=[d^{*}(0),\ldots,d^{*}(N-1)]^{\mathsf{T}}bold_italic_d = [ italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 0 ) , … , italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_N - 1 ) ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT and X=[𝒙(0),,𝒙(N1)]𝖳Xsuperscript𝒙0𝒙𝑁1𝖳{\textnormal{{X}}}=[\boldsymbol{x}(0),\ldots,\boldsymbol{x}(N-1)]^{\mathsf{T}}X = [ bold_italic_x ( 0 ) , … , bold_italic_x ( italic_N - 1 ) ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT; ()𝖳superscript𝖳(\cdot)^{\mathsf{T}}( ⋅ ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT is a transposition operator.

The method proposed by Hoerl, Kennard, and Baldwin (HKB) in [13, Eq. (2.2)], finds the regularization in two steps. First, (16) is solved for α=0𝛼0\alpha=0italic_α = 0 and, next, the regularization parameter is calculated as

αHKBsubscript𝛼HKB\displaystyle\alpha_{\textrm{HKB}}italic_α start_POSTSUBSCRIPT HKB end_POSTSUBSCRIPT =σ~e2(0)Nσ~w2(0),absentsuperscriptsubscript~𝜎𝑒20𝑁superscriptsubscript~𝜎𝑤20\displaystyle=\frac{\tilde{\sigma}_{e}^{2}(0)}{N\tilde{\sigma}_{w}^{2}(0)},= divide start_ARG over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 0 ) end_ARG start_ARG italic_N over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 0 ) end_ARG , (17)

where

σ~e2(α)superscriptsubscript~𝜎𝑒2𝛼\displaystyle\tilde{\sigma}_{e}^{2}(\alpha)over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α ) =1N𝒅XH𝒘^(α)2,absent1𝑁superscriptnorm𝒅superscriptXH^𝒘𝛼2\displaystyle=\frac{1}{N}\|\boldsymbol{d}-{\textnormal{{X}}}^{\mathrm{H}}\hat{% \boldsymbol{w}}(\alpha)\|^{2},= divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∥ bold_italic_d - X start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT over^ start_ARG bold_italic_w end_ARG ( italic_α ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (18)
σ~w2(α)superscriptsubscript~𝜎𝑤2𝛼\displaystyle\tilde{\sigma}_{w}^{2}(\alpha)over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α ) =1γ𝒘^(α)2,absent1𝛾superscriptnorm^𝒘𝛼2\displaystyle=\frac{1}{\gamma}\|\hat{\boldsymbol{w}}(\alpha)\|^{2},= divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ∥ over^ start_ARG bold_italic_w end_ARG ( italic_α ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (19)

and γ(0,M]𝛾0𝑀\gamma\in(0,M]italic_γ ∈ ( 0 , italic_M ] is the number of degrees of freedom of the solution. In the error-minimization problem, we set γ=M𝛾𝑀\gamma=Mitalic_γ = italic_M, while in the interference suppression problem, due to a linear constraint on 𝒘𝒘\boldsymbol{w}bold_italic_w, we set γ=M1𝛾𝑀1\gamma=M-1italic_γ = italic_M - 1.

The HKB regularization was studied in the beamforming context [11], but we immediately see that it cannot be applied for N<M𝑁𝑀N<Mitalic_N < italic_M. This is because then the rank of X is smaller than M𝑀Mitalic_M, so there is an infinite number of 𝒘^(0)^𝒘0\hat{\boldsymbol{w}}(0)over^ start_ARG bold_italic_w end_ARG ( 0 ) that solve (16), each of which yields σ~e2(0)=0superscriptsubscript~𝜎𝑒200\tilde{\sigma}_{e}^{2}(0)=0over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 0 ) = 0. Thus, for N<M𝑁𝑀N<Mitalic_N < italic_M, (17) produces αHKB=0subscript𝛼HKB0\alpha_{\textrm{HKB}}=0italic_α start_POSTSUBSCRIPT HKB end_POSTSUBSCRIPT = 0.

3 Bayesian formulation and inference of regularization parameter

To obtain the Bayesian formulation of the problem, we rewrite (1) in vector form:

𝒅𝒅\displaystyle\boldsymbol{d}bold_italic_d =XH𝒘+𝒆,absentsuperscriptXH𝒘𝒆\displaystyle={\textnormal{{X}}}^{\mathrm{H}}\boldsymbol{w}+\boldsymbol{e},= X start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_italic_w + bold_italic_e , (20)

where 𝒅𝒅\boldsymbol{d}bold_italic_d, X are already defined in (16), and 𝒆=[e(0),,e(N1)]𝖳𝒆superscriptsuperscript𝑒0superscript𝑒𝑁1𝖳\boldsymbol{e}=[e^{*}(0),\ldots,e^{*}(N-1)]^{\mathsf{T}}bold_italic_e = [ italic_e start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 0 ) , … , italic_e start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_N - 1 ) ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT.

Assuming that e(t)𝑒𝑡e(t)italic_e ( italic_t ) are independent, identically distributed (i.i.d.) zero-mean Gaussian variables with variance vesubscript𝑣𝑒v_{e}italic_v start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, we have

f(𝒅|X,𝒘)𝑓conditional𝒅X𝒘\displaystyle f(\boldsymbol{d}|{\textnormal{{X}}},\boldsymbol{w})italic_f ( bold_italic_d | X , bold_italic_w ) =𝒩(𝒅;XH𝒘,veI),absent𝒩𝒅superscriptXH𝒘subscript𝑣𝑒I\displaystyle=\mathcal{N}(\boldsymbol{d};{\textnormal{{X}}}^{\mathrm{H}}% \boldsymbol{w},v_{e}{\textnormal{{I}}}),= caligraphic_N ( bold_italic_d ; X start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_italic_w , italic_v start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT I ) , (21)

where

𝒩(𝒚;𝒎,V)=1det(πV)exp[(𝒚𝒎)HV1(𝒚𝒎)]𝒩𝒚𝒎V1det𝜋Vsuperscript𝒚𝒎HsuperscriptV1𝒚𝒎\displaystyle\mathcal{N}(\boldsymbol{y};\boldsymbol{m},{\textnormal{{V}}})=% \frac{1}{\textrm{det}(\pi{\textnormal{{V}}})}\exp[-(\boldsymbol{y}-\boldsymbol% {m})^{\mathrm{H}}{\textnormal{{V}}}^{-1}(\boldsymbol{y}-\boldsymbol{m})]caligraphic_N ( bold_italic_y ; bold_italic_m , V ) = divide start_ARG 1 end_ARG start_ARG det ( italic_π V ) end_ARG roman_exp [ - ( bold_italic_y - bold_italic_m ) start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_y - bold_italic_m ) ] (22)

denotes the probability density function (PDF) of a circular, complex Gaussian with mean 𝒎𝒎\boldsymbol{m}bold_italic_m and covariance matrix V.

The Bayesian approach models the parameter 𝒘𝒘\boldsymbol{w}bold_italic_w as a random vector with posterior distribution given by

f(𝒘|𝒅,X)𝑓conditional𝒘𝒅X\displaystyle f(\boldsymbol{w}|\boldsymbol{d},{\textnormal{{X}}})italic_f ( bold_italic_w | bold_italic_d , X ) f(𝒅|X,𝒘)f(𝒘).proportional-toabsent𝑓conditional𝒅X𝒘𝑓𝒘\displaystyle\propto f(\boldsymbol{d}|{\textnormal{{X}}},\boldsymbol{w})f(% \boldsymbol{w}).∝ italic_f ( bold_italic_d | X , bold_italic_w ) italic_f ( bold_italic_w ) . (23)

Then, assuming the elements 𝒘𝒘\boldsymbol{w}bold_italic_w to be i.i.d. zero-mean, Gaussian random variables with variance vwsubscript𝑣𝑤v_{w}italic_v start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT, i.e.,

f(𝒘)𝑓𝒘\displaystyle f(\boldsymbol{w})italic_f ( bold_italic_w ) =𝒩(𝒘;𝟎,vwI),absent𝒩𝒘0subscript𝑣𝑤I\displaystyle=\mathcal{N}(\boldsymbol{w};\boldsymbol{0},v_{w}{\textnormal{{I}}% }),= caligraphic_N ( bold_italic_w ; bold_0 , italic_v start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT I ) , (24)

it is simple to see that, using (21) and (24), the posterior distribution (23) is given by

f(𝒘|𝒅,X)𝑓conditional𝒘𝒅X\displaystyle f(\boldsymbol{w}|\boldsymbol{d},{\textnormal{{X}}})italic_f ( bold_italic_w | bold_italic_d , X ) =𝒩(𝒘;𝒘^,R𝒘),absent𝒩𝒘^𝒘subscriptR𝒘\displaystyle=\mathcal{N}(\boldsymbol{w};\hat{\boldsymbol{w}},{\textnormal{{R}% }}_{\boldsymbol{w}}),= caligraphic_N ( bold_italic_w ; over^ start_ARG bold_italic_w end_ARG , R start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT ) , (25)

where

R𝒘subscriptR𝒘\displaystyle{\textnormal{{R}}}_{\boldsymbol{w}}R start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT =veN(R𝒙+αI)1,absentsubscript𝑣𝑒𝑁superscriptsubscriptR𝒙𝛼I1\displaystyle=\frac{v_{e}}{N}({\textnormal{{R}}}_{\boldsymbol{x}}+\alpha{% \textnormal{{I}}})^{-1},= divide start_ARG italic_v start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_ARG start_ARG italic_N end_ARG ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , (26)
𝒘^^𝒘\displaystyle\hat{\boldsymbol{w}}over^ start_ARG bold_italic_w end_ARG =(R𝒙+αI)1𝒓𝒙d,absentsuperscriptsubscriptR𝒙𝛼I1subscript𝒓𝒙𝑑\displaystyle=({\textnormal{{R}}}_{\boldsymbol{x}}+\alpha{\textnormal{{I}}})^{% -1}\boldsymbol{r}_{\boldsymbol{x}d},= ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_r start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT , (27)
α𝛼\displaystyle\alphaitalic_α =veNvw.absentsubscript𝑣𝑒𝑁subscript𝑣𝑤\displaystyle=\frac{v_{e}}{Nv_{w}}.= divide start_ARG italic_v start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_ARG start_ARG italic_N italic_v start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_ARG . (28)

Of course, (27) being the mean of the posterior, it is also the maximum a posteriori (MAP) estimate, i.e., 𝒘^=argmax𝒘f(𝒘|𝒅,X)^𝒘subscriptargmax𝒘𝑓conditional𝒘𝒅X\hat{\boldsymbol{w}}=\mathop{\mathrm{argmax}}_{\boldsymbol{w}}f(\boldsymbol{w}% |\boldsymbol{d},{\textnormal{{X}}})over^ start_ARG bold_italic_w end_ARG = roman_argmax start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT italic_f ( bold_italic_w | bold_italic_d , X ) and is the same as the solution of the Wiener equation (2) obtained from empirical moments given in (7)-(8).

This modeling approach is well-known in signal processing textbooks. For example, [3, Ch. 4] or [1, Part VII - Summary and Notes] note the equivalence between the MAP estimation of 𝒘𝒘\boldsymbol{w}bold_italic_w and the Wiener (least-squares) solution. On the other hand, the signal processing literature does not exploit this model to its full extent and does not find the parameters 𝒗=[vw,ve]𝒗subscript𝑣𝑤subscript𝑣𝑒\boldsymbol{v}=[v_{w},v_{e}]bold_italic_v = [ italic_v start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ] even if it would give us the immediate advantage of defining the regularization parameter α𝛼\alphaitalic_α via (28). An additional advantage is that, knowing 𝒗𝒗\boldsymbol{v}bold_italic_v, we can find the posterior variance R𝒘subscriptR𝒘{\textnormal{{R}}}_{\boldsymbol{w}}R start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT which allows us to assess the uncertainty of the estimation: remember, the diagonal elements of R𝒘subscriptR𝒘{\textnormal{{R}}}_{\boldsymbol{w}}R start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT are the posterior variances of the estimates 𝒘^^𝒘\hat{\boldsymbol{w}}over^ start_ARG bold_italic_w end_ARG.

3.1 Inference

We will infer the parameters 𝒗𝒗\boldsymbol{v}bold_italic_v using the ML approach:

αML,v^esubscript𝛼MLsubscript^𝑣𝑒\displaystyle\alpha_{{\textnormal{ML}}},\hat{v}_{e}italic_α start_POSTSUBSCRIPT ML end_POSTSUBSCRIPT , over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT =argmaxα,veL(α,ve),absentsubscriptargmax𝛼subscript𝑣𝑒𝐿𝛼subscript𝑣𝑒\displaystyle=\mathop{\mathrm{argmax}}_{\alpha,v_{e}}L(\alpha,v_{e}),= roman_argmax start_POSTSUBSCRIPT italic_α , italic_v start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L ( italic_α , italic_v start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) , (29)
L(α,ve)𝐿𝛼subscript𝑣𝑒\displaystyle L(\alpha,v_{e})italic_L ( italic_α , italic_v start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) =logf(𝒅|X,𝒗),absent𝑓conditional𝒅X𝒗\displaystyle=-\log f(\boldsymbol{d}|{\textnormal{{X}}},\boldsymbol{v}),= - roman_log italic_f ( bold_italic_d | X , bold_italic_v ) , (30)

where, instead of vwsubscript𝑣𝑤v_{w}italic_v start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT and vesubscript𝑣𝑒v_{e}italic_v start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, we parameterized the variables using α=ve/(Nvw)𝛼subscript𝑣𝑒𝑁subscript𝑣𝑤\alpha=v_{e}/(Nv_{w})italic_α = italic_v start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT / ( italic_N italic_v start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ), which does not affect the optimality of ML solution, and focuses directly on the regularization parameter α𝛼\alphaitalic_α we are interested in.333Of course, we can obtain the ML estimates v^esubscript^𝑣𝑒\hat{v}_{e}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT and v^wsubscript^𝑣𝑤\hat{v}_{w}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT, too.

We marginalize over 𝒘𝒘\boldsymbol{w}bold_italic_w to obtain

f(𝒅|X,𝒗)𝑓conditional𝒅X𝒗\displaystyle f(\boldsymbol{d}|{\textnormal{{X}}},\boldsymbol{v})italic_f ( bold_italic_d | X , bold_italic_v ) =f(𝒅|X,𝒘,𝒗)f(𝒘|𝒗)d𝒘,absent𝑓conditional𝒅X𝒘𝒗𝑓conditional𝒘𝒗differential-d𝒘\displaystyle=\int f(\boldsymbol{d}|{\textnormal{{X}}},\boldsymbol{w},% \boldsymbol{v})f(\boldsymbol{w}|\boldsymbol{v})\,\mathrm{d}\boldsymbol{w},= ∫ italic_f ( bold_italic_d | X , bold_italic_w , bold_italic_v ) italic_f ( bold_italic_w | bold_italic_v ) roman_d bold_italic_w , (31)

with the distributions under integration being those shown in (21) and (24); the conditioning on 𝒗𝒗\boldsymbol{v}bold_italic_v merely makes explicit their dependence on the parameters 𝒗𝒗\boldsymbol{v}bold_italic_v. Since all the variables are Gaussian, it is rather easy to show that

f(𝒅|X,𝒗)𝑓conditional𝒅X𝒗\displaystyle f(\boldsymbol{d}|{\textnormal{{X}}},\boldsymbol{v})italic_f ( bold_italic_d | X , bold_italic_v ) =det[(R𝒙+αI)1]αMπNveNexp[Nve(𝒓𝒙dH𝒘^σ~d2)],absentdetdelimited-[]superscriptsubscriptR𝒙𝛼I1superscript𝛼𝑀superscript𝜋𝑁superscriptsubscript𝑣𝑒𝑁𝑁subscript𝑣𝑒superscriptsubscript𝒓𝒙𝑑H^𝒘superscriptsubscript~𝜎𝑑2\displaystyle=\frac{\textrm{det}[({\textnormal{{R}}}_{\boldsymbol{x}}+\alpha{% \textnormal{{I}}})^{-1}]{\alpha^{M}}}{\pi^{N}v_{e}^{N}}\exp\left[\frac{N}{v_{e% }}\big{(}\boldsymbol{r}_{\boldsymbol{x}d}^{\mathrm{H}}\hat{\boldsymbol{w}}-% \tilde{\sigma}_{d}^{2}\big{)}\right],= divide start_ARG det [ ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ] italic_α start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT end_ARG start_ARG italic_π start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_ARG roman_exp [ divide start_ARG italic_N end_ARG start_ARG italic_v start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_ARG ( bold_italic_r start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT over^ start_ARG bold_italic_w end_ARG - over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ] , (32)

where σ~d2=𝒅2/Nsuperscriptsubscript~𝜎𝑑2superscriptnorm𝒅2𝑁\tilde{\sigma}_{d}^{2}=\|\boldsymbol{d}\|^{2}/Nover~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ bold_italic_d ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_N is the estimate of the second moment of d(t)𝑑𝑡d(t)italic_d ( italic_t ), and, from (3), 𝒓𝒙dH𝒘^superscriptsubscript𝒓𝒙𝑑H^𝒘\boldsymbol{r}_{\boldsymbol{x}d}^{\mathrm{H}}\hat{\boldsymbol{w}}bold_italic_r start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT over^ start_ARG bold_italic_w end_ARG is real.

Thus,

L(α,ve)=𝐿𝛼subscript𝑣𝑒absent\displaystyle L(\alpha,v_{e})=italic_L ( italic_α , italic_v start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) = logdet[(R𝒙+αI)1]Mlogα+Nlogvedetdelimited-[]superscriptsubscriptR𝒙𝛼I1𝑀𝛼𝑁subscript𝑣𝑒\displaystyle-\log\textrm{det}\left[({\textnormal{{R}}}_{\boldsymbol{x}}+% \alpha{\textnormal{{I}}})^{-1}\right]-M\log\alpha+N\log v_{e}- roman_log det [ ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ] - italic_M roman_log italic_α + italic_N roman_log italic_v start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT (33)
+Nve(σ~d2𝒓𝒙dH𝒘^)+Nlogπ,𝑁subscript𝑣𝑒superscriptsubscript~𝜎𝑑2superscriptsubscript𝒓𝒙𝑑H^𝒘𝑁𝜋\displaystyle+\frac{N}{v_{e}}(\tilde{\sigma}_{d}^{2}-\boldsymbol{r}_{% \boldsymbol{x}d}^{\mathrm{H}}\hat{\boldsymbol{w}})+N\log\pi,+ divide start_ARG italic_N end_ARG start_ARG italic_v start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_ARG ( over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_r start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT over^ start_ARG bold_italic_w end_ARG ) + italic_N roman_log italic_π ,

which, for a given α𝛼\alphaitalic_α, is uniquely minimized by v^esubscript^𝑣𝑒\hat{v}_{e}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT satisfying

ddveL(α,v^e)ddsubscript𝑣𝑒𝐿𝛼subscript^𝑣𝑒\displaystyle\frac{\,\mathrm{d}}{\,\mathrm{d}v_{e}}L(\alpha,\hat{v}_{e})divide start_ARG roman_d end_ARG start_ARG roman_d italic_v start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_ARG italic_L ( italic_α , over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) =Nv^e[11v^e(σ~d2𝒓𝒙dH𝒘^)]=0,absent𝑁subscript^𝑣𝑒delimited-[]11subscript^𝑣𝑒superscriptsubscript~𝜎𝑑2superscriptsubscript𝒓𝒙𝑑H^𝒘0\displaystyle=\frac{N}{\hat{v}_{e}}\left[1-\frac{1}{\hat{v}_{e}}(\tilde{\sigma% }_{d}^{2}-\boldsymbol{r}_{\boldsymbol{x}d}^{\mathrm{H}}\hat{\boldsymbol{w}})% \right]=0,= divide start_ARG italic_N end_ARG start_ARG over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_ARG [ 1 - divide start_ARG 1 end_ARG start_ARG over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_ARG ( over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_r start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT over^ start_ARG bold_italic_w end_ARG ) ] = 0 , (34)
v^esubscript^𝑣𝑒\displaystyle\hat{v}_{e}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT =σ~d2𝒓𝒙dH𝒘^.absentsuperscriptsubscript~𝜎𝑑2superscriptsubscript𝒓𝒙𝑑H^𝒘\displaystyle=\tilde{\sigma}_{d}^{2}-\boldsymbol{r}_{\boldsymbol{x}d}^{\mathrm% {H}}\hat{\boldsymbol{w}}.= over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_r start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT over^ start_ARG bold_italic_w end_ARG . (35)

Then, (29) is reduced to

αMLsubscript𝛼ML\displaystyle\alpha_{{\textnormal{ML}}}italic_α start_POSTSUBSCRIPT ML end_POSTSUBSCRIPT =argminαL(α),absentsubscriptargmin𝛼𝐿𝛼\displaystyle=\mathop{\mathrm{argmin}}_{\alpha}L(\alpha),= roman_argmin start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_L ( italic_α ) , (36)
L(α)𝐿𝛼\displaystyle L(\alpha)italic_L ( italic_α ) =L(α,v^e)absent𝐿𝛼subscript^𝑣𝑒\displaystyle=L(\alpha,\hat{v}_{e})= italic_L ( italic_α , over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) (37)
=logdet[(R𝒙+αI)1]Mlogα+Nlog(σ~d2𝒓𝒙dH𝒘^)+Const.absentdetdelimited-[]superscriptsubscriptR𝒙𝛼I1𝑀𝛼𝑁superscriptsubscript~𝜎𝑑2superscriptsubscript𝒓𝒙𝑑H^𝒘Const.\displaystyle=-\log\textrm{det}\left[({\textnormal{{R}}}_{\boldsymbol{x}}+% \alpha{\textnormal{{I}}})^{-1}\right]-M\log\alpha+N\log(\tilde{\sigma}_{d}^{2}% -\boldsymbol{r}_{\boldsymbol{x}d}^{\mathrm{H}}\hat{\boldsymbol{w}})+{% \textnormal{Const.}}= - roman_log det [ ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ] - italic_M roman_log italic_α + italic_N roman_log ( over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_r start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT over^ start_ARG bold_italic_w end_ARG ) + Const. (38)

Using the eigenvalue decomposition, R𝒙=Qdiag(𝝀)QHsubscriptR𝒙Qdiag𝝀superscriptQH{\textnormal{{R}}}_{\boldsymbol{x}}={\textnormal{{Q}}}{\textnormal{diag}}(% \boldsymbol{\lambda}){\textnormal{{Q}}}^{\mathrm{H}}R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT = bold_typewriter_Q roman_diag ( bold_italic_λ ) Q start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT, where diag(𝝀)diag𝝀{\textnormal{diag}}(\boldsymbol{\lambda})diag ( bold_italic_λ ) is a diagonal matrix with diagonal elements taken from the vector 𝝀=[λ1,,λL]𝖳𝝀superscriptsubscript𝜆1subscript𝜆𝐿𝖳\boldsymbol{\lambda}=[\lambda_{1},\ldots,\lambda_{L}]^{\mathsf{T}}bold_italic_λ = [ italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_λ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT, λlsubscript𝜆𝑙\lambda_{l}italic_λ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT being the eigenvalues of R𝒙subscriptR𝒙{\textnormal{{R}}}_{\boldsymbol{x}}R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT, and the columns of QL×LQsuperscript𝐿𝐿{\textnormal{{Q}}}\in\mathbb{R}^{L\times L}Q ∈ blackboard_R start_POSTSUPERSCRIPT italic_L × italic_L end_POSTSUPERSCRIPT are the corresponding eigenvectors, we obtain

𝒘^(α)^𝒘𝛼\displaystyle\hat{\boldsymbol{w}}(\alpha)over^ start_ARG bold_italic_w end_ARG ( italic_α ) =Qdiag1(𝝀+α)𝒛𝒙d,absentsuperscriptQdiag1𝝀𝛼subscript𝒛𝒙𝑑\displaystyle={\textnormal{{Q}}}{\textnormal{diag}}^{-1}(\boldsymbol{\lambda}+% \alpha)\boldsymbol{z}_{\boldsymbol{x}d},= bold_typewriter_Q roman_diag start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_λ + italic_α ) bold_italic_z start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT , (39)
𝒛𝒙dsubscript𝒛𝒙𝑑\displaystyle\boldsymbol{z}_{\boldsymbol{x}d}bold_italic_z start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT =QH𝒓𝒙d=[z𝒙d,1,,z𝒙d,M]H,absentsuperscriptQHsubscript𝒓𝒙𝑑superscriptsubscript𝑧𝒙𝑑1subscript𝑧𝒙𝑑𝑀H\displaystyle={\textnormal{{Q}}}^{\mathrm{H}}\boldsymbol{r}_{\boldsymbol{x}d}=% [z_{\boldsymbol{x}d,1},\ldots,z_{\boldsymbol{x}d,M}]^{\mathrm{H}},= Q start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_italic_r start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT = [ italic_z start_POSTSUBSCRIPT bold_italic_x italic_d , 1 end_POSTSUBSCRIPT , … , italic_z start_POSTSUBSCRIPT bold_italic_x italic_d , italic_M end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT , (40)

so (38) may be written as

L(α)𝐿𝛼\displaystyle L(\alpha)italic_L ( italic_α ) =m=1Mlogα+λmα+Nlog(σ~d2m=1Mz𝒙d,m2α+λm)+Const,absentsuperscriptsubscript𝑚1𝑀𝛼subscript𝜆𝑚𝛼𝑁superscriptsubscript~𝜎𝑑2superscriptsubscript𝑚1𝑀superscriptsubscript𝑧𝒙𝑑𝑚2𝛼subscript𝜆𝑚Const\displaystyle=\sum_{m=1}^{M}\log\frac{\alpha+\lambda_{m}}{\alpha}+N\log\left(% \tilde{\sigma}_{d}^{2}-\sum_{m=1}^{M}\frac{z_{\boldsymbol{x}d,m}^{2}}{\alpha+% \lambda_{m}}\right)+{\textnormal{Const}},= ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT roman_log divide start_ARG italic_α + italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_α end_ARG + italic_N roman_log ( over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT divide start_ARG italic_z start_POSTSUBSCRIPT bold_italic_x italic_d , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_α + italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG ) + Const , (41)

and, now, we easily find its derivative:

L(α)superscript𝐿𝛼\displaystyle L^{\prime}(\alpha)italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ) =Nf(α)g(α)γ(α)α,absent𝑁𝑓𝛼𝑔𝛼𝛾𝛼𝛼\displaystyle=N\frac{f(\alpha)}{g(\alpha)}-\frac{\gamma(\alpha)}{\alpha},= italic_N divide start_ARG italic_f ( italic_α ) end_ARG start_ARG italic_g ( italic_α ) end_ARG - divide start_ARG italic_γ ( italic_α ) end_ARG start_ARG italic_α end_ARG , (42)

where

f(α)𝑓𝛼\displaystyle f(\alpha)italic_f ( italic_α ) =m=1Mz𝒙d,m2(λm+α)2=γ(α)σ~w2(α),absentsuperscriptsubscript𝑚1𝑀superscriptsubscript𝑧𝒙𝑑𝑚2superscriptsubscript𝜆𝑚𝛼2𝛾𝛼superscriptsubscript~𝜎𝑤2𝛼\displaystyle=\sum_{m=1}^{M}\frac{z_{\boldsymbol{x}d,m}^{2}}{(\lambda_{m}+% \alpha)^{2}}=\gamma(\alpha)\tilde{\sigma}_{w}^{2}(\alpha),= ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT divide start_ARG italic_z start_POSTSUBSCRIPT bold_italic_x italic_d , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = italic_γ ( italic_α ) over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α ) , (43)
g(α)𝑔𝛼\displaystyle g(\alpha)italic_g ( italic_α ) =σ~d2𝒓𝒙dH𝒘^(α)=σ~e2(α)+αγ(α)σ~w2(α),absentsuperscriptsubscript~𝜎𝑑2superscriptsubscript𝒓𝒙𝑑H^𝒘𝛼superscriptsubscript~𝜎𝑒2𝛼𝛼𝛾𝛼superscriptsubscript~𝜎𝑤2𝛼\displaystyle=\tilde{\sigma}_{d}^{2}-\boldsymbol{r}_{\boldsymbol{x}d}^{\mathrm% {H}}\hat{\boldsymbol{w}}(\alpha)=\tilde{\sigma}_{e}^{2}(\alpha)+\alpha\gamma(% \alpha)\tilde{\sigma}_{w}^{2}(\alpha),= over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - bold_italic_r start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT over^ start_ARG bold_italic_w end_ARG ( italic_α ) = over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α ) + italic_α italic_γ ( italic_α ) over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α ) , (44)

in which σ~e2(α)superscriptsubscript~𝜎𝑒2𝛼\tilde{\sigma}_{e}^{2}(\alpha)over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α ) and σ~w2(α)superscriptsubscript~𝜎𝑤2𝛼\tilde{\sigma}_{w}^{2}(\alpha)over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α ) are given in (18) and (19), respectively, and the latter uses

γγ(α)𝛾𝛾𝛼\displaystyle\gamma\equiv\gamma(\alpha)italic_γ ≡ italic_γ ( italic_α ) =m=1Mλmλm+α,absentsuperscriptsubscript𝑚1𝑀subscript𝜆𝑚subscript𝜆𝑚𝛼\displaystyle=\sum_{m=1}^{M}\frac{\lambda_{m}}{\lambda_{m}+\alpha},= ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_α end_ARG , (45)

also known as the effective number of parameters [14, Sec. 7.6]. Note that γ(α)[0,M]𝛾𝛼0𝑀\gamma(\alpha)\in[0,M]italic_γ ( italic_α ) ∈ [ 0 , italic_M ] and, for α=0𝛼0\alpha=0italic_α = 0, if no eigenvalues are zero, we can use γ=γ(α)=M𝛾𝛾𝛼𝑀\gamma=\gamma(\alpha)=Mitalic_γ = italic_γ ( italic_α ) = italic_M, as we did in (19).

As already noted in [15], solving L(α)=0superscript𝐿𝛼0L^{\prime}(\alpha)=0italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ) = 0 amounts to finding the real roots of the polynomial of degree not larger than 2M12𝑀12M-12 italic_M - 1, whose properties are described in the following:

Proposition 1 (Roots of L(α)superscript𝐿𝛼L^{\prime}(\alpha)italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ))
  1. 1.

    limαL(α)=0subscript𝛼𝐿𝛼0\lim_{\alpha\rightarrow\infty}L(\alpha)=0roman_lim start_POSTSUBSCRIPT italic_α → ∞ end_POSTSUBSCRIPT italic_L ( italic_α ) = 0, i.e., α=𝛼\alpha=\inftyitalic_α = ∞ is a root of L(α)superscript𝐿𝛼L^{\prime}(\alpha)italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ).

  2. 2.

    The odd-numbered roots (the first, the third, etc.) of L(α)superscript𝐿𝛼L^{\prime}(\alpha)italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ) are minima of L(α)𝐿𝛼L(\alpha)italic_L ( italic_α ).

  3. 3.

    L(α)superscript𝐿𝛼L^{\prime}(\alpha)italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ) has an even number of roots if and only if

    N𝒓𝒙d2>σ~d2Tr(R𝒙).𝑁superscriptnormsubscript𝒓𝒙𝑑2superscriptsubscript~𝜎𝑑2TrsubscriptR𝒙\displaystyle N\|\boldsymbol{r}_{\boldsymbol{x}d}\|^{2}>\tilde{\sigma}_{d}^{2}% {\textnormal{Tr}}({\textnormal{{R}}}_{\boldsymbol{x}}).italic_N ∥ bold_italic_r start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Tr ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ) . (46)

Proof: A

Some comments are in order.

  • We should appreciate the possibility of absence of finite roots of L(α)superscript𝐿𝛼L^{\prime}(\alpha)italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ). Note that, if the only root444Of course, we talk about the real roots which are meaningful solutions. is α=𝛼\alpha=\inftyitalic_α = ∞, then it is also the first root, which means that L(α)𝐿𝛼L(\alpha)italic_L ( italic_α ) is minimized for α=𝛼\alpha=\inftyitalic_α = ∞, in which case 𝒘^(α)=𝟎^𝒘𝛼0\hat{\boldsymbol{w}}(\alpha)=\boldsymbol{0}over^ start_ARG bold_italic_w end_ARG ( italic_α ) = bold_0. The fact that such a solution may be optimal is not at all obvious when formulating the filtering problem. As we will see empirically, it is indeed the case in some scenarios.

  • Since σ~dsubscript~𝜎𝑑\tilde{\sigma}_{d}over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, 𝒓𝒙dsubscript𝒓𝒙𝑑\boldsymbol{r}_{\boldsymbol{x}d}bold_italic_r start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT, and R𝒙subscriptR𝒙{\textnormal{{R}}}_{\boldsymbol{x}}R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT are empirical means, which, for large N𝑁Nitalic_N, tend to its corresponding expected values, (46) is likely to be satisfied for sufficiently large N𝑁Nitalic_N, where the latter dominates the left-hand side (l.h.s.) of (46). In other words, by increasing N𝑁Nitalic_N, we will have an even number of roots and then α=𝛼\alpha=\inftyitalic_α = ∞ is a local maximum of L(α)𝐿𝛼L(\alpha)italic_L ( italic_α ) and thus αMLsubscript𝛼ML\alpha_{{\textnormal{ML}}}italic_α start_POSTSUBSCRIPT ML end_POSTSUBSCRIPT is finite.

Finding the roots may be done exploiting the polynomial structure of L(α)superscript𝐿𝛼L^{\prime}(\alpha)italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ) but, in practice, this is feasible only for moderate M𝑀Mitalic_M, e.g., in MVDR receivers applied in arrays composed of dozens of antennas. For large M𝑀Mitalic_M, e.g., M>100𝑀100M>100italic_M > 100, typical in system identification and/or equalization, the roots may be found, e.g., via grid search [15]. However, not all of these methods are very practical, which may explain why they did not receive much attention in the literature – in fact, they were not reused as a go-to-solution by the authors of [15], e.g., in [11].

Our goal is thus to propose a simple approach to solve L(α)=0superscript𝐿𝛼0L^{\prime}(\alpha)=0italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ) = 0, which, after reorganizing (42), is equivalent to solving

α𝛼\displaystyle\alphaitalic_α =γ(α)g(α)Nf(α),absent𝛾𝛼𝑔𝛼𝑁𝑓𝛼\displaystyle=\gamma(\alpha)\frac{g(\alpha)}{Nf(\alpha)},= italic_γ ( italic_α ) divide start_ARG italic_g ( italic_α ) end_ARG start_ARG italic_N italic_f ( italic_α ) end_ARG , (47)

which we do via a fixed-point iteration:

α(i+1)superscript𝛼𝑖1\displaystyle\alpha^{(i+1)}italic_α start_POSTSUPERSCRIPT ( italic_i + 1 ) end_POSTSUPERSCRIPT =γ(α(i))g(α(i))Nf(α(i)),absent𝛾superscript𝛼𝑖𝑔superscript𝛼𝑖𝑁𝑓superscript𝛼𝑖\displaystyle=\gamma\big{(}\alpha^{(i)}\big{)}\frac{g\big{(}\alpha^{(i)}\big{)% }}{Nf\big{(}\alpha^{(i)}\big{)}},= italic_γ ( italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) divide start_ARG italic_g ( italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_N italic_f ( italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) end_ARG , (48)
=σ~e2(α(i))Nσ~w2(α(i))+α(i)Nγ(α(i)),i=1,,I,formulae-sequenceabsentsuperscriptsubscript~𝜎𝑒2superscript𝛼𝑖𝑁superscriptsubscript~𝜎𝑤2superscript𝛼𝑖superscript𝛼𝑖𝑁𝛾superscript𝛼𝑖𝑖1𝐼\displaystyle=\frac{\tilde{\sigma}_{e}^{2}\big{(}\alpha^{(i)}\big{)}}{N\tilde{% \sigma}_{w}^{2}\big{(}\alpha^{(i)}\big{)}}+\frac{\alpha^{(i)}}{N}\gamma\big{(}% \alpha^{(i)}\big{)},\quad i=1,\ldots,I,= divide start_ARG over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_N over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) end_ARG + divide start_ARG italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_ARG start_ARG italic_N end_ARG italic_γ ( italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) , italic_i = 1 , … , italic_I , (49)
αMLsubscript𝛼ML\displaystyle\alpha_{{\textnormal{ML}}}italic_α start_POSTSUBSCRIPT ML end_POSTSUBSCRIPT =α(I),absentsuperscript𝛼𝐼\displaystyle=\alpha^{(I)},= italic_α start_POSTSUPERSCRIPT ( italic_I ) end_POSTSUPERSCRIPT , (50)

where I𝐼Iitalic_I is a predefined number of iterations, and initialization α(0)>0superscript𝛼00\alpha^{(0)}>0italic_α start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT > 0 must be defined.

Note that:

  • The convergence of the fixed-point iteration (48) is not proven, but, in numerical examples, it always converged to a minima of L(α)superscript𝐿𝛼L^{\prime}(\alpha)italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ) when (46) was satisfied (i.e., when there are finite minima of L(α)𝐿𝛼L(\alpha)italic_L ( italic_α )).

  • With the initialization α(0)=0superscript𝛼00\alpha^{(0)}=0italic_α start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = 0, the first iteration of (49) yields

    α(1)superscript𝛼1\displaystyle\alpha^{(1)}italic_α start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT =σ~e2(0)Nσ~w2(0),absentsuperscriptsubscript~𝜎𝑒20𝑁superscriptsubscript~𝜎𝑤20\displaystyle=\frac{\tilde{\sigma}_{e}^{2}(0)}{N\tilde{\sigma}_{w}^{2}(0)},= divide start_ARG over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 0 ) end_ARG start_ARG italic_N over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 0 ) end_ARG , (51)

    which is exactly the HKB method shown in (17). We can thus say that our solution generalizes the HKB method, enhancing it with an iterative refinement, and removing the initialization with a non-regularized solution, i.e., α(0)=0superscript𝛼00\alpha^{(0)}=0italic_α start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = 0, which may be problematic in general, since it cannot be solved meaningfully for N<M𝑁𝑀N<Mitalic_N < italic_M.

  • The fixed-point iteration (49) is not a unique way to solve the problem iteratively. For example, using (44) in (47), and isolating α𝛼\alphaitalic_α, we obtain a new fixed-point equation:

    α(i+1)superscript𝛼𝑖1\displaystyle\alpha^{(i+1)}italic_α start_POSTSUPERSCRIPT ( italic_i + 1 ) end_POSTSUPERSCRIPT =σ~e2(α(i))[Nγ(α(i))]σ~w2(α(i)),absentsuperscriptsubscript~𝜎𝑒2superscript𝛼𝑖delimited-[]𝑁𝛾superscript𝛼𝑖superscriptsubscript~𝜎𝑤2superscript𝛼𝑖\displaystyle=\frac{\tilde{\sigma}_{e}^{2}\big{(}\alpha^{(i)}\big{)}}{[N-% \gamma\big{(}\alpha^{(i)}\big{)}]\tilde{\sigma}_{w}^{2}\big{(}\alpha^{(i)}\big% {)}},= divide start_ARG over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) end_ARG start_ARG [ italic_N - italic_γ ( italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) ] over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) end_ARG , (52)

    which is known as Gull-MacKay iteration [6, Ch. 18.1.4]; see [16, v1 App. A] for an alternative derivation.

    Our experience shows that Gull-MacKay converges faster than (48). However, it should be applied with care for N<M𝑁𝑀N<Mitalic_N < italic_M, because in this case, we do not have a guarantee that Nγ(α)𝑁𝛾𝛼N-\gamma(\alpha)italic_N - italic_γ ( italic_α ) is positive [as seen in (45)].

  • The iterative solutions (49) and (52) use σ~w2(α)superscriptsubscript~𝜎𝑤2𝛼\tilde{\sigma}_{w}^{2}(\alpha)over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α ) and σ~e2(α)superscriptsubscript~𝜎𝑒2𝛼\tilde{\sigma}_{e}^{2}(\alpha)over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α ) which may be calculated from the eigenvalue decomposition shown in (41)-(45); this reduces the complexity significantly.

4 Automatic regularization in the interference-suppression problem

Having solved the problem of automatic regularization of the Wiener equations (2) in the error-minimization problem, we turn our attention to the interference-suppression problem (5) and we reformulate it to take advantage of the development we already made in Sec. 3. To this end, we need to remove the constraint in (5), which is done by expressing 𝒘𝒘\boldsymbol{w}bold_italic_w as

𝒘=1M𝒂A𝒖,𝒘1𝑀𝒂A𝒖\displaystyle\boldsymbol{w}=\frac{1}{M}\boldsymbol{a}-{\textnormal{{A}}}% \boldsymbol{u},bold_italic_w = divide start_ARG 1 end_ARG start_ARG italic_M end_ARG bold_italic_a - A bold_italic_u , (53)

where

A =I1M𝒂𝒂H=AHabsentI1𝑀𝒂superscript𝒂HsuperscriptAH\displaystyle={\textnormal{{I}}}-\frac{1}{M}\boldsymbol{a}\boldsymbol{a}^{% \mathrm{H}}={\textnormal{{A}}}^{\mathrm{H}}= I - divide start_ARG 1 end_ARG start_ARG italic_M end_ARG bold_italic_a bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT = A start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT (54)

is the projection matrix; indeed, it is easy to see that 𝒂HA=𝟎superscript𝒂HA0\boldsymbol{a}^{\mathrm{H}}{\textnormal{{A}}}=\boldsymbol{0}bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT A = bold_0, and thus, for any 𝒖M𝒖superscript𝑀\boldsymbol{u}\in\mathbb{C}^{M}bold_italic_u ∈ blackboard_C start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT, 𝒂H𝒘=1superscript𝒂H𝒘1\boldsymbol{a}^{\mathrm{H}}\boldsymbol{w}=1bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_italic_w = 1.

Note that this approach, with a slightly different definition of (53), was also used in [15, 11].

We may thus reformulate (5) as

𝒘^^𝒘\displaystyle\hat{\boldsymbol{w}}over^ start_ARG bold_italic_w end_ARG =1M𝒂A𝒖^,absent1𝑀𝒂A^𝒖\displaystyle=\frac{1}{M}\boldsymbol{a}-{\textnormal{{A}}}\hat{\boldsymbol{u}},= divide start_ARG 1 end_ARG start_ARG italic_M end_ARG bold_italic_a - A over^ start_ARG bold_italic_u end_ARG , (55)
𝒖^^𝒖\displaystyle\hat{\boldsymbol{u}}over^ start_ARG bold_italic_u end_ARG =argmin𝒖{𝔼[|(𝒂/MA𝒖)H𝒙(t)|2]+α𝒂/MA𝒖2},absentsubscriptargmin𝒖𝔼delimited-[]superscriptsuperscript𝒂𝑀A𝒖H𝒙𝑡2𝛼superscriptnorm𝒂𝑀A𝒖2\displaystyle=\mathop{\mathrm{argmin}}_{\boldsymbol{u}}\left\{\mathds{E}\Big{[% }|(\boldsymbol{a}/M-{\textnormal{{A}}}\boldsymbol{u})^{\mathrm{H}}\boldsymbol{% x}(t)|^{2}\Big{]}+\alpha\|\boldsymbol{a}/M-{\textnormal{{A}}}\boldsymbol{u}\|^% {2}\right\},= roman_argmin start_POSTSUBSCRIPT bold_italic_u end_POSTSUBSCRIPT { blackboard_E [ | ( bold_italic_a / italic_M - A bold_italic_u ) start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_italic_x ( italic_t ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + italic_α ∥ bold_italic_a / italic_M - A bold_italic_u ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } , (56)
=argmin𝒖{𝔼[|d~(t)𝒖H𝒙~(t)|2]+α𝒖HA𝒖},absentsubscriptargmin𝒖𝔼delimited-[]superscript~𝑑𝑡superscript𝒖H~𝒙𝑡2𝛼superscript𝒖HA𝒖\displaystyle=\mathop{\mathrm{argmin}}_{\boldsymbol{u}}\left\{\mathds{E}\Big{[% }|\tilde{d}(t)-\boldsymbol{u}^{\mathrm{H}}\tilde{\boldsymbol{x}}(t)|^{2}\Big{]% }+\alpha\boldsymbol{u}^{\mathrm{H}}{\textnormal{{A}}}\boldsymbol{u}\right\},= roman_argmin start_POSTSUBSCRIPT bold_italic_u end_POSTSUBSCRIPT { blackboard_E [ | over~ start_ARG italic_d end_ARG ( italic_t ) - bold_italic_u start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT over~ start_ARG bold_italic_x end_ARG ( italic_t ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + italic_α bold_italic_u start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT A bold_italic_u } , (57)

where we removed the constant terms from (57), and we used

𝒙~(t)~𝒙𝑡\displaystyle\tilde{\boldsymbol{x}}(t)over~ start_ARG bold_italic_x end_ARG ( italic_t ) =A𝒙(t),absentA𝒙𝑡\displaystyle={\textnormal{{A}}}\boldsymbol{x}(t),= A bold_italic_x ( italic_t ) , (58)
d~(t)~𝑑𝑡\displaystyle\tilde{d}(t)over~ start_ARG italic_d end_ARG ( italic_t ) =1M𝒂H𝒙(t).absent1𝑀superscript𝒂H𝒙𝑡\displaystyle=\frac{1}{M}\boldsymbol{a}^{\mathrm{H}}\boldsymbol{x}(t).= divide start_ARG 1 end_ARG start_ARG italic_M end_ARG bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_italic_x ( italic_t ) . (59)
Proposition 2

The optimization in (57) is equivalent to

𝒖^^𝒖\displaystyle\hat{\boldsymbol{u}}over^ start_ARG bold_italic_u end_ARG =argmin𝒖{𝔼[|d~(t)𝒖H𝒙~(t)|2]+α𝒖2}.absentsubscriptargmin𝒖𝔼delimited-[]superscript~𝑑𝑡superscript𝒖H~𝒙𝑡2𝛼superscriptnorm𝒖2\displaystyle=\mathop{\mathrm{argmin}}_{\boldsymbol{u}}\left\{\mathds{E}\Big{[% }|\tilde{d}(t)-\boldsymbol{u}^{\mathrm{H}}\tilde{\boldsymbol{x}}(t)|^{2}\Big{]% }+\alpha\|\boldsymbol{u}\|^{2}\right\}.= roman_argmin start_POSTSUBSCRIPT bold_italic_u end_POSTSUBSCRIPT { blackboard_E [ | over~ start_ARG italic_d end_ARG ( italic_t ) - bold_italic_u start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT over~ start_ARG bold_italic_x end_ARG ( italic_t ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + italic_α ∥ bold_italic_u ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } . (60)

Proof: We can always write 𝐮=𝐮+𝐮𝐮subscript𝐮parallel-tosubscript𝐮perpendicular-to\boldsymbol{u}=\boldsymbol{u}_{\parallel}+\boldsymbol{u}_{\perp}bold_italic_u = bold_italic_u start_POSTSUBSCRIPT ∥ end_POSTSUBSCRIPT + bold_italic_u start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT, where 𝐮=β𝐚subscript𝐮parallel-toβ𝐚\boldsymbol{u}_{\parallel}=\beta\boldsymbol{a}bold_italic_u start_POSTSUBSCRIPT ∥ end_POSTSUBSCRIPT = italic_β bold_italic_a is the term collinear with 𝐚𝐚\boldsymbol{a}bold_italic_a and 𝐮subscript𝐮perpendicular-to\boldsymbol{u}_{\perp}bold_italic_u start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT is the term orthogonal to 𝐚𝐚\boldsymbol{a}bold_italic_a, i.e., 𝐚H𝐮=0superscript𝐚Hsubscript𝐮perpendicular-to0\boldsymbol{a}^{\mathrm{H}}\boldsymbol{u}_{\perp}=0bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_italic_u start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT = 0. Then, from 𝐮HA=0superscriptsubscript𝐮parallel-toHA0\boldsymbol{u}_{\parallel}^{\mathrm{H}}{\textnormal{{A}}}=0bold_italic_u start_POSTSUBSCRIPT ∥ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT A = 0, we see that 𝐮H𝐱~(t)=𝐮H𝐱~(t)superscript𝐮H~𝐱tsuperscriptsubscript𝐮perpendicular-toH~𝐱t\boldsymbol{u}^{\mathrm{H}}\tilde{\boldsymbol{x}}(t)=\boldsymbol{u}_{\perp}^{% \mathrm{H}}\tilde{\boldsymbol{x}}(t)bold_italic_u start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT over~ start_ARG bold_italic_x end_ARG ( italic_t ) = bold_italic_u start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT over~ start_ARG bold_italic_x end_ARG ( italic_t ) and 𝐮HA𝐮=𝐮H2superscript𝐮HA𝐮superscriptnormsuperscriptsubscript𝐮perpendicular-toH2\boldsymbol{u}^{\mathrm{H}}{\textnormal{{A}}}\boldsymbol{u}=\|\boldsymbol{u}_{% \perp}^{\mathrm{H}}\|^{2}bold_italic_u start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT A bold_italic_u = ∥ bold_italic_u start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , which means that the cost function under minimization in (57) is insensitive to adding a term collinear with 𝐚𝐚\boldsymbol{a}bold_italic_a to any 𝐮𝐮\boldsymbol{u}bold_italic_u, i.e., 𝐮+β𝐚𝐮β𝐚\boldsymbol{u}+\beta\boldsymbol{a}bold_italic_u + italic_β bold_italic_a. In particular, we may remove the term collinear with 𝐚𝐚\boldsymbol{a}bold_italic_a from 𝐮^^𝐮\hat{\boldsymbol{u}}over^ start_ARG bold_italic_u end_ARG by adding a penalty term 𝐮2superscriptnormsubscript𝐮parallel-to2\|\boldsymbol{u}_{\parallel}\|^{2}∥ bold_italic_u start_POSTSUBSCRIPT ∥ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, i.e.,

𝒖^^𝒖\displaystyle\hat{\boldsymbol{u}}over^ start_ARG bold_italic_u end_ARG =argmin𝒖{𝔼[|d~(t)𝒖H𝒙~(t)|2]+α(𝒖HA𝒖+𝒖2)},absentsubscriptargmin𝒖𝔼delimited-[]superscript~𝑑𝑡superscript𝒖H~𝒙𝑡2𝛼superscript𝒖HA𝒖superscriptnormsubscript𝒖parallel-to2\displaystyle=\mathop{\mathrm{argmin}}_{\boldsymbol{u}}\left\{\mathds{E}\Big{[% }|\tilde{d}(t)-\boldsymbol{u}^{\mathrm{H}}\tilde{\boldsymbol{x}}(t)|^{2}\Big{]% }+\alpha(\boldsymbol{u}^{\mathrm{H}}{\textnormal{{A}}}\boldsymbol{u}+\|% \boldsymbol{u}_{\parallel}\|^{2})\right\},= roman_argmin start_POSTSUBSCRIPT bold_italic_u end_POSTSUBSCRIPT { blackboard_E [ | over~ start_ARG italic_d end_ARG ( italic_t ) - bold_italic_u start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT over~ start_ARG bold_italic_x end_ARG ( italic_t ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + italic_α ( bold_italic_u start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT A bold_italic_u + ∥ bold_italic_u start_POSTSUBSCRIPT ∥ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) } , (61)

and, because 𝐮HA𝐮+𝐮2=𝐮2superscript𝐮HA𝐮superscriptnormsubscript𝐮parallel-to2superscriptnorm𝐮2\boldsymbol{u}^{\mathrm{H}}{\textnormal{{A}}}\boldsymbol{u}+\|\boldsymbol{u}_{% \parallel}\|^{2}=\|\boldsymbol{u}\|^{2}bold_italic_u start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT A bold_italic_u + ∥ bold_italic_u start_POSTSUBSCRIPT ∥ end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ bold_italic_u ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, (61) is the same as (60).

The goal of Proposition 2 was to obtain (60) which has the same form as error-minimization problem (2). Thus, we can reuse the equations of the latter, i.e.,

𝒖^^𝒖\displaystyle\hat{\boldsymbol{u}}over^ start_ARG bold_italic_u end_ARG =(R𝒙~+αI)1𝒓𝒙~d~,absentsuperscriptsubscriptR~𝒙𝛼I1subscript𝒓~𝒙~𝑑\displaystyle=({\textnormal{{R}}}_{\tilde{\boldsymbol{x}}}+\alpha{\textnormal{% {I}}})^{-1}\boldsymbol{r}_{\tilde{\boldsymbol{x}}\tilde{d}},= ( R start_POSTSUBSCRIPT over~ start_ARG bold_italic_x end_ARG end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_r start_POSTSUBSCRIPT over~ start_ARG bold_italic_x end_ARG over~ start_ARG italic_d end_ARG end_POSTSUBSCRIPT , (62)
R𝒙~subscriptR~𝒙\displaystyle{\textnormal{{R}}}_{\tilde{\boldsymbol{x}}}R start_POSTSUBSCRIPT over~ start_ARG bold_italic_x end_ARG end_POSTSUBSCRIPT =1Nt=0N1𝒙~(t)𝒙~H(t)=AR𝒙A,absent1𝑁superscriptsubscript𝑡0𝑁1~𝒙𝑡superscript~𝒙H𝑡subscriptAR𝒙A\displaystyle=\frac{1}{N}\sum_{t=0}^{N-1}\tilde{\boldsymbol{x}}(t)\tilde{% \boldsymbol{x}}^{\mathrm{H}}(t)={\textnormal{{A}}}{\textnormal{{R}}}_{% \boldsymbol{x}}{\textnormal{{A}}},= divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT over~ start_ARG bold_italic_x end_ARG ( italic_t ) over~ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT ( italic_t ) = bold_typewriter_A bold_typewriter_R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT A , (63)
𝒓𝒙~d~subscript𝒓~𝒙~𝑑\displaystyle\boldsymbol{r}_{\tilde{\boldsymbol{x}}\tilde{d}}bold_italic_r start_POSTSUBSCRIPT over~ start_ARG bold_italic_x end_ARG over~ start_ARG italic_d end_ARG end_POSTSUBSCRIPT =1Nt=0N1𝒙~(t)d~(t)=1MAR𝒙𝒂,absent1𝑁superscriptsubscript𝑡0𝑁1~𝒙𝑡superscript~𝑑𝑡1𝑀subscriptAR𝒙𝒂\displaystyle=\frac{1}{N}\sum_{t=0}^{N-1}\tilde{\boldsymbol{x}}(t)\tilde{d}^{*% }(t)=\frac{1}{M}{\textnormal{{A}}}{\textnormal{{R}}}_{\boldsymbol{x}}% \boldsymbol{a},= divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT over~ start_ARG bold_italic_x end_ARG ( italic_t ) over~ start_ARG italic_d end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) = divide start_ARG 1 end_ARG start_ARG italic_M end_ARG bold_typewriter_A bold_typewriter_R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT bold_italic_a , (64)

as well as we can apply the iterative solution (49) to find the regularization factor, that is

α(i+1)superscript𝛼𝑖1\displaystyle\alpha^{(i+1)}italic_α start_POSTSUPERSCRIPT ( italic_i + 1 ) end_POSTSUPERSCRIPT =γ(α(i))1N𝒅~X~H𝒖^(α(i))2N𝒖^(α(i))2+γ(α(i))Nα(i)absent𝛾superscript𝛼𝑖1𝑁superscriptnorm~𝒅superscript~XH^𝒖superscript𝛼𝑖2𝑁superscriptnorm^𝒖superscript𝛼𝑖2𝛾superscript𝛼𝑖𝑁superscript𝛼𝑖\displaystyle=\gamma\big{(}\alpha^{(i)}\big{)}\frac{\frac{1}{N}\|\tilde{% \boldsymbol{d}}-\tilde{\textnormal{{X}}}^{\mathrm{H}}\hat{\boldsymbol{u}}\big{% (}\alpha^{(i)}\big{)}\|^{2}}{N\|\hat{\boldsymbol{u}}\big{(}\alpha^{(i)}\big{)}% \|^{2}}+\frac{\gamma\big{(}\alpha^{(i)}\big{)}}{N}\alpha^{(i)}= italic_γ ( italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) divide start_ARG divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∥ over~ start_ARG bold_italic_d end_ARG - over~ start_ARG X end_ARG start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT over^ start_ARG bold_italic_u end_ARG ( italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_N ∥ over^ start_ARG bold_italic_u end_ARG ( italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_γ ( italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_N end_ARG italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT (65)

Since we removed the terms collinear with 𝒂𝒂\boldsymbol{a}bold_italic_a, we have 𝒖^H𝒂=0superscript^𝒖H𝒂0\hat{\boldsymbol{u}}^{\mathrm{H}}\boldsymbol{a}=0over^ start_ARG bold_italic_u end_ARG start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_italic_a = 0, and

𝒖^(i)2=𝒘^(i)21M,superscriptnormsuperscript^𝒖𝑖2superscriptnormsuperscript^𝒘𝑖21𝑀\displaystyle\|\hat{\boldsymbol{u}}^{(i)}\|^{2}=\|\hat{\boldsymbol{w}}^{(i)}\|% ^{2}-\frac{1}{M},∥ over^ start_ARG bold_italic_u end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ over^ start_ARG bold_italic_w end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_M end_ARG , (66)

and, from B, we have

γ(i)=Mα(i)Tr[(R𝒙+α(i)I)1]𝒂HR𝒙H(R𝒙+α(i)I)1𝒘^(i),superscript𝛾𝑖𝑀superscript𝛼𝑖Trdelimited-[]superscriptsubscriptR𝒙superscript𝛼𝑖I1superscript𝒂HsuperscriptsubscriptR𝒙HsuperscriptsubscriptR𝒙superscript𝛼𝑖I1superscript^𝒘𝑖\displaystyle\gamma^{(i)}=M-\alpha^{(i)}{\textnormal{Tr}}[({\textnormal{{R}}}_% {\boldsymbol{x}}+\alpha^{(i)}{\textnormal{{I}}})^{-1}]-\boldsymbol{a}^{\mathrm% {H}}{\textnormal{{R}}}_{\boldsymbol{x}}^{\mathrm{H}}({\textnormal{{R}}}_{% \boldsymbol{x}}+\alpha^{(i)}{\textnormal{{I}}})^{-1}\hat{\boldsymbol{w}}^{(i)},italic_γ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = italic_M - italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT Tr [ ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ] - bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_italic_w end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , (67)

which may be integrated in the fixed point iteration.

For example, the Gull-MacKay iteration (52) becomes

𝒘^(i)superscript^𝒘𝑖\displaystyle\hat{\boldsymbol{w}}^{(i)}over^ start_ARG bold_italic_w end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT =(R𝒙+α(i)I)1𝒂𝒂H(R𝒙+α(i)I)1𝒂,absentsuperscriptsubscriptR𝒙superscript𝛼𝑖I1𝒂superscript𝒂HsuperscriptsubscriptR𝒙superscript𝛼𝑖I1𝒂\displaystyle=\frac{({\textnormal{{R}}}_{\boldsymbol{x}}+\alpha^{(i)}{% \textnormal{{I}}})^{-1}\boldsymbol{a}}{\boldsymbol{a}^{\mathrm{H}}({% \textnormal{{R}}}_{\boldsymbol{x}}+\alpha^{(i)}{\textnormal{{I}}})^{-1}% \boldsymbol{a}},= divide start_ARG ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_a end_ARG start_ARG bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_a end_ARG , (68)
α(i+1)superscript𝛼𝑖1\displaystyle\alpha^{(i+1)}italic_α start_POSTSUPERSCRIPT ( italic_i + 1 ) end_POSTSUPERSCRIPT =(𝒘^(i))HR𝒙𝒘^(i)(𝒘^(i)21M)(Nγ(i)1).absentsuperscriptsuperscript^𝒘𝑖HsubscriptR𝒙superscript^𝒘𝑖superscriptnormsuperscript^𝒘𝑖21𝑀𝑁superscript𝛾𝑖1\displaystyle=\frac{(\hat{\boldsymbol{w}}^{(i)})^{\mathrm{H}}{\textnormal{{R}}% }_{\boldsymbol{x}}\hat{\boldsymbol{w}}^{(i)}}{\left(\|\hat{\boldsymbol{w}}^{(i% )}\|^{2}-\displaystyle\frac{1}{M}\right)\left(\displaystyle\frac{N}{\gamma^{(i% )}}-1\right)}.= divide start_ARG ( over^ start_ARG bold_italic_w end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT over^ start_ARG bold_italic_w end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_ARG start_ARG ( ∥ over^ start_ARG bold_italic_w end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ) ( divide start_ARG italic_N end_ARG start_ARG italic_γ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_ARG - 1 ) end_ARG . (69)

5 Numerical examples

5.1 Error-minimization problem: system identification

We consider the problem of identification of an acoustic impulse response, where x(t)𝑥𝑡x(t)italic_x ( italic_t ) is an AR(1) process, i.e., x(t)=ax(t1)+u(t)𝑥𝑡𝑎𝑥𝑡1𝑢𝑡x(t)=ax(t-1)+u(t)italic_x ( italic_t ) = italic_a italic_x ( italic_t - 1 ) + italic_u ( italic_t ) and u(t)𝑢𝑡u(t)italic_u ( italic_t ) is generated from a zero-mean unit-variance white Gaussian noise; we use a=0.9𝑎0.9a=0.9italic_a = 0.9. The impulse response 𝒉=[h(0),,h(M1)]𝖳𝒉superscript0𝑀1𝖳\boldsymbol{h}=[h(0),\ldots,h(M-1)]^{\mathsf{T}}bold_italic_h = [ italic_h ( 0 ) , … , italic_h ( italic_M - 1 ) ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT with length M=600𝑀600M=600italic_M = 600, shown in Fig. 1, is calculated using software [17] for a room of dimensions (5,4,6)546(5,4,6)( 5 , 4 , 6 ) m, the source in position (2,3.5,2)23.52(2,3.5,2)( 2 , 3.5 , 2 ) m, the receiver in position (2,1.5,1)21.51(2,1.5,1)( 2 , 1.5 , 1 ) m, a sampling rate of 8888 kHz, and a reverberation time of 225225225225 ms. The desired output is obtained as d(t)=𝒉𝖳𝒙(t)+e(t)𝑑𝑡superscript𝒉𝖳𝒙𝑡𝑒𝑡d(t)=\boldsymbol{h}^{\mathsf{T}}\boldsymbol{x}(t)+e(t)italic_d ( italic_t ) = bold_italic_h start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_italic_x ( italic_t ) + italic_e ( italic_t ), with e(t)𝑒𝑡e(t)italic_e ( italic_t ) being a zero-mean Gaussian noise with variance vesubscriptsuperscript𝑣𝑒v^{*}_{e}italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, and

𝒙(t)=[x(t),x(t1),,x(tM+1)]𝖳.𝒙𝑡superscript𝑥𝑡𝑥𝑡1𝑥𝑡𝑀1𝖳\displaystyle\boldsymbol{x}(t)=[x(t),x(t-1),\dots,x(t-M+1)]^{\mathsf{T}}.bold_italic_x ( italic_t ) = [ italic_x ( italic_t ) , italic_x ( italic_t - 1 ) , … , italic_x ( italic_t - italic_M + 1 ) ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT . (70)

We define the SNR as

SNR =10log10(𝔼[|𝒉𝖳𝒙(t)|2]ve)[dB].absent10subscript10𝔼delimited-[]superscriptsuperscript𝒉𝖳𝒙𝑡2subscriptsuperscript𝑣𝑒delimited-[]dB\displaystyle=10\log_{10}\left(\frac{\mathds{E}[|\boldsymbol{h}^{\mathsf{T}}% \boldsymbol{x}(t)|^{2}]}{v^{*}_{e}}\right)~{}[{\textnormal{dB}}].= 10 roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( divide start_ARG blackboard_E [ | bold_italic_h start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_italic_x ( italic_t ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG italic_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_ARG ) [ dB ] . (71)
Refer to caption
(a)

Fig. 1: Impulse response 𝒉𝒉\boldsymbol{h}bold_italic_h generated using [17].

Although we use real variables, it is easy to see that the formulas to find α𝛼\alphaitalic_α, derived in Sec. 3, are the same.

The quality of the estimate 𝒘^𝒘^(α)^𝒘^𝒘𝛼\hat{\boldsymbol{w}}\equiv\hat{\boldsymbol{w}}(\alpha)over^ start_ARG bold_italic_w end_ARG ≡ over^ start_ARG bold_italic_w end_ARG ( italic_α ) will be assessed through the misalignment (a relative estimation error) of the impulse response:

𝗆(α)𝗆𝛼\displaystyle\mathsf{m}(\alpha)sansserif_m ( italic_α ) =20log10(𝒘^𝒉2𝒉2)[dB].absent20subscript10subscriptnorm^𝒘𝒉2subscriptnorm𝒉2delimited-[]dB\displaystyle=20\log_{10}\left(\frac{\|\hat{\boldsymbol{w}}-\boldsymbol{h}\|_{% 2}}{\|\boldsymbol{h}\|_{2}}\right)[{\textnormal{dB}}].= 20 roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( divide start_ARG ∥ over^ start_ARG bold_italic_w end_ARG - bold_italic_h ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG ∥ bold_italic_h ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) [ dB ] . (72)

A simple, worst-case metric, is obtained by setting α=𝛼\alpha=\inftyitalic_α = ∞, for which 𝒘^=𝟎^𝒘0\hat{\boldsymbol{w}}=\boldsymbol{0}over^ start_ARG bold_italic_w end_ARG = bold_0, and thus we have 𝗆()=0dB𝗆0dB\mathsf{m}(\infty)=0~{}{\textnormal{dB}}sansserif_m ( ∞ ) = 0 dB. The best-case reference is obtained with “oracle”-given regularization parameter and its corresponding misalignment:

α^^𝛼\displaystyle\hat{\alpha}over^ start_ARG italic_α end_ARG =argminα𝗆(α),absentsubscriptargmin𝛼𝗆𝛼\displaystyle=\mathop{\mathrm{argmin}}_{\alpha}\mathsf{m}(\alpha),= roman_argmin start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT sansserif_m ( italic_α ) , (73)
𝗆^^𝗆\displaystyle\hat{\mathsf{m}}over^ start_ARG sansserif_m end_ARG =𝗆(α^).absent𝗆^𝛼\displaystyle=\mathsf{m}(\hat{\alpha}).= sansserif_m ( over^ start_ARG italic_α end_ARG ) . (74)
Refer to caption
Fig. 2: Values α(i)superscript𝛼𝑖\alpha^{(i)}italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT obtained via fixed-point iteration (49) (dashed-dotted lines) and via Gull-MacKay iteration (52) (dashed lines) in different realizations of the data using α(0)=0.5superscript𝛼00.5\alpha^{(0)}=0.5italic_α start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = 0.5; N=1000𝑁1000N=1000italic_N = 1000, M=600𝑀600M=600italic_M = 600. Solid lines are constant, as they indicate an oracle-given α^^𝛼\hat{\alpha}over^ start_ARG italic_α end_ARG. Thick lines indicate averages over realizations shown with thin lines.

Fig. 2 illustrates the convergence of fixed point iterations (49) and (52): it shows the evolution of α(i)superscript𝛼𝑖\alpha^{(i)}italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT with the starting point α(0)=0.5superscript𝛼00.5\alpha^{(0)}=0.5italic_α start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = 0.5, chosen to be far from the oracle-given α^^𝛼\hat{\alpha}over^ start_ARG italic_α end_ARG. We evaluate various realizations of the data with N=1000𝑁1000N=1000italic_N = 1000 and M=600𝑀600M=600italic_M = 600, and note that, beyond I=5𝐼5I=5italic_I = 5, for practical purposes, convergence may be declared for Gull-MacKay, while the fixed-point iteration (49) is slower, requiring approximately twice as many iterations.

All the results we show in the following are thus based on the Gull-MacKay iteration, with I=5𝐼5I=5italic_I = 5 and α(0)=0.5superscript𝛼00.5\alpha^{(0)}=0.5italic_α start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = 0.5. We verified that, in all displayed cases, the condition (46) was not violated.555This is because we decided to use N>M𝑁𝑀N>Mitalic_N > italic_M which is a practical approach to the system identification. However, for smaller N𝑁Nitalic_N, the condition (46) may be violated.

The results, shown in Fig. 3(a)(c), are consistent with intuition: by increasing N𝑁Nitalic_N and SNR, we decrease the estimation error when the oracle and the fixed-point (Gull-MacKay) iteration regularization is used. In fact, the difference between the regularization parameter α(I)superscript𝛼𝐼\alpha^{(I)}italic_α start_POSTSUPERSCRIPT ( italic_I ) end_POSTSUPERSCRIPT and the oracle-given value α^^𝛼\hat{\alpha}over^ start_ARG italic_α end_ARG is rather small, making the iterative estimation (52) an attractive tool for the choice of α𝛼\alphaitalic_α.

Moreover, we observe that (i) the HKB and the Ledoit-Wolf regularization methods may yield worse performance than 𝗆()=0𝗆0\mathsf{m}(\infty)=0sansserif_m ( ∞ ) = 0 dB, which is the trivial performance limit. This is well understood for N<M𝑁𝑀N<Mitalic_N < italic_M, because then αHKB=0subscript𝛼HKB0\alpha_{{\textnormal{HKB}}}=0italic_α start_POSTSUBSCRIPT HKB end_POSTSUBSCRIPT = 0, i.e., the solution is not regularized; see our comments at the end of Sec. 2.1.2. Moreover, for low SNR, the HKB regularization requires a substantial number of samples (approx. N>1600𝑁1600N>1600italic_N > 1600) to merely attain 𝗆(α)=0dB𝗆𝛼0dB\mathsf{m}(\alpha)=0~{}{\textnormal{dB}}sansserif_m ( italic_α ) = 0 dB, (ii) the Ledoit-Wolf regularization does not adapt to the data, e.g., for large SNR it fails to outperform the non-regularized (α=0𝛼0\alpha=0italic_α = 0) solution. This is not entirely surprising because the Ledoit-Wolf method does not take into account the cross-correlation 𝒓𝒙dsubscript𝒓𝒙𝑑\boldsymbol{r}_{\boldsymbol{x}d}bold_italic_r start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Fig. 3: Results obtained for different regularization methods, and SNR equal to (a,b) 00dB and (c,d) 20202020dB. In (a,c) we show the misalignment 𝗆(α(I))𝗆superscript𝛼𝐼\mathsf{m}(\alpha^{(I)})sansserif_m ( italic_α start_POSTSUPERSCRIPT ( italic_I ) end_POSTSUPERSCRIPT ) (72), and 𝗆^^𝗆\hat{\mathsf{m}}over^ start_ARG sansserif_m end_ARG given by (74), while the corresponding values of α(I)superscript𝛼𝐼\alpha^{(I)}italic_α start_POSTSUPERSCRIPT ( italic_I ) end_POSTSUPERSCRIPT and α^^𝛼\hat{\alpha}over^ start_ARG italic_α end_ARG are shown in (b,d). In (b) and (d), thick lines are averages of realizations shown with thin lines.

5.2 Interference suppression problem: beamforming

We consider the antenna-processing scenario, in which the signal 𝒙(t)𝒙𝑡\boldsymbol{x}(t)bold_italic_x ( italic_t ) (4) is defined as

𝒙(t)=k=1Kdk(t)𝒂(ϕk)+𝒆(t),𝒙𝑡superscriptsubscript𝑘1𝐾subscript𝑑𝑘𝑡𝒂subscriptitalic-ϕ𝑘𝒆𝑡\displaystyle\boldsymbol{x}(t)=\sum_{k=1}^{K}d_{k}(t)\boldsymbol{a}(\phi_{k})+% \boldsymbol{e}(t),bold_italic_x ( italic_t ) = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) bold_italic_a ( italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + bold_italic_e ( italic_t ) , (75)

where 𝒆(t)𝒆𝑡\boldsymbol{e}(t)bold_italic_e ( italic_t ) is a zero-mean, circular complex Gaussian noise with covariance matrix 𝔼[𝒆(t)𝒆H(t)]=I𝔼delimited-[]𝒆𝑡superscript𝒆H𝑡I\mathds{E}[\boldsymbol{e}(t)\boldsymbol{e}^{\mathrm{H}}(t)]={\textnormal{{I}}}blackboard_E [ bold_italic_e ( italic_t ) bold_italic_e start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT ( italic_t ) ] = I, and dk(t)subscript𝑑𝑘𝑡d_{k}(t)italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) are zero-mean, unit-variance, i.i.d. Gaussian variables modeling signals, each with power σk2=𝔼[|dk(t)|2]superscriptsubscript𝜎𝑘2𝔼delimited-[]superscriptsubscript𝑑𝑘𝑡2\sigma_{k}^{2}=\mathds{E}[|d_{k}(t)|^{2}]italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = blackboard_E [ | italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ], and the steering vector for the angle ϕitalic-ϕ\phiitalic_ϕ is defined as

𝒂(ϕ)𝒂italic-ϕ\displaystyle\boldsymbol{a}(\phi)bold_italic_a ( italic_ϕ ) =[1,ejπcos(ϕ),ej2πcos(ϕ),,ej(M1)πcos(ϕ)]𝖳,absentsuperscript1superscripte𝑗𝜋italic-ϕsuperscripte𝑗2𝜋italic-ϕsuperscripte𝑗𝑀1𝜋italic-ϕ𝖳\displaystyle=[1,\mathrm{e}^{-j\pi\cos(\phi)},\mathrm{e}^{-j2\pi\cos(\phi)},% \dots,\mathrm{e}^{-j(M-1)\pi\cos(\phi)}]^{\mathsf{T}},= [ 1 , roman_e start_POSTSUPERSCRIPT - italic_j italic_π roman_cos ( italic_ϕ ) end_POSTSUPERSCRIPT , roman_e start_POSTSUPERSCRIPT - italic_j 2 italic_π roman_cos ( italic_ϕ ) end_POSTSUPERSCRIPT , … , roman_e start_POSTSUPERSCRIPT - italic_j ( italic_M - 1 ) italic_π roman_cos ( italic_ϕ ) end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT , (76)

that is, we assume that 𝒙(t)𝒙𝑡\boldsymbol{x}(t)bold_italic_x ( italic_t ) is acquired at a linear antenna array with M𝑀Mitalic_M elements spaced at half-wavelength [1, Ch. 6.5].

The true covariance matrix is thus calculated as

R¯𝒙subscript¯R𝒙\displaystyle\overline{{\textnormal{{R}}}}_{\boldsymbol{x}}over¯ start_ARG R end_ARG start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT =k=1Kσk2𝒂(ϕk)𝒂(ϕk)H+I.absentsuperscriptsubscript𝑘1𝐾superscriptsubscript𝜎𝑘2𝒂subscriptitalic-ϕ𝑘𝒂superscriptsubscriptitalic-ϕ𝑘HI\displaystyle=\sum_{k=1}^{K}\sigma_{k}^{2}\boldsymbol{a}(\phi_{k})\boldsymbol{% a}(\phi_{k})^{\mathrm{H}}+{\textnormal{{I}}}.= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_a ( italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_a ( italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT + I . (77)

In the beamforming problem, our goal is to suppress the interference signals dl(t),lksubscript𝑑𝑙𝑡𝑙𝑘d_{l}(t),l\neq kitalic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_l ≠ italic_k using the filter 𝒘^ksubscript^𝒘𝑘\hat{\boldsymbol{w}}_{k}over^ start_ARG bold_italic_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT found through (6), where we know the steering vector of the signal of interest 𝒂=𝒂(ϕk),k{1,,K}formulae-sequence𝒂𝒂subscriptitalic-ϕ𝑘𝑘1𝐾\boldsymbol{a}=\boldsymbol{a}(\phi_{k}),k\in\{1,\dots,K\}bold_italic_a = bold_italic_a ( italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , italic_k ∈ { 1 , … , italic_K }. The quality of interference suppression is measured by the signal-to-interference-plus-noise ratio (SINR) at the output of the filter, calculated as

𝖲𝖨𝖭𝖱ksubscript𝖲𝖨𝖭𝖱𝑘\displaystyle\mathsf{SINR}_{k}sansserif_SINR start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =𝔼[|dk(t)|2]𝔼[|𝒘^kH𝒙(t)|2]𝔼[|dk(t)|2]=σk2𝒘^kHR¯𝒙𝒘^kσk2.absent𝔼delimited-[]superscriptsubscript𝑑𝑘𝑡2𝔼delimited-[]superscriptsuperscriptsubscript^𝒘𝑘H𝒙𝑡2𝔼delimited-[]superscriptsubscript𝑑𝑘𝑡2subscriptsuperscript𝜎2𝑘superscriptsubscript^𝒘𝑘Hsubscript¯R𝒙subscript^𝒘𝑘subscriptsuperscript𝜎2𝑘\displaystyle=\frac{\mathds{E}[|d_{k}(t)|^{2}]}{\mathds{E}[|\hat{\boldsymbol{w% }}_{k}^{\mathrm{H}}\boldsymbol{x}(t)|^{2}]-\mathds{E}[|d_{k}(t)|^{2}]}=\frac{% \sigma^{2}_{k}}{\hat{\boldsymbol{w}}_{k}^{\mathrm{H}}\overline{{\textnormal{{R% }}}}_{\boldsymbol{x}}\hat{\boldsymbol{w}}_{k}-\sigma^{2}_{k}}.= divide start_ARG blackboard_E [ | italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG blackboard_E [ | over^ start_ARG bold_italic_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_italic_x ( italic_t ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - blackboard_E [ | italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG = divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG bold_italic_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT over¯ start_ARG R end_ARG start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT over^ start_ARG bold_italic_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG . (78)

In this example, we use K=3𝐾3K=3italic_K = 3, [σ12,σ22,σ32]=[20,10,5]dBsubscriptsuperscript𝜎21subscriptsuperscript𝜎22subscriptsuperscript𝜎2320105dB[\sigma^{2}_{1},\sigma^{2}_{2},\sigma^{2}_{3}]=[20,10,5]{\textnormal{dB}}[ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ] = [ 20 , 10 , 5 ] dB, and [ϕ1,ϕ2,ϕ3]=[0.2π,0.3π,0.6π]subscriptitalic-ϕ1subscriptitalic-ϕ2subscriptitalic-ϕ30.2𝜋0.3𝜋0.6𝜋[\phi_{1},\phi_{2},\phi_{3}]=[0.2\pi,0.3\pi,0.6\pi][ italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ] = [ 0.2 italic_π , 0.3 italic_π , 0.6 italic_π ].

Refer to caption
Fig. 4: Empirical frequency of violating condition (46) in the interference suppression example with 10000 independent realizations.

We show in Fig. 4 the empirical frequency of violating condition (46) obtained from 10000 data realizations. In these cases, L(α)superscript𝐿𝛼L^{\prime}(\alpha)italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ) often has no finite roots, i.e., αML=subscript𝛼ML\alpha_{{\textnormal{ML}}}=\inftyitalic_α start_POSTSUBSCRIPT ML end_POSTSUBSCRIPT = ∞, 𝒖^=𝟎^𝒖0\hat{\boldsymbol{u}}=\boldsymbol{0}over^ start_ARG bold_italic_u end_ARG = bold_0 and 𝒘^=𝒂/M^𝒘𝒂𝑀\hat{\boldsymbol{w}}=\boldsymbol{a}/Mover^ start_ARG bold_italic_w end_ARG = bold_italic_a / italic_M. In other words, there are cases where the optimal solution 𝒘^^𝒘\hat{\boldsymbol{w}}over^ start_ARG bold_italic_w end_ARG is a matched filter.

To understand why this may happen, we recall that the matched filter is optimal in the presence of white Gaussian noise. This clarifies why the probability of obtaining such a solution is larger for high-energy target signal (e.g., k=1𝑘1k=1italic_k = 1): this is when the interference is weak and may, indeed, “appear like” white noise, especially for small N𝑁Nitalic_N. On the other hand, for weak signals (e.g., k=3𝑘3k=3italic_k = 3), the interference (e.g., from the signal k=1𝑘1k=1italic_k = 1) is strong and will emerge from the empirical covariance matrix R𝒙subscriptR𝒙{\textnormal{{R}}}_{\boldsymbol{x}}R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT, even for relatively small N𝑁Nitalic_N.

Refer to caption
(a)
Refer to caption
(b)
Fig. 5: Empirical frequency of the number of roots in the interference suppression example across different signals of interest for 10000 independent realizations of data, M=10𝑀10M=10italic_M = 10 and (a) N=10000𝑁10000N=10000italic_N = 10000, (b) N=10𝑁10N=10italic_N = 10.

The empirical evaluation of the number of roots of L(α)superscript𝐿𝛼L^{\prime}(\alpha)italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ), is shown in Fig. 5 for large and small number of samples N𝑁Nitalic_N, leads to the following observations: (i) for large N𝑁Nitalic_N, the vast majority of cases produced a unique and finite root αMLsubscript𝛼ML\alpha_{{\textnormal{ML}}}italic_α start_POSTSUBSCRIPT ML end_POSTSUBSCRIPT, which was obtained here through the Gull-MacKay iteration (69) (since there are two roots, the first one is the minimum, see Proposition 1b); (ii) for small N𝑁Nitalic_N, frequent cases are when αML=subscript𝛼ML\alpha_{{\textnormal{ML}}}=\inftyitalic_α start_POSTSUBSCRIPT ML end_POSTSUBSCRIPT = ∞ (when there is one root) or when there are multiple finite roots; it occurs relatively frequently, especially for strong target signals k{1,2}𝑘12k\in\{1,2\}italic_k ∈ { 1 , 2 }; (iii) in the presence of multiple minima, the matched filter solution α=𝛼\alpha=\inftyitalic_α = ∞ can be competitive with αMLsubscript𝛼ML\alpha_{{\textnormal{ML}}}italic_α start_POSTSUBSCRIPT ML end_POSTSUBSCRIPT, i.e., L(αML)L()𝐿subscript𝛼ML𝐿L(\alpha_{{\textnormal{ML}}})\approx L(\infty)italic_L ( italic_α start_POSTSUBSCRIPT ML end_POSTSUBSCRIPT ) ≈ italic_L ( ∞ ).

To handle the multiple-roots situation, without explicitly identifying them all (which may be numerically tedious), we propose a two-step approach: First, we find the root αMLsubscript𝛼ML\alpha_{{\textnormal{ML}}}italic_α start_POSTSUBSCRIPT ML end_POSTSUBSCRIPT using the Gull-MacKay iteration (69). Next, we verify if L(αML)>L()𝐿subscript𝛼ML𝐿L(\alpha_{{\textnormal{ML}}})>L(\infty)italic_L ( italic_α start_POSTSUBSCRIPT ML end_POSTSUBSCRIPT ) > italic_L ( ∞ ), in which case we make a replacement αMLsubscript𝛼ML\alpha_{{\textnormal{ML}}}\leftarrow\inftyitalic_α start_POSTSUBSCRIPT ML end_POSTSUBSCRIPT ← ∞, otherwise we keep αMLsubscript𝛼ML\alpha_{{\textnormal{ML}}}italic_α start_POSTSUBSCRIPT ML end_POSTSUBSCRIPT unchanged. In fact, this heuristic is easy to implement because, from (41), we have L()=Nlogσ~d2𝐿𝑁superscriptsubscript~𝜎𝑑2L(\infty)=N\log\tilde{\sigma}_{d}^{2}italic_L ( ∞ ) = italic_N roman_log over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

In Fig. 6, we show 𝖲𝖨𝖭𝖱k,k=1,2,3formulae-sequencesubscript𝖲𝖨𝖭𝖱𝑘𝑘123\mathsf{SINR}_{k},k=1,2,3sansserif_SINR start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_k = 1 , 2 , 3 as a function of N𝑁Nitalic_N, for different regularization methods.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Fig. 6: SINR (78) obtained for a) k=1𝑘1k=1italic_k = 1, b) k=2𝑘2k=2italic_k = 2, and c) k=3𝑘3k=3italic_k = 3.

Similarly, the values of the regularization parameter are shown in Fig. 7. In this case, the thick line corresponds to the median of the regularization parameter, as it gracefully deals with the cases when αML=subscript𝛼ML\alpha_{{\textnormal{ML}}}=\inftyitalic_α start_POSTSUBSCRIPT ML end_POSTSUBSCRIPT = ∞.

We observe that (i) the proposed estimation method is very close to the oracle solutions, and clearly outperforms other methods, especially for N>M𝑁𝑀N>Mitalic_N > italic_M and for strong target signal k=1𝑘1k=1italic_k = 1, (ii) In many cases, for relatively small N𝑁Nitalic_N and high target signal power (k=1𝑘1k=1italic_k = 1), the optimal regularization is αML=subscript𝛼ML\alpha_{{\textnormal{ML}}}=\inftyitalic_α start_POSTSUBSCRIPT ML end_POSTSUBSCRIPT = ∞, which means that the optimal solution is a matched filter, see (68), (iii) as in Sec. 5.1, the HKB regularization approaches the optimal solution only for sufficiently large N𝑁Nitalic_N, and (iv) the Ledoit-Wolf regularization parameters is independent of the steering vector 𝒂𝒂\boldsymbol{a}bold_italic_a (see Fig. 7) which affects its performance; this illustrates well the idea that, in the MVDR problem, the regularization should take into account the steering vector and not only the covariance matrix R𝒙subscriptR𝒙{\textnormal{{R}}}_{\boldsymbol{x}}R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Fig. 7: The regularization parameter obtained for a) k=1𝑘1k=1italic_k = 1, b) k=2𝑘2k=2italic_k = 2, and c) k=3𝑘3k=3italic_k = 3.

6 Conclusions

In this work, we presented a method, adopted from the area of statistical machine learning, to find the regularization parameter in two main classes of linear MMSE filters applied in (i) the error-minimization and (ii) the interference suppression problems. Using a probabilistic formulation, we estimate the parameters of the model from the ML principle, where the regularization parameter is found using a few steps of the fixed-point iteration. We also provide data-dependent conditions for the existence of the finite ML solution and show heuristics which deal well with multiple ML solutions.

Numerical examples indicate that the simple iterative solution we show is remarkably close to the optimal regularization parameter.

We compare the proposed solution with other methods known in the literature. We show that the HKB method [13] may be seen as a simplified version of our approach and that the Ledoit-Wolf shrinkage [12] fails to appropriately choose the regularization, which is due to its explicit independence from the desired signal.

Acknowledgments

This work was supported in part by the Fonds de recherche du Québec (FRQ) - Nature et technologies under the Doctoral research scholaships B2X 2024-2025 program, file number 342496, recipient Daniel Gomes de Pinho Zanco.

Appendix A Proof of Proposition 1

Considering (42), we note that f(α)𝑓𝛼f(\alpha)italic_f ( italic_α ) and g(α)𝑔𝛼g(\alpha)italic_g ( italic_α ) shown in (43) and (44) are bounded and positive, therefore, their ratio is also bounded and positive.

Since limα0γ(α)/α=subscript𝛼0𝛾𝛼𝛼\lim_{\alpha\rightarrow 0}\gamma(\alpha)/\alpha=\inftyroman_lim start_POSTSUBSCRIPT italic_α → 0 end_POSTSUBSCRIPT italic_γ ( italic_α ) / italic_α = ∞, for a sufficiently small α𝛼\alphaitalic_α, we have L(α)<0superscript𝐿𝛼0L^{\prime}(\alpha)<0italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ) < 0 (i.e., α,α<α,L(α)<0formulae-sequencesuperscript𝛼for-all𝛼superscript𝛼superscript𝐿𝛼0\exists\alpha^{*},\forall\alpha<\alpha^{*},L^{\prime}(\alpha)<0∃ italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , ∀ italic_α < italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ) < 0). We also have that

limαL(α)subscript𝛼superscript𝐿𝛼\displaystyle\lim_{\alpha\rightarrow\infty}L^{\prime}(\alpha)roman_lim start_POSTSUBSCRIPT italic_α → ∞ end_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ) =limαNf(α)αg(α)γ(α)αg(α)absentsubscript𝛼𝑁𝑓𝛼𝛼𝑔𝛼𝛾𝛼𝛼𝑔𝛼\displaystyle=\lim_{\alpha\rightarrow\infty}\frac{Nf(\alpha)\alpha-g(\alpha)% \gamma(\alpha)}{\alpha g(\alpha)}= roman_lim start_POSTSUBSCRIPT italic_α → ∞ end_POSTSUBSCRIPT divide start_ARG italic_N italic_f ( italic_α ) italic_α - italic_g ( italic_α ) italic_γ ( italic_α ) end_ARG start_ARG italic_α italic_g ( italic_α ) end_ARG (79)
=limαNαm=1Mz𝒙d,m2(λm+α)2[σ~d2m=1Mz𝒙d,m2λm+α][m=1Mλmλm+α]α[σ~d2m=1Mz𝒙d,m2λm+α],absentsubscript𝛼𝑁𝛼superscriptsubscript𝑚1𝑀superscriptsubscript𝑧𝒙𝑑𝑚2superscriptsubscript𝜆𝑚𝛼2delimited-[]superscriptsubscript~𝜎𝑑2superscriptsubscript𝑚1𝑀superscriptsubscript𝑧𝒙𝑑𝑚2subscript𝜆𝑚𝛼delimited-[]superscriptsubscript𝑚1𝑀subscript𝜆𝑚subscript𝜆𝑚𝛼𝛼delimited-[]superscriptsubscript~𝜎𝑑2superscriptsubscript𝑚1𝑀superscriptsubscript𝑧𝒙𝑑𝑚2subscript𝜆𝑚𝛼\displaystyle=\lim_{\alpha\rightarrow\infty}\frac{N\alpha\displaystyle\sum_{m=% 1}^{M}\frac{z_{\boldsymbol{x}d,m}^{2}}{(\lambda_{m}+\alpha)^{2}}-\left[\tilde{% \sigma}_{d}^{2}-\displaystyle\sum_{m=1}^{M}\frac{z_{\boldsymbol{x}d,m}^{2}}{% \lambda_{m}+\alpha}\right]\left[\displaystyle\sum_{m=1}^{M}\frac{\lambda_{m}}{% \lambda_{m}+\alpha}\right]}{\alpha\left[\tilde{\sigma}_{d}^{2}-\displaystyle% \sum_{m=1}^{M}\frac{z_{\boldsymbol{x}d,m}^{2}}{\lambda_{m}+\alpha}\right]},= roman_lim start_POSTSUBSCRIPT italic_α → ∞ end_POSTSUBSCRIPT divide start_ARG italic_N italic_α ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT divide start_ARG italic_z start_POSTSUBSCRIPT bold_italic_x italic_d , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - [ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT divide start_ARG italic_z start_POSTSUBSCRIPT bold_italic_x italic_d , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_α end_ARG ] [ ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_α end_ARG ] end_ARG start_ARG italic_α [ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT divide start_ARG italic_z start_POSTSUBSCRIPT bold_italic_x italic_d , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_α end_ARG ] end_ARG ,
=limαNm=1Mz𝒙d,m2(λm+α)21α[σ~d2m=1Mz𝒙d,m2λm+α][m=1Mλmλm+α]σ~d2limαm=1Mz𝒙d,m2λm+α,absentsubscript𝛼𝑁superscriptsubscript𝑚1𝑀superscriptsubscript𝑧𝒙𝑑𝑚2superscriptsubscript𝜆𝑚𝛼21𝛼delimited-[]superscriptsubscript~𝜎𝑑2superscriptsubscript𝑚1𝑀superscriptsubscript𝑧𝒙𝑑𝑚2subscript𝜆𝑚𝛼delimited-[]superscriptsubscript𝑚1𝑀subscript𝜆𝑚subscript𝜆𝑚𝛼superscriptsubscript~𝜎𝑑2subscript𝛼superscriptsubscript𝑚1𝑀superscriptsubscript𝑧𝒙𝑑𝑚2subscript𝜆𝑚𝛼\displaystyle=\frac{\displaystyle\lim_{\alpha\rightarrow\infty}N\displaystyle% \sum_{m=1}^{M}\frac{z_{\boldsymbol{x}d,m}^{2}}{(\lambda_{m}+\alpha)^{2}}-% \displaystyle\frac{1}{\alpha}\left[\tilde{\sigma}_{d}^{2}-\displaystyle\sum_{m% =1}^{M}\frac{z_{\boldsymbol{x}d,m}^{2}}{\lambda_{m}+\alpha}\right]\left[% \displaystyle\sum_{m=1}^{M}\frac{\lambda_{m}}{\lambda_{m}+\alpha}\right]}{% \tilde{\sigma}_{d}^{2}-\displaystyle\lim_{\alpha\rightarrow\infty}% \displaystyle\sum_{m=1}^{M}\frac{z_{\boldsymbol{x}d,m}^{2}}{\lambda_{m}+\alpha% }},= divide start_ARG roman_lim start_POSTSUBSCRIPT italic_α → ∞ end_POSTSUBSCRIPT italic_N ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT divide start_ARG italic_z start_POSTSUBSCRIPT bold_italic_x italic_d , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_α end_ARG [ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT divide start_ARG italic_z start_POSTSUBSCRIPT bold_italic_x italic_d , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_α end_ARG ] [ ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_α end_ARG ] end_ARG start_ARG over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_lim start_POSTSUBSCRIPT italic_α → ∞ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT divide start_ARG italic_z start_POSTSUBSCRIPT bold_italic_x italic_d , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_α end_ARG end_ARG ,
=0,absent0\displaystyle=0,= 0 ,

thus \infty is a root of L(α)superscript𝐿𝛼L^{\prime}(\alpha)italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ).

Then, from intermediate value theorem, L(α)superscript𝐿𝛼L^{\prime}(\alpha)italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ) has at least one finite root if L(α)>0superscript𝐿𝛼0L^{\prime}(\alpha)>0italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ) > 0 for a sufficiently large α𝛼\alphaitalic_α, i.e., when

Nαf(α)γ(α)g(α)>1.𝑁𝛼𝑓𝛼𝛾𝛼𝑔𝛼1\displaystyle N\frac{\alpha f(\alpha)}{\gamma(\alpha)g(\alpha)}>1.italic_N divide start_ARG italic_α italic_f ( italic_α ) end_ARG start_ARG italic_γ ( italic_α ) italic_g ( italic_α ) end_ARG > 1 . (80)

By taking the limit as α𝛼\alphaitalic_α tends to \infty on both sides, we can evaluate if L(α)superscript𝐿𝛼L^{\prime}(\alpha)italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ) is decreasing, such that if

Nlimααf(α)g(α)γ(α)𝑁subscript𝛼𝛼𝑓𝛼𝑔𝛼𝛾𝛼\displaystyle N\lim_{\alpha\rightarrow\infty}\frac{\alpha f(\alpha)}{g(\alpha)% \gamma(\alpha)}italic_N roman_lim start_POSTSUBSCRIPT italic_α → ∞ end_POSTSUBSCRIPT divide start_ARG italic_α italic_f ( italic_α ) end_ARG start_ARG italic_g ( italic_α ) italic_γ ( italic_α ) end_ARG >1absent1\displaystyle>1> 1 (81)
Nlimααf(α)γ(α)𝑁subscript𝛼𝛼𝑓𝛼𝛾𝛼\displaystyle N\lim_{\alpha\rightarrow\infty}\frac{\alpha f(\alpha)}{\gamma(% \alpha)}italic_N roman_lim start_POSTSUBSCRIPT italic_α → ∞ end_POSTSUBSCRIPT divide start_ARG italic_α italic_f ( italic_α ) end_ARG start_ARG italic_γ ( italic_α ) end_ARG >limαg(α)absentsubscript𝛼𝑔𝛼\displaystyle>\lim_{\alpha\rightarrow\infty}g(\alpha)> roman_lim start_POSTSUBSCRIPT italic_α → ∞ end_POSTSUBSCRIPT italic_g ( italic_α ) (82)
Nlimααm=1Mz𝒙d,m2(λm+α)2m=1Mλmλm+α𝑁subscript𝛼𝛼superscriptsubscript𝑚1𝑀superscriptsubscript𝑧𝒙𝑑𝑚2superscriptsubscript𝜆𝑚𝛼2superscriptsubscript𝑚1𝑀subscript𝜆𝑚subscript𝜆𝑚𝛼\displaystyle N\lim_{\alpha\rightarrow\infty}\frac{\alpha\displaystyle\sum_{m=% 1}^{M}\frac{z_{\boldsymbol{x}d,m}^{2}}{(\lambda_{m}+\alpha)^{2}}}{% \displaystyle\sum_{m=1}^{M}\frac{\lambda_{m}}{\lambda_{m}+\alpha}}italic_N roman_lim start_POSTSUBSCRIPT italic_α → ∞ end_POSTSUBSCRIPT divide start_ARG italic_α ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT divide start_ARG italic_z start_POSTSUBSCRIPT bold_italic_x italic_d , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_α end_ARG end_ARG >σ~d2absentsuperscriptsubscript~𝜎𝑑2\displaystyle>\tilde{\sigma}_{d}^{2}> over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (83)
Nlimαm=1Mz𝒙d,m2(λmα+1)2m=1Mλmλmα+1𝑁subscript𝛼superscriptsubscript𝑚1𝑀superscriptsubscript𝑧𝒙𝑑𝑚2superscriptsubscript𝜆𝑚𝛼12superscriptsubscript𝑚1𝑀subscript𝜆𝑚subscript𝜆𝑚𝛼1\displaystyle N\lim_{\alpha\rightarrow\infty}\frac{\displaystyle\sum_{m=1}^{M}% \frac{z_{\boldsymbol{x}d,m}^{2}}{(\frac{\lambda_{m}}{\alpha}+1)^{2}}}{% \displaystyle\sum_{m=1}^{M}\frac{\lambda_{m}}{\frac{\lambda_{m}}{\alpha}+1}}italic_N roman_lim start_POSTSUBSCRIPT italic_α → ∞ end_POSTSUBSCRIPT divide start_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT divide start_ARG italic_z start_POSTSUBSCRIPT bold_italic_x italic_d , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( divide start_ARG italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_α end_ARG + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG divide start_ARG italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_α end_ARG + 1 end_ARG end_ARG >σ~d2absentsuperscriptsubscript~𝜎𝑑2\displaystyle>\tilde{\sigma}_{d}^{2}> over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (84)
Nm=1Mz𝒙d,m2m=1Mλm𝑁superscriptsubscript𝑚1𝑀superscriptsubscript𝑧𝒙𝑑𝑚2superscriptsubscript𝑚1𝑀subscript𝜆𝑚\displaystyle N\frac{\displaystyle\sum_{m=1}^{M}z_{\boldsymbol{x}d,m}^{2}}{% \displaystyle\sum_{m=1}^{M}\lambda_{m}}italic_N divide start_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT bold_italic_x italic_d , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG >σ~d2absentsuperscriptsubscript~𝜎𝑑2\displaystyle>\tilde{\sigma}_{d}^{2}> over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (85)
N𝒓𝒙d2>σ~d2m=1Mλm,𝑁superscriptnormsubscript𝒓𝒙𝑑2superscriptsubscript~𝜎𝑑2superscriptsubscript𝑚1𝑀subscript𝜆𝑚\displaystyle N\|\boldsymbol{r}_{\boldsymbol{x}d}\|^{2}>\tilde{\sigma}_{d}^{2}% \sum_{m=1}^{M}\lambda_{m},italic_N ∥ bold_italic_r start_POSTSUBSCRIPT bold_italic_x italic_d end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , (86)

where (86) is the same as (46).

When (86) is true, L(α)superscript𝐿𝛼L^{\prime}(\alpha)italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ) changes sign at least once, and thus L(α)superscript𝐿𝛼L^{\prime}(\alpha)italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_α ) has at least 2 roots (one at \infty and the other at the sign change). If there are 3 roots, then (86) cannot be true, since L(0)<0superscript𝐿00L^{\prime}(0)<0italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 0 ) < 0 and L()=0superscript𝐿0L^{\prime}(\infty)=0italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( ∞ ) = 0, and three roots would require two sign changes. These observations can be extended to an arbitrary number of roots. In fact, (86) can only be true if the number of roots is even, and, since \infty is always a root, the condition also tells us if there is at least one finite root.

This finishes the proof.

Appendix B Derivation of γ𝛾\gammaitalic_γ in MVDR filter

To calculate γ=Tr[Iα(AR𝒙A+αI)1]𝛾Trdelimited-[]I𝛼superscriptsubscriptAR𝒙A𝛼I1\gamma={\textnormal{Tr}}[{\textnormal{{I}}}-\alpha({\textnormal{{A}}}{% \textnormal{{R}}}_{\boldsymbol{x}}{\textnormal{{A}}}+\alpha{\textnormal{{I}}})% ^{-1}]italic_γ = Tr [ I - italic_α ( bold_typewriter_A bold_typewriter_R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT A + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ], we find

Iα(AR𝒙A+αI)1I𝛼superscriptsubscriptAR𝒙A𝛼I1\displaystyle{\textnormal{{I}}}-\alpha({\textnormal{{A}}}{\textnormal{{R}}}_{% \boldsymbol{x}}{\textnormal{{A}}}+\alpha{\textnormal{{I}}})^{-1}I - italic_α ( bold_typewriter_A bold_typewriter_R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT A + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT =A(αR𝒙1+I𝒂𝒂H/M)1A,absentAsuperscript𝛼superscriptsubscriptR𝒙1I𝒂superscript𝒂H𝑀1A\displaystyle={\textnormal{{A}}}(\alpha{\textnormal{{R}}}_{\boldsymbol{x}}^{-1% }+{\textnormal{{I}}}-\boldsymbol{a}\boldsymbol{a}^{\mathrm{H}}/M)^{-1}{% \textnormal{{A}}},= A ( italic_α R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + I - bold_italic_a bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT / italic_M ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT A , (87)
=A[(R𝒙+αI)R𝒙1𝒂𝒂H/M]1A,absentAsuperscriptdelimited-[]subscriptR𝒙𝛼IsuperscriptsubscriptR𝒙1𝒂superscript𝒂H𝑀1A\displaystyle={\textnormal{{A}}}[({\textnormal{{R}}}_{\boldsymbol{x}}+\alpha{% \textnormal{{I}}}){\textnormal{{R}}}_{\boldsymbol{x}}^{-1}-\boldsymbol{a}% \boldsymbol{a}^{\mathrm{H}}/M]^{-1}{\textnormal{{A}}},= A [ ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - bold_italic_a bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT / italic_M ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT A , (88)
=A[R𝒙(R𝒙+αI)1+R𝒙(R𝒙+αI)1𝒂𝒂HR𝒙(R𝒙+αI)1M𝒂HR𝒙(R𝒙+αI)1𝒂]A,absentAdelimited-[]subscriptR𝒙superscriptsubscriptR𝒙𝛼I1subscriptR𝒙superscriptsubscriptR𝒙𝛼I1𝒂superscript𝒂HsubscriptR𝒙superscriptsubscriptR𝒙𝛼I1𝑀superscript𝒂HsubscriptR𝒙superscriptsubscriptR𝒙𝛼I1𝒂A\displaystyle={\textnormal{{A}}}\left[{\textnormal{{R}}}_{\boldsymbol{x}}({% \textnormal{{R}}}_{\boldsymbol{x}}+\alpha{\textnormal{{I}}})^{-1}+\frac{{% \textnormal{{R}}}_{\boldsymbol{x}}({\textnormal{{R}}}_{\boldsymbol{x}}+\alpha{% \textnormal{{I}}})^{-1}\boldsymbol{a}\boldsymbol{a}^{\mathrm{H}}{\textnormal{{% R}}}_{\boldsymbol{x}}({\textnormal{{R}}}_{\boldsymbol{x}}+\alpha{\textnormal{{% I}}})^{-1}}{M-\boldsymbol{a}^{\mathrm{H}}{\textnormal{{R}}}_{\boldsymbol{x}}({% \textnormal{{R}}}_{\boldsymbol{x}}+\alpha{\textnormal{{I}}})^{-1}\boldsymbol{a% }}\right]{\textnormal{{A}}},= A [ R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + divide start_ARG R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_a bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_M - bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_a end_ARG ] A , (89)
=A[S+S𝒂𝒂HSM𝒂HS𝒂]A,absentAdelimited-[]SS𝒂superscript𝒂HS𝑀superscript𝒂HS𝒂A\displaystyle={\textnormal{{A}}}\left[{\textnormal{{S}}}+\frac{{\textnormal{{S% }}}\boldsymbol{a}\boldsymbol{a}^{\mathrm{H}}{\textnormal{{S}}}}{M-\boldsymbol{% a}^{\mathrm{H}}{\textnormal{{S}}}\boldsymbol{a}}\right]{\textnormal{{A}}},= A [ S + divide start_ARG S bold_italic_a bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT S end_ARG start_ARG italic_M - bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT S bold_italic_a end_ARG ] A , (90)

where S=R𝒙(R𝒙+αI)1=Iα(R𝒙+αI)1SsubscriptR𝒙superscriptsubscriptR𝒙𝛼I1I𝛼superscriptsubscriptR𝒙𝛼I1{\textnormal{{S}}}={\textnormal{{R}}}_{\boldsymbol{x}}({\textnormal{{R}}}_{% \boldsymbol{x}}+\alpha{\textnormal{{I}}})^{-1}={\textnormal{{I}}}-\alpha({% \textnormal{{R}}}_{\boldsymbol{x}}+\alpha{\textnormal{{I}}})^{-1}S = R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = I - italic_α ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

Next, using the fact that AA=AAAA{\textnormal{{A}}}{\textnormal{{A}}}={\textnormal{{A}}}bold_typewriter_A bold_typewriter_A = A,

γ𝛾\displaystyle\gammaitalic_γ =Tr(SA)+Tr(S𝒂𝒂HSA)M𝒂HS𝒂,absentTrSATrS𝒂superscript𝒂HSA𝑀superscript𝒂HS𝒂\displaystyle={\textnormal{Tr}}({\textnormal{{S}}}{\textnormal{{A}}})+\frac{{% \textnormal{Tr}}({\textnormal{{S}}}\boldsymbol{a}\boldsymbol{a}^{\mathrm{H}}{% \textnormal{{S}}}{\textnormal{{A}}})}{M-\boldsymbol{a}^{\mathrm{H}}{% \textnormal{{S}}}\boldsymbol{a}},= Tr ( bold_typewriter_S bold_typewriter_A ) + divide start_ARG Tr ( S bold_italic_a bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_typewriter_S bold_typewriter_A ) end_ARG start_ARG italic_M - bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT S bold_italic_a end_ARG , (91)
=Tr(S)1M𝒂HS𝒂+𝒂HSS𝒂1M|𝒂HS𝒂|2M𝒂HS𝒂,absentTrS1𝑀superscript𝒂HS𝒂superscript𝒂HSS𝒂1𝑀superscriptsuperscript𝒂HS𝒂2𝑀superscript𝒂HS𝒂\displaystyle={\textnormal{Tr}}({\textnormal{{S}}})-\frac{1}{M}\boldsymbol{a}^% {\mathrm{H}}{\textnormal{{S}}}\boldsymbol{a}+\frac{\boldsymbol{a}^{\mathrm{H}}% {\textnormal{{S}}}{\textnormal{{S}}}\boldsymbol{a}-\frac{1}{M}|\boldsymbol{a}^% {\mathrm{H}}{\textnormal{{S}}}\boldsymbol{a}|^{2}}{M-\boldsymbol{a}^{\mathrm{H% }}{\textnormal{{S}}}\boldsymbol{a}},= Tr ( S ) - divide start_ARG 1 end_ARG start_ARG italic_M end_ARG bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT S bold_italic_a + divide start_ARG bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT bold_typewriter_S bold_typewriter_S bold_italic_a - divide start_ARG 1 end_ARG start_ARG italic_M end_ARG | bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT S bold_italic_a | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_M - bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT S bold_italic_a end_ARG , (92)
=Tr(S)𝒂HS(IS)𝒂𝒂H(IS)𝒂,absentTrSsuperscript𝒂HSIS𝒂superscript𝒂HIS𝒂\displaystyle={\textnormal{Tr}}({\textnormal{{S}}})-\frac{\boldsymbol{a}^{% \mathrm{H}}{\textnormal{{S}}}({\textnormal{{I}}}-{\textnormal{{S}}})% \boldsymbol{a}}{\boldsymbol{a}^{\mathrm{H}}({\textnormal{{I}}}-{\textnormal{{S% }}})\boldsymbol{a}},= Tr ( S ) - divide start_ARG bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT S ( I - S ) bold_italic_a end_ARG start_ARG bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT ( I - S ) bold_italic_a end_ARG , (93)
=Tr(S)𝒂HS𝒘^,absentTrSsuperscript𝒂HS^𝒘\displaystyle={\textnormal{Tr}}({\textnormal{{S}}})-\boldsymbol{a}^{\mathrm{H}% }{\textnormal{{S}}}\hat{\boldsymbol{w}},= Tr ( S ) - bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT S over^ start_ARG bold_italic_w end_ARG , (94)
=MαTr[(R𝒙+αI)1]𝒂HR𝒙(R𝒙+αI)1𝒘^.absent𝑀𝛼Trdelimited-[]superscriptsubscriptR𝒙𝛼I1superscript𝒂HsubscriptR𝒙superscriptsubscriptR𝒙𝛼I1^𝒘\displaystyle=M-\alpha{\textnormal{Tr}}[({\textnormal{{R}}}_{\boldsymbol{x}}+% \alpha{\textnormal{{I}}})^{-1}]-\boldsymbol{a}^{\mathrm{H}}{\textnormal{{R}}}_% {\boldsymbol{x}}({\textnormal{{R}}}_{\boldsymbol{x}}+\alpha{\textnormal{{I}}})% ^{-1}\hat{\boldsymbol{w}}.= italic_M - italic_α Tr [ ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ] - bold_italic_a start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( R start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT + italic_α I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_italic_w end_ARG . (95)

References

  • [1] A. H. Sayed, Adaptive Filters.   Hoboken, New Jersey: John Wiley & Sons, 2008.
  • [2] L.-M. Dogariu, J. Benesty, C. Paleologu, and S. Ciochină, “An insightful overview of the Wiener filter for system identification,” Applied Sciences, vol. 11, no. 17, 2021. [Online]. Available: https://www.mdpi.com/2076-3417/11/17/7774
  • [3] G. Pillonetto, T. Chen, A. Chiuso, G. De Nicolao, and L. Ljung, Regularized System Identification.   Springer Link, 2022.
  • [4] D. M. Allen, “Mean square error of prediction as a criterion for selecting variables,” Technometrics, vol. 13, no. 3, pp. 469–475, 1971. [Online]. Available: https://www.tandfonline.com/doi/abs/10.1080/00401706.1971.10488811
  • [5] G. H. Golub, M. Heath, and G. Wahba, “Generalized cross-validation as a method for choosing a good ridge parameter,” Technometrics, vol. 21, no. 2, pp. 215–223, 1979. [Online]. Available: http://www.jstor.org/stable/1268518
  • [6] D. Barber, Bayesian reasoning and Machine Learning.   New York: Cambridge University Press, 2012.
  • [7] S. Haykin, Adaptive Filter Theory, 4th ed.   Prentice Hall, 2002.
  • [8] D. Tse and P. Viswanath, Fundamentals of Wireless Communication.   Cambridge University Press, May 2005.
  • [9] J. Li, P. Stoica, and Z. Wang, “On robust Capon beamforming and diagonal loading,” IEEE Transactions on Signal Processing, vol. 51, no. 7, pp. 1702–1715, 2003.
  • [10] J. Li and P. Stoica, “An adaptive filtering approach to spectral estimation and SAR imaging,” IEEE Transactions on Signal Processing, vol. 44, no. 6, pp. 1469–1484, 1996.
  • [11] L. Du, J. Li, and P. Stoica, “Fully automatic computation of diagonal loading levels for robust adaptive beamforming,” IEEE Transactions on Aerospace and Electronic Systems, vol. 46, no. 1, pp. 449–458, 2010.
  • [12] O. Ledoit and M. Wolf, “A well-conditioned estimator for large-dimensional covariance matrices,” Journal of Multivariate Analysis, vol. 88, no. 2, pp. 365–411, 2004. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0047259X03000964
  • [13] R. W. K. Arthur E. Hoerl and K. F. Baldwin, “Ridge regression:some simulations,” Communications in Statistics, vol. 4, no. 2, pp. 105–123, 1975. [Online]. Available: https://doi.org/10.1080/03610927508827232
  • [14] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning.   Springer Series in Statistics, 2009.
  • [15] Y. Selén, R. Abrahamsson, and P. Stoica, “Automatic robust adaptive beamforming via ridge regression,” Signal Processing, vol. 88, no. 1, pp. 33–49, 2008. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0165168407002460
  • [16] D. Gomes de Pinho Zanco, L. Szczecinski, and J. Benesty. (2023) Automatic regularization for linear MMSE filters. [Online]. Available: https://arxiv.longhoe.net/pdf/2312.06560
  • [17] N. Werner, “audiolabs/rir-generator: Version 0.2.0,” 2023.