\equalcont

These authors contributed equally to this work.

[1]\fnmLixin \surShen \equalcontThese authors contributed equally to this work.

[1]\orgdivDepartment of Mathematics, \orgnameSyracuse University, \orgaddress\citySyracuse, \postcodeNY 13244, \countryUSA

2]\orgdivInformation Directorate, \orgnameAir Force Research Laboratory, \orgaddress\cityRome, \postcodeNY 10587,\countryUSA

Computing Proximity Operators of Scale and Signed Permutation Invariant Functions

\fnmJianqing \surJia [email protected] \fnmAshley \surPrater-Bennette [email protected] [email protected] * [

Abstract

This paper investigates the computation of proximity operators for scale and signed permutation invariant functions. A scale-invariant function remains unchanged under uniform scaling, while a signed permutation invariant function retains its structure despite permutations and sign changes applied to its input variables. Noteworthy examples include the $\ell_{0}$ function and the ratios of $\ell_{1}/\ell_{2}$ and its square, with their proximity operators being particularly crucial in sparse signal recovery. We delve into the properties of scale and signed permutation invariant functions, delineating the computation of their proximity operators into three sequential steps: the $\bm{w}$ -step, $r$ -step, and $d$ -step. These steps collectively form a procedure termed as WRD, with the $\bm{w}$ -step being of utmost importance and requiring careful treatment. Leveraging this procedure, we present a method for explicitly computing the proximity operator of $(\ell_{1}/\ell_{2})^{2}$ and introduce an efficient algorithm for the proximity operator of $\ell_{1}/\ell_{2}$ .

keywords:

sparse promoting functions, proximity operator,

\ell_{1}/\ell_{2}

(\ell_{1}/\ell_{2})^{2}

pacs:

[

MSC Classification]90C26, 90C32, 90C55, 90C90, 65K05

1 Introduction

This paper addresses the computation of the proximity operator for scale and signed permutation invariant functions. A scale-invariant function is characterized by its resilience to uniform scaling: it remains unaltered when its input undergoes a constant factor multiplication. This invariance extends to permutations, ensuring that changes in the order of input variables do not affect the function’s value. Additionally, the function exhibits invariance under sign changes, meaning that if any component of an input is replaced by its negative counterpart, the function value remains consistent. In the context of this study, a signed permutation invariant function is defined as a mathematical function that retains its form despite permutations and sign changes applied to its input variables.

Several well-known examples of signed permutation invariant functions, as well as scale and signed permutation invariant functions, are presented:

•

All $\ell_{p}$ norms, where $0<p\leq\infty$ , and log-sum penalty function in $\mathbb{R}^{n}$ are signed permutation invariant but not scale invariant, see [1, 2];
•

The $\ell_{0}$ norm and the effective sparsity measure $\left(\frac{\|\cdot\|_{q}}{\|\cdot\|_{1}}\right)^{\frac{q}{1-q}}$ , $q\in(0,\infty)\setminus\{1\}$ are both scale and signed permutation invariant, see [3, 4, 5, 6, 7].

The proximity operator is a mathematical concept used in optimization. This operator provides a computationally efficient way to find solutions for optimization problems involving nonsmooth functions [8, 9, 10, 11, 12, 13, 14, 15]. Given a proper lower semicontinuous function $f$ and a point $\bm{x}$ , the proximity operator of $f$ at $\bm{x}$ , denoted as $\mathrm{prox}_{f}(\bm{x})$ , is defined as:

\mathrm{prox}_{f}(\bm{x})=\arg\min\left\{\frac{1}{2}\|\bm{u}-\bm{x}\|_{2}^{2}+% f(\bm{u}):\bm{u}\in\mathbb{R}^{n}\right\}.

In simpler terms, the proximity operator finds a point $\bm{u}$ that minimizes the sum of the function $f$ and half of the squared Euclidean distance between $\bm{u}$ and a given point $\bm{x}$ .

Our focus of this paper is to study the proximity operator of scale and signed permutation invariant functions. Our approach for computing the proximity operator of scale and signed permutation invariant functions is based on this observation: the space $\mathbb{R}^{n}$ is isomorphic to the Cartesian product of $\mathbb{R}$ and the $(n-1)$ dimensional unit sphere, denoted by $\mathbb{S}^{n-1}$ . Mathematically, this can be expressed as:

\mathbb{R}^{n}\cong\mathbb{R}\times\mathbb{S}^{n-1}.

That is, for $\bm{u}\in\mathbb{R}^{n}$ , it can be converted to a pair $(r,\bm{w})\in\mathbb{R}\times\mathbb{S}^{n-1}$ such that $\bm{u}=r\bm{w}$ , where $r=\|\bm{u}\|_{2}$ and $\bm{w}=\bm{u}/\|\bm{u}\|_{2}$ . With this conversion, the task of finding a point $\bm{u}\in\mathrm{prox}_{f}(\bm{x})$ transforms into finding a pair of $(r,\bm{w})\in\mathbb{R}\times\mathbb{S}^{n-1}$ such that $\bm{u}=r\bm{w}$ . Exploring the properties of the scale and signed permutation invariant functions $f$ , the process of finding this pair $(r,\bm{w})$ involves three consecutive steps. The first step is to solve an optimization problem with variable $\bm{w}$ only, the second step straightforwardly yields $r=\langle\bm{x},\bm{w}\rangle$ , and the final step involves deciding whether to choose the origin or the scaled vector $\bm{u}=r\bm{w}$ as the resulting point. Clearly, the first step is crucial.

For all scale and signed permutation invariant functions, we will present a complete study on the following function

h_{p}(\bm{x})=\left(\frac{\|\bm{x}\|_{1}}{\|\bm{x}\|_{2}}\right)^{p}\quad\mbox% {for}\quad p=1,2.

Notably, there has been a gap in existing literature concerning the proximity operator of $h_{2}$ , and we have observed a recent study that addresses the proximity operator of $h_{1}$ [16]. In our work, we aim to fill this gap by providing a comprehensive analysis of the proximity operator for both $h_{1}$ and $h_{2}$ within the context of scale and signed permutation invariant functions.

With our approach, the optimization problem for $\bm{w}$ associated with both $h_{1}$ and $h_{2}$ is nonconvex and takes the form of a constrained quadratic programming problem after certain simplifications. Despite the nonconvex nature of the objective functions and the constrained sets, we adopt a distinct strategy to address them individually.

For the $h_{2}$ function, the objective function of the quadratic programming problem involves only a quadratic term formulated by a structured symmetric rank-2 matrix. Explicitly demonstrating that this matrix possesses one positive eigenvalue and one negative eigenvalue, and the constrained set of the problem is $\mathbb{S}^{n-1}\cap\mathbb{R}^{n}_{+}$ , where $\mathbb{R}^{n}_{+}$ is the first orthant of $\mathbb{R}^{n}$ . While both the objective function and constrained set are nonconvex, we are able to develop a procedure to find the optimal solution $\bm{w}$ through the eigenvector of the matrix corresponding to the negative eigenvalue, achieved in a finite number of iterations.

For the $h_{1}$ function, the objective function of the quadratic programming problem comprises a quadratic term formulated by a rank-one symmetric matrix and one linear term. The rank-1 matrix is negative definite, and the constrained set remains $\mathbb{S}^{n-1}\cap\mathbb{R}^{n}_{+}$ . Similar to the situation with $h_{2}$ , both the objective function and constrained set are nonconvex. However, the procedure utilized for $h_{2}$ cannot be directly adapted for $h_{1}$ . To address this, we relax the nonconvex feasible set $\mathbb{S}^{n-1}\cap\mathbb{R}^{n}_{+}$ to a convex set $\{\bm{w}\in\mathbb{R}^{n}_{+}:\|\bm{w}\|_{2}\leq 1\}$ . The resulting optimization problem maintains the same objective function as the non-relaxed version, but is now constrained in a convex domain. We establish conditions ensuring that the optimal solution to the relaxed problem lies on $\mathbb{S}^{n-1}\cap\mathbb{R}^{n}_{+}$ or to be the origin. Subsequently, we propose a projected gradient method to solve the relaxed optimization problem. Leveraging the fact that the optimal solution is related to the proximity operator of $h_{1}$ at a given point, we use this information as prior knowledge to initialize the projected gradient method. Through numerical experiments, our findings consistently indicate that the algorithm can successfully find the optimal $\bm{w}$ for the original, unrelaxed optimization problem.

It’s worth noting that a different approach for the proximity operator of $h_{1}$ has been reported recently in [16]. That paper claimed to have derived the analytical solution of the proximity operator of $h_{1}$ , relying on prior knowledge about the sparsity of the corresponding output from this proximity operator, which, however, is unknown in general. A bisection method was then applied for finding this desired sparsity.

The current literature, including works such as [3, 4, 5, 6], suggests that both $h_{1}$ and $h_{2}$ functions can effectively promote sparsity in underlying signals. However, to the best of our knowledge, there is a lack of theoretical justification for this claim. In this paper, we provide the theoretical proof that both $h_{1}$ and $h_{2}$ functions qualify as sparsity-promoting functions, as defined in [17].

The outline of the rest of the paper is as follows: In Section 2, we begin by presenting some properties of the proximity operators for scale and signed permutation invariant functions. These properties allow us to focus our discussion on these proximity operators within a specific set: each point lies in the first orthant of $\mathbb{R}^{n}$ , and the entries of the point are in descending order. By employing a different representation of the points in this set, determining the proximity operators of scale and signed permutation invariant functions at these points essentially reduces to solving a quadratic programming problem constrained on a nonconvex set. We then introduce a comprehensive procedure called the WRD procedure, which comprises three distinct steps: $\bm{w}$ -step, $r$ -step and $d$ -step. This procedure enables efficient computation of proximity operators for scale and signed permutation invariant functions, offering a systematic approach to solving such problems.

In Section 3, utilizing the WRD procedure, we compute the proximity operator of $h_{2}$ . We are able to provide an explicit solution for the proximity operator of $h_{2}$ at any point in a highly efficient manner, thereby demonstrating the effectiveness of our approach.

In Section 4, leveraging the WRD procedure, we compute the proximity operator of $h_{1}$ . We are able to develop an efficient algorithm to evaluate the proximity operator of $h_{1}$ at any point, showcasing the versatility of our methodology.

The conclusion of this paper is drawn in Section 5, summarizing the findings and contributions of our study. We discuss the implications of our results and propose avenues for future research.

2 Scale and Signed Permutation Invariant Functions and their Proximity Operators

All functions in this work are defined on $\mathbb{R}^{n}$ the Euclidean space of dimension $n$ . Bold lowercase letters, such as $\bm{x}$ , signify vectors, with the $j$ th component represented by the corresponding lowercase letter $x_{j}$ . Matrices are indicated by bold uppercase letters such as $\mathsf{A}$ and $\mathsf{B}$ . We use $\mathbb{R}^{n}_{+}$ to denote the set of points in $\mathbb{R}^{n}$ such that all entries of each point in the set are nonnegative. The cone of vectors $\bm{x}$ in $\mathbb{R}^{n}_{+}$ satisfying $x_{1}\geq x_{2}\geq\ldots\geq x_{n}\geq 0$ is denoted by $\mathbb{R}^{n}_{\downarrow}$ . We use $\mathbb{S}^{n-1}$ (or $\mathbb{B}^{n}$ ) to denote the unit sphere (or ball) in $\mathbb{R}^{n}$ . We use $\mathbb{S}^{n-1}_{+}$ ( $\mathbb{B}^{n}_{+}$ or $\mathbb{B}^{n}_{\downarrow}$ ) to denote the partial unit sphere $\mathbb{S}^{n-1}\cap\mathbb{R}^{n}_{+}$ (the partial unit ball $\mathbb{B}^{n}\cap\mathbb{R}^{n}_{+}$ or $\mathbb{B}^{n}\cap\mathbb{R}^{n}_{\downarrow}$ ) in $\mathbb{R}^{n}$ . Let $\mathcal{P}_{n}$ denote the set of all $n\times n$ signed permutation matrices: those matrices that have only one nonzero entry in every row or column, which is $\pm 1$ .

The $\ell_{p}$ norm of $\bm{x}=[x_{1},\ldots,x_{n}]^{\top}\in\mathbb{R}^{n}$ is defined as $\|\bm{x}\|_{p}=(\sum_{k=1}^{n}|x_{k}|^{p})^{1/p}$ for $1\leq p<\infty$ and $\|\bm{x}\|_{\infty}=\max\{|x_{k}|:k=1,2,\ldots,n\}$ . When $p=0$ , $\|\bm{x}\|_{0}$ represents the number of non-zero components in $\bm{x}$ . The standard inner product in $\mathbb{R}^{n}$ is denoted by $\langle\bm{u},\bm{v}\rangle$ , where $\bm{u}$ and $\bm{v}$ are vectors in $\mathbb{R}^{n}$ .

We denote $[n]:=\{1,2,\ldots,n\}$ as an index set up to a positive integer $n$ . For a subset $S$ of $[n]$ , the notation $|S|$ represents the cardinality of $S$ . For a vector $\bm{x}\in\mathbb{R}^{n}$ and a subset $S$ of $[n]$ , $\bm{x}_{S}$ denotes the vector that retains the entries with indices in $S$ of $\bm{x}$ and sets the remaining entries to zero, or the subvector of $\bm{x}$ with indices solely from $S$ . The specific meaning of $\bm{x}_{S}$ being referred to will be evident from the context of the discussion.

A function $f:\mathbb{R}^{n}\rightarrow\mathbb{R}$ is considered scale invariant if for all $\bm{x}\in\mathbb{R}^{n}$ and $\alpha>0$ , the following holds:

f(\alpha\bm{x})=f(\bm{x}).

In other words, scaling the input by any positive constant does not alter the value of the function.

A function $f:\mathbb{R}^{n}\rightarrow\mathbb{R}$ is considered signed permutation invariant if it remains unchanged under the action of permutations and sign changes of its input variables. Formally, a function $f$ is signed permutation invariant if, for all permutations $\mathsf{P}\in\mathcal{P}_{n}$ and for all vectors $\bm{x}\in\mathbb{R}^{n}$ , the following holds:

f(\mathsf{P}\bm{x})=f(\bm{x}).

A function $f$ defined on $\mathbb{R}^{n}$ with values in $\mathbb{R}\cup\{+\infty\}$ is proper if its domain $\mathrm{dom}(f)=\{x\in\mathbb{R}^{n}:f(x)<+\infty\}$ is nonempty, and $f$ is lower semicontinuous if its epigraph is a closed set. The set of proper and lower semicontinuous functions on $\mathbb{R}^{n}$ to $\mathbb{R}\cup\{+\infty\}$ is denoted by $\Gamma(\mathbb{R}^{n})$ .

The proximity operator was introduced by Moreau in [18]. For a function $f\in\Gamma(\mathbb{R}^{n})$ , the proximity operator of $f$ at $\bm{z}\in\mathbb{R}^{n}$ with index $\alpha$ is defined by

\mathrm{prox}_{\alpha f}(\bm{z}):=\mathrm{arg}\min\left\{\frac{1}{2\alpha}\|% \bm{x}-\bm{z}\|_{2}^{2}+f(\bm{x}):\bm{x}\in\mathbb{R}^{n}\right\}.

The proximity operator of $f$ is a set-valued operator from $\mathbb{R}^{n}\rightarrow 2^{\mathbb{R}^{n}}$ , the power set of $\mathbb{R}^{n}$ . In this paper, for a scale and signed permutation function, we always assume that the set $\mathrm{prox}_{\alpha f}(\bm{z})$ is nonempty and compact.

2.1 Properties

The proximity operator exhibits certain properties concerning scale and signed permutation invariant functions.

Lemma 2.1.

Let $\bm{x}\in\mathbb{R}^{n}$ , $\mathsf{P}\in\mathcal{P}_{n}$ , $\alpha>0$ , and $\lambda>0$ . The following statements hold:

(i)

For a signed permutation invariant function $f\in\Gamma(\mathbb{R}^{n})$ , $\mathrm{prox}_{\lambda f}(\bm{x})=\mathsf{P}^{-1}\mathrm{prox}_{\lambda f}(% \mathsf{P}\bm{x})$ .
(ii)

For a scale invariant function $f\in\Gamma(\mathbb{R}^{n})$ , $\mathrm{prox}_{\lambda f}(\alpha\bm{x})=\alpha\mathrm{prox}_{\lambda\alpha^{-2% }f}(\bm{x})$ .

Proof.

The proof of the two items is based on the definitions of the proximity operator and scale and signed permutation invariant function. We skip the details of the proof here. ∎

For any vector $\bm{x}\in\mathbb{R}^{n}$ , there is a signed permutation $\mathsf{P}\in\mathcal{P}_{n}$ such that $\mathsf{P}\bm{x}\in\mathbb{R}_{\downarrow}^{n}$ , that is, the entries of $\bm{x}$ can be sorted in a way of $|x_{\sigma(1)}|\geq|x_{\sigma(2)}|\geq\ldots\geq|x_{\sigma(n)}|$ , where $\sigma(i)$ is the index of nonzero entry in the $i$ th column of $\mathsf{P}$ . By Lemma 2.1, for a signed permutation invariant function in $\Gamma(\mathbb{R}^{n})$ , it is sufficient to consider its proximity operator on $\mathbb{R}_{\downarrow}^{n}$ .

For a vector $\bm{x}\in\mathbb{R}_{\downarrow}^{n}$ , we assert that $\bm{x}$ exhibits $k$ blocks, characterized by $(k+1)$ distinct indices $\{i_{j}:j\in[k+1]\}$ satisfying $i_{1}=1$ , $i_{k+1}=n$ , and $i_{j}<i_{j+1}$ . In these blocks, $\bm{x}$ follows the pattern $x_{i_{j}}=x_{i_{j+1}-1}<x_{i_{j+1}}$ for $1\leq j\leq k-1$ and $x_{i_{k}}=x_{i_{k+1}}$ . In essence, the vector $\bm{x}$ comprises $k$ blocks, where entries within each block are identical, yet they differ from entries in other blocks.

Lemma 2.2.

Let $f$ be a signed permutation invariant function in $\Gamma(\mathbb{R}^{n})$ , and let $\lambda>0$ . Consider $\bm{x}\in\mathbb{R}^{n}_{\downarrow}$ , we assert that $\operatorname*{prox}_{\lambda f}(\bm{x})\subseteq\mathbb{R}^{n}_{+}$ . Furthermore, there exists a point $\bm{u}\in\operatorname*{prox}_{\lambda f}(\bm{x})$ such that $\bm{u}\in\mathbb{R}_{\downarrow}^{n}$ .

Proof.

To establish $\operatorname*{prox}_{\lambda f}(\bm{x})\subseteq\mathbb{R}^{n}_{+}$ , we observe that the objective function from the definition of $\operatorname*{prox}_{\lambda f}(\bm{x})$ is $\frac{1}{2\lambda}\|\bm{u}-\bm{x}\|_{2}^{2}+f(\bm{u})$ for all $\bm{u}\in\mathbb{R}^{n}$ . As $f$ is a signed permutation invariant function, $f(\bm{u})=f(\mathsf{P}\bm{u})$ for all $\mathsf{P}\in\mathcal{P}_{n}$ . Given $\bm{x}\in\mathbb{R}_{\downarrow}^{n}$ , our discussion can be restricted to $\bm{u}\in\mathbb{R}_{+}^{n}$ ; otherwise, say the first element $u_{1}$ of $\bm{u}$ is negative, then $(-u_{1}-x_{1})^{2}+\sum_{\ell=2}^{n}(u_{\ell}-x_{\ell})^{2}\leq(u_{1}-x_{1})^{% 2}+\sum_{\ell=2}^{n}(u_{\ell}-x_{\ell})^{2}$ . From the above discussion, we conclude that $\operatorname*{prox}_{\lambda f}(\bm{x})\subseteq\mathbb{R}_{+}^{n}$ .

Now, suppose $\bm{u}\in\operatorname*{prox}_{\lambda f}(\bm{x})$ . If the vector $\bm{x}$ has one block, that is, all entries of $\bm{x}$ are the same. Clearly, we can rearrange entries of $\bm{u}$ so that the rearranged one is in $\mathbb{R}^{n}_{\downarrow}$ and is still in $\operatorname*{prox}_{\lambda f}(\bm{x})$ . If vector $\bm{x}\in\mathbb{R}_{\downarrow}^{n}$ has $k\geq 2$ blocks, characterized by $(k+1)$ distinct indices $\{i_{j}:j\in[k+1]\}$ . We define $u_{\overline{j}}=\max\{u_{\ell}:i_{j}\leq\ell\leq i_{j+1}-1\}$ and $u_{\underline{j}}=\min\{u_{\ell}:i_{j}\leq\ell\leq i_{j+1}-1\}$ for $1\leq j\leq k-1$ , and $u_{\overline{k}}=\max\{u_{\ell}:i_{k}\leq\ell\leq i_{k+1}\}$ and $u_{\underline{k}}=\min\{u_{\ell}:i_{k}\leq\ell\leq i_{k+1}\}$ . We claim that $u_{\underline{j}}\geq u_{\overline{j+1}}$ for $1\leq j\leq k-1$ . If these inequalities do not hold for some $1\leq i\leq k-1$ , assume, without loss of generality, that $u_{\underline{1}}<u_{\overline{2}}$ . One can assume that $u_{1}=u_{\underline{1}}$ and $u_{i_{2}}=u_{\overline{2}}$ . In this case, let $\widetilde{\bm{u}}$ be a vector from $\bm{u}$ by exchanging its first and the $i_{2}$ components. Immediately, $f(\widetilde{\bm{u}})=f(\bm{u})$ , and

	$\displaystyle\\|\widetilde{\bm{u}}-\bm{x}\\|_{2}^{2}-\\|\bm{u}-\bm{x}\\|_{2}^{2}$	$\displaystyle=$	$\displaystyle(u_{\overline{2}}-x_{1})^{2}+(u_{\underline{1}}-x_{i_{2}})^{2}-(u% _{\underline{1}}-x_{1})^{2}-(u_{\overline{2}}-x_{i_{2}})^{2}$
		$\displaystyle=$	$\displaystyle 2(u_{\underline{1}}-u_{\overline{2}})(x_{1}-x_{i_{2}})<0$

due to the conditions of $x_{1}=x_{i_{1}}>x_{i_{2}}$ and $u_{\underline{1}}<u_{\overline{2}}$ . This conflicts with our assumption of $\bm{u}\in\operatorname*{prox}_{\lambda f}(\bm{x})$ .

Finally, since all entries in each block of $\bm{x}$ are the same, arranging the entries of $\bm{u}\in\operatorname*{prox}_{\lambda f}(\bm{x})$ for the indices in the same block in descending order results in $\bm{u}$ still belonging to $\operatorname*{prox}_{\lambda f}(\bm{x})$ . Thus, there exists a point $\bm{u}\in\operatorname*{prox}_{\lambda f}(\bm{x})$ such that $\bm{u}\in\mathbb{R}_{\downarrow}^{n}$ . ∎

2.2 Reformulation

Our focus of this paper is to study the proximity operator of scale and signed permutation invariant functions. Our approach for computing the proximity operator of scale and signed permutation invariant functions is based on this observation: the space $\mathbb{R}^{n}$ is isomorphic to the Cartesian product of $\mathbb{R}$ and $\mathbb{S}^{n-1}$ . That is, for $\bm{u}\in\mathbb{R}^{n}$ , it can be converted to a pair $(r,\bm{w})\in\mathbb{R}\times\mathbb{S}^{n-1}$ such that

\bm{u}=r\bm{w},

where

r=\|\bm{u}\|_{2}\in\mathbb{R}\quad\mbox{and}\quad\bm{w}=\frac{\bm{u}}{\|\bm{u}% \|_{2}}\in\mathbb{S}^{n-1}.

With this conversion, the task of finding a point $\bm{u}\in\mathrm{prox}_{f}(\bm{x})$ transforms into finding a pair of $(r,\bm{w})\in\mathbb{R}\times\mathbb{S}^{n-1}$ such that $\bm{u}=r\bm{w}$ .

Theorem 2.3.

Let $f$ be a scale and signed permutation invariant function in $\Gamma(\mathbb{R}^{n})$ , and let $\rho>0$ . Consider a vector $\bm{x}\in\mathbb{R}_{\downarrow}^{n}$ and define

F(\bm{u}):=\frac{\rho}{2}\|\bm{u}-\bm{x}\|_{2}^{2}+f(\bm{u}).

(1)

Then $\bm{x}^{\star}\in\mathrm{prox}_{\frac{1}{\rho}f}(\bm{x})$ if and only if $\bm{x}^{\star}$ is given by

\bm{x}^{\star}\in\left\{\begin{array}[]{ll}\{\mathbf{0}\},&\hbox{if $F(\mathbf% {0})<F(\langle\bm{x},\bm{w}^{\star}\rangle\bm{w}^{\star})$;}\\ \{\mathbf{0},\langle\bm{x},\bm{w}^{\star}\rangle\bm{w}^{\star}\},&\hbox{if $F(% \mathbf{0})=F(\langle\bm{x},\bm{w}^{\star}\rangle\bm{w}^{\star})$;}\\ \{\langle\bm{x},\bm{w}^{\star}\rangle\bm{w}^{\star}\},&\hbox{otherwise.}\end{% array}\right.

(2)

where $\bm{w}^{\star}$ is a solution to the following optimization problem

\min\left\{-\frac{\rho}{2}\langle\bm{x},\bm{w}\rangle^{2}+f(\bm{w}):\bm{w}\in% \mathbb{S}^{n-1}_{+}\right\}.

(3)

Proof.

From the definition of proximity operator,

\mathrm{prox}_{\frac{1}{\rho}f}(\bm{x})=\operatorname*{arg\,min}\left\{F(\bm{u% }):\bm{u}\in\mathbb{R}^{n}\right\}.

By Lemma 2.1 and Lemma 2.2, for $\bm{x}\in\mathbb{R}_{\downarrow}^{n}$ we establish that

\operatorname*{arg\,min}\left\{F(\bm{u}):\bm{u}\in\mathbb{R}^{n}\right\}=% \operatorname*{arg\,min}\left\{F(\bm{u}):\bm{u}\in\mathbb{R}^{n}_{+}\right\}.

To delve deeper into the optimization problem on the right-hand side, we express $\bm{u}=r\bm{w}$ with $r\geq 0$ and $\bm{w}\in\mathbb{S}_{+}^{n-1}$ . Consequently, for $r=0$ ,

F(\mathbf{0})=\frac{\rho}{2}\|\bm{x}\|_{2}^{2}+f(\mathbf{0})

and for $r>0$

$\displaystyle F(\bm{u})$	$\displaystyle=$	$\displaystyle\frac{\rho}{2}\\|r\bm{w}-\bm{x}\\|_{2}^{2}+f(r\bm{w})$	(4)
	$\displaystyle=$	$\displaystyle\frac{\rho}{2}(r^{2}-2r\langle\bm{w},\bm{x}\rangle+\\|\bm{x}\\|_{2}% ^{2})+f(\bm{w})$
	$\displaystyle=$	$\displaystyle\frac{\rho}{2}(r-\langle\bm{w},\bm{x}\rangle)^{2}+\frac{\rho}{2}% \\|\bm{x}\\|_{2}^{2}+\left(-\frac{\rho}{2}\langle\bm{w},\bm{x}\rangle^{2}+f(\bm{% w})\right).$

In equation (4), the terms are as follows: The first term $\frac{\rho}{2}(r-\langle\bm{w},\bm{x}\rangle)^{2}$ can always achieve the minimum value $0$ by taking $r=\langle\bm{w},\bm{x}\rangle$ ; the second term $\frac{\rho}{2}\|\bm{x}\|_{2}^{2}$ is constant with respect to the pair $(r,\bm{w})$ ; and third term $-\frac{\rho}{2}\langle\bm{w},\bm{x}\rangle^{2}+f(\bm{w})$ is solely a function of $\bm{w}$ . Therefore, we seek $\bm{w}^{\star}$ that minimizes the third term with respect to $\bm{w}$ , i.e., solving the optimization problem (3), then form the expression $\langle\bm{x},\bm{w}^{\star}\rangle\bm{w}^{\star}$ . Hence, the conclusion of this theorem holds. ∎

In the following discussion, we use the notation $F$ in (1) to represent the objective function for $\mathrm{prox}_{\frac{1}{\rho}f}(\bm{x})$ and denote

G(\bm{w}):=-\frac{\rho}{2}\langle\bm{x},\bm{w}\rangle^{2}+f(\bm{w}).

(5)

to represent the objective function of (3).

The significance of the scale and signed permutation invariance of $f$ becomes evident in the proof of the theorem above. The scale invariance of $f$ facilitates the discussion from $\mathbb{R}^{n}$ to $\mathbb{S}^{n-1}$ , while the signed permutation invariance narrows the focus from $\mathbb{S}^{n-1}$ to $\mathbb{S}^{n-1}_{+}$ , allowing us to isolate the impact of $r$ and $\bm{w}$ when solving an optimization problem that involves $\bm{w}$ exclusively.

In accordance with Theorem 2.3, the process of determining the pair $(r,\bm{w})$ involves three distinct steps:

•

$\bm{w}$ -step: In this step, the objective is to find an optimal solution $\bm{w}^{\star}$ to the optimization problem (3).
•

$r$ -step: Following the $\bm{w}$ -step, the corresponding $r^{\star}$ is computed as $r^{\star}=\langle\bm{x},\bm{w}^{\star}\rangle$ , where $\bm{w}^{\star}$ is the output from $\bm{w}$ -step.
•

$d$ -step: This final step determines $\bm{x}^{\star}$ according to (2).

Upon completing these three steps, as shown in (2), $\bm{x}^{\star}$ belongs to $\mathrm{prox}_{\frac{1}{\rho}f}(\bm{x})$ . For ease of reference in the subsequent discussion, this procedure is referred to as WRD ( $\bm{w}$ -step, $r$ -step, $d$ -step).

To show the applicability of the WRD procedure, we present the proximity operator of the $\ell_{0}$ norm, a typical scale and signed permutation invariant function.

Example 2.4.

The proximity operator of the $\ell_{0}$ norm at $\bm{x}$ with index $1/\rho$ is, see, e.g., [19, 17],

(\mathrm{prox}_{\frac{1}{\rho}\|\cdot\|_{0}}(\bm{x}))_{i}=\left\{\begin{array}% []{ll}\{x_{i}\},&\hbox{if $|x_{i}|>\sqrt{2/\rho}$;}\\ \{0,x_{i}\},&\hbox{if $|x_{i}|=\sqrt{2/\rho}$;}\\ \{0\},&\hbox{otherwise.}\end{array}\right.

We intend to apply the WRD procedure for computing $\mathrm{prox}_{\frac{1}{\rho}\|\cdot\|_{0}}$ . Assuming $\bm{x}\in\mathbb{R}^{n}_{\downarrow}$ , and following the approach used in the proof of Theorem 2.3, we define $F(\bm{u}):=\frac{\rho}{2}\|\bm{u}-\bm{x}\|_{2}^{2}+\|\bm{u}\|_{0}$ . The next step involves seeking the optimal solution to optimization problem (3) for $\bm{w}\in\mathbb{S}^{n-1}_{+}$ , where $G(\bm{w}):=-\frac{\rho}{2}\langle\bm{x},\bm{w}\rangle^{2}+\|\bm{w}\|_{0}$ . Thus, for $\bm{w}\in\mathbb{S}^{n-1}_{+}$ with an $\ell_{0}$ norm of $k$ , the smallest value of $G$ is achieved when $\bm{w}$ is aligned with the first $k$ entries of $\bm{x}$ , that is,

G\left(\frac{\bm{x}_{[k]}}{\|\bm{x}_{[k]}\|_{2}}\right)=-\frac{\rho}{2}\|\bm{x% }_{[k]}\|_{2}^{2}+k=\sum_{i=1}^{k}\left(-\frac{\rho}{2}x_{i}^{2}+1\right).

Here $\bm{x}_{[k]}$ keeps the first $k$ entries of $\bm{x}$ and sets the remaining entries zeros. Therefore, the output in the $\bm{w}$ -step of the WRD procedure is given by

\operatorname*{arg\,min}_{\bm{w}\in\mathbb{S}^{n-1}_{+}}G(\bm{w})=\left\{% \begin{array}[]{ll}\left\{\frac{\bm{x}_{[1]}}{\|\bm{x}_{[1]}\|_{2}}\right\},&% \hbox{if $x_{1}<\sqrt{2/\rho}$;}\\ \left\{\frac{\bm{x}_{S}}{\|\bm{x}_{S}\|_{2}}:S\subseteq[p],|S|\geq 1\right\},&% \hbox{if $\exists p\in[n]$ s.t. $x_{1}=x_{p}=\sqrt{2/\rho}>x_{p+1}$;}\\ \left\{\frac{\bm{x}_{[k]}}{\|\bm{x}_{[k]}\|_{2}}\right\},&\hbox{if $\exists k% \in[n]$ s.t. $x_{k}>\sqrt{2/\rho}>x_{k+1}$;}\\ \left\{\frac{\bm{x}_{[k]\cup S}}{\|\bm{x}_{[k]\cup S}\|_{2}}:S\subseteq[p],|S|% \geq 1\right\},&\hbox{if $\exists k\in[n]$ and $p\in[n-k]$ s.t. }\\ &\mbox{ \hskip 5.69046pt$x_{k}>\sqrt{2/\rho}=x_{k+1}=x_{k+p}>x_{k+p+1}$}.\\ \end{array}\right.

This output represents the solutions to the $\bm{w}$ -step of the WRD. Subsequently, choosing a vector $\bm{w}^{\star}\in\operatorname*{arg\,min}_{\bm{w}\in\mathbb{S}^{n-1}_{+}}G(\bm% {w})$ , the $r$ -step generates $r=\langle\bm{x},\bm{w}^{\star}\rangle$ . With the pair $(r,\bm{w}^{\star})$ , the $d$ -step of the WRD compares the difference between $F(\langle\bm{x},\bm{w}^{\star}\rangle\bm{w}^{\star})$ and $F(\mathbf{0})$ , resulting in

F(\langle\bm{x},\bm{w}^{\star}\rangle\bm{w}^{\star})-F(\mathbf{0})=\sum_{i=1}^% {k}\left(-\frac{\rho}{2}x_{i}^{2}+1\right),

where $k$ is equal to $1$ if $x_{1}\leq\sqrt{2/\rho}$ or is the integer such that $x_{k}>\sqrt{2/\rho}\geq x_{k+1}$ . Clearly, the vector $\mathbf{0}$ is in $\mathrm{prox}_{\frac{1}{\rho}\|\cdot\|_{0}}(\bm{x})$ if $x_{1}<\sqrt{2/\rho}$ , and both $\mathbf{0}$ and $\langle\bm{x},\bm{w}^{\star}\rangle\bm{w}^{\star}$ are in $\mathrm{prox}_{\frac{1}{\rho}\|\cdot\|_{0}}(\bm{x})$ if $x_{1}=\sqrt{2/\rho}$ ; otherwise $\langle\bm{x},\bm{w}^{\star}\rangle\bm{w}^{\star}$ is in $\mathrm{prox}_{\frac{1}{\rho}\|\cdot\|_{0}}(\bm{x})$ . These discussions affirm that the WRD procedure accurately recovers the proximity operator of the $\ell_{0}$ norm.

In the rest of the paper, we focus on computing the proximity operator of the function below:

h_{p}(\bm{x})=\left\{\begin{array}[]{ll}\left(\frac{\|\bm{x}\|_{1}}{\|\bm{x}\|% _{2}}\right)^{p},&\hbox{if $\bm{x}\neq\mathbf{0}$;}\\ 0,&{otherwise},\end{array}\right.

(6)

for $p=1$ and $2$ . This function is lower semicontinuous and for all nonzero vectors $\bm{x}\in\mathbb{R}^{n}$ , $1\leq h_{p}(\bm{x})\leq n^{p/2}$ . Thus, the proximity operator of $h_{p}$ at any point is nonempty. Notably, setting the value of $h_{p}$ at the origin to any value smaller than or equal to 1 preserves the lower semicontinuity of the function. For example, $h_{1}(\mathbf{0})$ is set to be $1$ as illustrated in [16]. Therefore, our proposed WRD procedure remains applicable. Lastly, it’s important to note that in $\mathbb{R}$ , our definition of $h_{p}$ aligns consistently with the $\ell_{0}$ norm, that is, $h_{p}(\bm{x})=\|\bm{x}\|_{0}$ for $\bm{x}\in\mathbb{R}$ .

In the next section, we consider the computation of the proximity operator of $h_{2}$ first.

3 The Proximity Operator of $h_{2}$

We plan to use the WRD procedure to compute the proximity operator of $h_{2}$ . We begin with showing the optimization problem (3) associated with the $\bm{w}$ -step of the WRD.

Define $\bm{e}$ as a vector with all its components $1$ . For $\bm{x}\in\mathbb{R}_{+}^{n}$ , we have

\langle\bm{w},\bm{x}\rangle^{2}=\bm{w}^{\top}\bm{x}\bm{x}^{\top}\bm{w}\quad% \mbox{and}\quad\|\bm{w}\|_{1}^{2}=\bm{w}^{\top}\bm{e}\bm{e}^{\top}\bm{w}.

Set

\mathsf{A}_{\rho,\bm{x}}:=2\bm{e}\bm{e}^{\top}-{\rho}\bm{x}\bm{x}^{\top}.

(7)

The corresponding function $G$ in (5) becomes

G(\bm{w})=\frac{1}{2}\bm{w}^{\top}\mathsf{A}_{\rho,\bm{x}}\bm{w},

Hence, the optimization problem (3) is a quadratic programming constrained on $\mathbb{S}_{+}^{n-1}$ .

We promptly obtain a result concerning the proximity operator of $h_{2}$ at points that are multiples of the vector $\bm{e}$ as follows:

Theorem 3.1.

For $\rho>0$ and $\bm{x}=\alpha\bm{e}$ for some $\alpha>0$ , then

\mathrm{prox}_{\frac{1}{\rho}h_{2}}(\bm{x})=\left\{\begin{array}[]{ll}\{\alpha% \bm{e}\},&\hbox{if $\rho\alpha^{2}>2$;}\\ \{\mathbf{0}\}\cup\{\alpha\|\bm{w}\|_{1}\bm{w}:\bm{w}\in\mathbb{S}_{+}^{n-1}\}% ,&\hbox{if $\rho\alpha^{2}=2$;}\\ \{\mathbf{0}\},&\hbox{if $\rho\alpha^{2}<2$.}\end{array}\right.

Proof.

In the situation of $\bm{x}=\alpha\bm{e}$ for some $\alpha>0$ , we have $\mathsf{A}_{\rho,\bm{x}}=(2-\rho\alpha^{2})\bm{e}\bm{e}^{\top}$ from (7). The objective function of problem (3) is $G(\bm{w})=\frac{1}{2}(2-\rho\alpha^{2})\|\bm{w}\|_{1}^{2}$ . To investigate the minimal value of the above function on $\mathbb{S}_{+}^{n-1}$ and at which point the optimal is achieved, there are three different situations according to the value of $\rho\alpha^{2}$ .

If $\rho\alpha^{2}>2$ , the minimal value of $\frac{1}{2}\bm{w}^{\top}\mathsf{A}_{\rho,\bm{x}}\bm{w}$ is achieved at $\bm{w}$ which has the largest $\ell_{1}$ norm for $\bm{w}\in\mathbb{S}_{+}^{n-1}$ . Clearly, the optimal $\bm{w}^{\star}$ must be $\frac{1}{\sqrt{n}}\bm{e}$ and $G(\bm{w}^{\star})=\frac{1}{2}(2-\rho\alpha^{2})n<0$ . Hence, $\mathrm{prox}_{\frac{1}{\rho}h_{2}}(\bm{x})=\{\langle\alpha\bm{e},\bm{w}^{% \star}\rangle\bm{w}^{\star}\}=\{\alpha\bm{e}\}$ .

If $\rho\alpha^{2}=2$ , then $G(\bm{w})=0$ for all $\bm{w}\in\mathbb{S}^{n-1}_{+}$ . Note that $\langle\alpha\bm{e},\bm{w}\rangle\bm{w}=\alpha\|\bm{w}\|_{1}\bm{w}$ . Hence, $\mathrm{prox}_{\frac{1}{\rho}h_{2}}(\bm{x})=\{\mathbf{0}\}\cup\{\alpha\|\bm{w}% \|_{1}\bm{w}:\bm{w}\in\mathbb{S}_{+}^{n-1}\}$ .

Finally, if $\rho\alpha^{2}<2$ , then the minimal value of $G$ on $\mathbb{S}_{+}^{n-1}$ is achieved at $\bm{w}^{\star}\in\{\bm{e}_{i}:i\in[n]\}$ and $G(\bm{e}_{i})=\frac{1}{2}(2-\rho\alpha^{2})>0$ for all $i\in[n]$ . Hence, $\mathrm{prox}_{\frac{1}{\rho}h_{2}}(\bm{x})=\{\mathbf{0}\}$ . ∎

By Lemma 2.1, we restrict our attention to the proximity operator of $h_{2}$ on $\mathbb{R}_{\downarrow}^{n}$ . The complete discussion is presented in the following two subsections. In the first subsection, we conduct a comprehensive analysis of the proximity operator of $h_{2}$ specially in $\mathbb{R}^{2}$ . We delve into the intricacies of this operator, exploring its behavior and characteristics within this constrained domain. In the second subsection, we begin with investigating the properties of the eigenvectors of the matrix $\mathsf{A}_{\rho,\bm{x}}$ . The eigenvector corresponding to a negative eigenvalue plays a pivotal role in determining the solution in the $\bm{w}$ -step of the WRD procedure. By leveraging these properties effectively, we explicitly derive the proximity operator of $h_{2}$ over the entire space $\mathbb{R}^{n}$ .

3.1 Special case: the proximity operator of $h_{2}$ on $\mathbb{R}^{2}$

The following result is about the proximity operator of $h_{2}$ on $\mathbb{R}^{2}_{\downarrow}$ .

Theorem 3.2.

For $\rho>0$ and $\bm{x}\in\mathbb{R}_{\downarrow}^{2}$ not a multiple of $\bm{e}$ , write

\theta^{\star}=\left\{\begin{array}[]{ll}\frac{1}{2}\arctan\left(\frac{-2(2-% \rho x_{1}x_{2})}{\rho(x_{1}^{2}-x_{2}^{2})}\right),&\hbox{if $\rho x_{1}x_{2}% >2$;}\\ 0,&\hbox{if $\rho x_{1}x_{2}\leq 2$.}\end{array}\right.

then, $\bm{w}^{\star}=\begin{bmatrix}\cos\theta^{\star}&\sin\theta^{\star}\end{% bmatrix}^{\top}$ is the optimal solution to problem (3). Finally,

\mathrm{prox}_{\frac{1}{\rho}h_{2}}(\bm{x})=\left\{\begin{array}[]{ll}\{% \mathbf{0}\},&\hbox{if $\rho x_{1}x_{2}\leq 2$ and $\rho x_{1}^{2}<2$;}\\ \{\mathbf{0},x_{1}\bm{e}_{1}\},&\hbox{if $\rho x_{1}x_{2}\leq 2$ and $\rho x_{% 1}^{2}=2$;}\\ \{x_{1}\bm{e}_{1}\},&\hbox{if $\rho x_{1}x_{2}\leq 2$ and $\rho x_{1}^{2}>2$;}% \\ \{\langle\bm{x},\bm{w}^{\star}\rangle\bm{w}^{\star}\},&\hbox{if $\rho x_{1}x_{% 2}>2$.}\end{array}\right.

Proof.

For $\bm{w}\in\mathbb{S}^{1}_{+}$ , we have

G(\bm{w})=\frac{1}{2}(2-\rho x_{1}^{2})w_{1}^{2}+\frac{1}{2}(2-\rho x_{2}^{2})% w_{2}^{2}+(2-\rho x_{1}x_{2})w_{1}w_{2}.

Write $w_{1}=\cos\theta$ and $w_{2}=\sin\theta$ . The function $G$ can be written as

G(\bm{w})=\frac{1}{2}(2-\rho x_{2}^{2})-\frac{\rho}{4}(x_{1}^{2}-x_{2}^{2})+% \underbrace{\frac{\rho}{4}\rho(x_{2}^{2}-x_{1}^{2})\cos(2\theta)+\frac{1}{2}(2% -\rho x_{1}x_{2})\sin(2\theta)}_{Q(\theta):=}.

It is clear that minimizing $\bm{w}^{\top}\mathsf{A}_{\rho,\bm{x}}\bm{w}$ for $\bm{w}\in\mathbb{S}^{1}_{+}$ is equivalent to minimizing the function $F(\theta)$ for $\theta\in[0,\pi/2]$ . By Lemma 2.1 and Theorem 2.3, we can restrict the parameter $\theta\in[0,\pi/4]$ .

To investigate the global minimizer of $Q$ over the interval $\theta\in[0,\pi/4]$ , we compute the derivative of $Q$ as follows

Q^{\prime}(\theta)=\frac{1}{2}\rho(x_{1}^{2}-x_{2}^{2})\sin(2\theta)+(2-\rho x% _{1}x_{2})\cos(2\theta).

We consider two cases. Case 1: If $2-\rho x_{1}x_{2}\geq 0$ , $Q^{\prime}(\theta)\geq 0$ for $\theta\in[0,\pi/4]$ . Hence, $Q$ achieves its global minimum at $\theta=0$ . That is, $\bm{w}^{\star}=\bm{e}_{1}$ . Case 2: If $2-\rho x_{1}x_{2}<0$ , $Q^{\prime}$ has only one root $\theta^{\star}$ on $[0,\pi/4]$ , given by $\theta^{\star}=\frac{1}{2}\arctan\left(\frac{-2(2-\rho x_{1}x_{2})}{\rho(x_{1}% ^{2}-x_{2}^{2})}\right)$ . Due to $Q^{\prime}(0)=2(2-\rho x_{1}x_{2})<0$ and $Q^{\prime}(\pi/4)=\rho(x_{1}^{2}-x_{2}^{2})>0$ . Hence, $Q$ achieves its global minimum at $\theta^{\star}$ . As a result, $\bm{w}^{\star}=\begin{bmatrix}\cos\theta^{\star}&\sin\theta^{\star}\end{% bmatrix}^{\top}$ is the optimal solution to problem (3). This completes the $\bm{w}$ -step of the WRD procedure for $\mathrm{prox}_{\frac{1}{\rho}h_{2}}(\bm{x})$ . The $r$ -step follows immediately with $r^{\star}=\langle\bm{x},\bm{w}^{\star}\rangle$ .

Finally, for the $d$ -step of the WRD procedure, we only need to know the sign of $G(\bm{w}^{\star})$ . For Case 1, $G(\bm{w}^{\star})=G(\bm{e}_{1})=\frac{1}{2}(2-\rho x_{1}^{2})$ , which is positive if $\rho x_{1}^{2}<2$ , zero if $\rho x_{1}^{2}=2$ , and negative otherwise. For Case 2, $G(\bm{w}^{\star})<G(\bm{e}_{1})=\frac{1}{2}(2-\rho x_{1}^{2})<\frac{1}{2}(2-% \rho x_{1}x_{2})<0$ . So, from the sign of $G(\bm{w}^{\star})$ , we conclude $\mathrm{prox}_{\frac{1}{\rho}h_{2}}(\bm{x})$ . ∎

To close this subsection, a detailed examination of the proximity operator of $h_{2}$ with index $1/\rho$ in $\mathbb{R}^{2}$ is conducted through visual representation via plots. In addition, the proximity operator of the $\ell_{0}$ norm with index $1/\rho$ is incorporated for comparative analysis, considering $h_{2}$ as an approximation of the $\ell_{0}$ norm. The ensuing visualizations aim to provide insights into the behavior and characteristics of the proximity operator for $h_{2}$ in comparison to the $\ell_{0}$ norm, enhancing our understanding of their respective properties in $\mathbb{R}^{2}$ . As stipulated by Lemma 2.1, we exclusively present the behavior on $\mathbb{R}^{2}_{\downarrow}$ .

Figure 3.1(a) illustrates the proximity operator of the $\ell_{0}$ norm. Following the guidance from Example 2.4, the set $\mathbb{R}^{2}_{\downarrow}$ is divided into three distinct regions I, II, and III as depicted in Figure 3.1(a) and defined as follows:

Region I	$\displaystyle=$	$\displaystyle\{(x_{1},x_{2}):0\leq x_{2}\leq x_{1}\leq\sqrt{{2}/{\rho}}\},$
Region II	$\displaystyle=$	$\displaystyle\{(x_{1},x_{2}):0\leq x_{2}\leq\sqrt{{2}/{\rho}}<x_{1}\},$
Region III	$\displaystyle=$	$\displaystyle\{(x_{1},x_{2}):\sqrt{{2}/{\rho}}<x_{2}\leq x_{1}\}.$

On Region I, the $\mathrm{prox}_{\frac{1}{\rho}\|\cdot\|_{0}}$ at the corner $\sqrt{{2}/{\rho}}\bm{e}$ is $\{\mathbf{0},\sqrt{{2}/{\rho}}\bm{e},\sqrt{{2}/{\rho}}\bm{e}_{1},\sqrt{{2}/{% \rho}}\bm{e}_{2}\}$ ; at each other point on the line $x_{1}=\sqrt{{2}/{\rho}}$ , it is $\sqrt{{2}/{\rho}}\bm{e}_{1}$ ; and at each other point in Region I, it is $\mathbf{0}$ . On Region II, $\mathrm{prox}_{\frac{1}{\rho}\|\cdot\|_{0}}$ at the point $(x_{1},\sqrt{{2}/{\rho}})$ is $\{(x_{1},\sqrt{{2}/{\rho}}),x_{1}\bm{e}_{1}\}$ and at each other point $(x_{1},x_{2})$ is $x_{1}\bm{e}_{1}$ . On Region III, $\mathrm{prox}_{\frac{1}{\rho}\|\cdot\|_{0}}$ at each point is itself.

Figure 3.1(b) showcases the proximity operator of the $h_{2}$ on the line $x_{1}=x_{2}$ . The operator $\mathrm{prox}_{\frac{1}{\rho}h_{2}}$ at each point $\alpha\bm{e}$ is $\mathbf{0}$ if $\alpha<\sqrt{2/\rho}$ (blue dash-dot line); $\{\mathbf{0}\}\cup\{\alpha\|\bm{w}\|_{1}\bm{w}:\bm{w}\in\mathbb{S}_{+}^{n-1}\}$ if $\alpha=\sqrt{2/\rho}$ (marked by the square); and $\alpha\bm{e}$ itself if $\alpha>\sqrt{2/\rho}$ (magenta dot line). Comparing with the proximity operator of the $\ell_{0}$ norm, the main difference is at the point $\sqrt{2/\rho}\bm{e}$ .

Figure 3.1(c) exhibits the proximity operator of the $h_{2}$ on $\mathbb{R}^{2}_{\downarrow}$ excluding the line $x_{1}=x_{2}$ . The set $\mathbb{R}^{2}_{\downarrow}$ partitions into three regions I, II, and III as shown in Figure 3.1(c) and defined as follows:

Region I	$\displaystyle=$	$\displaystyle\{(x_{1},x_{2}):0\leq x_{2}<x_{1}\leq\sqrt{{2}/{\rho}}\},$
Region II	$\displaystyle=$	$\displaystyle\{(x_{1},x_{2}):0\leq x_{2}\leq{2}/{(\rho x_{1})},x_{1}>\sqrt{{2}% /{\rho}}\},$
Region III	$\displaystyle=$	$\displaystyle\{(x_{1},x_{2}):{2}/{(\rho x_{1})}<x_{2}\leq x_{1},x_{1}>\sqrt{{2% }/{\rho}}\}.$

On Region I, the $\mathrm{prox}_{\frac{1}{\rho}h_{2}}$ at each point on the line $x_{1}=\sqrt{{2}/{\rho}}$ is $\{\mathbf{0},\sqrt{{2}/{\rho}}\bm{e}_{1}\}$ ; the $\mathrm{prox}_{\frac{1}{\rho}h_{2}}$ at each other point is $\mathbf{0}$ . On Region II, the $\mathrm{prox}_{\frac{1}{\rho}h_{2}}$ at each point $(x_{1},x_{2})$ is $x_{1}\bm{e}_{1}$ (see the red line). On Region III, the $\mathrm{prox}_{\frac{1}{\rho}h_{2}}$ at each point $\bm{x}$ is $\langle\bm{x},\bm{w}^{\star}\rangle\bm{w}^{\star}$ , where $\bm{w}^{\star}$ is given in Theorem 3.2. Specifically, results for three lines with their slopes 0.9 (green line), 0.5 (cyan line), and 0.3 (black line) are presented, and the $\mathrm{prox}_{\frac{1}{\rho}h_{2}}$ at these points are represented by dashed lines with corresponding colors.

Refer to caption — Figure 3.1: The plots of the proximity operator in $\mathbb{R}^{2}_{\downarrow}$ for (a) the $\ell_{0}$ norm; (b) $h_{2}$ on the line with the slope $1$ ; and (c) $h_{2}$ on $\mathbb{R}^{2}_{\downarrow}$ excluding the line with the slope $1$ .

3.2 General case: the proximity operator of $h_{2}$ on $\mathbb{R}^{n}$

In the preceding subsection, we explored the determination of the proximity operator of $h_{2}$ on $\mathbb{R}^{2}$ through the WRD procedure. The central concept involved parameterizing $\mathbb{S}^{1}_{+}$ using a single variable, simplifying the resulting problem in the $\bm{w}$ -step of the WRD procedure and facilitating ease of solution. While $\mathbb{S}^{n-1}_{+}$ for $n>2$ can be parameterized by $(n-1)$ parameters, the ensuing problem in the $\bm{w}$ -step appears to be intricate for direct analysis. Consequently, alternative approaches must be considered to address and overcome the complexities associated with this scenario.

Given the pivotal role of the $\bm{w}$ -step in the WRD procedure, this subsection places particular emphasis on this phase. It is noteworthy that the objective function $G$ for the $\bm{w}$ -step is characterized as a quadratic form. In this context, we invoke the following two pertinent results.

Lemma 3.3 (Theorem 1 in [20]).

Consider the following optimization problem

\min\left\{\frac{1}{2}\bm{w}^{\top}\mathsf{H}\bm{w}+\bm{b}^{\top}\bm{w}:\|\bm{% w}\|_{2}=r\right\},

(8)

where $\mathsf{H}$ is an $n\times n$ symmetric matrix, $\bm{b}\in\mathbb{R}^{n}$ and $r$ a positive number. A vector $\bm{w}^{\star}$ is a solution to this problem if and only if there is a real number $\lambda^{\star}$ such that (i) $\mathsf{H}+\lambda^{\star}\mathsf{\mathrm{I}}$ is positive semi-definite; (ii) $(\mathsf{H}+\lambda^{\star}\mathsf{\mathrm{I}})\bm{w}^{\star}=-\bm{b}$ ; and $\|\bm{w}^{\star}\|_{2}=r$ . Such a $\lambda^{\star}$ is unique.

Lemma 3.4 ([21, 20]).

Consider the optimization problem (8). If $\bm{b}$ is orthogonal to some eigenvector associated with the smallest eigenvalue, then there is no local-nonglobal minimum for (8).

Note that both Lemma 3.3 and Lemma 3.4 consider the quadratic optimization problems constrained on a sphere. However, our problem in $\bm{w}$ -step is restricted on $\mathbb{S}^{n-1}_{+}$ .

To investigate the applicability of Lemma 3.3 for the optimization problem in the $\bm{w}$ -step, a crucial prerequisite is understanding the eigen-structure of the matrix $\mathsf{A}_{\rho,\bm{x}}$ , as defined in (7). This matrix is the sum of two rank-1 matrices; consequently, it possesses at most two non-zero eigenvalues. In order to delve into the eigen-structure of the matrix $\mathsf{A}_{\rho,\bm{x}}$ , let’s introduce a set of notations:

$\displaystyle\Delta$	$\displaystyle:=$	$\displaystyle\left(\frac{\rho}{2}\\|\bm{x}\\|_{2}^{2}+n\right)^{2}-2\rho\\|\bm{x}% \\|_{1}^{2},$	(9)
$\displaystyle\underline{\alpha}$	$\displaystyle:=$	$\displaystyle\left(\frac{\rho}{2}\\|\bm{x}\\|_{2}^{2}+n\right)-\sqrt{\Delta},$	(10)
$\displaystyle\overline{\alpha}$	$\displaystyle:=$	$\displaystyle\left(\frac{\rho}{2}\\|\bm{x}\\|_{2}^{2}+n\right)+\sqrt{\Delta}.$	(11)
$\displaystyle\underline{\lambda}$	$\displaystyle:=$	$\displaystyle 2n-\overline{\alpha}$	(12)
$\displaystyle\overline{\lambda}$	$\displaystyle:=$	$\displaystyle 2n-\underline{\alpha}$	(13)
$\displaystyle\underline{\bm{w}}$	$\displaystyle:=$	$\displaystyle\bm{x}-\frac{\underline{\alpha}}{\rho\\|\bm{x}\\|_{1}}\bm{e}$	(14)
$\displaystyle\overline{\bm{w}}$	$\displaystyle:=$	$\displaystyle\bm{x}-\frac{\overline{\alpha}}{\rho\\|\bm{x}\\|_{1}}\bm{e}.$	(15)

Observations about the above notations are as follows: The inequality $\|\bm{x}\|_{1}\leq\sqrt{n}\|\bm{x}\|_{2}$ implies that $\Delta\geq\left(\frac{\rho}{2}\|\bm{x}\|_{2}^{2}-n\right)^{2}$ , and the inequality strictly holds if $\bm{x}$ is not a multiple of $\bm{e}$ . This observation further implies that both $\underline{\alpha}$ and $\overline{\alpha}$ (given in (10) and (11)) are non-negative numbers. For $\overline{\lambda}$ (given in (12)):

\overline{\lambda}=2n-\underline{\alpha}=-\left(\frac{\rho}{2}\|\bm{x}\|_{2}^{% 2}-n\right)+\sqrt{\Delta}\geq-\left(\frac{\rho}{2}\|\bm{x}\|_{2}^{2}-n\right)+% \left|\frac{\rho}{2}\|\bm{x}\|_{2}^{2}-n\right|\geq 0,

where the equality holds if $\bm{x}$ is a multiple of $\bm{e}$ . Similarly, for $\underline{\lambda}$ (given in (13)):

\underline{\lambda}=2n-\overline{\alpha}=-\left(\frac{\rho}{2}\|\bm{x}\|_{2}^{% 2}-n\right)-\sqrt{\Delta}\leq-\left(\frac{\rho}{2}\|\bm{x}\|_{2}^{2}-n\right)-% \left|\frac{\rho}{2}\|\bm{x}\|_{2}^{2}-n\right|\leq 0,

where the equality holds if $\bm{x}$ is a multiple of $\bm{e}$ again. Hence, if $\bm{x}$ is not a multiple of $\bm{e}$ , then $\overline{\lambda}$ is positive, while $\underline{\lambda}$ is negative.

The subsequent result elucidates the eigenstructure of the matrix $\mathsf{A}_{\rho,\bm{x}}$ .

Proposition 3.5.

Let $\mathsf{A}_{\rho,\bm{x}}$ be given in (7) for $\rho>0$ and $\bm{x}\in\mathbb{R}_{+}^{n}$ . Let $\underline{\alpha}$ , $\overline{\alpha}$ , $\underline{\lambda}$ , $\overline{\lambda}$ , $\underline{\bm{w}}$ , and $\overline{\bm{w}}$ be given by (10), (11), (12), (13), (14), and (15), respectively. The following statements hold:

(i)

Assume $\bm{x}=\alpha\bm{e}$ for some $\alpha>0$ . Then the matrix $\mathsf{A}_{\rho,\bm{x}}$ has only zero as its eigenvalues if $\rho\alpha^{2}=2$ ; or has $(2-\rho\alpha^{2})n$ as its only non-zero eigenvalue with $\frac{1}{\sqrt{n}}\bm{e}$ the corresponding eigenvector.
(ii)

Assume $\bm{x}\neq\alpha\bm{e}$ for any $\alpha$ . Then the matrix $\mathsf{A}_{\rho,\bm{x}}$ has only one positive eigenvalue $\overline{\lambda}$ and one negative eigenvalue $\underline{\lambda}$ given as $\overline{\lambda}=2n-\underline{\alpha}$ and $\underline{\lambda}=2n-\overline{\alpha}$ . The corresponding eigenvectors associated with $\overline{\lambda}$ and $\underline{\lambda}$ are $\overline{\bm{w}}=\bm{x}-\frac{\overline{\alpha}}{\rho\|\bm{x}\|_{1}}\bm{e}$ and $\underline{\bm{w}}=\bm{x}-\frac{\underline{\alpha}}{\rho\|\bm{x}\|_{1}}\bm{e}$ , respectively.

Proof.

Item (i). In this case, $\mathsf{A}_{\rho,\bm{x}}=\left(2-\rho\alpha^{2}\right)\bm{e}\bm{e}^{\top}$ . Clearly, $\mathsf{A}_{\rho,\bm{x}}=\mathbf{0}$ if $\rho\alpha^{2}=2$ , so $\mathsf{A}_{\rho,\bm{x}}$ has only zero as its eigenvalues. Otherwise, $\mathsf{A}_{\rho,\bm{x}}$ has $\left(2-\rho\alpha^{2}\right)n$ as its only non-zero eigenvalue with the corresponding eigenvector $\frac{1}{\sqrt{n}}\bm{e}$ .

Item (ii). From $\mathrm{rank}(\mathsf{A}_{\rho,\bm{x}})\leq\mathrm{rank}(2\bm{e}\bm{e}^{\top})% +\mathrm{rank}({\rho}\bm{x}\bm{x}^{\top})=2$ , the matrix $\mathsf{A}_{\rho,\bm{x}}$ has at most two nonzero eigenvalues which will be found as follows. For any $\lambda$ , a direct computation gives

\mathsf{A}_{\rho,\bm{x}}(\bm{x}-\lambda\bm{e})=-{\rho}(\|\bm{x}\|_{2}^{2}-\|% \bm{x}\|_{1}\lambda)\bm{x}+2(\|\bm{x}\|_{1}-n\lambda)\bm{e}.

If the vector $\bm{x}-\lambda\bm{e}$ is the eigenvector of $\mathsf{A}_{\rho,\bm{x}}$ , then the equation

-{\rho}(\|\bm{x}\|_{2}^{2}-\|\bm{x}\|_{1}\lambda)=\frac{2(\|\bm{x}\|_{1}-n% \lambda)}{-\lambda}

holds for some $\lambda$ and the value $\frac{2(\|\bm{x}\|_{1}-n\lambda)}{-\lambda}$ is the associated eigenvalue. Simplifying the above equation leads to the following quadratic equation

\frac{\rho}{2}\|\bm{x}\|_{1}\lambda^{2}-\left(\frac{\rho}{2}\|\bm{x}\|_{2}^{2}% +n\right)\lambda+\|\bm{x}\|_{1}=0.

The discriminant of the quadratic equation with variable $\lambda$ is $\Delta$ given by (9). Since $\|\bm{x}\|_{1}^{2}<n\|\bm{x}\|_{2}^{2}$ , we have $\Delta>(\frac{\rho}{2}\|\bm{x}\|_{2}^{2}-n)^{2}\geq 0$ . Hence, the above quadratic equation has two real roots $\lambda_{1}=\frac{\overline{\alpha}}{\rho\|\bm{x}\|_{1}}$ and $\lambda_{2}=\frac{\underline{\alpha}}{\rho\|\bm{x}\|_{1}}$ . Substituting $\lambda_{1}$ and $\lambda_{2}$ into $\frac{2(\|\bm{x}\|_{1}-n\lambda)}{-\lambda}$ yield two eigenvalues $\overline{\lambda}$ and $\underline{\lambda}$ of $\mathsf{A}_{\rho,\bm{x}}$ , respectively. In this case, we know that $\overline{\lambda}>0$ and $\underline{\lambda}<0$ . The eigenvectors corresponding to $\overline{\lambda}$ and $\underline{\lambda}$ are $\overline{\bm{w}}$ and $\underline{\bm{w}}$ , respectively. ∎

We remark that for $\bm{x}\in\mathbb{R}^{n}_{\downarrow}$ , the largest component of $\underline{\bm{w}}$ in (14), that is the first component $x_{1}-\frac{\underline{\alpha}}{\rho\|\bm{x}\|_{1}}$ , is always non-negative. Actually, by (9), (10), and (14), we have

	$\displaystyle x_{1}-\frac{\underline{\alpha}}{\rho\\|\bm{x}\\|_{1}}$	$\displaystyle=$	$\displaystyle x_{1}-\frac{2\rho\\|\bm{x}\\|_{1}^{2}}{\rho\\|\bm{x}\\|_{1}\left[% \left(\frac{\rho}{2}\\|\bm{x}\\|_{2}^{2}+n\right)+\sqrt{\left(\frac{\rho}{2}\\|% \bm{x}\\|_{2}^{2}+n\right)^{2}-2\rho\\|\bm{x}\\|_{1}^{2}}\right]}$
		$\displaystyle\geq$	$\displaystyle x_{1}-\frac{2\\|\bm{x}\\|_{1}}{\left(\frac{\rho}{2}\\|\bm{x}\\|_{2}^% {2}+n\right)+\left\|\frac{\rho}{2}\\|\bm{x}\\|_{2}^{2}-n\right\|}=\left\{\begin{% array}[]{ll}x_{1}-\frac{2\\|\bm{x}\\|_{1}}{\rho\\|\bm{x}\\|_{2}^{2}},&\hbox{if $% \rho\\|\bm{x}\\|_{2}^{2}\geq 2n$;}\\ x_{1}-\frac{2\\|\bm{x}\\|_{1}}{2n},&\hbox{if $\rho\\|\bm{x}\\|_{2}^{2}<2n$,}\end{% array}\right.$

which is always non-negative. This derivation also indicates that $x_{1}-\frac{\underline{\alpha}}{\rho\|\bm{x}\|_{1}}>0$ always holds if $\bm{x}$ is not parallel to $\bm{e}$ .

If the last component $x_{n}-\frac{\underline{\alpha}}{\rho\|\bm{x}\|_{1}}$ of $\underline{\bm{w}}$ in (14) is positive, we have the following result.

Theorem 3.6.

For $\rho>0$ and $\bm{x}\in\mathbb{R}_{\downarrow}^{n}$ not being a multiple of $\bm{e}$ , if $x_{n}>\frac{\underline{\alpha}}{\rho\|\bm{x}\|_{1}}$ , then the vector $\bm{w}^{\star}:=\frac{\underline{\bm{w}}}{\|\underline{\bm{w}}\|_{2}}$ with $\underline{\bm{w}}=\bm{x}-\frac{\underline{\alpha}}{\rho\|\bm{x}\|_{1}}\bm{e}$ is the solution to the optimization problem (3). Furthermore, we have

\mathrm{prox}_{\frac{1}{\rho}h_{2}}(\bm{x})=\left\langle\bm{x},\bm{w}^{\star}% \right\rangle\bm{w}^{\star}.

Proof.

From $\bm{x}\in\mathbb{R}_{\downarrow}^{n}$ not being a multiple of $\bm{e}$ , $x_{n}>\frac{\underline{\alpha}}{\rho\|\bm{x}\|_{1}}$ , and $\underline{\alpha}$ being nonnegative, we know that $\mathbf{0}\neq\underline{\bm{w}}\in\mathbb{R}_{\downarrow}^{n}$ and $\bm{w}^{\star}\in\mathbb{R}_{\downarrow}^{n}\cap\mathbb{S}^{n-1}_{+}$ . By identifying $\mathsf{A}_{\rho,\bm{x}}$ , $\mathbf{0}$ , and $1$ as $\mathsf{H}$ , $\bm{b}$ , and $r$ in (8) of Lemma 3.3, respectively, we know that $\mathsf{A}_{\rho,\bm{x}}-\underline{\lambda}\mathsf{\mathrm{I}}$ is positive semi-definite and $(\mathsf{A}_{\rho,\bm{x}}-\underline{\lambda}\mathsf{\mathrm{I}})\underline{% \bm{w}}=\mathbf{0}$ from the item (ii) of Proposition 3.5. Therefore, the unit vector $\bm{w}^{\star}\in\mathbb{R}_{+}^{n}$ is the solution to the problem (3) from Lemma 3.3.

To determine $\mathrm{prox}_{\frac{1}{\rho}h_{2}}(\bm{x})$ , we notice that the first entries of both $\bm{x}$ and $\bm{w}^{\star}$ are positive, hence $\langle\bm{x},\bm{w}^{\star}\rangle>0$ . Furthermore, since $G(\bm{w}^{\star})=\underline{\lambda}<0$ for $G$ given in (5), we conclude that $F(\bm{w}^{\star})<F(\mathbf{0})$ for $F$ given in (1). This completes the proof of this theorem. ∎

There are two remarks on Theorem 3.6. The first one is that under the conditions of this theorem, simplifying the expression $\left\langle\bm{x},\bm{w}^{\star}\right\rangle\bm{w}^{\star}$ leads to

\mathrm{prox}_{\frac{1}{\rho}h_{2}}(\bm{x})=\frac{\|\bm{x}\|_{2}^{2}-\frac{% \underline{\alpha}}{\rho}}{\|\bm{x}\|_{2}^{2}-2\frac{\underline{\alpha}}{\rho}% +\frac{n\underline{\alpha}^{2}}{\rho^{2}\|\bm{x}\|_{1}^{2}}}\left(\bm{x}-\frac% {\underline{\alpha}}{\rho\|\bm{x}\|_{1}}\bm{e}\right).

The second remark concerns the consistency of Theorem 3.6 in $\mathbb{R}^{2}_{\downarrow}$ with Theorem 3.2. That is, if the condition $x_{2}>\frac{\underline{\alpha}}{\rho\|\bm{x}\|_{1}}$ holds, then $\rho x_{1}x_{2}>2$ and $\bm{w}^{\star}$ in both Theorem 3.6 and Theorem 3.2 are identical. To this end, and to have simpler expressions, let us denote

a:=\sqrt{\left(\frac{\rho}{2}\|\bm{x}\|_{2}^{2}+2\right)^{2}-2\rho\|\bm{x}\|_{% 1}^{2}}\quad\mbox{and}\quad b:=\left(\frac{\rho}{2}\|\bm{x}\|_{2}^{2}+2\right)% -\rho x_{2}\|\bm{x}\|_{1}.

By (10), the condition $x_{2}>\frac{\underline{\alpha}}{\rho\|\bm{x}\|_{1}}$ implies $a>b$ . We claim that $a>|b|$ . If this claim does not hold, then $b$ must be negative and $0<a\leq|b|$ . Squaring this inequality and simplifying it yield $\rho x_{1}x_{2}\leq 2$ . In this situation, $b=\frac{\rho}{2}(x_{1}^{2}-x_{2}^{2})+2-\rho x_{1}x_{2}>0$ . This contradicts the negativeness of $b$ . Hence, $a>|b|$ . Similarly, squaring this inequality and simplifying it leads to $\rho x_{1}x_{2}>2$ .

Further, defining $\beta:=2\theta^{\star}=\arctan\left(\frac{-2(2-\rho x_{1}x_{2})}{\rho(x_{1}^{2% }-x_{2}^{2})}\right)$ and with the help of the identity

\frac{\cos\theta^{\star}}{\sin\theta^{\star}}=\sqrt{1+\frac{1}{\tan^{2}\beta}}% +\frac{1}{\tan\beta},

we can show, after some simplifications, that the ratios of the entries of $\bm{w}^{\star}$ in both Theorem 3.6 and Theorem 3.2 are the same:

\frac{\cos\theta^{\star}}{\sin\theta^{\star}}=\frac{\rho x_{1}\|\bm{x}\|_{1}-% \underline{\alpha}}{\rho x_{2}\|\bm{x}\|_{1}-\underline{\alpha}},

which means that $\bm{w}^{\star}$ in both Theorem 3.6 and Theorem 3.2 are identical.

The next result discusses the property of the solution from the $\bm{w}$ -step under the condition that the last component $x_{n}-\frac{\underline{\alpha}}{\rho\|\bm{x}\|_{1}}$ of $\underline{\bm{w}}$ in (14) is non-positive.

Theorem 3.7.

For $\rho>0$ and $\bm{x}\in\mathbb{R}_{\downarrow}^{n}$ , let $\bm{w}^{\star}$ be the optimal solution to the optimization problem (3). If $x_{n}\leq\frac{\underline{\alpha}}{\rho\|\bm{x}\|_{1}}$ , then $\left(\bm{w}^{\star}\right)_{n}=0$ .

Proof.

Suppose that all components of $\bm{w}^{\star}$ are positive, Then $\bm{w}^{\star}\in\mathbb{S}_{+}^{n-1}\subset\mathbb{S}^{n-1}$ . So $\bm{w}^{\star}$ is a local minimizer of

\min\left\{\frac{1}{2}\bm{w}^{\top}\mathsf{A}_{\rho,\bm{x}}\bm{w}:\bm{w}\in% \mathbb{S}^{n-1}\right\}.

(17)

As the zero vector is orthogonal any vector, it naturally follows that it is orthogonal to $\underline{\bm{w}}$ , the eigenvector of $\mathsf{A}_{\rho,\bm{x}}$ associated with the negative eigenvalue $\underline{\lambda}$ . By Lemma 3.4, there is no local-nonglobal minimum for (17). Hence $\bm{w}^{\star}$ is the global minimizer of problem (17). As a result, $\bm{w}^{\star}=\frac{\underline{\bm{w}}}{\|\underline{\bm{w}}\|_{2}}$ , whose last component is less than $0$ by the given condition $x_{n}\leq\frac{\underline{\alpha}}{\rho\|\bm{x}\|_{1}}$ . This completes our proof. ∎

To have an efficient approach for computing the proximity operator of $h_{2}$ , let us access the entries of the matrix $\mathsf{A}_{\rho,\bm{x}}$ , which are

\mathsf{A}_{\rho,\bm{x}}=\begin{bmatrix}2-\rho x_{1}^{2}&2-\rho x_{1}x_{2}&% \cdots&2-\rho x_{1}x_{n}\\ 2-\rho x_{2}x_{1}&2-\rho x_{2}^{2}&\cdots&2-\rho x_{2}x_{n}\\ \vdots&\vdots&\ddots&\vdots\\ 2-\rho x_{n}x_{1}&2-\rho x_{n}x_{2}&\cdots&2-\rho x_{n}^{2}\end{bmatrix}.

Since $\bm{x}\in\mathbb{R}^{n}_{\downarrow}$ , the numbers of entries in each row, each column, and each diagonal are increasing corresponding to the indices of the entries. Based on the structure of this matrix, we define a function $\mu$ that maps every pair $(\rho,\bm{x})$ with $\rho$ and $\bm{x}\in\mathbb{R}^{n}_{\downarrow}$ to a non-negative integer as follows:

\mu(\rho,\bm{x}):=\left\{\begin{array}[]{ll}0,&\hbox{if $(\mathsf{A}_{\rho,\bm% {x}})_{11}\geq 0$;}\\ k,&\hbox{if there exists $1\leq k<n$ such that $(\mathsf{A}_{\rho,\bm{x}})_{1k% }<0$ and $(\mathsf{A}_{\rho,\bm{x}})_{1(k+1)}\geq 0$;}\\ n,&\hbox{if $(\mathsf{A}_{\rho,\bm{x}})_{1n}<0$.}\end{array}\right.

(18)

This number $\mu(\rho,\bm{x})$ counts how many negative components in the vector $2\bm{e}-\rho x_{1}\bm{x}$ . As $2\bm{e}-\rho x_{1}\bm{x}$ is the first column of the matrix $\mathsf{A}_{\rho,\bm{x}}$ , with the number $\mu(\rho,\bm{x})$ , we consider three cases for the matrix $\mathsf{A}_{\rho,\bm{x}}$ in the following theorem.

Theorem 3.8.

Let $\rho>0$ and let $\bm{x}\in\mathbb{R}_{\downarrow}^{n}$ . Set $k=\mu(\rho,\bm{x})$ . Then the following statements hold:

(i)

If $k=0$ , then $\bm{e}_{1}$ is the global minimizer to the optimization problem (3);

(ii)

If $1\leq k\leq n$ , then the vector

\begin{bmatrix}\tilde{\bm{w}}^{\star}\\ \mathbf{0}_{(n-k)\times 1}\end{bmatrix}

is the global minimizer to the optimization problem (3), where $\tilde{\bm{w}}^{\star}$ is the minimizer of the problem

\min_{\bm{w}\in\mathbb{S}^{k-1}_{+}}\frac{1}{2}\bm{w}^{\top}\mathsf{A}_{\rho,% \bm{x}_{[k]}}\bm{w}.

Here $\mathsf{A}_{\rho,\bm{x}_{[k]}}$ is the $k$ -order leading principal submatrix of $\mathsf{A}_{\rho,\bm{x}}$ obtained by removing its last $(n-k)$ rows and columns.

Proof.

(i) For $\bm{x}\in\mathbb{R}_{\downarrow}^{n}$ , from the fact $(\mathsf{A}_{\rho,\bm{x}})_{11}\geq 0$ , we conclude that $(\mathsf{A}_{\rho,\bm{x}})_{ij}\geq(\mathsf{A}_{\rho,\bm{x}})_{11}\geq 0$ for all $i,j\in[n]$ . Therefore, for all $\bm{w}\in\mathbb{S}^{n-1}_{+}$ , we have

\frac{1}{2}\bm{w}^{\top}\mathsf{A}_{\rho,\bm{x}}\bm{w}\geq\frac{1}{2}(\mathsf{% A}_{\rho,\bm{x}})_{11}\sum_{i,j=1}^{n}w_{i}w_{j}=\frac{1}{2}(\mathsf{A}_{\rho,% \bm{x}})_{11}\|\bm{w}\|_{1}^{2}\geq\frac{1}{2}(\mathsf{A}_{\rho,\bm{x}})_{11}% \|\bm{w}\|_{2}^{2}=\frac{1}{2}(\mathsf{A}_{\rho,\bm{x}})_{11}.

The inequalities in the above can be achieved for $\bm{w}=\bm{e}_{1}$ .

(ii) In this case, we split the matrix $\mathsf{A}_{\rho,\bm{x}}$ into $2\times 2$ block matrix as follows

\mathsf{A}_{\rho,\bm{x}}=\begin{bmatrix}\mathsf{A}_{11}&\mathsf{A}_{12}\\ \mathsf{A}_{21}&\mathsf{A}_{22}\end{bmatrix},

where $\mathsf{A}_{11}$ , $\mathsf{A}_{12}$ , $\mathsf{A}_{21}$ , and $\mathsf{A}_{22}$ are size $k\times k$ , $k\times(n-k)$ , $(n-k)\times k$ , and $(n-k)\times(n-k)$ , respectively. In fact, $\mathsf{A}_{11}=\mathsf{A}_{\rho,\bm{x}_{[k]}}$ . We further know that all entries in $\mathsf{A}_{12}$ , $\mathsf{A}_{21}$ , and $\mathsf{A}_{22}$ are non-negative. For any $\bm{w}\in\mathbb{S}^{n-1}_{+}$ , write

\bm{w}=\begin{bmatrix}\bm{w}_{1}\\ \bm{w}_{2}\end{bmatrix}

with $\bm{w}_{1}\in\mathbb{R}^{k}$ and $\bm{w}_{2}\in\mathbb{R}^{n-k}$ . We have

\bm{w}^{\top}\mathsf{A}_{\rho,\bm{x}}\bm{w}=\bm{w}_{1}^{\top}\mathsf{A}_{11}% \bm{w}_{1}+\bm{w}_{1}^{\top}\mathsf{A}_{12}\bm{w}_{2}+\bm{w}_{2}^{\top}\mathsf% {A}_{21}\bm{w}_{1}+\bm{w}_{2}^{\top}\mathsf{A}_{22}\bm{w}_{2}\geq\bm{w}_{1}^{% \top}\mathsf{A}_{11}\bm{w}_{1}.

The inequality $2-\rho x_{1}x_{k}<0$ implies $\min_{\bm{w}_{1}}\bm{w}_{1}^{\top}\mathsf{A}_{11}\bm{w}_{1}<0$ . Thus,

\min_{\bm{w}\in\mathbb{S}^{n-1}_{+}}\frac{1}{2}\bm{w}^{\top}\mathsf{A}_{\rho,% \bm{x}}\bm{w}\geq\min_{\bm{w}\in\mathbb{S}^{n-1}_{+}}\frac{1}{2}\bm{w}_{1}^{% \top}\mathsf{A}_{11}\bm{w}_{1}\geq\min_{\tilde{\bm{w}}\in\mathbb{S}^{k-1}_{+}}% \frac{1}{2}\tilde{\bm{w}}^{\top}\mathsf{A}_{11}\tilde{\bm{w}}.

In particular, for all vectors $\tilde{\bm{w}}\in\mathbb{S}^{n-1}_{+}$ with $\bm{w}_{2}=\mathbf{0}$ , one has

\frac{1}{2}\tilde{\bm{w}_{1}}^{\top}\mathsf{A}_{11}\tilde{\bm{w}_{1}}=\frac{1}% {2}\tilde{\bm{w}}^{\top}\mathsf{A}_{\rho,\bm{x}}\tilde{\bm{w}}\geq\min_{\bm{w}% \in\mathbb{S}^{n-1}_{+}}\frac{1}{2}\bm{w}^{\top}\mathsf{A}_{\rho,\bm{x}}\bm{w}.

We conclude that

\min_{\bm{w}\in\mathbb{S}^{n-1}_{+}}\frac{1}{2}\bm{w}^{\top}\mathsf{A}_{\rho,% \bm{x}}\bm{w}=\min_{\tilde{\bm{w}}\in\mathbb{S}^{k-1}_{+}}\frac{1}{2}\tilde{% \bm{w}}^{\top}\mathsf{A}_{11}\tilde{\bm{w}}.

This completes the proof. ∎

We remark that not all entries of $\tilde{\bm{w}}^{\star}$ in Theorem 3.8 are necessarily positive, and some entries may be zero, as demonstrated in the following example.

Example 3.9.

Let

\bm{x}=\begin{bmatrix}2.5&1.5&1&0.5\end{bmatrix}^{\top}.

For this vector and two different values of $\rho$ , we present the matrix $\mathsf{A}_{\rho,\bm{x}}$ , its eigenvector $\bm{v}$ associated with the negative eigenvalue, and $\bm{w}^{\star}$ the minimizer of the problem $\min_{\bm{w}\in\mathbb{S}^{3}_{+}}\frac{1}{2}\bm{w}^{\top}\mathsf{A}_{\rho,\bm% {x}}\bm{w}$ .

For $\rho_{1}=2.5$ , we have $\mathsf{A}_{\rho_{1},\bm{x}}$ , $\bm{v}_{1}$ , and $\bm{w}_{1}^{\star}$ as follows:

\mathsf{A}_{\rho_{1},\bm{x}}=\frac{1}{8}\begin{bmatrix}-109&-59&-34&-9\\ -59&-29&-14&1\\ -34&-14&-4&6\\ -9&1&6&11\end{bmatrix},\bm{v}_{1}=\begin{bmatrix}0.8598\\ 0.4481\\ 0.2422\\ 0.0363\end{bmatrix},\quad\mbox{and}\quad\bm{w}_{1}^{\star}=\begin{bmatrix}0.85% 98\\ 0.4481\\ 0.2422\\ 0.0363\end{bmatrix}.

For $\rho_{2}=1.8$ , we have $\mathsf{A}_{\rho_{2},\bm{x}}$ , $\bm{v}_{2}$ , and $\bm{w}_{2}^{\star}$ as follows:

\mathsf{A}_{\rho_{2},\bm{x}}=\begin{bmatrix}-9.25&-4.75&-2.50&-0.25\\ -4.75&-2.05&-0.70&0.65\\ -2.50&-0.70&0.20&1.10\\ -0.25&0.65&1.10&1.55\end{bmatrix},\bm{v}_{2}=\begin{bmatrix}0.8795\\ 0.4294\\ 0.2043\\ -0.0207\end{bmatrix},\quad\mbox{and}\quad\bm{w}_{2}^{\star}=\begin{bmatrix}0.8% 804\\ 0.4286\\ 0.2027\\ 0\end{bmatrix}.

Notice that for the values $\rho_{1}=2.5$ and $\rho_{2}=1.8$ , both meet the condition $2-\rho x_{1}x_{4}<0$ , that is $\mu(\rho_{1},\bm{x})=\mu(\rho_{2},\bm{x})=4$ . However, this does not determine the positivity of all components in $\bm{w}^{\star}$ .

We can establish that $h_{2}$ acts as a promoter of sparsity from Theorem 3.8 under the situation of $\mu(\rho,\bm{x})=0$ . This assertion is encapsulated in the subsequent result.

Theorem 3.10.

For $\rho>0$ , the following inclusion holds for all $\bm{x}$ in the set $\{\bm{x}\in\mathbb{R}^{n}:\|\bm{x}\|_{\infty}\leq\sqrt{{2}/{\rho}}\}$ :

\mathbf{0}\in\mathrm{prox}_{\frac{1}{\rho}h_{2}}(\bm{x}).

Proof.

By Lemma 2.1, it suffices to consider all points in the set $\mathbb{R}^{n}_{\downarrow}$ with their $\ell_{\infty}$ norm smaller than $\sqrt{{2}/{\rho}}$ . For $\bm{x}\in\mathbb{R}^{n}_{\downarrow}$ , we examine two scenarios. If $\bm{x}=\alpha\bm{e}$ with $\alpha\leq\sqrt{{2}/{\rho}}$ , the result holds due to Theorem 3.1. If $\bm{x}\neq\alpha\bm{e}$ for any $\alpha>0$ , by Theorem 3.8 we have $G(\bm{e}_{1})=\frac{1}{2}(2-\rho x_{1}^{2})\geq 0$ , hence, the results holds as well. ∎

This theorem underscores the sparse-promoting nature of $h_{2}$ within the specified domain.

Given $\rho>0$ and $\bm{x}\in\mathbb{R}_{\downarrow}^{n}$ , Theorem 3.8 provides a clear guideline for algorithm development when computing the optimal solution $\bm{w}$ to problem (3), eventually, $\mathrm{prox}_{\frac{1}{\rho}h_{2}}(\bm{x})$ . If there exists an integer $k\in[1,n-1]$ such that $2-\rho x_{1}x_{k}<0$ and $2-\rho x_{1}x_{k+1}\geq 0$ , it follows that $w_{k+1}=\cdots=w_{n}=0$ . This allows us to safely truncate $\bm{x}$ by removing its last $n-k$ entries. This approach can significantly speed up the computation process by focusing only on the relevant components of $\bm{x}$ .

We are ready now to present our algorithm for computing $\operatorname*{prox}_{\frac{1}{\rho}h_{2}}$ based on our WRD procedure for arbitrary $\bm{x}\in\mathbb{R}^{n}$ . This algorithm is presented in Algorithm 1.

Algorithm 1 Computing the Proximal Operator of

h_{2}

1:Input: Vector

\bm{x}\in\mathbb{R}^{n}

, parameter

\rho>0

2:Output: The proximal operator

\text{prox}_{\frac{1}{\rho}h_{2}}(\bm{x})

3:procedure (WRD Procedure)

4: Sort and convert

\bm{x}

into

\mathbb{R}^{n}_{\downarrow}

via a signed permutation matrix

\mathsf{P}

5: Compute

k=\mu(\rho,\bm{x})

by (18)

6: if

k=0

then

\bm{w}=\bm{e}_{1}

(see item (i) of Theorem 3.8)

8: else(

\bm{w}

-step)

9: for

k:-1:1

10: Forming a vector (still denoted by

\bm{x}

) from the first

k

entries of

\bm{x}

11: if

\bm{x}=\alpha\bm{e}

for some

\alpha>0

then

12:

\bm{u}=\text{prox}_{\frac{1}{\rho}h_{2}}(\bm{x})

by Theorem 3.1

13: else if

k=2

then

14: return

\bm{w}

by Theorem 3.2

15: else

16: if the last entry of

\underline{\bm{w}}

by (14), is greater than

0

then

17: return

\bm{w}\leftarrow\frac{\underline{\bm{w}}}{\|\underline{\bm{w}}\|_{2}}

by Theorem 3.6

18: end if

19: end if

20: end for

21: end if

22: Pad

\bm{w}

with a zero block such that the resulting vector, still denoted by

\bm{w}

, is in

\mathbb{S}^{n-1}_{+}

23: Form

\bm{u}\leftarrow\langle\bm{x},\bm{w}\rangle\bm{w}

(

r

-step)

24: Determine

\bm{u}\leftarrow\left\{\begin{array}[]{ll}\mathbf{0},&\hbox{if $F(\mathbf{0})% \leq F(\bm{u})$;}\\ \bm{u},&\hbox{otherwise.}\end{array}\right.

(

d

-step)

25:

\bm{u}\leftarrow\mathsf{P}^{-1}\bm{u}\in\text{prox}_{\frac{1}{\rho}h_{2}}(\bm{% x})

26:end procedure

4 The Proximal Operator of $h_{1}$

In this section, we detail the computation of the proximal operator for the function $h_{1}$ via the WRD procedure.

We begin with showing the optimization problem (3) associated with the $\bm{w}$ -step of the WRD procedure. For the given $\rho>0$ and $\bm{x}\in\mathbb{R}_{+}^{n}$ , defining

\mathsf{A}_{\rho,\bm{x}}=-\rho\cdot\bm{x}\bm{x}^{\top}.

(19)

The corresponding function $G$ in (5) for $h_{1}$ becomes the quadratic form

G(\bm{w})=\frac{1}{2}\bm{w}^{\top}\mathsf{A}_{\rho,\bm{x}}\bm{w}+\bm{e}^{\top}% \bm{w}.

By Lemma 2.1, our focus is restricted to discussing the proximity operator of $h_{1}$ on $\mathbb{R}_{\downarrow}^{n}$ . This discussion unfolds in the subsequent three subsections.

In the first subsection, we highlight that the method for $h_{2}$ , as delineated in Section 3, cannot be directly applied to $h_{1}$ , despite the initial feasibility of such a transfer, particularly considering their analogous reformulations. Additionally, we provide the explicit expression of the proximity operator of $h_{1}$ at specific points, highlighting that $h_{1}$ serves as a function that promotes sparsity.

The second subsection conducts an in-depth examination of the proximity operator of $h_{1}$ in $\mathbb{R}^{2}$ . Notably, the method tailored for this task poses challenges in its extension to higher dimensions.

In the third subsection, we introduce a strategy to transform the optimization problem in the $\bm{w}$ -step of the WRD procedure. This transformation entails converting a concave objective function constrained on a nonconvex set into one with the same objective function but constrained on a closed and bounded convex set. The latter can be efficiently solved using the nonconvex gradient projection algorithm (see [8]).

4.1 The approach for $h_{2}$ does not work for $h_{1}$

Initially, it may seem feasible to directly apply the method for $h_{2}$ described in Section 3 to $h_{1}$ , especially given their similar reformulations. However, we want to point out that this approach is not directly transferable to $h_{1}$ . This becomes evident when considering Lemma 3.3, which leads us to the subsequent result.

Proposition 4.1.

For $\bm{x}\in\mathbb{R}_{+}^{n}$ and $\rho>0$ , we consider a quadratic optimization problem on the unit sphere as follows

\min\left\{\frac{1}{2}\bm{w}^{\top}\mathsf{A}_{\rho,\bm{x}}\bm{w}+\bm{e}^{\top% }\bm{w}:\bm{w}\in\mathbb{S}^{n-1}\right\}.

(20)

A vector $\bm{w}^{\star}$ is a solution to (4.1) if and only if there is a unique $\lambda^{\star}>\rho\|\bm{x}\|_{2}^{2}$ such that

(\mathsf{A}_{\rho,\bm{x}}+\lambda^{\star}\mathsf{\mathrm{I}})\bm{w}^{\star}=-% \bm{e}

with $\bm{w}^{\star}$ being a unit vector.

Proof.

Problem (20) is a special case of problem (8) by identifying $\mathsf{A}_{\rho,\bm{x}}$ , $\bm{e}$ , and $1$ as $\mathsf{H}$ , $\bm{b}$ , and $r$ , respectively.

The matrix $\mathsf{A}_{\rho,\bm{x}}=-\rho\bm{x}\bm{x}^{\top}$ is a rank-1 matrix and has $-\rho\|\bm{x}\|_{2}^{2}$ as its only one non-zero eigenvalue with the associated unit eigenvector $\frac{\bm{x}}{\|\bm{x}\|_{2}}$ . Hence, for any $\lambda\geq\rho\|\bm{x}\|_{2}^{2}$ , the matrix $\mathsf{A}_{\rho,\bm{x}}+\lambda\mathsf{\mathrm{I}}$ is positive semidefinite.

“ $\Rightarrow$ ” If $\bm{w}^{*}$ is the optimal solution to problem (20), by Lemma 3.3, there exists a unique $\lambda^{\star}\geq\rho\|\bm{x}\|_{2}^{2}$ such that $(\mathsf{A}_{\rho,\bm{x}}+\lambda^{\star}\mathsf{\mathrm{I}})\bm{w}^{\star}=-% \bm{e}$ with $\bm{w}^{\star}$ being a unit vector. We claim that $\lambda^{\star}>\rho\|\bm{x}\|_{2}^{2}$ . If not, assume that $\lambda^{\star}=\rho\|\bm{x}\|_{2}^{2}$ , and let $\mathsf{U}$ be an orthogonal matrix whose the first column is $\frac{\bm{x}}{\|\bm{x}\|_{2}}$ . Then, the equality $(\mathsf{A}_{\rho,\bm{x}}+\lambda^{\star}\mathsf{\mathrm{I}})\bm{w}^{\star}=-% \bm{e}$ leads to

\mathsf{U}\begin{bmatrix}0&&&\\ &\rho\|\bm{x}\|_{2}^{2}&&\\ &&\ddots&\\ &&&\rho\|\bm{x}\|_{2}^{2}\end{bmatrix}\mathsf{U}^{\top}\bm{w}^{\star}=-\bm{e}% \quad\mbox{or}\quad\begin{bmatrix}0&&&\\ &\rho\|\bm{x}\|_{2}^{2}&&\\ &&\ddots&\\ &&&\rho\|\bm{x}\|_{2}^{2}\end{bmatrix}\mathsf{U}^{\top}\bm{w}^{\star}=-\begin{% bmatrix}\frac{\|\bm{x}\|_{1}}{\|\bm{x}\|_{2}}\\ \star\\ \vdots\\ \star\end{bmatrix},

which is inconsistent. Hence, $\lambda^{\star}$ is strictly greater than $\rho\|\bm{x}\|_{2}^{2}$ .

“ $\Leftarrow$ ” We show that there exists an $\lambda>\rho\|\bm{x}\|_{2}^{2}$ such that $\|(\mathsf{A}_{\rho,\bm{x}}+\lambda\mathsf{\mathrm{I}})^{-1}\bm{e}\|_{2}=1$ . For $\lambda\neq\rho\|\bm{x}\|_{2}^{2}$ , the matrix $\mathsf{A}_{\rho,\bm{x}}+\lambda\mathsf{\mathrm{I}}$ is invertible and its inverse is

(\mathsf{A}_{\rho,\bm{x}}+\lambda\mathsf{\mathrm{I}})^{-1}=\frac{1}{\lambda}% \left(\mathsf{\mathrm{I}}+\frac{\rho}{\lambda-\rho\|\bm{x}\|_{2}^{2}}\bm{x}\bm% {x}^{\top}\right).

For $\lambda>\rho\|\bm{x}\|_{2}^{2}$ , from $\|(\mathsf{A}_{\rho,\bm{x}}+\lambda\mathsf{\mathrm{I}})^{-1}\bm{e}\|_{2}=1$ together with the above equation, we obtain

\left\|(\lambda-\rho\|\bm{x}\|_{2}^{2})\bm{e}+\rho\|\bm{x}\|_{1}\bm{x}\right\|% _{2}=\lambda(\lambda-\rho\|\bm{x}\|_{2}^{2}).

(21)

To study the root the above equation, we consider two different cases: (i) $\bm{x}=\alpha\bm{e}$ for some $\alpha>0$ and (ii) $\bm{x}\neq\alpha\bm{e}$ for any $\alpha>0$ .

For the case of $\bm{x}=\alpha\bm{e}$ for some $\alpha>0$ , one has $\|\bm{x}\|_{1}=\alpha n$ and $\|\bm{x}\|_{2}=\alpha\sqrt{n}$ . It leads from (21) that $\lambda\sqrt{n}=\lambda(\lambda-\rho\alpha^{2}n)$ . This equation has two real roots and the only root, that is larger than $\rho\|\bm{x}\|_{2}^{2}$ , is $\lambda^{\star}=\sqrt{n}+\rho\alpha^{2}n>\rho\alpha^{2}n=\rho\|\bm{x}\|_{2}^{2}$ . By Lemma 3.3,

\bm{w}^{\star}=-\frac{1}{\sqrt{n}}\bm{e}

(22)

is the optimal solution to problem (20).

The rest of the proof considers the case of $\bm{x}\neq\alpha\bm{e}$ for any $\alpha>0$ . Squaring the identity (21) from its both sides and simplifying the resulting equation lead to the following quartic equation

Q(q)=0,

where $q=\lambda-\rho\|\bm{x}\|_{2}^{2}$ and

Q(q)=q^{4}+2\rho\|\bm{x}\|_{2}^{2}q^{3}+(\rho^{2}\|\bm{x}\|_{2}^{4}-n)q^{2}-2% \rho\|\bm{x}\|_{1}^{2}q-\rho^{2}\|\bm{x}\|_{1}^{2}\|\bm{x}\|_{2}^{2}.

Since $Q(0)=-\rho^{2}\|\bm{x}\|_{1}^{2}\|\bm{x}\|_{2}^{2}<0$ and $Q(q)$ is positive for a sufficient large $q$ , there exists at least one root of $Q$ on the interval $[0,\infty)$ . No matter what value of $(\rho^{2}\|\bm{x}\|_{2}^{4}-n)$ will be, the number of sign changes of the polynomial $Q$ is $1$ . Therefore, by Descartes’ Rule of Signs [22], we conclude that $Q$ has exactly one positive root, say $q^{\star}$ . Hence, with $\lambda^{\star}=q^{\star}+\rho\|\bm{x}\|_{2}^{2}$ ,

\bm{w}^{\star}=(\mathsf{A}_{\rho,\bm{x}}+\lambda^{*}\mathsf{\mathrm{I}})^{-1}(% -\bm{e})=-\frac{1}{\lambda^{*}}\left(\bm{e}+\frac{\rho\|\bm{x}\|_{1}}{\lambda^% {*}-\rho\|\bm{x}\|_{2}^{2}}\bm{x}\right)

(23)

is the optimal solution to problem (20) by Lemma 3.3 again. ∎

It is evident from the preceding proof that all entries of the optimal solution $\bm{w}^{\star}$ , as indicated in (22) and (23), are negative. Consequently, this vector $\bm{w}^{\star}$ cannot serve as the solution to problem (3). Therefore, the methodology employed for $h_{2}$ is not applicable to $h_{1}$ , necessitating a distinct approach.

Next, we provide the proximity operator of $h_{1}$ for vectors $\bm{x}$ with uniform entries.

Theorem 4.2.

For $\rho>0$ and $\bm{x}=\alpha\bm{e}\in\mathbb{R}^{n}$ for some $\alpha>0$ , then

\mathrm{prox}_{\frac{1}{\rho}h_{1}}(\bm{x})=\begin{cases}\{\bm{0}\},&\text{if % }\alpha<\sqrt{\frac{2}{\rho\sqrt{n}}}\\ \{\bm{0},\bm{x}\},&\text{if }\alpha=\sqrt{\frac{2}{\rho\sqrt{n}}};\\ \{\bm{x}\},&\text{if }\alpha>\sqrt{\frac{2}{\rho\sqrt{n}}}.\end{cases}

Proof.

In this situation, we have $\mathsf{A}_{\rho,x}=-\rho\alpha^{2}\bm{e}\bm{e}^{\top}$ from (19). The objective function of problem (3) is

G(\bm{w})=\frac{1}{2}\bm{w}^{\top}\mathsf{A}_{\rho,x}\bm{w}+\bm{e}^{\top}\bm{w% }=-\frac{1}{2}\rho\alpha^{2}\|\bm{w}\|_{1}^{2}+\|\bm{w}\|_{1}=-\frac{1}{2}\rho% \alpha^{2}\left(\|\bm{w}\|_{1}-\frac{1}{\rho\alpha^{2}}\right)^{2}+\frac{1}{2% \rho\alpha^{2}},

where $\bm{w}\in\mathbb{S}^{n-1}_{+}$ . Note that $\|\bm{w}\|_{1}\in[1,\sqrt{n}]$ for all $\bm{w}\in\mathbb{S}^{n-1}_{+}$ , the above quantity achieves its global minimum at $\|\bm{w}\|_{1}$ being $1$ or $\sqrt{n}$ , depending on which one is further away to $\frac{1}{\rho\alpha^{2}}$ . Hence, $\|\bm{w}^{\star}\|_{1}$ the $\ell_{1}$ norm of the optimal solution $\bm{w}^{\star}$ to problem (3) is $\sqrt{n}$ if $\frac{1}{\rho\alpha^{2}}<\frac{1}{2}(1+\sqrt{n})$ ; $1$ or $\sqrt{n}$ if $\frac{1}{\rho\alpha^{2}}=\frac{1}{2}(1+\sqrt{n})$ ; or $1$ if $\frac{1}{\rho\alpha^{2}}>\frac{1}{2}(1+\sqrt{n})$ . As a result, the $\bm{w}$ -step of the WRD procedure provides the optimal solution $\bm{w}^{\star}$ to problem (3) as follows:

\bm{w}^{\star}\in\begin{cases}\{\frac{1}{\sqrt{n}}\bm{e}\},&\text{if }\frac{1}% {\rho\alpha^{2}}<\frac{1}{2}(1+\sqrt{n});\\ \{\frac{1}{\sqrt{n}}\bm{e}\}\cup\{\bm{e_{i}}:i=1,\ldots,n\},&\text{if }\frac{1% }{\rho\alpha^{2}}=\frac{1}{2}(1+\sqrt{n});\\ \{\bm{e_{i}}:i=1,\ldots,n\},&\text{if }\frac{1}{\rho\alpha^{2}}>\frac{1}{2}(1+% \sqrt{n}).\end{cases}

The $r$ -step of the WRD procedure simply follows with $r^{\star}=\langle\bm{x},\bm{w}^{\star}\rangle$ . At the $d$ -step of the WRD procedure, we compare $F(r^{\star}\bm{w}^{\star})$ and $F(\bm{0})$ with $F$ defined in (1). Note that

F(r^{\star}\bm{w}^{\star})-F(\bm{0})=G(\bm{w}^{\star})=\begin{cases}-\frac{1}{% 2}\rho\alpha^{2}n+\sqrt{n},&\text{if }\frac{1}{\rho\alpha^{2}}<\frac{1}{2}(1+% \sqrt{n});\\ \frac{\sqrt{n}}{1+\sqrt{n}},&\text{if }\frac{1}{\rho\alpha^{2}}=\frac{1}{2}(1+% \sqrt{n});\\ -\frac{1}{2}\rho\alpha^{2}+1,&\text{if }\frac{1}{\rho\alpha^{2}}>\frac{1}{2}(1% +\sqrt{n}).\end{cases}

We see that under the condition $\frac{1}{\rho\alpha^{2}}<\frac{1}{2}(1+\sqrt{n})$ , the quality $F(r^{\star}\bm{w}^{\star})-F(\bm{0})=-\frac{1}{2}\rho\alpha^{2}n+\sqrt{n}$ is positive if $\frac{1}{\rho\alpha^{2}}>\frac{\sqrt{n}}{2}$ , zero if $\frac{1}{\rho\alpha^{2}}=\frac{\sqrt{n}}{2}$ , or negative if $\frac{1}{\rho\alpha^{2}}<\frac{\sqrt{n}}{2}$ ; Under the condition $\frac{1}{\rho\alpha^{2}}=\frac{1}{2}(1+\sqrt{n})$ , $F(r^{\star}\bm{w}^{\star})-F(\bm{0})=\frac{\sqrt{n}}{1+\sqrt{n}}>0$ ; Under the condition $\frac{1}{\rho\alpha^{2}}>\frac{1}{2}(1+\sqrt{n})$ , i.e., $-\frac{1}{2}\rho\alpha^{2}>\frac{-1}{1+\sqrt{n}}$ , we have $F(r^{\star}\bm{w}^{\star})-F(\bm{0})=-\frac{1}{2}\rho\alpha^{2}+1>\frac{\sqrt{% n}}{1+\sqrt{n}}$ always positive. The result of this theorem follows from (2). ∎

The next result shows that the function $h_{1}$ is indeed a sparse promoting function whose proximity operator will send the points in a neighborhood of the origin to the origin (see [17]).

Theorem 4.3.

For $\rho>0$ , the following inclusion

\mathbf{0}\in\mathrm{prox}_{\frac{1}{\rho}h_{1}}(\bm{x})

holds for $\bm{x}\in\mathbb{R}^{n}_{+}$ with $\|\bm{x}\|_{2}\leq\sqrt{\frac{2}{\rho}}$ .

Proof.

Let $G$ be the objective function of problem (3) associated with $h_{1}$ . For $\bm{w}\in\mathbb{S}^{n-1}_{+}$ , we have

G(\bm{w})=-\frac{\rho}{2}\langle\bm{x},\bm{w}\rangle^{2}+\bm{e}^{\top}\bm{w}% \geq-\frac{\rho}{2}\|\bm{x}\|_{2}^{2}+1\geq 0

for $\bm{x}\in\mathbb{R}^{n}_{+}$ with $\|\bm{x}\|_{2}\leq\sqrt{\frac{2}{\rho}}$ . We further have $F(\langle\bm{x},\bm{w}\rangle\bm{w})-F(\mathbf{0})=G(\bm{w})\geq 0$ , where $F$ is defined in (1). Hence, $\mathbf{0}\in\mathrm{prox}_{\frac{1}{\rho}h_{1}}(\bm{x})$ . ∎

4.2 Special case: the proximity operator of $h_{1}$ on $\mathbb{R}^{2}$

The following result establishes a region in which the proximity operator of $h_{1}$ does not vanish on $\mathbb{R}_{\downarrow}^{2}$ .

Proposition 4.4.

For $\rho>0$ , define two sets in $\mathbb{R}_{\downarrow}^{2}$ as follows:

	$\displaystyle S_{1}$	$\displaystyle=$	$\displaystyle\left\{\bm{x}\in\mathbb{R}_{\downarrow}^{2}:x_{1}>\sqrt{\frac{2}{% \rho}}\right\},$
	$\displaystyle S_{2}$	$\displaystyle=$	$\displaystyle\left\{\bm{x}\in\mathbb{R}_{\downarrow}^{2}:x_{2}=\kappa x_{1},x_% {1}>\sqrt{\frac{2(1+\kappa)}{\rho(1+\kappa^{2})^{3/2}}},\kappa\in[0,1]\right\}.$

Then, the origin is not in $\mathrm{prox}_{\frac{1}{\rho}h_{1}}(\bm{x})$ for every point $\bm{x}\in S_{1}\cup S_{2}$ .

Proof.

For each point $\bm{x}\in S_{1}\cup S_{2}$ , to prove the origin is not in $\mathrm{prox}_{\frac{1}{\rho}h_{1}}(\bm{x})$ it is sufficient to show that there exists a point, say $\bm{z}$ , in $\mathbb{R}_{\downarrow}^{2}$ such that $F(\bm{z})-F(\mathbf{0})<0$ , where $F$ is defined in (1).

First, we choose $\bm{z}=x_{1}\bm{e}_{1}\in\mathbb{R}_{\downarrow}^{2}$ . Then, $F(\bm{z})-F(\mathbf{0})=-\frac{\rho}{2}x_{1}^{2}+1<0$ which holds for $\bm{x}\in S_{1}$ .

Next, we choose $\bm{z}=\bm{x}$ . Then, with $\kappa=\frac{x_{2}}{x_{1}}$ ,

F(\bm{z})-F(\mathbf{0})=(1+\kappa^{2})\left(-\frac{1}{2}\rho x_{1}^{2}+\frac{1% +\kappa}{(1+\kappa^{2})^{3/2}}\right)<0,

for all points $\bm{x}\in S_{2}$ . This completes the proof of this proposition. ∎

We comment on this proposition. Consider two curves parameterized by the parameter $\kappa\in[0,1]$ as follows:

\mathcal{C}_{1}:[0,1]\ni\kappa\mapsto\sqrt{\frac{2}{\rho}}(1,\kappa)\quad\mbox% {and}\quad\mathcal{C}_{2}:[0,1]\ni\kappa\mapsto\sqrt{\frac{2(1+\kappa)}{\rho(1% +\kappa^{2})^{3/2}}}(1,\kappa).

We have $\mathcal{C}_{1}(0)=\mathcal{C}_{2}(0)=\sqrt{\frac{2}{\rho}}(1,0)$ , $\mathcal{C}_{1}(1)=\sqrt{\frac{2}{\rho}}(1,1)$ , and $\mathcal{C}_{2}(1)=\sqrt{\frac{\sqrt{2}}{\rho}}(1,1)$ . Two curves intersect at the point with $\kappa$ to be the root of the polynomial of $\kappa^{5}+3\kappa^{2}+2\kappa-2=0$ . This root is $\kappa\approx 0.6124$ . The red shaded region in Figure 4.2 is the set $S_{1}\cup S_{2}$ . The blue shaded region Figure 4.2 represents the set where every point is mapped to the origin by $\mathrm{prox}_{\frac{1}{\rho}h_{1}}(\bm{x})$ , as stipulated by Theorem 4.3. We will explore the blank region situated between the blue and red shaded areas in the subsequent analysis.

In the following analysis, our discussion distinctly excludes the instances of uniform entries in $\bm{x}$ , which have been previously addressed in Theorem 4.2. We now focus on the case where $\bm{x}\in\mathbb{R}_{\downarrow}^{2}$ . This scenario can be further divided into two distinct cases: one where $\bm{x}$ contains one zero entry, and another where it does not. We begin by examining the situation where $\bm{x}$ includes one zero entry, as detailed in the following proposition.

Proposition 4.5.

For $\rho>0$ and $\bm{x}=\alpha\bm{e}_{1}$ with $\alpha>0$ , then

\mathrm{prox}_{\frac{1}{\rho}h_{1}}(\bm{x})=\begin{cases}\{\mathbf{0}\},&\text% {if }\alpha<\sqrt{\frac{2}{\rho}};\\ \{\mathbf{0},\bm{x}\},&\text{if }\alpha=\sqrt{\frac{2}{\rho}};\\ \{\bm{x}\},&\text{if }\alpha>\sqrt{\frac{2}{\rho}}.\end{cases}

Proof.

The objective function of problem (3) $G$ associated with $h_{1}$ for the given $\bm{x}$ is

G(\bm{w})=-\frac{1}{2}\rho\alpha^{2}w_{1}^{2}+w_{1}+w_{2}=-\frac{1}{2}\rho% \alpha^{2}w_{1}^{2}+w_{1}+\sqrt{1-w_{1}^{2}},

where $w_{1}\in[0,1]$ . A direct calculation shows that both functions $-\frac{1}{2}\rho\alpha^{2}w_{1}^{2}$ and $w_{1}+\sqrt{1-w_{1}^{2}}$ are concave with respect to $w_{1}$ . Together with the facts of $G(\bm{e}_{1})=-\frac{1}{2}\rho\alpha^{2}+1$ and $G(\bm{e}_{2})=1$ , hence, $G$ achieves its global minimum at $\bm{w}^{\star}=\bm{e}_{1}$ .

The $r$ -step of the WRD procedure simply follows with $r^{\star}=\langle\bm{x},\bm{w}^{\star}\rangle=\alpha$ . At the $d$ -step, we compare $F(r^{\star}\bm{w}^{\star})$ and $F(\bm{0})$ via their difference $F(r^{\star}\bm{w}^{\star})-F(\bm{0})=G(\bm{w}^{\star})=-\frac{1}{2}\rho\alpha^% {2}+1$ . Our result of this theorem immediately follows from the above difference. ∎

We observe that Proposition 4.5 corroborates the findings of Proposition 4.4 for points lying on the $x_{1}$ -axis. Further, $(\mathrm{prox}_{\frac{1}{\rho}h_{1}}(\bm{x}))_{1}=\mathrm{prox}_{\frac{1}{\rho% }|\cdot|_{0}}(x_{1})$ for $\bm{x}=\alpha\bm{e}_{1}$ .

For $\bm{x}\in\mathbb{R}_{\downarrow}^{2}$ with $x_{1}\neq 0$ , let $G$ be the objective function of problem (3) associated with $h_{1}$ . We define $Q:[0,\frac{\pi}{4}]\rightarrow\mathbb{R}$ as

Q(\theta):=G(\bm{w}(\theta))\quad\mbox{with}\quad\bm{w}(\theta)=\begin{bmatrix% }\cos(\theta)\\ \sin(\theta)\end{bmatrix}.

A direct computation yields

Q(\theta)=-\frac{1}{2}\rho\|\bm{x}\|_{2}^{2}\cos^{2}\left(\theta-\frac{\alpha}% {2}\right)+\sqrt{2}\sin\left(\theta+\frac{\pi}{4}\right),

(24)

where the constant $\alpha$ is given by, with $\kappa=\frac{x_{2}}{x_{1}}\in[0,1]$ ,

\alpha=\left\{\begin{array}[]{ll}\arctan\left(\frac{2\kappa}{1-\kappa^{2}}% \right)\in\left[0,\frac{\pi}{2}\right),&\hbox{if $x_{1}>x_{2}$;}\\ \frac{\pi}{2},&\hbox{if $x_{1}=x_{2}$.}\end{array}\right.

(25)

Then, solving problem (3) involves minimizing $Q$ over the interval $[0,\frac{\pi}{4}]$ . The minimal value of $Q$ on this interval can be attained at $0$ , $\pi/4$ , or the critical points of $Q$ . To determine these critical points, we examine the properties of $Q^{\prime}$ , which is

Q^{\prime}(\theta)=\frac{1}{2}\rho\|\bm{x}\|_{2}^{2}\sin(2\theta-\alpha)+\sqrt% {2}\cos\left(\theta+\frac{\pi}{4}\right).

We immediately observed that: first, the function $\sqrt{2}\cos(\theta+\frac{\pi}{4})$ monotonically decreases from $1$ to $0$ as $\theta$ varies from $0$ to $\frac{\pi}{4}$ ; second, the function $\frac{1}{2}\rho\|\bm{x}\|_{2}^{2}\sin(2\theta+\alpha)$ monotonically increases from $\frac{1}{2}\rho\|\bm{x}\|_{2}^{2}\sin(-\alpha)=-\rho x_{1}x_{2}$ to $0$ as $\theta$ ranges from $0$ to $\frac{\alpha}{2}$ , and from $0$ to $\frac{1}{2}\rho\|\bm{x}\|_{2}^{2}\cos(\alpha)=\frac{1}{2}\rho(x_{1}^{2}-x_{2}^% {2})$ as $\theta$ goes from $\frac{\alpha}{2}$ to $\frac{\pi}{4}$ . Thus, $Q^{\prime}$ is positive, and consequently $Q$ is increasing on $[\frac{\alpha}{2},\frac{\pi}{4}]$ . Therefore, the optimal value of $Q$ will be achieved at zero or some point in the interval $[0,\frac{\alpha}{2}]$ . Hence, we confine our analysis of $Q$ to this interval.

Remarkably, we can establish that $Q^{\prime}$ has at most two zeros in the interval $[0,\frac{\alpha}{2}]$ . This can be demonstrated by factorizing $Q^{\prime}$ as a product of a positive function with a convex function:

Q^{\prime}(\theta)=\frac{1}{2}\rho\|\bm{x}\|_{2}^{2}\cos\left(\theta+\frac{\pi% }{4}\right)L(\theta),

where $L:[0,\frac{\alpha}{2}]\rightarrow\mathbb{R}$ is defined as:

L(\theta)=\frac{\sin(2\theta-\alpha)}{\cos(\theta+\frac{\pi}{4})}+\frac{2\sqrt% {2}}{\rho\|\bm{x}\|_{2}^{2}}.

(26)

We proceed to demonstrate that $L$ is convex on the interval $[0,\frac{\alpha}{2}]$ .

Lemma 4.6.

For $\rho>0$ and a nonzero vector $\bm{x}\in\mathbb{R}_{\downarrow}^{2}$ with $\kappa=\frac{x_{2}}{x_{1}}\in[0,1)$ , the following statements for the function $L$ given by (26) hold:

(i)

$L$ is convex on the interval $[0,\frac{\alpha}{2}]$ , where $\alpha$ is given in (25).
(ii)

$L(0)$ is positive, zero, or negative if $\rho x_{1}x_{2}-1$ is negative, zero, or positive, respectively. $L^{\prime}(0)$ is nonnegative if $\kappa\leq\frac{\sqrt{5}-1}{2}$ and negative if $\kappa>\frac{\sqrt{5}-1}{2}$ .
(iii)

$L$ has at most two roots on the interval $[0,\frac{\alpha}{2}]$ .

Proof.

Item (i). Notice that

	$\displaystyle L^{\prime}(\theta)$	$\displaystyle=$	$\displaystyle\frac{2\cos(2\theta-\alpha)\cos(\theta+\frac{\pi}{4})+\sin(2% \theta-\alpha)\sin(\theta+\frac{\pi}{4})}{\cos^{2}(\theta+\frac{\pi}{4})},$
	$\displaystyle L^{\prime\prime}(\theta)$	$\displaystyle=$	$\displaystyle\frac{\frac{1}{2}\sin(2\theta-\alpha)(\sin(2\theta)-1)+2\cos% \alpha}{\cos^{3}(\theta+\frac{\pi}{4})}.$

Since both numerator and denominator of $L^{\prime\prime}$ are positive, $L^{\prime\prime}(\theta)>0$ for all $\theta\in[0,\frac{\alpha}{2}]$ , hence, $L$ is strictly convex on this interval.

Item (ii). We notice that

L(0)=\frac{2\sqrt{2}}{\rho\|\bm{x}\|_{2}^{2}}(1-\rho x_{1}x_{2}),\quad L^{% \prime}(0)=\frac{2\sqrt{2}}{\rho\|\bm{x}\|_{2}^{2}}(x_{1}^{2}-x_{2}^{2}-x_{1}x% _{2}).

Hence, the statements in item (ii) hold.

Item (iii). We have

L\left(\frac{\alpha}{2}\right)=\frac{2\sqrt{2}}{\rho\|\bm{x}\|_{2}^{2}}>0,% \quad L^{\prime}\left(\frac{\alpha}{2}\right)=\frac{2}{\cos(\frac{\alpha}{2}+% \frac{\pi}{4})}>0.

Together with the convexity of $L$ , and the value of $L(0)$ , we know that $L$ has at most two zeros on the interval $[0,\frac{\alpha}{2}]$ . ∎

With these preliminaries, we can now present the solution to problem (3) associated with $h_{1}$ in the following theorem, which provides the outcome of the $\bm{w}$ -step of the WRD procedure for the proximity operator of $h_{1}$ .

Proposition 4.7.

For $\rho>0$ and a nonzero vector $\bm{x}\in\mathbb{R}_{\downarrow}^{2}$ with $\kappa=\frac{x_{2}}{x_{1}}\in[0,1)$ , let the function $Q$ be given by (24), and let the function $L$ be given by (26). Define $\alpha$ as in (25). Then, the optimal solution $\bm{w}^{\star}$ to problem (3) is represented as:

\bm{w}^{\star}=\begin{bmatrix}\cos(\theta^{\star})\\ \sin(\theta^{\star})\end{bmatrix},

where $\theta^{\star}$ is determined as follows:

(i)

Case $\rho x_{1}x_{2}<1$ . We choose

\theta^{\star}=\left\{\begin{array}[]{ll}0,&\hbox{if $\kappa\leq\frac{\sqrt{5}% -1}{2}$;}\\ 0,&\hbox{if $\frac{\sqrt{5}-1}{2}<\kappa<1$, $L(\theta_{0})\geq 0$ with $L^{% \prime}(\theta_{0})=0$;}\\ \arg\min\{Q(\theta):\theta\in\{0,\theta_{1}\}\},&\hbox{if $\frac{\sqrt{5}-1}{2% }<\kappa<1$, $L(\theta_{0})<0$ with $L^{\prime}(\theta_{0})=0$.}\end{array}\right.

(27)

Here $\theta_{1}$ is the root of $L$ on the interval $(\theta_{0},\frac{\alpha}{2})$ .

(ii)

Case $\rho x_{1}x_{2}=1$ . If $\kappa\leq\frac{\sqrt{5}-1}{2}$ , we choose $\theta^{\star}=0$ ; Otherwise, $\theta^{\star}$ is chosen to be the root of $L$ on $(0,\frac{\alpha}{2})$ .
(iii)

Case $\rho x_{1}x_{2}>1$ . $\theta^{\star}$ is chosen to be the only root of $L$ on the interval $[0,\frac{\alpha}{2}]$ .

Proof.

Case $\rho x_{1}x_{2}<1$ . That is, $L(0)>0$ by Lemma 4.6. Then $Q^{\prime}$ has no root if $L^{\prime}(0)\geq 0$ . In this situation, $L$ is positive, so is $Q^{\prime}$ on $[0,\frac{\alpha}{2}]$ . Hence, we choose $\theta^{\star}=0$ ; If $L^{\prime}(0)<0$ , since $L^{\prime}\left(\frac{\alpha}{2}\right)>0$ , there exists one and only one point $\theta_{0}\in(0,\frac{\alpha}{2})$ such that $L^{\prime}(\theta_{0})=0$ . If $L(\theta_{0})\geq 0$ , $Q^{\prime}$ has no root, we choose $\theta^{\star}=0$ . If $L(\theta_{0})<0$ , then $L$ has a unique root, say $\theta_{1}$ , on the interval $(\theta_{0},\frac{\alpha}{2})$ . In this situation, we choose $\theta^{\star}=\arg\min\{Q(\theta):\theta\in\{0,\theta_{1}\}\}$ . All situations are summarized in (27).

Case $\rho x_{1}x_{2}=1$ . That is, $L(0)=0$ by Lemma 4.6. If $L^{\prime}(0)\geq 0$ , we choose $\theta^{\star}=0$ . On the other hand, if $L^{\prime}(0)<0$ , let $\theta_{1}$ be the only root of $L$ on the open interval $(0,\frac{\alpha}{2})$ , then $\theta^{\star}=\theta_{1}$ .

Case $\rho x_{1}x_{2}>1$ . That is, $L(0)<0$ by Lemma 4.6 again. Let $\theta_{1}$ be the only root on the open interval $(0,\frac{\alpha}{2})$ . Then, $\theta^{\star}=\theta_{1}$ , and $Q$ achieves its global minimum at $\theta^{\star}$ . ∎

Based on Proposition 4.7, the set of $\mathbb{R}_{\downarrow}^{2}\setminus\{\alpha\bm{e}:\alpha\in\mathbb{R}\}$ is split into three disjoint sets $I_{1}$ , $I_{2}$ , and $I_{3}$ , as follows:

$\displaystyle I_{1}$	$\displaystyle=$	$\displaystyle\{(x_{1},x_{2})\in\mathbb{R}_{\downarrow}^{2}:x_{1}>x_{2},\rho x_% {1}x_{2}<1\}$
$\displaystyle I_{2}$	$\displaystyle=$	$\displaystyle\{(x_{1},x_{2})\in\mathbb{R}_{\downarrow}^{2}:x_{1}>x_{2},\rho x_% {1}x_{2}=1\}$
$\displaystyle I_{3}$	$\displaystyle=$	$\displaystyle\{(x_{1},x_{2})\in\mathbb{R}_{\downarrow}^{2}:x_{1}>x_{2},\rho x_% {1}x_{2}>1\}.$

We further split $I_{1}$ as the union of $I_{1i}$ , $i=1,2,3,4$ and $I_{2}$ as the union of $I_{2i}$ , $i=1,2,3,4$ as follows:

\begin{array}[]{ll}I_{11}=\left\{(x_{1},x_{2})\in I_{1}:x_{1}>\sqrt{\frac{2}{% \rho}}\right\}&I_{21}=\left\{(x_{1},x_{2})\in I_{2}:x_{1}>\sqrt{\frac{2}{\rho}% }\right\}\\ I_{12}=\left\{(x_{1},x_{2})\in I_{1}:x_{1}=\sqrt{\frac{2}{\rho}}\right\}&I_{22% }=\left\{\left(\sqrt{\frac{2}{\rho}},\sqrt{\frac{1}{2\rho}}\right)\right\}\\ I_{13}=\left\{(x_{1},x_{2})\in I_{1}:\frac{\sqrt{5}+1}{2}x_{2}\leq x_{1}<\sqrt% {\frac{2}{\rho}}\right\}&I_{23}=\left\{(x_{1},x_{2})\in I_{2}:\sqrt{\frac{% \sqrt{5}+1}{2\rho}}\leq x_{1}<\sqrt{\frac{2}{\rho}}\right\}\\ I_{14}=\left\{(x_{1},x_{2})\in I_{1}:\frac{\sqrt{5}+1}{2}x_{2}>x_{1}\right\}&I% _{24}=\left\{(x_{1},x_{2})\in I_{2}:\sqrt{\frac{1}{\rho}}<x_{1}<\sqrt{\frac{% \sqrt{5}+1}{2\rho}}\right\}\end{array}

With the given sets, the proximity operator of $h_{1}$ from the WRD procedure is presented in the next theorem.

Theorem 4.8.

Let $\rho>0$ . For $\bm{x}\in I_{1}\cup I_{2}$ , we have

\mathrm{prox}_{\frac{1}{\rho}h_{1}}(\bm{x})=\left\{\begin{array}[]{ll}\{x_{1}% \bm{e}_{1}\}&\hbox{if $\bm{x}\in I_{11}\cup I_{21}$;}\\ \{\mathbf{0},\sqrt{\frac{2}{\rho}}\bm{e}_{1}\}&\hbox{if $\bm{x}\in I_{12}\cup I% _{22}$;}\\ \{\mathbf{0}\}&\hbox{if $\bm{x}\in I_{13}\cup I_{23}$;}\\ \arg\min\{F(\bm{z}):\bm{z}\in\{\mathbf{0},\langle\bm{x},\bm{w}^{\star}\rangle% \bm{w}^{\star}\}\}&\hbox{if $\bm{x}\in I_{14}\cup I_{24}$,}\end{array}\right.

where $\bm{w}^{\star}$ is from item (i) or item (ii) of Proposition 4.7.

For $\bm{x}\in I_{3}$ , we have

\mathrm{prox}_{\frac{1}{\rho}h_{1}}(\bm{x})=\arg\min\{F(\bm{z}):\bm{z}\in\{% \mathbf{0},\langle\bm{x},\bm{w}^{\star}\rangle\bm{w}^{\star}\}\},

where $\bm{w}^{\star}$ is from item (iii) of Proposition 4.7.

Proof.

The $\bm{w}$ -step of the WRD procedure provides $\bm{w}^{\star}$ , the solution of optimization problem (3) associated with the function $h_{1}$ by Proposition 4.7. The $r$ -step simply follows with $r^{\star}=\langle\bm{x},\bm{w}^{\star}\rangle$ . At the $d$ -step, we compare $F(r^{\star}\bm{w}^{\star})$ and $F(\bm{0})$ with $F$ defined in (1). Note that

F(r^{\star}\bm{w}^{\star})-F(\bm{0})=G(\bm{w}^{\star}).

If $G(\bm{w}^{\star})$ is positive, the zero is in $\mathrm{prox}_{\frac{1}{\rho}h_{1}}(\bm{x})$ ; if $G(\bm{w}^{\star})$ is negative, $r^{\star}\bm{w}^{\star}$ is in $\mathrm{prox}_{\frac{1}{\rho}h_{1}}(\bm{x})$ ; if $G(\bm{w}^{\star})$ is zero, both the zero vector and $r^{\star}\bm{w}^{\star}$ are in $\mathrm{prox}_{\frac{1}{\rho}h_{1}}(\bm{x})$ . The rest of the result follows directly from Proposition 4.7. ∎

Figure 4.3(a) illustrates the region where the proximity operator $\mathrm{prox}_{\frac{1}{\rho}h_{1}}$ maps points to the origin. According to Theorem 4.2, all points on the line segment from the origin to $(\sqrt{{\sqrt{2}}/{\rho}},\sqrt{{\sqrt{2}}/{\rho}})$ will be mapped to the origin by $\mathrm{prox}_{\frac{1}{\rho}h_{1}}$ . Additionally, as stated in Theorem 4.8, all points under the line $x_{2}=\frac{\sqrt{5}-1}{2}x_{1}$ in the red region are mapped to the origin by $\mathrm{prox}_{\frac{1}{\rho}h_{1}}$ . The remaining points in both red and blue colors are obtained numerically with the assistance of Theorem 4.8.

4.3 General case: the proximity operator of $h_{1}$ on $\mathbb{R}^{n}$

Here, we demonstrate that if the last $k$ entries of $\bm{x}\in\mathbb{R}^{n}_{\downarrow}$ are zero, then the last $k$ entries of $\bm{w}^{\star}$ , the optimal solution to problem (3), are zero as well. Leveraging this result, we proceed by assuming that all entries of $\bm{x}\in\mathbb{R}^{n}_{\downarrow}$ are all nonzero. The primary outcome of this subsection is the transformation of problem (3) into the one with same objective function but constrained on a convex set. The modified problem can be addressed using the nonconvex gradient projection algorithm in [8]. Subsequently, we introduce an algorithm for computing the proximity operator of $h_{1}$ on $\mathbb{R}^{n}$ .

Theorem 4.9.

For $\rho>0$ and $\bm{x}\in\mathbb{R}_{\downarrow}^{n}$ , suppose that the last $k\geq 1$ entries of $\bm{x}$ are zeros, that is,

\bm{x}=\begin{bmatrix}\bm{x}_{[n-k]}\\ \mathbf{0}\end{bmatrix},

Then, for an optimal solution $\bm{w}^{\star}$ to problem (3), we have $\bm{w}^{\star}_{[n]\setminus[n-k]}=\mathbf{0}$ , that is, the last $k$ entries of $\bm{w}^{\star}$ are zero.

Proof.

The proof hinges on iteratively reducing the dimension by one up to $k$ steps. Without loss of generality, we assume that $k=1$ . Let $F$ denote the objective function of problem (3) defined on $\mathbb{S}^{n-1}_{+}$ . Throughout this proof, we consistently treat $\bm{w}_{[n-1]}$ as the truncation of $\bm{w}$ from its first $(n-1)$ entries.

Define: $H:\mathbb{B}^{n-1}_{+}(\mathbf{0},1)\rightarrow\mathbb{R}$ as $H(\bm{w}_{[n-1]}):=G(\bm{w})$ . Considering the last entry of $\bm{x}$ being zero, we have

\displaystyle H(\bm{w}_{[n-1]})

\displaystyle=

\displaystyle\underbrace{-\frac{1}{2}\rho\langle\bm{x}_{[n-1]},\bm{w}_{[n-1]}% \rangle^{2}}_{H_{1}(\bm{w}_{[n-1]})}+\underbrace{\sum_{i=1}^{n-1}w_{i}+\sqrt{1% -\sum_{i=1}^{n-1}w_{i}^{2}}}_{H_{2}(\bm{w}_{[n-1]})}.

We can verify that both $H_{1}$ and $H_{2}$ are concave functions over the domain $\mathbb{B}^{n-1}_{+}$ , hence, the minimal value of $H_{1}+H_{2}$ will be achieved at $\bm{w}^{\star}_{[n-1]}$ on the boundary of the ball.

We remark that $\bm{w}^{\star}_{[n-1]}$ cannot be the zero vector. If so, $H(\bm{w}^{\star}_{[n-1]})=H(\mathbf{0})=1$ . However, $H(\bm{e}_{1})=-\frac{1}{2}\rho x_{1}^{2}+1<1$ , which contradicts $\bm{w}^{\star}_{[n-1]}$ being the minimal solution to $H$ .

Next, we show that $\bm{w}^{\star}_{[n-1]}$ must be a unit vector, that is, $\bm{w}^{\star}_{[n-1]}\in\mathbb{S}^{n-2}_{+}$ . If not, assume that $\|\bm{w}^{\star}_{[n-1]}\|_{2}<1$ , we can show that there exists a better solution on the boundary of $\mathbb{B}^{n-1}_{+}(\mathbf{0},1)$ , which contradicts the optimality of $\bm{w}_{[n-1]}^{\star}$ . Write $\tilde{\bm{w}}^{\star}_{[n-1]}=\frac{\bm{w}^{\star}_{[n-1]}}{\|\bm{w}^{\star}_% {[n-1]}\|_{2}}$ , we define $C:[0,1]\rightarrow\mathbb{R}$ as follows:

C(\lambda):=H_{1}(\lambda\tilde{\bm{w}}^{\star}_{[n-1]})+H_{2}(\lambda\tilde{% \bm{w}}^{\star}_{[n-1]}).

Clearly,

C(\lambda)=-\frac{1}{2}\rho\langle\bm{x}_{[n-1]},\tilde{\bm{w}}^{\star}_{[n-1]% }\rangle^{2}\lambda^{2}+\langle\bm{e},\tilde{\bm{w}}^{\star}_{[n-1]}\rangle% \lambda+\sqrt{1-\lambda^{2}},

which is not constant, and concave with respect to the variable $\lambda$ . Therefore, the minimal value can only be achieved at $\lambda=1$ . Therefore, $\|\bm{w}^{\star}_{[n-1]}\|_{2}=1$ . In other words, the $n$ -th entry of the optimal solution $\bm{w}^{\star}$ to problem (3) must be $0$ . This completes the proof. ∎

Note that for problem (3), the feasible set $\mathbb{S}_{+}^{n-1}$ is nonconvex. This nonconvex nature poses significant challenges in algorithm development. To address this, we present the following result which allows us to consider the problem within the confines of a convex set, specifically $\mathbb{B}_{+}^{n}(\mathbf{0},1)$ . This approach provides a more tractable pathway for algorithmic development and analysis.

Theorem 4.10.

Let $\bm{x}\in\mathbb{R}_{\downarrow}^{n}$ and assume that its last entry $x_{n}$ is nonzero. Let $\bm{w}^{\star}$ be an optimal solution to the following optimization problem

\min\left\{\frac{1}{2}\bm{w}^{\top}\mathsf{A}_{\rho,\bm{x}}\bm{w}+\bm{e}^{\top% }\bm{w}:\bm{w}\in\mathbb{B}_{\downarrow}^{n}(\mathbf{0},1)\right\},

(28)

where $\mathsf{A}_{\rho,\bm{x}}$ is given by (19). Then, $\bm{w}^{\star}$ is either the origin or the optimal solution to the optimization problem (3). Furthermore, we have

\langle\bm{x},\bm{w}^{\star}\rangle\bm{w}^{\star}\in\mathrm{prox}_{\frac{1}{% \rho}h_{1}}(\bm{x}).

(29)

Proof.

The proof is trivial if $\bm{w}^{\star}$ is the zero vector. If $\bm{w}^{\star}\neq\mathbf{0}$ , we now show that $\|\bm{w}^{\star}\|_{2}=1$ , i.e. $\bm{w}^{\star}$ is the optimal solution to the optimization problem (3). If not, we denote the objection function of problem (28) by $H$ , that is,

H(\bm{w})=\frac{1}{2}\bm{w}^{\top}\mathsf{A}_{\rho,\bm{x}}\bm{w}+\bm{e}^{\top}% \bm{w}.

Set $\tilde{\bm{w}}^{\star}:=\frac{\bm{w}^{\star}}{\|\bm{w}^{\star}\|_{2}}$ , and define $C:[0,1]\rightarrow\mathbb{R}$ as follows:

C(\lambda)=H(\lambda\tilde{\bm{w}}^{\star})=-\lambda\left(\frac{1}{2}\rho% \langle\bm{x},\tilde{\bm{w}}^{\star}\rangle^{2}\lambda-\|\tilde{\bm{w}}^{\star% }\|_{1}\right).

Clearly, $C$ achieves its optimal value at either $\lambda=0$ or $1$ . Hence,

H({\bm{w}}^{\star})=C(\|\bm{w}^{\star}\|_{2})>\min\{C(0),C(1)\}=\min\{H(% \mathbf{0}),H(\tilde{\bm{w}}^{\star})\}.

We conclude that $\bm{w}^{\star}$ is either the origin or the optimal solution to the optimization problem (3).

Finally, we show the inclusion (29) holds. If $\bm{w}^{\star}=\mathbf{0}$ , then, for all $\bm{w}\in\mathbb{S}^{n-1}_{+}$ ,

0=H(\mathbf{0})\leq H(\bm{w})=G(\bm{w}),

where $G$ is the objective function of problem (3). Therefore, no matter which the optimal point for problem (3) is, we know $\mathbf{0}\in\mathrm{prox}_{\frac{1}{\rho}h_{1}}(\bm{x})$ .

If $\bm{w}^{\star}\neq\mathbf{0}$ , then $\bm{w}^{\star}$ is the optimal solution to problem (3) as well. Hence $\bm{w}^{\star}$ is the output of the $\bm{w}$ -step of the WRD procedure and $G(\bm{w}^{\star})<0$ . Obviously, $\langle\bm{x},\bm{w}^{\star}\rangle\bm{w}^{\star}\in\mathrm{prox}_{\frac{1}{% \rho}h_{1}}(\bm{x})$ by the $r$ -step and $d$ -step of the WRD procedure. We conclude that the inclusion (29) holds. ∎

Based on Theorem 4.10, computing $\mathrm{prox}_{\frac{1}{\rho}h_{1}}(\bm{x})$ is resorting to solving optimization problem (28). This problem has a concave objective function restricted on a convex set. A popular algorithm for solving problem (28) is called nonconvex gradient projection algorithm as follows: with an initial guess $\bm{w}^{(0)}$ , iterative

\bm{w}^{(k+1)}=P_{\mathbb{B}_{\downarrow}^{n}(\mathbf{0},1)}\left(\bm{w}^{(k)}% -\frac{1}{2\rho\|\bm{x}\|_{2}^{2}}(\mathsf{A}_{\rho,\bm{x}}\bm{w}^{(k)}+\bm{e}% )\right),

(30)

where $P_{\mathbb{B}_{\downarrow}^{n}(\mathbf{0},1)}$ is the projection operator onto the set $\mathbb{B}_{\downarrow}^{n}(\mathbf{0},1)$ . Since $\mathbb{B}_{\downarrow}^{n}(\mathbf{0},1)$ is a closed and bounded semi-algebraic convex subset of $\mathbb{R}^{n}$ and the gradient of the objective function of problem (28) is $\mathsf{A}_{\rho,\bm{x}}\bm{w}+\bm{e}$ with Lipschtiz constant $\rho\|\bm{x}\|_{2}^{2}$ , the sequence $\{\bm{w}^{(k)}\}_{k\in\mathbb{N}}$ converges, see, for example [8, Theorem 5.3].

We are ready now to present our algorithm for computing $\operatorname*{prox}_{\frac{1}{\rho}h_{1}}$ based on our WRD procedure for arbitrary $\bm{x}\in\mathbb{R}^{n}$ . This algorithm is presented in Algorithm 2.

1:Input: Vector

\bm{x}\in\mathbb{R}^{n}

, parameter

\rho>0

, and an initial guess

\bm{w}^{(0)}

2:Output: The proximal operator

\text{prox}_{\frac{1}{\rho}h_{1}}(\bm{x})

3:procedure (WRD Procedure)

4: Sort and convert

\bm{x}

into

\mathbb{R}^{n}_{\downarrow}

via a signed permutation matrix

\mathsf{P}

5: Trim

\bm{x}

if necessary by Theorem 4.9

6: Generate

\bm{w}^{(i)}

via (30) and denote its limit by

\bm{w}^{\star}

(

\bm{w}

-step)

7: Form

\bm{u}\leftarrow\langle\bm{x},\bm{w}^{\star}\rangle\bm{w}^{\star}

by Theorem 4.10 (

r

-step and

d

-step)

8: Pad

\bm{u}

with a zero block if necessary by Theorem 4.9

\bm{u}\leftarrow\mathsf{P}^{-1}\bm{u}\in\operatorname*{prox}_{\frac{1}{\rho}h_% {1}}(\bm{x})

10:end procedure

Algorithm 2 Computing the Proximal Operator of

h_{1}

Due to the inherent nonconvexity of problem (28), the initial guess provided to any algorithms for this problem significantly influences the quality of the solution obtained. In our simulations, we have observed that choosing $\bm{w}^{(0)}=\alpha\frac{\bm{x}}{\|\bm{x}\|_{2}}$ with $\alpha\in[\frac{1}{4},\frac{3}{4}]$ tends to yield satisfactory results. The numerical result with Algorithm 2 in $\mathbb{R}^{2}$ is shown in Figure 4.3(b). In comparison with Figure 4.3(a), the regions which are identified to be mapped to the origin by $\mathrm{prox}_{\frac{1}{\rho}h_{1}}$ are consistent.

5 Conclusions

This paper addresses the computation of proximity operators of scale and signed permutation invariant functions. By delving into the intrinsic properties of these functions, we introduce a procedure called WRD, which includes the $\bm{w}$ -step, $r$ -step, and $d$ -step, to effectively handle the computation of proximity operators. Specifically, we conduct a thorough investigation into two specific scale and signed permutation invariant functions: the ratio of $\ell_{1}/\ell_{2}$ and its square. For the function $(\ell_{1}/\ell_{2})^{2}$ , we propose an algorithm capable of explicitly generating its proximity operator through a few straightforward steps. Additionally, for the function $\ell_{1}/\ell_{2}$ , we devise an efficient algorithm with guaranteed convergence to compute its proximity operator.

In future endeavors, we aim to explore the practical applications of these developed algorithms, particularly in sparse signal recovery and image processing domains.

Declarations

•

The authors declare that they have no conflict of interest.
•

The work of L. Shen was supported in part by the National Science Foundation under grant DMS-2208385 and by 2023 and 2024 Air Force Summer Faculty Fellowship Program (SFFP). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation and AFRL (Air Force Research Laboratory).

References

\bibcommenthead
Candes et al. [2008] Candes, E., Wakin, M.B., Boyd, S.: Enhancing sparsity by reweighted $\ell^{1}$ minimization. Journal of Fourier Analysis and Applications 14, 877–905 (2008)
Prater-Bennette et al. [2022] Prater-Bennette, A., Shen, L., Tripp, E.E.: The proximity operator of the log-sum penalty. Journal of Scientific Computing 93(3), 1–34 (2022)
Lopes [2016] Lopes, M.E.: Unknown sparsity in compressed sensing: Denoising and inference. IEEE Transactions on Information Theory 62(9), 5145–5166 (2016)
Rahimi et al. [2019] Rahimi, Y., Wang, C., Dong, H., Lou, Y.: A scale-invariant approach for sparse signal recovery. SIAM Journal on Scientific Computing 41(6), 3649–3672 (2019)
Tang and Nehorai [2011] Tang, G., Nehorai, A.: Performance analysis of sparse recovery based on constrained minimal singular values. IEEE Transactions on Signal Processing 59(12), 5734–5745 (2011)
Yin et al. [2014] Yin, P., Esser, E., Xin, J.: Ratio and difference of $\ell_{1}$ and $\ell_{2}$ norms and sparse representation with coherent dictionaries. Communications in Information and Systems 14(2), 87–109 (2014)
Xu et al. [2021] Xu, Y., Narayan, A., Tran, H., Webster, C.G.: Analysis of the ratio of $\ell_{1}$ and $\ell_{2}$ norms in compressed sensing. Applied and Computational Harmonic Analysis 55, 486–511 (2021)
Attouch et al. [2013] Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Mathematical Programming, Ser. A 137, 91–129 (2013)
Beck and Teboulle [2009] Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2, 183–202 (2009)
Bolte et al. [2014] Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming 146, 449–494 (2014)
Combettes and Wajs [2005] Combettes, P., Wajs, V.: Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation: A SIAM Interdisciplinary Journal 4, 1168–1200 (2005)
Krol et al. [2012] Krol, A., Li, S., Shen, L., Xu, Y.: Preconditioned alternating projection algorithms for maximum a Posteriori ECT reconstruction. Inverse Problems 28, 115005–34 (2012)
Li et al. [2015] Li, Q., Shen, L., Xu, Y., Zhang, N.: Multi-step fixed-point proximity algorithms for solving a class of optimization problems arising from image processing. Advances in Computational Mathematics 41(2), 387–422 (2015)
Micchelli et al. [2011] Micchelli, C.A., Shen, L., Xu, Y.: Proximity algorithms for image models: Denoising. Inverse Problems 27, 045009–30 (2011)
Parikh and Boyd [2014] Parikh, N., Boyd, S.: Proximal algorithms. Foundations and Trends in Optimization 1, 123–231 (2014)
Tao [2022] Tao, M.: Minimization of $\ell_{1}$ over $\ell_{2}$ for sparse signal recovery with convergence guarantee. SIAM Journal on Scientific Computing 44(2), 770–797 (2022)
Shen et al. [2019] Shen, L., Suter, B.W., Tripp, E.E.: Structured sparsity promoting functions. Journal of Optimization Theory and Applications 183(3), 386–421 (2019)
Moreau [1962] Moreau, J.-J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. C.R. Acad. Sci. Paris Sér. A Math. 255, 1897–2899 (1962)
Donoho [1995] Donoho, D.: De-noising by soft-thresholding. IEEE Transactions on Information Theory 41, 613–627 (1995)
Tao and An [1996] Tao, P.D., An, L.T.H.: Difference of convex functions optimization algorithms (DCA) for globally minimizing nonconvex quadratic forms on Euclidean balls and spheres. Operations Research Letters 19(5), 207–216 (1996)
Martínez [1994] Martínez, J.M.: Local minimizers of quadratic functions on Euclidean balls and spheres. SIAM Journal on Optimization 4(1), 159–176 (1994)
Wang [2004] Wang, X.: A simple proof of Descartes’s rule of signs. The American Mathematical Monthly 111(6), 525–526 (2004) https://doi.org/10.1080/00029890.2004.11920108

	$\displaystyle x_{1}-\frac{\underline{\alpha}}{\rho\\|\bm{x}\\|_{1}}$	$\displaystyle=$	$\displaystyle x_{1}-\frac{2\rho\\|\bm{x}\\|_{1}^{2}}{\rho\\|\bm{x}\\|_{1}\left[% \left(\frac{\rho}{2}\\|\bm{x}\\|_{2}^{2}+n\right)+\sqrt{\left(\frac{\rho}{2}\\|% \bm{x}\\|_{2}^{2}+n\right)^{2}-2\rho\\|\bm{x}\\|_{1}^{2}}\right]}$
		$\displaystyle\geq$	$\displaystyle x_{1}-\frac{2\\|\bm{x}\\|_{1}}{\left(\frac{\rho}{2}\\|\bm{x}\\|_{2}^% {2}+n\right)+\left\|\frac{\rho}{2}\\|\bm{x}\\|_{2}^{2}-n\right\|}=\left\{\begin{% array}[]{ll}x_{1}-\frac{2\\|\bm{x}\\|_{1}}{\rho\\|\bm{x}\\|_{2}^{2}},&\hbox{if $% \rho\\|\bm{x}\\|_{2}^{2}\geq 2n$;}\\ x_{1}-\frac{2\\|\bm{x}\\|_{1}}{2n},&\hbox{if $\rho\\|\bm{x}\\|_{2}^{2}<2n$,}\end{% array}\right.$


(a)	(b)	(c)

Computing Proximity Operators of Scale and Signed Permutation Invariant Functions

Abstract

keywords:

pacs:

1 Introduction

2 Scale and Signed Permutation Invariant Functions and their Proximity Operators

2.1 Properties

Lemma 2.1.

Proof.

Lemma 2.2.

Proof.

2.2 Reformulation

Theorem 2.3.

Proof.

Example 2.4.

3 The Proximity Operator of h2subscriptℎ2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

Theorem 3.1.

Proof.

3.1 Special case: the proximity operator of h2subscriptℎ2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT on ℝ2superscriptℝ2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

Theorem 3.2.

Proof.

3.2 General case: the proximity operator of h2subscriptℎ2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT on ℝnsuperscriptℝ𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT

Lemma 3.3 (Theorem 1 in [20]).

Lemma 3.4 ([21, 20]).

Proposition 3.5.

Proof.

Theorem 3.6.

Proof.

Theorem 3.7.

Proof.

Theorem 3.8.

Proof.

Example 3.9.

Theorem 3.10.

Proof.

4 The Proximal Operator of h1subscriptℎ1h_{1}italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

4.1 The approach for h2subscriptℎ2h_{2}italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT does not work for h1subscriptℎ1h_{1}italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

Proposition 4.1.

Proof.

Theorem 4.2.

Proof.

Theorem 4.3.

Proof.

4.2 Special case: the proximity operator of h1subscriptℎ1h_{1}italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT on ℝ2superscriptℝ2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

Proposition 4.4.

Proof.

Proposition 4.5.

Proof.

Lemma 4.6.

Proof.

Proposition 4.7.

Proof.

Theorem 4.8.

Proof.

4.3 General case: the proximity operator of h1subscriptℎ1h_{1}italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT on ℝnsuperscriptℝ𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT

Theorem 4.9.

Proof.

Theorem 4.10.

Proof.

5 Conclusions

Declarations

References

3 The Proximity Operator of $h_{2}$

3.1 Special case: the proximity operator of $h_{2}$ on $\mathbb{R}^{2}$

3.2 General case: the proximity operator of $h_{2}$ on $\mathbb{R}^{n}$

4 The Proximal Operator of $h_{1}$

4.1 The approach for $h_{2}$ does not work for $h_{1}$

4.2 Special case: the proximity operator of $h_{1}$ on $\mathbb{R}^{2}$

4.3 General case: the proximity operator of $h_{1}$ on $\mathbb{R}^{n}$