This paper investigates the computation of proximity operators for scale and signed permutation invariant functions. A scale-invariant function remains unchanged under uniform scaling, while a signed permutation invariant function retains its structure despite permutations and sign changes applied to its input variables. Noteworthy examples include the function and the ratios of and its square, with their proximity operators being particularly crucial in sparse signal recovery. We delve into the properties of scale and signed permutation invariant functions, delineating the computation of their proximity operators into three sequential steps: the -step, -step, and -step. These steps collectively form a procedure termed as WRD, with the -step being of utmost importance and requiring careful treatment. Leveraging this procedure, we present a method for explicitly computing the proximity operator of and introduce an efficient algorithm for the proximity operator of .
This paper addresses the computation of the proximity operator for scale and signed permutation invariant functions. A scale-invariant function is characterized by its resilience to uniform scaling: it remains unaltered when its input undergoes a constant factor multiplication. This invariance extends to permutations, ensuring that changes in the order of input variables do not affect the function’s value. Additionally, the function exhibits invariance under sign changes, meaning that if any component of an input is replaced by its negative counterpart, the function value remains consistent. In the context of this study, a signed permutation invariant function is defined as a mathematical function that retains its form despite permutations and sign changes applied to its input variables.
Several well-known examples of signed permutation invariant functions, as well as scale and signed permutation invariant functions, are presented:
•
All norms, where , and log-sum penalty function in are signed permutation invariant but not scale invariant, see [1, 2];
•
The norm and the effective sparsity measure , are both scale and signed permutation invariant, see [3, 4, 5, 6, 7].
The proximity operator is a mathematical concept used in optimization. This operator provides a computationally efficient way to find solutions for optimization problems involving nonsmooth functions [8, 9, 10, 11, 12, 13, 14, 15]. Given a proper lower semicontinuous function and a point , the proximity operator of at , denoted as , is defined as:
In simpler terms, the proximity operator finds a point that minimizes the sum of the function and half of the squared Euclidean distance between and a given point .
Our focus of this paper is to study the proximity operator of scale and signed permutation invariant functions.
Our approach for computing the proximity operator of scale and signed permutation invariant functions is based on this observation: the space is isomorphic to the Cartesian product of and the dimensional unit sphere, denoted by . Mathematically, this can be expressed as:
That is, for , it can be converted to a pair such that
, where and .
With this conversion, the task of finding a point transforms into finding a pair of such that . Exploring the properties of the scale and signed permutation invariant functions , the process of finding this pair involves three consecutive steps. The first step is to solve an optimization problem with variable only, the second step straightforwardly yields , and the final step involves deciding whether to choose the origin or the scaled vector as the resulting point. Clearly, the first step is crucial.
For all scale and signed permutation invariant functions, we will present a complete study on the following function
Notably, there has been a gap in existing literature concerning the proximity operator of , and we have observed a recent study that addresses the proximity operator of [16]. In our work, we aim to fill this gap by providing a comprehensive analysis of the proximity operator for both and within the context of scale and signed permutation invariant functions.
With our approach, the optimization problem for associated with both and is nonconvex and takes the form of a constrained quadratic programming problem after certain simplifications. Despite the nonconvex nature of the objective functions and the constrained sets, we adopt a distinct strategy to address them individually.
For the function, the objective function of the quadratic programming problem involves only a quadratic term formulated by a structured symmetric rank-2 matrix. Explicitly demonstrating that this matrix possesses one positive eigenvalue and one negative eigenvalue, and the constrained set of the problem is , where is the first orthant of . While both the objective function and constrained set are nonconvex, we are able to develop a procedure to find the optimal solution through the eigenvector of the matrix corresponding to the negative eigenvalue, achieved in a finite number of iterations.
For the function, the objective function of the quadratic programming problem comprises a quadratic term formulated by a rank-one symmetric matrix and one linear term. The rank-1 matrix is negative definite, and the constrained set remains . Similar to the situation with , both the objective function and constrained set are nonconvex. However, the procedure utilized for cannot be directly adapted for . To address this, we relax the nonconvex feasible set to a convex set . The resulting optimization problem maintains the same objective function as the non-relaxed version, but is now constrained in a convex domain. We establish conditions ensuring that the optimal solution to the relaxed problem lies on or to be the origin. Subsequently, we propose a projected gradient method to solve the relaxed optimization problem. Leveraging the fact that the optimal solution is related to the proximity operator of at a given point, we use this information as prior knowledge to initialize the projected gradient method. Through numerical experiments, our findings consistently indicate that the algorithm can successfully find the optimal for the original, unrelaxed optimization problem.
It’s worth noting that a different approach for the proximity operator of has been reported recently in [16]. That paper claimed to have derived the analytical solution of the proximity operator of , relying on prior knowledge about the sparsity of the corresponding output from this proximity operator, which, however, is unknown in general. A bisection method was then applied for finding this desired sparsity.
The current literature, including works such as [3, 4, 5, 6], suggests that both and functions can effectively promote sparsity in underlying signals. However, to the best of our knowledge, there is a lack of theoretical justification for this claim. In this paper, we provide the theoretical proof that both and functions qualify as sparsity-promoting functions, as defined in [17].
The outline of the rest of the paper is as follows: In Section 2, we begin by presenting some properties of the proximity operators for scale and signed permutation invariant functions. These properties allow us to focus our discussion on these proximity operators within a specific set: each point lies in the first orthant of , and the entries of the point are in descending order. By employing a different representation of the points in this set, determining the proximity operators of scale and signed permutation invariant functions at these points essentially reduces to solving a quadratic programming problem constrained on a nonconvex set. We then introduce a comprehensive procedure called the WRD procedure, which comprises three distinct steps: -step, -step and -step. This procedure enables efficient computation of proximity operators for scale and signed permutation invariant functions, offering a systematic approach to solving such problems.
In Section 3, utilizing the WRD procedure, we compute the proximity operator of . We are able to provide an explicit solution for the proximity operator of at any point in a highly efficient manner, thereby demonstrating the effectiveness of our approach.
In Section 4, leveraging the WRD procedure, we compute the proximity operator of . We are able to develop an efficient algorithm to evaluate the proximity operator of at any point, showcasing the versatility of our methodology.
The conclusion of this paper is drawn in Section 5, summarizing the findings and contributions of our study. We discuss the implications of our results and propose avenues for future research.
2 Scale and Signed Permutation Invariant Functions and their Proximity Operators
All functions in this work are defined on the Euclidean space of dimension . Bold lowercase letters, such as , signify vectors, with the th component represented by the corresponding lowercase letter . Matrices are indicated by bold uppercase letters such as and . We use to denote the set of points in such that all entries of each point in the set are nonnegative. The cone of vectors in satisfying is denoted by . We use (or ) to denote the unit sphere (or ball) in . We use ( or ) to denote the partial unit sphere (the partial unit ball or ) in . Let denote the set of all signed permutation matrices: those matrices that have only one nonzero entry in every row or column, which is .
The norm of is defined as for and . When ,
represents the number of non-zero components in . The standard inner product in is denoted by , where and are vectors in .
We denote as an index set up to a positive integer . For a subset of , the notation represents the cardinality of . For a vector and a subset of , denotes the vector that retains the entries with indices in of and sets the remaining entries to zero, or the subvector of with indices solely from . The specific meaning of being referred to will be evident from the context of the discussion.
A function is considered scale invariant if for all and , the following holds:
In other words, scaling the input by any positive constant does not alter the value of the function.
A function is considered signed permutation invariant if it remains unchanged under the action of permutations and sign changes of its input variables. Formally, a function is signed permutation invariant if, for all permutations and for all vectors , the following holds:
A function defined on with values in is proper if its domain is nonempty, and is lower semicontinuous if its epigraph is a closed set. The set of proper and lower semicontinuous functions on to is denoted by .
The proximity operator was introduced by Moreau in [18]. For a function , the proximity operator of at with index is defined by
The proximity operator of is a set-valued operator from , the power set of . In this paper, for a scale and signed permutation function, we always assume that the set is nonempty and compact.
2.1 Properties
The proximity operator exhibits certain properties concerning scale and signed permutation invariant functions.
Lemma 2.1.
Let , , , and . The following statements hold:
(i)
For a signed permutation invariant function , .
(ii)
For a scale invariant function , .
Proof.
The proof of the two items is based on the definitions of the proximity operator and scale and signed permutation invariant function. We skip the details of the proof here.
∎
For any vector , there is a signed permutation such that , that is, the entries of can be sorted in a way of , where is the index of nonzero entry in the th column of . By Lemma 2.1, for a signed permutation invariant function in , it is sufficient to consider its proximity operator on .
For a vector , we assert that exhibits blocks, characterized by distinct indices satisfying , , and . In these blocks, follows the pattern for and . In essence, the vector comprises blocks, where entries within each block are identical, yet they differ from entries in other blocks.
Lemma 2.2.
Let be a signed permutation invariant function in , and let . Consider , we assert that . Furthermore, there exists a point such that .
Proof.
To establish , we observe that the objective function from the definition of is
for all . As is a signed permutation invariant function, for all . Given , our discussion can be restricted to ; otherwise, say the first element of is negative, then . From the above discussion, we conclude that .
Now, suppose . If the vector has one block, that is, all entries of are the same. Clearly, we can rearrange entries of so that the rearranged one is in and is still in . If vector has blocks, characterized by distinct indices . We define and for , and and . We claim that for . If these inequalities do not hold for some , assume, without loss of generality, that . One can assume that and . In this case, let be a vector from by exchanging its first and the components. Immediately, , and
due to the conditions of and . This conflicts with our assumption of .
Finally, since all entries in each block of are the same, arranging the entries of for the indices in the same block in descending order results in still belonging to . Thus, there exists a point such that .
∎
2.2 Reformulation
Our focus of this paper is to study the proximity operator of scale and signed permutation invariant functions. Our approach for computing the proximity operator of scale and signed permutation invariant functions is based on this observation: the space is isomorphic to the Cartesian product of and .
That is, for , it can be converted to a pair such that
where
With this conversion, the task of finding a point transforms into finding a pair of such that .
Theorem 2.3.
Let be a scale and signed permutation invariant function in , and let . Consider a vector and define
(1)
Then if and only if is given by
(2)
where is a solution to the following optimization problem
To delve deeper into the optimization problem on the right-hand side, we express with and . Consequently, for ,
and for
(4)
In equation (4), the terms are as follows: The first term can always achieve the minimum value by taking ; the second term is constant with respect to the pair ; and third term is solely a function of . Therefore, we seek that minimizes the third term with respect to , i.e., solving the optimization problem (3), then form the expression . Hence, the conclusion of this theorem holds.
∎
In the following discussion, we use the notation in (1) to represent the objective function for and denote
The significance of the scale and signed permutation invariance of becomes evident in the proof of the theorem above. The scale invariance of facilitates the discussion from to , while the signed permutation invariance narrows the focus from to , allowing us to isolate the impact of and when solving an optimization problem that involves exclusively.
In accordance with Theorem 2.3, the process of determining the pair involves three distinct steps:
•
-step: In this step, the objective is to find an optimal solution to the optimization problem (3).
•
-step: Following the -step, the corresponding is computed as , where is the output from -step.
•
-step: This final step determines according to (2).
Upon completing these three steps, as shown in (2), belongs to . For ease of reference in the subsequent discussion, this procedure is referred to as WRD (-step, -step, -step).
To show the applicability of the WRD procedure, we present the proximity operator of the norm, a typical scale and signed permutation invariant function.
Example 2.4.
The proximity operator of the norm at with index is, see, e.g., [19, 17],
We intend to apply the WRD procedure for computing . Assuming , and following the approach used in the proof of Theorem 2.3, we define . The next step involves seeking the optimal solution to optimization problem (3) for , where . Thus, for with an norm of , the smallest value of is achieved when is aligned with the first entries of , that is,
Here keeps the first entries of and sets the remaining entries zeros. Therefore, the output in the -step of the WRD procedure is given by
This output represents the solutions to the -step of the WRD. Subsequently, choosing a vector , the -step generates . With the pair , the -step of the WRD compares the difference between and , resulting in
where is equal to if or is the integer such that . Clearly, the vector is in if , and both and are in if ; otherwise is in . These discussions affirm that the WRD procedure accurately recovers the proximity operator of the norm.
In the rest of the paper, we focus on computing the proximity operator of the function below:
(6)
for and . This function is lower semicontinuous and for all nonzero vectors ,
. Thus, the proximity operator of at any point is nonempty.
Notably, setting the value of at the origin to any value smaller than or equal to 1 preserves the lower semicontinuity of the function. For example, is set to be as illustrated in [16]. Therefore, our proposed WRD procedure remains applicable. Lastly, it’s important to note that in , our definition of aligns consistently with the norm, that is, for .
In the next section, we consider the computation of the proximity operator of first.
3 The Proximity Operator of
We plan to use the WRD procedure to compute the proximity operator of . We begin with showing the optimization problem (3) associated with the -step of the WRD.
Define as a vector with all its components . For , we have
Hence, the optimization problem (3) is a quadratic programming constrained on .
We promptly obtain a result concerning the proximity operator of at points that are multiples of the vector as follows:
Theorem 3.1.
For and for some , then
Proof.
In the situation of for some , we have from (7). The objective function of problem (3) is . To investigate the minimal value of the above function on and at which point the optimal is achieved, there are three different situations according to the value of .
If , the minimal value of is achieved at which has the largest norm for . Clearly, the optimal must be and . Hence, .
If , then for all . Note that . Hence, .
Finally, if , then the minimal value of on is achieved at and for all . Hence, .
∎
By Lemma 2.1, we restrict our attention to the proximity operator of on . The complete discussion is presented in the following two subsections. In the first subsection, we conduct a comprehensive analysis of the proximity operator of specially in . We delve into the intricacies of this operator, exploring its behavior and characteristics within this constrained domain. In the second subsection, we begin with investigating the properties of the eigenvectors of the matrix . The eigenvector corresponding to a negative eigenvalue plays a pivotal role in determining the solution in the -step of the WRD procedure. By leveraging these properties effectively, we explicitly derive the proximity operator of over the entire space .
3.1 Special case: the proximity operator of on
The following result is about the proximity operator of on .
Theorem 3.2.
For and not a multiple of , write
then, is the optimal solution to problem (3). Finally,
Proof.
For , we have
Write and . The function can be written as
It is clear that minimizing for is equivalent to minimizing the function for . By Lemma 2.1 and Theorem 2.3, we can restrict the parameter .
To investigate the global minimizer of over the interval , we compute the derivative of as follows
We consider two cases. Case 1: If , for . Hence, achieves its global minimum at . That is, . Case 2: If , has only one root on , given by . Due to and . Hence, achieves its global minimum at . As a result, is the optimal solution to problem (3). This completes the -step of the WRD procedure for . The -step follows immediately with .
Finally, for the -step of the WRD procedure, we only need to know the sign of . For Case 1, , which is positive if , zero if , and negative otherwise. For Case 2, . So, from the sign of , we conclude .
∎
To close this subsection, a detailed examination of the proximity operator of with index in is conducted through visual representation via plots. In addition, the proximity operator of the norm with index is incorporated for comparative analysis, considering as an approximation of the norm. The ensuing visualizations aim to provide insights into the behavior and characteristics of the proximity operator for in comparison to the norm, enhancing our understanding of their respective properties in . As stipulated by Lemma 2.1, we exclusively present the behavior on .
Figure 3.1(a) illustrates the proximity operator of the norm. Following the guidance from Example 2.4, the set is divided into three distinct regions I, II, and III as depicted in Figure 3.1(a) and defined as follows:
Region I
Region II
Region III
On Region I, the at the corner is ; at each other point on the line , it is ; and at each other point in Region I, it is . On Region II, at the point is and at each other point is . On Region III, at each point is itself.
Figure 3.1(b) showcases the proximity operator of the on the line . The operator at each point is if (blue dash-dot line); if (marked by the square); and itself if (magenta dot line). Comparing with the proximity operator of the norm, the main difference is at the point .
Figure 3.1(c) exhibits the proximity operator of the on excluding the line . The set partitions into three regions I, II, and III as shown in Figure 3.1(c) and defined as follows:
Region I
Region II
Region III
On Region I, the at each point on the line is ; the at each other point
is . On Region II, the at each point is (see the red line). On Region III, the at each point is , where is given in Theorem 3.2. Specifically, results for three lines with their slopes 0.9 (green line), 0.5 (cyan line), and 0.3 (black line) are presented, and the at these points are represented by dashed lines with corresponding colors.
(a)
(b)
(c)
Figure 3.1: The plots of the proximity operator in for (a) the norm; (b) on the line with the slope ; and (c) on excluding the line with the slope .
3.2 General case: the proximity operator of on
In the preceding subsection, we explored the determination of the proximity operator of on through the WRD procedure. The central concept involved parameterizing using a single variable, simplifying the resulting problem in the -step of the WRD procedure and facilitating ease of solution. While for can be parameterized by parameters, the ensuing problem in the -step appears to be intricate for direct analysis. Consequently, alternative approaches must be considered to address and overcome the complexities associated with this scenario.
Given the pivotal role of the -step in the WRD procedure, this subsection places particular emphasis on this phase. It is noteworthy that the objective function for the -step is characterized as a quadratic form. In this context, we invoke the following two pertinent results.
where is an symmetric matrix, and a positive number. A vector is a solution to this problem if and only if there is a real number such that (i) is positive semi-definite; (ii) ; and . Such a is unique.
Consider the optimization problem (8). If is orthogonal to some eigenvector associated with the smallest eigenvalue, then there is no local-nonglobal minimum for (8).
Note that both Lemma 3.3 and Lemma 3.4 consider the quadratic optimization problems constrained on a sphere. However, our problem in -step is restricted on .
To investigate the applicability of Lemma 3.3 for the optimization problem in the -step, a crucial prerequisite is understanding the eigen-structure of the matrix , as defined in (7). This matrix is the sum of two rank-1 matrices; consequently, it possesses at most two non-zero eigenvalues. In order to delve into the eigen-structure of the matrix , let’s introduce a set of notations:
(9)
(10)
(11)
(12)
(13)
(14)
(15)
Observations about the above notations are as follows: The inequality implies that , and the inequality strictly holds if is not a multiple of . This observation further implies that both and (given in (10) and (11)) are non-negative numbers. For (given in (12)):
where the equality holds if is a multiple of . Similarly, for (given in (13)):
where the equality holds if is a multiple of again. Hence, if is not a multiple of , then is positive, while is negative.
The subsequent result elucidates the eigenstructure of the matrix .
Proposition 3.5.
Let be given in (7) for and . Let , , , , , and be given by (10), (11), (12), (13), (14), and (15), respectively. The following statements hold:
(i)
Assume for some . Then the matrix has only zero as its eigenvalues if ; or has as its only non-zero eigenvalue with the corresponding eigenvector.
(ii)
Assume for any . Then the matrix has only one positive eigenvalue and one negative eigenvalue given as
and . The corresponding eigenvectors associated with and are
and , respectively.
Proof.
Item (i). In this case, . Clearly, if , so has only zero as its eigenvalues. Otherwise, has as its only non-zero eigenvalue with the corresponding eigenvector .
Item (ii). From , the matrix has at most two nonzero eigenvalues which will be found as follows.
For any , a direct computation gives
If the vector is the eigenvector of , then the equation
holds for some and the value is the associated eigenvalue. Simplifying the above equation leads to the following quadratic equation
The discriminant of the quadratic equation with variable is given by (9). Since , we have
. Hence, the above quadratic equation has two real roots and . Substituting and into yield two eigenvalues and of , respectively. In this case, we know that and . The eigenvectors corresponding to and are and , respectively.
∎
We remark that for , the largest component of in (14), that is the first component , is always non-negative. Actually, by (9), (10), and (14), we have
which is always non-negative. This derivation also indicates that always holds if is not parallel to .
If the last component of in (14) is positive, we have the following result.
Theorem 3.6.
For and not being a multiple of , if , then the vector with is the solution to the optimization problem (3). Furthermore, we have
Proof.
From not being a multiple of , , and being nonnegative, we know that and . By identifying , , and as , , and in (8) of Lemma 3.3, respectively, we know that is positive semi-definite and from the item (ii) of Proposition 3.5. Therefore, the unit vector is the solution to the problem (3) from Lemma 3.3.
To determine , we notice that the first entries of both and are positive, hence . Furthermore, since for given in (5), we conclude that for given in (1). This completes the proof of this theorem.
∎
There are two remarks on Theorem 3.6. The first one is that under the conditions of this theorem, simplifying the expression leads to
The second remark concerns the consistency of Theorem 3.6 in with Theorem 3.2. That is, if the condition holds, then and in both Theorem 3.6 and Theorem 3.2 are identical. To this end, and to have simpler expressions, let us denote
By (10), the condition implies . We claim that . If this claim does not hold, then must be negative and . Squaring this inequality and simplifying it yield . In this situation, . This contradicts the negativeness of . Hence, . Similarly, squaring this inequality and simplifying it leads to .
Further, defining and with the help of the identity
we can show, after some simplifications, that the ratios of the entries of in both Theorem 3.6 and Theorem 3.2 are the same:
which means that in both Theorem 3.6 and Theorem 3.2 are identical.
The next result discusses the property of the solution from the -step under the condition that the last component of in (14) is non-positive.
Theorem 3.7.
For and , let be the optimal solution to the optimization problem (3). If , then .
Proof.
Suppose that all components of are positive, Then . So is a local minimizer of
(17)
As the zero vector is orthogonal any vector, it naturally follows that it is orthogonal to , the eigenvector of associated with the negative eigenvalue . By Lemma 3.4, there is no local-nonglobal minimum for (17). Hence is the global minimizer of problem (17). As a result, ,
whose last component is less than by the given condition . This completes our proof.
∎
To have an efficient approach for computing the proximity operator of , let us access the entries of the matrix , which are
Since , the numbers of entries in each row, each column, and each diagonal are increasing corresponding to the indices of the entries. Based on the structure of this matrix, we define a function that maps every pair with and to a non-negative integer as follows:
(18)
This number counts how many negative components in the vector . As is the first column of the matrix , with the number , we consider three cases for the matrix in the following theorem.
Theorem 3.8.
Let and let . Set . Then the following statements hold:
(i)
If , then is the global minimizer to the optimization problem (3);
(ii)
If , then the vector
is the global minimizer to the optimization problem (3), where is the minimizer of the problem
Here is the -order leading principal submatrix of obtained by removing its last rows and columns.
Proof.
(i) For , from the fact , we conclude that for all . Therefore, for all , we have
The inequalities in the above can be achieved for .
(ii) In this case, we split the matrix into block matrix as follows
where , , , and are size , , , and , respectively. In fact, . We further know that all entries in , , and are non-negative. For any , write
with and . We have
The inequality implies . Thus,
In particular, for all vectors with , one has
We conclude that
This completes the proof.
∎
We remark that not all entries of in Theorem 3.8 are necessarily positive, and some entries may be zero, as demonstrated in the following example.
Example 3.9.
Let
For this vector and two different values of , we present the matrix , its eigenvector associated with the negative eigenvalue, and the minimizer of the problem .
For , we have , , and as follows:
For , we have , , and as follows:
Notice that for the values and , both meet the condition , that is . However, this does not determine the positivity of all components in .
We can establish that acts as a promoter of sparsity from Theorem 3.8 under the situation of . This assertion is encapsulated in the subsequent result.
Theorem 3.10.
For , the following inclusion holds for all in the set :
Proof.
By Lemma 2.1,
it suffices to consider all points in the set with their norm smaller than . For , we examine two scenarios.
If with , the result holds due to Theorem 3.1. If for any , by Theorem 3.8 we have , hence, the results holds as well.
∎
This theorem underscores the sparse-promoting nature of within the specified domain.
Given and , Theorem 3.8 provides a clear guideline for algorithm development when computing the optimal solution to problem (3), eventually, . If there exists an integer such that and , it follows that . This allows us to safely truncate by removing its last entries. This approach can significantly speed up the computation process by focusing only on the relevant components of .
We are ready now to present our algorithm for computing based on our WRD procedure for arbitrary . This algorithm is presented in Algorithm 1.
4 The Proximal Operator of
In this section, we detail the computation of the proximal operator for the function via the WRD procedure.
We begin with showing the optimization problem (3) associated with the -step of the WRD procedure. For the given and ,
defining
(19)
The corresponding function in (5) for becomes the quadratic form
By Lemma 2.1, our focus is restricted to discussing the proximity operator of on . This discussion unfolds in the subsequent three subsections.
In the first subsection, we highlight that the method for , as delineated in Section 3, cannot be directly applied to , despite the initial feasibility of such a transfer, particularly considering their analogous reformulations. Additionally, we provide the explicit expression of the proximity operator of
at specific points, highlighting that serves as a function that promotes sparsity.
The second subsection conducts an in-depth examination of the proximity operator of in . Notably, the method tailored for this task poses challenges in its extension to higher dimensions.
In the third subsection, we introduce a strategy to transform the optimization problem in the -step of the WRD procedure. This transformation entails converting a concave objective function constrained on a nonconvex set into one with the same objective function but constrained on a closed and bounded convex set. The latter can be efficiently solved using the nonconvex gradient projection algorithm (see [8]).
4.1 The approach for does not work for
Initially, it may seem feasible to directly apply the method for described in Section 3 to , especially given their similar reformulations. However, we want to point out that this approach is not directly transferable to . This becomes evident when considering Lemma 3.3, which leads us to the subsequent result.
Proposition 4.1.
For and , we consider a quadratic optimization problem on the unit sphere as follows
(20)
A vector is a solution to (4.1) if and only if there is a unique such that
with being a unit vector.
Proof.
Problem (20) is a special case of problem (8) by identifying , , and as , , and , respectively.
The matrix is a rank-1 matrix and has as its only one non-zero eigenvalue with the associated unit eigenvector . Hence, for any , the matrix is positive semidefinite.
“” If is the optimal solution to problem (20), by Lemma 3.3, there exists a unique such that with being a unit vector. We claim that . If not, assume that , and let be an orthogonal matrix whose the first column is . Then, the equality leads to
which is inconsistent. Hence, is strictly greater than .
“” We show that there exists an such that . For , the matrix is invertible and its inverse is
For , from together with the above equation, we obtain
(21)
To study the root the above equation, we consider two different cases: (i) for some and (ii) for any .
For the case of for some , one has and . It leads from (21) that
. This equation has two real roots and the only root, that is larger than , is
.
By Lemma 3.3,
The rest of the proof considers the case of for any . Squaring the identity (21) from its both sides and simplifying the resulting equation lead to the following quartic equation
where and
Since and is positive for a sufficient large , there exists at least one root of on the interval . No matter what value of will be, the number of sign changes of the polynomial is . Therefore, by Descartes’ Rule of Signs [22], we conclude that has exactly one positive root, say . Hence, with ,
(23)
is the optimal solution to problem (20) by Lemma 3.3 again.
∎
It is evident from the preceding proof that all entries of the optimal solution , as indicated in (22) and (23), are negative. Consequently, this vector cannot serve as the solution to problem (3). Therefore, the methodology employed for is not applicable to , necessitating a distinct approach.
Next, we provide the proximity operator of for vectors with uniform entries.
Theorem 4.2.
For and for some , then
Proof.
In this situation, we have from (19). The objective function of problem (3) is
where .
Note that for all , the above quantity achieves its global minimum at being or , depending on which one is further away to . Hence, the norm of the optimal solution to problem (3) is if ; or if ; or if .
As a result, the -step of the WRD procedure provides the optimal solution to problem (3) as follows:
The -step of the WRD procedure simply follows with . At the -step of the WRD procedure, we compare and with defined in (1). Note that
We see that under the condition , the quality is positive if , zero if , or negative if ; Under the condition , ; Under the condition , i.e., , we have always positive. The result of this theorem follows from (2).
∎
The next result shows that the function is indeed a sparse promoting function whose proximity operator will send the points in a neighborhood of the origin to the origin (see [17]).
Theorem 4.3.
For , the following inclusion
holds for with .
Proof.
Let be the objective function of problem (3) associated with . For , we have
for with . We further have
,
where is defined in (1). Hence, .
∎
4.2 Special case: the proximity operator of on
The following result establishes a region in which the proximity operator of does not vanish on .
Proposition 4.4.
For , define two sets in as follows:
Then, the origin is not in for every point .
Proof.
For each point , to prove the origin is not in it is sufficient to show that there exists a point, say , in such that , where is defined in (1).
First, we choose . Then, which holds for .
Next, we choose . Then, with ,
for all points . This completes the proof of this proposition.
∎
We comment on this proposition. Consider two curves parameterized by the parameter as follows:
We have , , and . Two curves intersect at the point with to be the root of the polynomial of . This root is . The red shaded region in Figure 4.2 is the set . The blue shaded region Figure 4.2 represents the set where every point is mapped to the origin by , as stipulated by Theorem 4.3. We will explore the blank region situated between the blue and red shaded areas in the subsequent analysis.
Figure 4.2: The proximity operator will map all points in the blue shaded region to the origin and all points in the red region to a nonzero point.
In the following analysis, our discussion distinctly excludes the instances of uniform entries in , which have been previously addressed in Theorem 4.2. We now focus on the case where . This scenario can be further divided into two distinct cases: one where contains one zero entry, and another where it does not. We begin by examining the situation where includes one zero entry, as detailed in the following proposition.
Proposition 4.5.
For and with , then
Proof.
The objective function of problem (3) associated with for the given is
where . A direct calculation shows that both functions and are concave with respect to . Together with the facts of and , hence, achieves its global minimum at .
The -step of the WRD procedure simply follows with . At the -step, we compare and via their difference .
Our result of this theorem immediately follows from the above difference.
∎
We observe that Proposition 4.5 corroborates the findings of Proposition 4.4 for points lying on the -axis. Further, for .
For with , let be the objective function of problem (3) associated with . We define as
A direct computation yields
(24)
where the constant is given by, with ,
(25)
Then, solving problem (3) involves minimizing over the interval . The minimal value of on this interval can be attained at , , or the critical points of . To determine these critical points, we examine the properties of , which is
We immediately observed that: first, the function monotonically decreases from to as varies from to ; second, the function monotonically increases from to as ranges from to , and from to as goes from to . Thus, is positive, and consequently is increasing on . Therefore, the optimal value of will be achieved at zero or some point in the interval . Hence, we confine our analysis of to this interval.
Remarkably, we can establish that has at most two zeros in the interval . This can be demonstrated by factorizing as a product of a positive function with a convex function:
where is defined as:
(26)
We proceed to demonstrate that is convex on the interval .
Lemma 4.6.
For and a nonzero vector with , the following statements for the function given by (26) hold:
(i)
is convex on the interval , where is given in (25).
(ii)
is positive, zero, or negative if is negative, zero, or positive, respectively. is nonnegative if and negative if .
(iii)
has at most two roots on the interval .
Proof.
Item (i). Notice that
Since both numerator and denominator of are positive, for all , hence, is strictly convex on this interval.
Item (ii). We notice that
Hence, the statements in item (ii) hold.
Item (iii). We have
Together with the convexity of , and the value of , we know that has at most two zeros on the interval .
∎
With these preliminaries, we can now present the solution to problem (3) associated with in the following theorem, which provides the outcome of the -step of the WRD procedure for the proximity operator of .
Proposition 4.7.
For and a nonzero vector with , let the function be given by (24), and let the function be given by (26). Define as in (25).
Then, the optimal solution to problem (3) is represented as:
where is determined as follows:
(i)
Case . We choose
(27)
Here is the root of on the interval .
(ii)
Case . If , we choose ; Otherwise, is chosen to be the root of on .
(iii)
Case . is chosen to be the only root of on the interval .
Proof.
Case . That is, by Lemma 4.6. Then has no root if . In this situation, is positive, so is on . Hence, we choose ; If , since , there exists one and only one point such that . If , has no root, we choose . If , then has a unique root, say , on the interval . In this situation, we choose . All situations are summarized in (27).
Case . That is, by Lemma 4.6. If , we choose . On the other hand, if , let be the only root of on the open interval , then .
Case . That is, by Lemma 4.6 again. Let be the only root on the open interval . Then, , and achieves its global minimum at .
∎
Based on Proposition 4.7, the set of is split into three disjoint sets , , and , as follows:
We further split as the union of , and as the union of , as follows:
With the given sets, the proximity operator of from the WRD procedure is presented in the next theorem.
Theorem 4.8.
Let . For , we have
where is from item (i) or item (ii) of Proposition 4.7.
The -step of the WRD procedure provides , the solution of optimization problem (3) associated with the function by Proposition 4.7. The -step simply follows with . At the -step, we compare and with defined in (1). Note that
If is positive, the zero is in ; if is negative, is in ; if is zero, both the zero vector and are in . The rest of the result follows directly from Proposition 4.7.
∎
Figure 4.3(a) illustrates the region where the proximity operator maps points to the origin. According to Theorem 4.2, all points on the line segment from the origin to will be mapped to the origin by . Additionally, as stated in Theorem 4.8, all points under the line in the red region are mapped to the origin by . The remaining points in both red and blue colors are obtained numerically with the assistance of Theorem 4.8.
(a)
(b)
Figure 4.3: (a) The proximity operator will map all points in the shaded region to the origin; (b) Numerical result for the region which will be mapped to the origin by the .
4.3 General case: the proximity operator of on
Here, we demonstrate that if the last entries of are zero, then the last entries of , the optimal solution to problem (3), are zero as well. Leveraging this result, we proceed by assuming that all entries of are all nonzero. The primary outcome of this subsection is the transformation of problem (3) into the one with same objective function but constrained on a convex set. The modified problem can be addressed using the nonconvex gradient projection algorithm in [8]. Subsequently, we introduce an algorithm for computing the proximity operator of on .
Theorem 4.9.
For and , suppose that the last entries of are zeros, that is,
Then, for an optimal solution to problem (3), we have , that is, the last entries of are zero.
Proof.
The proof hinges on iteratively reducing the dimension by one up to steps. Without loss of generality, we assume that . Let denote the objective function of problem (3) defined on . Throughout this proof, we consistently treat as the truncation of from its first entries.
Define: as . Considering the last entry of being zero, we have
We can verify that both and are concave functions over the domain , hence, the minimal value of will be achieved at on the boundary of the ball.
We remark that cannot be the zero vector. If so, . However, , which contradicts being the minimal solution to .
Next, we show that must be a unit vector, that is, . If not, assume that , we can show that there exists a better solution on the boundary of , which contradicts the optimality of . Write , we define as follows:
Clearly,
which is not constant, and concave with respect to the variable . Therefore, the minimal value can only be achieved at . Therefore, . In other words, the -th entry of the optimal solution to problem (3) must be . This completes the proof.
∎
Note that for problem (3), the feasible set is nonconvex. This nonconvex nature poses significant challenges in algorithm development. To address this, we present the following result which allows us to consider the problem within the confines of a convex set, specifically . This approach provides a more tractable pathway for algorithmic development and analysis.
Theorem 4.10.
Let and assume that its last entry is nonzero. Let be an optimal solution to the following optimization problem
(28)
where is given by (19). Then, is either the origin or the optimal solution to the optimization problem (3). Furthermore, we have
(29)
Proof.
The proof is trivial if is the zero vector. If , we now show that , i.e. is the optimal solution to the optimization problem (3). If not, we denote the objection function of problem (28) by , that is,
Set ,
and define as follows:
Clearly, achieves its optimal value at either or . Hence,
We conclude that is either the origin or the optimal solution to the optimization problem (3).
Finally, we show the inclusion (29) holds. If , then, for all ,
where is the objective function of problem (3). Therefore, no matter which the optimal point for problem (3) is, we know .
If , then is the optimal solution to problem (3) as well. Hence is the output of the -step of the WRD procedure and . Obviously, by the -step and -step of the WRD procedure. We conclude that the inclusion (29) holds.
∎
Based on Theorem 4.10, computing is resorting to solving optimization problem (28). This problem has a concave objective function restricted on a convex set. A popular algorithm for solving problem (28) is called nonconvex gradient projection algorithm as follows: with an initial guess , iterative
(30)
where is the projection operator onto the set . Since is a closed and bounded semi-algebraic convex subset of and the gradient of the objective function of problem (28) is with Lipschtiz constant , the sequence converges, see, for example [8, Theorem 5.3].
We are ready now to present our algorithm for computing based on our WRD procedure for arbitrary . This algorithm is presented in Algorithm 2.
Due to the inherent nonconvexity of problem (28), the initial guess provided to any algorithms for this problem significantly influences the quality of the solution obtained. In our simulations, we have observed that choosing with tends to yield satisfactory results. The numerical result with Algorithm 2 in is shown in Figure 4.3(b). In comparison with Figure 4.3(a), the regions which are identified to be mapped to the origin by are consistent.
5 Conclusions
This paper addresses the computation of proximity operators of scale and signed permutation invariant functions. By delving into the intrinsic properties of these functions, we introduce a procedure called WRD, which includes the -step, -step, and -step, to effectively handle the computation of proximity operators. Specifically, we conduct a thorough investigation into two specific scale and signed permutation invariant functions: the ratio of and its square. For the function , we propose an algorithm capable of explicitly generating its proximity operator through a few straightforward steps. Additionally, for the function , we devise an efficient algorithm with guaranteed convergence to compute its proximity operator.
In future endeavors, we aim to explore the practical applications of these developed algorithms, particularly in sparse signal recovery and image processing domains.
Declarations
•
The authors declare that they have no conflict of interest.
•
The work of L. Shen was supported in part by the National
Science Foundation under grant DMS-2208385 and by 2023 and 2024 Air Force Summer Faculty Fellowship Program (SFFP). Any opinions, findings
and conclusions or recommendations expressed in this material are those of the
authors and do not necessarily reflect the views of the National Science Foundation and AFRL (Air Force Research
Laboratory).
References
\bibcommenthead
Candes
et al. [2008]
Candes, E.,
Wakin, M.B.,
Boyd, S.:
Enhancing sparsity by reweighted minimization.
Journal of Fourier Analysis and Applications
14,
877–905
(2008)
Prater-Bennette
et al. [2022]
Prater-Bennette, A.,
Shen, L.,
Tripp, E.E.:
The proximity operator of the log-sum penalty.
Journal of Scientific Computing
93(3),
1–34
(2022)
Lopes [2016]
Lopes, M.E.:
Unknown sparsity in compressed sensing: Denoising and inference.
IEEE Transactions on Information Theory
62(9),
5145–5166
(2016)
Rahimi
et al. [2019]
Rahimi, Y.,
Wang, C.,
Dong, H.,
Lou, Y.:
A scale-invariant approach for sparse signal recovery.
SIAM Journal on Scientific Computing
41(6),
3649–3672
(2019)
Tang and
Nehorai [2011]
Tang, G.,
Nehorai, A.:
Performance analysis of sparse recovery based on constrained minimal
singular values.
IEEE Transactions on Signal Processing
59(12),
5734–5745
(2011)
Yin et al. [2014]
Yin, P.,
Esser, E.,
Xin, J.:
Ratio and difference of and norms and sparse
representation with coherent dictionaries.
Communications in Information and Systems
14(2),
87–109
(2014)
Xu
et al. [2021]
Xu, Y.,
Narayan, A.,
Tran, H.,
Webster, C.G.:
Analysis of the ratio of and norms in compressed
sensing.
Applied and Computational Harmonic Analysis
55,
486–511
(2021)
Attouch
et al. [2013]
Attouch, H.,
Bolte, J.,
Svaiter, B.F.:
Convergence of descent methods for semi-algebraic and tame problems:
proximal algorithms, forward-backward splitting, and regularized
Gauss-Seidel methods.
Mathematical Programming, Ser. A
137,
91–129
(2013)
Beck and
Teboulle [2009]
Beck, A.,
Teboulle, M.:
A fast iterative shrinkage-thresholding algorithm for linear inverse
problems.
SIAM Journal on Imaging Sciences
2,
183–202
(2009)
Bolte
et al. [2014]
Bolte, J.,
Sabach, S.,
Teboulle, M.:
Proximal alternating linearized minimization for nonconvex and
nonsmooth problems.
Mathematical Programming
146,
449–494
(2014)
Combettes and
Wajs [2005]
Combettes, P.,
Wajs, V.:
Signal recovery by proximal forward-backward splitting.
Multiscale Modeling and Simulation: A SIAM Interdisciplinary Journal
4,
1168–1200
(2005)
Krol et al. [2012]
Krol, A.,
Li, S.,
Shen, L.,
Xu, Y.:
Preconditioned alternating projection algorithms for maximum a
Posteriori ECT reconstruction.
Inverse Problems
28,
115005–34
(2012)
Li et al. [2015]
Li, Q.,
Shen, L.,
Xu, Y.,
Zhang, N.:
Multi-step fixed-point proximity algorithms for solving a class of
optimization problems arising from image processing.
Advances in Computational Mathematics
41(2),
387–422
(2015)
Parikh and Boyd [2014]
Parikh, N.,
Boyd, S.:
Proximal algorithms.
Foundations and Trends in Optimization
1,
123–231
(2014)
Tao [2022]
Tao, M.:
Minimization of over for sparse signal recovery with
convergence guarantee.
SIAM Journal on Scientific Computing
44(2),
770–797
(2022)
Shen
et al. [2019]
Shen, L.,
Suter, B.W.,
Tripp, E.E.:
Structured sparsity promoting functions.
Journal of Optimization Theory and Applications
183(3),
386–421
(2019)
Moreau [1962]
Moreau, J.-J.:
Fonctions convexes duales et points proximaux dans un espace
hilbertien.
C.R. Acad. Sci. Paris Sér. A Math.
255,
1897–2899
(1962)
Donoho [1995]
Donoho, D.:
De-noising by soft-thresholding.
IEEE Transactions on Information Theory
41,
613–627
(1995)
Tao and An [1996]
Tao, P.D.,
An, L.T.H.:
Difference of convex functions optimization algorithms (DCA) for
globally minimizing nonconvex quadratic forms on Euclidean balls and
spheres.
Operations Research Letters
19(5),
207–216
(1996)
Martínez [1994]
Martínez, J.M.:
Local minimizers of quadratic functions on Euclidean balls and
spheres.
SIAM Journal on Optimization
4(1),
159–176
(1994)