A Multi-resolution Low-rank Tensor Decomposition
Abstract
The (efficient and parsimonious) decomposition of higher-order tensors is a fundamental problem with numerous applications in a variety of fields. Several methods have been proposed in the literature to that end, with the Tucker and PARAFAC decompositions being the most prominent ones. Inspired by the latter, in this work we propose a multi-resolution low-rank tensor decomposition to describe (approximate) a tensor in a hierarchical fashion. The central idea of the decomposition is to recast the tensor into multiple lower-dimensional tensors to exploit the structure at different levels of resolution. The method is first explained, an alternating least squares algorithm is discussed, and preliminary simulations illustrating the potential practical relevance are provided.
Index Terms— Tensor decomposition, Low-rank approximation, Kronecker decomposition, multi-resolution approximation.
1 Introduction
We live in a digital age where common-life devices, from smartphones to cars, generate massive amounts of data that provide researchers and practitioners a range of opportunities. Processing contemporary information comes, however, at a cost, since data sources are messy and heterogeneous. In this context, parsimonious models emerge as an ideal tool to enhance efficiency when processing such vast amounts of information. This can be done by leveraging the structure of the data, as is the case of information living in multiple (possibly many) dimensions. Multi-dimensional data are prevalent in numerous fields, with representative examples including chemometrics, bioengineering, communications, hyper-spectral imaging, or psychometrics [1, 2]. Traditionally, matrices were used to model those datasets, but tensor-representation models have been recently breaking through. Multi-dimensional arrays, or tensors, are data structures that generalize the concept of vectors and matrices to highly-dimensional domains. In recent years, tensors have also been applied to address numerous data science and machine learning tasks, from simple interpolation to supervised classification [3].
In this data-science context, a problem of particular interest is that of tensor decomposition, which tries to estimate a set of latent factors that summarize the tensor. Many tensor decompositions were developed as the generalization of well-known matrix-decomposition methods to high-dimensional domains [4, 5]. This was the case of the PARAFAC tensor decomposition [6] and its generalization, the Tucker tensor decomposition [7], which can be both understood as higher-order generalizations of the SVD decomposition of a matrix. More specifically, these decompositions aim at describing (approximating) the tensor as a sum of rank-1 tensors, decomposing it as a sum of outer products of vectors (called factors). The PARAFAC decomposition is conceptually simple and its representation complexity scales gracefully (the number of parameters grows linearly with the rank). The Tucker decomposition enjoys additional degrees of freedom at the cost of greater complexity (exponential dependence of the number of parameters with respect to the rank). Hierarchical tensor decompositions, such as the Tensor Train (TT) decomposition [8] or a hierarchical Tucker (hTucker) decomposition [9], try to alleviate this problem. The former unwraps the tensor into a chain of three-dimensional tensors, and the latter generalizes the same idea by organizing the dimensions in a binary tree. Furthermore, in recent years significant effort has been devoted to modify existing decomposition algorithms to deal with factor constraints (e.g., non-negativeness), promote certain priors (e.g., factor sparsity), or be robust to imperfections [10] [11] [12].
However, little to no work has been carried out to study the tensor decomposition from a multi-resolution perspective. This can be specially interesting for tensor signals such as videos, where 2-, 3-, and 4-dimensional components are mixed in a single tensor. In this work, we postulate a simple but novel multi-resolution low-rank decomposition method. More specifically, this paper:
-
•
Introduces a new multi-resolution tensor decomposition to exploit the low-rank structure of a tensor at different resolutions.
-
•
Proposes an algorithm to implement the decomposition.
-
•
Tests the benefits of the model via numerical simulations.
Regarding the first contribution, rather than postulating a low-rank decomposition of the tensor using the original multidimensional representation, we 1) consider a collection of lower-order multidimensional representations of the tensor (where several of the original modes of the tensor are combined into a single one); 2) postulate a low-rank decomposition for each of the lower-dimensional representations; 3) map each of the representations back to the original tensor domain; and 4) model the original tensor as the sum of such low-rank representations. As illustrated in detail in the manuscript, this results in an efficient decomposition method capable of combining low-rank structures present at different resolutions.
Section 2 introduces notation and tensor preliminaries. Section 3 presents our decomposition method. A simple algorithmic approach to address the decomposition is described in Section 4. Illustrative numerical experiments are provided in Section 5.
2 Notation and tensor preliminaries
The entries of a (column) vector , a matrix and a tensor are denoted by , and , respectively, with denoting the order of tensor . Moreover, the th column of matrix is denoted by . Sets are represented by calligraphic capital letters. The cardinality of a set is denoted by . When a set is ordered, we use the notation with to denote the th element of the set. The vertical concatenation of the columns of matrix is denoted by . is the Frobenious norm of matrix , which can be equivalently written as .
2.1 Tensor to matrix unfolding
Given a tensor of order and size , there are many ways to unfold the entries of the tensor into a matrix . In this section, we are interested in unfoldings where the columns of matrix represent one of the original modes of and the rows of represent all the other modes of the tensor. Mathematically, we define the matrix unfolding operator as
(1) | ||||
where and, to simplify exposition, we have assumed that .
2.2 Tensor to lower-order tensor unfolding
Consider a tensor , of order , and let denote the set containing the indexes of all the modes of .
Definition 1
The ordered set is a partition of the set if it holds that: for all , for all , and .
We are interested in resha** the entries of the th order tensor of size to generate a lower-order tensor , with order and according to a given partition as specified next
(2) | ||||
Note that, according to definition of the operator, the indexes along the th mode of represent tuples of indexes of the original tensor .
Clearly, if , so that and , we have that . On the other hand, if , so that and for all , we have that .
2.3 Low-rank PARAFAC tensor decomposition
Consider the th order tensor along with the matrices for . Then, is said to have rank if it can be written as
(3) |
where is the generalization of the outer product for more than two vectors. That is, if , , are three generic vectors, then is a tensor of order satisfying .
The decomposition in (3) is oftentimes referred to as canonical polyadic decomposition or PARAFAC decomposition, with matrices being referred to as factors. As in the case of matrices, moderate values of induce a parsimonious description of the tensor, since the values in can be equivalently represented by the entries in .
Using the Khatri-Rao product, denoted as , and the different unfolding operators introduced in the previous sections, we have that
(4) | |||||
(5) |
These expressions will be leveraged in the next section.
3 Multi-resolution low-rank decomposition
Consider a collection of partitions ,…,, with for . Given the th order tensor and the collection of partitions ,…,, we propose the following decomposition for the tensor at hand
(6) |
which can be equivalently written as
(7) |
where is the rank of the tensor associated to the partition.
Number of parameters: As already explained, one of the most meaningful implications of low-rank tensor models is the fact that they provide a parsimonious description of the tensor, reducing its implicit number of degrees of freedom. The same is true for the decomposition in (6). To be concrete, the tensor has order , with the dimension of the th mode being . As a result, having rank implies that
parameters suffice to fully describe the entries in . Summing across the different factors implies that
parameters suffice to fully describe the entries in .
4 Algorithmic implementation
The decomposition introduced in (6) can be obtained by solving the following minimization problem:
(8) | |||
The approach proposed in this section is to estimate each of the tensors sequentially, so that when optimizing with respect to the remaining tensors with are kept fixed. As a result, the minimization problem to be solved in the th step is:
(9) | |||
for . The constraint in (9) can be handled using a PARAFAC decomposition
(10) |
so that (9) can be equivalently formulated as:
(11) |
The above problem is non-convex, but fixing all but one of the factors (say the th one), it becomes linear in . Under this approach and unfolding the tensor into a matrix , we have the following update rule to constructing an Alternating Least Squares (ALS) algorithm:
(12) |
for all . Once the factors have been obtained, then a) the th tensor is found using (10) and b) the problem in (9) is solved for the next , with . As a result, instances of (12) need to be run. Note that, when solving (8) via (9)-(12), the order matters. The first to be estimated provides the main (coarser) approximation, while the subsequent ones try to fit the residual error between the main tensor and the sum of the previously estimated components , providing a finer approximation. Due to the structure , which carries over , the order in which the tensors are approximated is expected to generate variations in the results.
4.1 Constructing the partitions
The algorithm in the previous section assumes that the partitions ,…, are given. A simple generic approach to design ,…, is to rely on a regular multiresolution construction that splits the index set into smaller sets with the same cardinality. More specifically, one can implement a sequential design with steps for which, at step we split into index sets with (approximately) the same number of elements. The collection of partitions ,…, is then naturally given by grou** together the sets obtained in each of those steps. To be more clear, let and be the floor and ceil operators and consider the collection of partitions ,…, with and where the th element is given by
In the above definition we have adopted the convention that, if is a whole positive number, and . Clearly, the partition design in (4.1) is regular in the sense that it achieves for all and for .
To gain insights, suppose for simplicity that our tensor of order has size , i.e., that the value of is the same across modes, then the number of parameters required to represent using the model in (6) and the partitions in (4.1) is approximately
(14) |
which contrasts with the entries in .
Clearly, alternative ways to build the partitions ,…, are possible. This is especially relevant when prior knowledge exists and one can leverage it to group indexes based on known (di-)similarities among the underlying dimensions. Due to space limitations discussing such alternative partition techniques is out of the scope of this manuscript, but it is part of our ongoing work.
5 Numerical experiments
The multi-resolution low-rank (MRLR) tensor decomposition scheme is numerically tested in three different scenarios: the first dealing with an amino acids dataset [13], the second one with a video signal [14], and the third one to approximate a multivariate function. The amino acids dataset is a three-mode tensor of size . The video signal is composed of 173 frames of pixels each and three channels (R, G, and B). To reduce the computational and memory complexity requirements of the problem, the frames have been sub-sampled and the resolution has been lowered, resulting in a final four-mode tensor of size . Finally, the multidimensional function in the last scenario has as its domain, with each of the three dimensions being discretized using 100 points, so that a tensor with entries is obtained. The Tensorly Python package is used to benchmark the MRLR tensor decomposition against other tensor decomposition algorithms [15].
The amino acids tensor is approximated using a hierarchical structure of a matrix plus a three-mode tensor. The matrix can be build by unfolding the tensor in different ways. Here, two reshapes have been studied, a unfolding (res-1), and a unfolding (aka res-2). The structure of the algorithm resembles that of a gradient-boosting-like approach [16]. First, the initial tensor is approximated by a low-rank structure. Then, the residual is approximated by a low-rank structure too. Subsequent residuals are also approximated if necessary. This sequential process can be started from the coarser unfolding, the matrix, or the other way around (reverse). In this experiment, both alternatives have been tested. The rank of the matrix unfolding is fixed while the rank of the three-mode tensor is gradually increased.
![Refer to caption](extracted/5624543/figures/fig_1.jpg)
The performance of the algorithms has been measured in terms of Normalized Frobenius Error () between the true tensor and the approximation , which is given by
(15) |
The results are reported in Fig. 1. The MRLR decomposition is compared to the PARAFAC decomposition. The res-1 unfolding of the matrix (square-like unfolding) seems to perform better than the res-2 unfolding (tall unfolding). Then, the approximation from the coarser to the finer arrangement beats the reverse one. Moreover, all the MRLR schemes outperform the PARAFAC one in terms of for the same number of parameters. Indeed, the best-performing MRLR algorithm obtains roughly the same as the PARAFAC decomposition using parameters less approximately.
In the second test case, the four-mode video tensor is unfolded into a matrix and a three-mode tensor. The ranks of the matrix and the three-mode tensors have been fixed to 1. The rank of the four-mode tensor approximation is gradually increased. The results are provided in Fig. 2. Again, the coarser-to-finer arrangement outperforms both, the reverse (finer-to-coarser) arrangement, and the PARAFAC decomposition. It needs approximately parameters less to achieve the same .
![Refer to caption](extracted/5624543/figures/fig_2.jpg)
Finally, we tested the MRLR tensor decomposition in a third test case to approximate a multivariate function. Given a set of input variables, with denoting the th input variable and the set of all possible values of , we are interested in functions that map any element into a real value. When these functions are discrete, tensors can be used to model them efficiently. Continuous functions can be discretized/quantized. Tensor decomposition methods can then be leveraged for applications such as approximation, or denoising [17]. In such a context, we tested the MRLR tensor decomposition algorithm to model the following multivariate continuous function :
(16) |
Sampling a three dimensional grid of discrete values ranging from to with an step-size of leads to a tensor that summarizes the multivariate function in (16). The tensor can be approximated using the MRLR tensor decomposition to leverage parsimony. The tensor is unfolded into a matrix, and the coarser-to-finer setup has been implemented. The performance of the MRLR tensor decomposition is again compared to that of the PARAFAC decomposition in terms of for an increasing number of parameters. The results are shown in Fig. 3. As in previous scenarios, the MRLR decomposition outperforms the PARAFAC decomposition for the same number of parameters consistently. At some points, the difference between both algorithms is particularly high. For example, the MRLR tensor decomposition needs roughly parameters to achieve of , while the PARAFAC decomposition needs more than parameters.
![Refer to caption](extracted/5624543/figures/fig_3.jpg)
6 Conclusions
This paper presented a parsimonious multi-resolution low-rank (MRLR) tensor decomposition to approximate a tensor as a sum of low-order tensor unfoldings. An Alternating Least Squares (ALS) algorithm was proposed to implement the MRLR tensor decomposition. Then, the MRLR tensor decomposition was compared against the PARAFAC decomposition in two real-case scenarios, and also in a multivariate function approximation problem. The MRLR tensor decomposition outperformed the PARAFAC decomposition for the same number of parameters, showing that it can efficiently leverage information defined at different dimensional orders.
References
- [1] R. Bro, “Parafac. tutorial and applications,” Chemometrics and Intelligent Laboratory Systems, vol. 38, no. 2, pp. 149–171, 1997.
- [2] R. B. Cattell, “Parallel proportional profiles and other principles for determining the choice of factors by rotation,” Psychometrika, vol. 9, no. 4, pp. 267–283, 1944.
- [3] E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropoulos, “Tensors for data mining and data fusion: Models, applications, and scalable algorithms,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 8, no. 2, pp. 1–44, 2016.
- [4] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Review, vol. 51, no. 3, pp. 455–500, 2009.
- [5] N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E. E. Papalexakis, and C. Faloutsos, “Tensor decomposition for signal processing and machine learning,” IEEE Transactions on Signal Processing, vol. 65, no. 13, pp. 3551–3582, 2017.
- [6] R. A. Harshman, “Foundations of the parafac procedure: Models and conditions for an “explanatory” multimodal factor analysis,” UCLA Working Papers Phonetics, vol. 16, pp. 1–84, 1970.
- [7] L. R. Tucker, “Some mathematical notes on three-mode factor analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311, 1966.
- [8] I. V. Oseledets, “Tensor-train decomposition,” SIAM Journal on Scientific Computing, vol. 33, no. 5, pp. 2295–2317, 2011.
- [9] L. Grasedyck, D. Kressner, and C. Tobler, “A literature survey of low-rank tensor approximation techniques,” GAMM-Mitteilungen, vol. 36, no. 1, pp. 53–78, 2013.
- [10] D. Wang, F. Cong, and T. Ristaniemi, “Higher-order nonnegative candecomp/parafac tensor decomposition using proximal algorithm,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 3457–3461.
- [11] Q. Xie, Q. Zhao, D. Meng, and Z. Xu, “Kronecker-basis-representation based tensor sparsity and its applications to tensor recovery,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 8, pp. 1888–1902, 2017.
- [12] O. Kaya and B. Uçar, “Parallel candecomp/parafac decomposition of sparse tensors using dimension trees,” SIAM Journal on Scientific Computing, vol. 40, no. 1, pp. C99–C130, 2018.
- [13] R. Bro, “Multi-way analysis in the food industry-models, algorithms, and applications,” Ph.D. dissertation, University of Amsterdam (NL), 1998.
- [14] S. Rozada, “Multi-resolution low-rank tensor decomposition,” https://github.com/sergiorozada12/multiresolution-tensor-decomposition, 2021.
- [15] J. Kossaifi, Y. Panagakis, A. Anandkumar, and M. Pantic, “Tensorly: Tensor learning in python,” arXiv preprint arXiv:1610.09555, 2016.
- [16] J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of Statistics, pp. 1189–1232, 2001.
- [17] N. Kargas and N. D. Sidiropoulos, “Supervised learning and canonical decomposition of multivariate functions,” IEEE Transactions on Signal Processing, vol. 69, pp. 1097–1107, 2021.