[1]\fnmAgnieszka \surNiemczynowicz \equalcontThese authors contributed equally to this work.
These authors contributed equally to this work.
[1]\orgdivFaculty of Mathematics and Computer Science, \orgnameUniversity of Warmia and Mazury in Olsztyn, \orgaddress\street Słoneczna 54, \cityOlsztyn, \postcode10-710, \stateWarmińsko-Mazurskie, \countryOlsztyn
2]\orgdivFaculty of Computer Science and Telecommunications, \orgnameCracow University of Technology, \orgaddress\streetWarszawska 24, \cityKraków, \postcode31-155, \stateMałopolska, \countryPoland
Fully tensorial approach to hypercomplex neural networks
Abstract
Fully tensorial theory of hypercomplex neural networks is given. The key point is to observe that the algebra multiplication can be represented as a rank three tensor. This approach is attractive for neural network libraries that support effective tensorial operations.
keywords:
Hypercomplex neural network, algebra, tensorpacs:
[MSC Classification]15A69, 15-04
1 Introduction
The fast progress in applications of Artificial Neural Networks (NN) promotes new directions of research and generalizations. This involves advanced mathematical concepts such as group theory [19], differential geometry [5, 6], or topological methods in data analysis [7].
The core of NN implementations lies in linear algebra usage. In most popular programming libraries (TensorFlow [1], PyTorch [15]), the most popular architecture is feed forward NN, which is based on a stack of layers where the data passes between them unidirectionally. Optimized tensorial operations realize the flow of the data.
There are different algebraic extensions. One of these paths is Algebraic Neural Networks [14], where the additional endomorphism operations on data are performed. The other algebra-geometry direction is the neural networks based on Geometric Algebra/Clifford algebra [4, 17]. Recently, Parametrized Hypercomplex Neural Networks were invented [10] for convolutional layers. They can learn optimal hypercomplex algebra adjusted to data, exploring optimized Kronecker product. However, in some applications kee** hyperalgebra of even more general algebra parameters as (fixed) hyperparameters is needed. In such a way we can optimize algebra structure at the metalevel.
In this paper we discuss implementation in which we change classical real algebra computations into various hypercomplex or even more general algebras computations. This approach presented, e.g., in [3] is not new. However, there is a revival of interest of this direction due to better complexity properties in such areas as image processing [18] or time series analysis [11]. In these contributions the Open Source code [18] for specific four-dimensional hypercomplex algebras was used.
The implementation explained in this article significantly expands the ideas from [18] for arbitrary algebras, including hypercomplex ones. The algorithms described here agree with the NN presented in [18] for specific 4-dimensional hypercomplex algebras. However, implementation of [18] was obtained by constructing an additional multiplication structure from the multiplication table for the hypercomplex algebra, which is treated as additional step in setting up neural network. Our approach permit us to omit this complexity and generalize to arbitrary algebras. This is very important contribution form theoretical treatment of general algebraic approach to hypercomplex neural networks.
The main contribution of this paper is following:
-
•
summarize basic concepts on tensorial operations in terms of hypercomplex and more general algebras, especially, we noted that the algebra multiplication can be expressed as a third-rank tensor,
-
•
provide general algorithm for computations within hypercomplex dense layer,
-
•
provide general algorithms for 1-, 2-, and 3-dimensional hypercomplex convolutional layer computations.
2 Methods
This section provides an overview of the mathematical theory behind the operations used in implementing hypercomplex neural networks. This is a tenet of methods used in this paper. These are classical notions explained in detail in standard references, e.g., [2].
2.1 Tensors
The primary object that is used in NN implementations is a tensor. It relies on the tensor product described in the following definition
Definition 1.
The tensor product of two vector spaces and over the field is the vector space denoted by and defined as a quotient space , where is a subspace of that is spanned by
(1) |
where , , .
By induction, it can be defined for vector spaces and denoted by that can be dented by .
The tensor product can also be defined for duals spaces of a vector spaces for , , and we can define mixed tensor product made for vector spaces and their duals.
The tensor product of vector space is a vecotr space, so we can define a base, e.g., if the base of is and is , then the base of is .
The tensor space can be used to decompose any multilinear map**. It is expressed in the universal factorization theorem for tensor product. It states that for a bilinear map** of vector spaces can be uniquely factorized by a new map** according to Fig. 1.
Then the map is called the tensor. Moreover, in the map , we can move to the domain of the map, i.e., we can define instead of , where is a field common to the vector spaces.
Example 1.
In the base of and of the bilinear map** has the form
(2) |
where for all , are the coefficients of a numerical matrix that is a representation of the multilinear map** (tensor) in the fixed base of and . The matrix collects the components of the tensor in a fixed tensor base.
The matrix is implemented in TensorFlow and PyTorch library as a tensor class. The critical difference is that the mathematical tensors have specific properties of transformations under the change of basis of underlying vector spaces. The libraries implement the tensors as a multidimensional matrix of numbers. Moreover, they do not keep the upper (contravariant) or lower (covariant) position of indices.
The example can be extended to the tensor product of multiple vector spaces and their duals.
Example 2.
The linear map** can be written as a map** that can be written in the base of and of as111Here we use Einstein summation convention: repeated bottom and top index indicate the summation over the whole range. . The vector space of linear operators is denoted as .
For tensors we also often use abstract index notation where we provide only components of the tensor, e.g., , understanding them not as fixed base numerical values but as a full tensor , see [16],
Since the tensor product is a functor in the category of linear spaces [2], therefore, for a linear map** , we can define the extension of the map** for a tensor product space by acting on product as and extending by linearity for all combinations.
This is similar behaviour as for the Cartesian product of vector spaces. The Cartesian product is also a functor, and therefore, all the above operations apply in this case.
We can define a few linear algebra operations realized in tensor libraries.
-
•
Broadcasting: it is defined on for a linear operator to be an multilinear extension
(3) that is . Similar broadcasting is realized for the tensor product. Both operations relies on functoriality of Cartesian product and tensor product.
-
•
Transposition/Permutation: The transposition of two components relies on the following fact: there is the unique map** that simply reverses the order of factors . We can extend it to arbitrary permutation of numbers, , we have . We can define a similar operation for the Cartesian product.
In the abstract index notation, we have .
-
•
Resha**: For a pair of indices of the range and we can define a reshape operation, which is the new sole index that value depends on the values of given by the function: . We can extend the operation for the pair of neighbour indices of a tensor by , where . This operation changes only the way of indexing; however, it is useful in applications. In the abstract index notation we can write .
-
•
Contraction: For a two tensors and , and a fixed base of , the contraction of indices (related to ) and (related to is (note implicit sum): where and .
The contraction can also be defined for a single tensor in the same way, e.g., in abstract index notation for a single tensor , ones get , where implicit summation was applied.
-
•
Concatenation: joins the tensors of the same shape along given dimension , i.e., .
2.2 (Hypercomplex) Algebras
In this part we introduce mathematical concepts related to hypercomplex and general algebras, as in the following definition.
Definition 2.
The algebra over a field is a vector space equipped with a product - a binary operation with the following properties:
-
•
,
-
•
,
-
•
,
for and .
Moreover, the algebra is commutative if for .
Example 3.
For real numbers we have , with .
Complex numbers can be obtained from commutative algebra with , and , and .
By convention we will always assume that the neutral element of the algebra mutliplication is .
Efficient use of algebras in computations based on tensors (TensorFlow, PyTorch) relies on converting the product within the algebra into a tensor operation.
Treating algebra as a vector space with additional structure of vector multiplication222It can be define using a functor that cast category of algebras into the category of linear spaces. we have the following definition, which is essential to the rest of the paper.
Definition 3.
For an algebra over the product can be defined as a tensor . Selecting the base of and the dual base with , the product has the form
(4) |
Then the multiplication table entry is presented in (5).
(5) |
The tensor coefficients (abstract index notation), , play the same role as structure constants for a group [8]. These coefficients in a fixed base can be represented as a multidimensional matrix (called tensors in TensorFlow and PyTorch). When the algebra is commutative, then or .
3 Results
In this section, we provide mathematical details of the implementation of hypercomplex dense and convolutional neural networks.
We do not distinguish co- and contravariant indices in the description and write them at the bottom level. Moreover, the tensor is treated as a multidimensional array, with indices starting from 0. This is a standard convention in the tensor libraries such as TensorFlow and PyTorch.
We assume that the algebra multiplication structure constant is fixed and stored in the tensor (in abstract index notation) , see Subsection 2.2.
3.1 Hypercomplex Dense layer
We start with a description of the dense layer. It is a general-purpose layer that operates on the data with additional dimensionality, which is a multiple algebra dimensions. We assume that the input data are of dimension , where is the batch size, - the algebra size, and - the positive integer multiplier. The last two numbers determine the input data size. The input tensor , where - batch index, is the dimension of algebra, is the multiplicity index of algebra dimension. Moreover, we use learning parameters (weights/kernel) , where is the index over units/neurons. The bias is used if needed. Kernel and bias are usually initialized with numbers taken from specific distributions [9].
We now provide the algorithm of hypercomplex dense network in Algorithm 1. We offer both tensorial and abstract index notations (AIN). We need two flags bias - if bias is included and activation - if activation function is used.
The Keras with TensorFlow implementation and PyTorch implementations are given in [13].
3.2 Hypercomplex Convolutional layer
In this part the hypercomplex convolutional neural network will be described. We present general -dimensional () layers. They differ by the shape of the input data and kernel size.
The additional ’image channels’ of the data can be packed as an element of algebra. For instance, the two-dimensional image can be decomposed as a matrix of four-dimensional algebra elements, i.e., color channels plus alpha, making a single pixel an element of four-dimensional algebra.
The general idea of NN action, as in [18], is to use algebra constants tensor to separate the coefficients of the algebra base component and then apply the traditional convolution for each algebra component.
The dimension of the input data of dimension , where is the batch size, , - the size of data sample in each dimension, - the algebra dimension, and - the positive multiplier333One can see that the size of the data sample in each dimension (axes) stands right after the batch size, which is used in TensorFlow convention. For PyTorch, this multiindex is placed at the end. We will not provide an implementation for the PyTorch convention since it differs by a few permutations.. The kernel size is , where: , is the kernel dimension in each dimension, and - the number of filters. We therefore have the kernel . The flag indicates the bias usage. If it is used, it has dimension . Kernel and bias can be initialized by arbitrary distributions [9]. We use standard -dimensional convolution for convoluting algebra components, there are standard optimized convolution operations [12]. The algorithm is presented in the Algorithm 2.
4 Conclusion
In this paper, we introduced the mathematical details of the implementation of dense and convolutional NN based on hypercomplex and general algebras. The critical point in this presentation is to associate algebra multiplication with rank-three tensor. Thanks to this observation, all the NN processing steps can be represented as tensorial operations.
The fully tensorial operations applied to algebra operations simplifies neural networks operations, and allows to support for fast tensorial operations in modern packages as TensorFlow and PyTorch. Implementation of the above generalized hypercomplex NN is described elsewhere [13].
Supplementary information
Acknowledgements This paper has been supported by the Polish National Agency for Academic Exchange Strategic Partnership Programme under Grant No. BPI/PST/2021/1/00031 (nawa.gov.pl).
Declarations
-
•
Funding - This paper has been supported by the Polish National Agency for Academic Exchange Strategic Partnership Programme under Grant No. BPI/PST/2021/1/00031 (nawa.gov.pl).
-
•
Conflict of interest/Competing interests - Not applicable
-
•
Ethics approval and consent to participate - Not applicable
-
•
Consent for publication - All authors agree on publication
-
•
Data availability - Not applicable
-
•
Materials availability - Not applicable
-
•
Code availability - Not applicable
-
•
Author contribution - A.N., R.K.: Conceptualization, Methodology, Investigation, Writing - Original Draft, Review, Editing. A.N.: Funding acquisition, Supervision.
Acknowledgements
This paper has been supported by the Polish National Agency for Academic Exchange Strategic Partnership Programme under Grant No. BPI/PST/2021/1/00031 (nawa.gov.pl).
References
- [1] M. Abadi, P. Barham, J. Chen, Z. Chen et al. TensorFlow: a system for large-scale machine learning, In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, USA, 265–283, (2016).
- [2] P. Aluffi, Algebra: Chapter 0, American Mathematical Society, 2009
- [3] P. Arena, L. Fortuna, G. Muscato, M.G. Xibilia, Neural Networks in Multidimensional Domains, Lecture Notes in Control and Information Sciences, Springer London, 1998l doi: 10.1007/BFb0047683
- [4] S. Buchholz, G. Sommer, On Clifford neurons and Clifford multi-layer perceptrons, Neural Networks, 21, 7, 925-935 (2008); doi: 10.1016/j.neunet.2008.03.004.
- [5] W. Cao, C. Zheng, Z. Yan, et al. Geometric machine learning: research and applications. Multimed Tools Appl 81, 30545–30597 (2022). https://doi.org/10.1007/s11042-022-12683-9
- [6] W. Cao, Z. Yan, Z. He and Z. He, A Comprehensive Survey on Geometric Deep Learning, IEEE Access, vol. 8, pp. 35929-35949, 2020, doi: 10.1109/ACCESS.2020.2975067.
- [7] G. Carlsson, Topology and Data, Bull. Amer. Math. Soc. 46 (2), 255–308. (2009) doi:10.1090/s0273-0979-09-01249-x
- [8] W. Fulton, J. Harris, Representation theory. A first course, Springer-Verlag, 1991
- [9] X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:249-256 (2010)
- [10] E. Grassucci, A. Zhang, D. Comminiello, PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex Convolutions, IEEE Transactions on Neural Networks and Learning Systems, 1-13 (2022); doi: 10.1109/TNNLS.2022.3226772
- [11] R. Kycia, A. Niemczynowicz, Hypercomplex neural network in time series forecasting of stock data, Submitted, arXiv:2401.04632 [cs.NE]
- [12] A. Lavin, S. Gray, Fast Algorithms for Convolutional Neural Networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 4013-4021, doi: 10.1109/CVPR.2016.435.
- [13] A. Niemczynowicz, R. Kycia, KHNNs: hypercomplex neural networks computations via Keras using TensorFlow and PyTorch, in preparation
- [14] A. Parada-Mayorga, A. Ribeiro, Algebraic Neural Networks: Stability to Deformations, IEEE Transactions on Signal Processing, vol. 69, pp. 3351-3366, (2021); doi: 10.1109/TSP.2021.3084537.
- [15] A. Paszke, S. Gross, F. Massa, A. Lerer et al., PyTorch: an imperative style, high-performance deep learning library, Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, Article 721, 8026–8037 (2019)
- [16] R. Penrose, W. Rindler, Spinors and Space-Time, Volume 1: Two-Spinor Calculus and Relativistic Fields, Cambridge University Press, 1984
- [17] D. Ruhe, J.K. Gupta, S. De Keninck, M. Welling, J. Brandstetter, Geometric clifford algebra networks. In Proceedings of the 40th International Conference on Machine Learning (ICML’23), Vol. 202. JMLR.org, Article 1219, 29306–29337, (2023)
- [18] G. Vieira, M.E. Valle, W. Lopes, Clifford Convolutional Neural Networks for Lymphoblast Image Classification, Silva, D.W., Hitzer, E., Hildenbrand, D. (eds) Advanced Computational Applications of Geometric Algebra. ICACGA 2022. Lecture Notes in Computer Science, vol 13771. Springer, Cham. (2024); doi: 10.1007/978-3-031-34031-4_7
- [19] J. Wood, J. ShaweTaylor, Representation theory and invariant neural networks, Discrete Applied Mathematics, 69, 1-2, 33–60 (1996)