[1]\fnmAgnieszka \surNiemczynowicz \equalcontThese authors contributed equally to this work.

\equalcont

These authors contributed equally to this work.

[1]\orgdivFaculty of Mathematics and Computer Science, \orgnameUniversity of Warmia and Mazury in Olsztyn, \orgaddress\street Słoneczna 54, \cityOlsztyn, \postcode10-710, \stateWarmińsko-Mazurskie, \countryOlsztyn

2]\orgdivFaculty of Computer Science and Telecommunications, \orgnameCracow University of Technology, \orgaddress\streetWarszawska 24, \cityKraków, \postcode31-155, \stateMałopolska, \countryPoland

Fully tensorial approach to hypercomplex neural networks

[email protected]    \fnmRadosław Antoni \surKycia [email protected] * [
Abstract

Fully tensorial theory of hypercomplex neural networks is given. The key point is to observe that the algebra multiplication can be represented as a rank three tensor. This approach is attractive for neural network libraries that support effective tensorial operations.

keywords:
Hypercomplex neural network, algebra, tensor
pacs:
[

MSC Classification]15A69, 15-04

1 Introduction

The fast progress in applications of Artificial Neural Networks (NN) promotes new directions of research and generalizations. This involves advanced mathematical concepts such as group theory [19], differential geometry [5, 6], or topological methods in data analysis [7].

The core of NN implementations lies in linear algebra usage. In most popular programming libraries (TensorFlow [1], PyTorch [15]), the most popular architecture is feed forward NN, which is based on a stack of layers where the data passes between them unidirectionally. Optimized tensorial operations realize the flow of the data.

There are different algebraic extensions. One of these paths is Algebraic Neural Networks [14], where the additional endomorphism operations on data are performed. The other algebra-geometry direction is the neural networks based on Geometric Algebra/Clifford algebra [4, 17]. Recently, Parametrized Hypercomplex Neural Networks were invented [10] for convolutional layers. They can learn optimal hypercomplex algebra adjusted to data, exploring optimized Kronecker product. However, in some applications kee** hyperalgebra of even more general algebra parameters as (fixed) hyperparameters is needed. In such a way we can optimize algebra structure at the metalevel.

In this paper we discuss implementation in which we change classical real algebra computations into various hypercomplex or even more general algebras computations. This approach presented, e.g., in [3] is not new. However, there is a revival of interest of this direction due to better complexity properties in such areas as image processing [18] or time series analysis [11]. In these contributions the Open Source code [18] for specific four-dimensional hypercomplex algebras was used.

The implementation explained in this article significantly expands the ideas from [18] for arbitrary algebras, including hypercomplex ones. The algorithms described here agree with the NN presented in [18] for specific 4-dimensional hypercomplex algebras. However, implementation of [18] was obtained by constructing an additional multiplication structure from the multiplication table for the hypercomplex algebra, which is treated as additional step in setting up neural network. Our approach permit us to omit this complexity and generalize to arbitrary algebras. This is very important contribution form theoretical treatment of general algebraic approach to hypercomplex neural networks.

The main contribution of this paper is following:

  • summarize basic concepts on tensorial operations in terms of hypercomplex and more general algebras, especially, we noted that the algebra multiplication can be expressed as a third-rank tensor,

  • provide general algorithm for computations within hypercomplex dense layer,

  • provide general algorithms for 1-, 2-, and 3-dimensional hypercomplex convolutional layer computations.

2 Methods

This section provides an overview of the mathematical theory behind the operations used in implementing hypercomplex neural networks. This is a tenet of methods used in this paper. These are classical notions explained in detail in standard references, e.g., [2].

2.1 Tensors

The primary object that is used in NN implementations is a tensor. It relies on the tensor product described in the following definition

Definition 1.

The tensor product of two vector spaces V1subscript𝑉1V_{1}italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and V2subscript𝑉2V_{2}italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over the field 𝔽𝔽\mathbb{F}blackboard_F is the vector space denoted by V1V2tensor-productsubscript𝑉1subscript𝑉2V_{1}\otimes V_{2}italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊗ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and defined as a quotient space V1×V2/Lsubscript𝑉1subscript𝑉2𝐿V_{1}\times V_{2}/Litalic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / italic_L, where L𝐿Litalic_L is a subspace of V1×V2subscript𝑉1subscript𝑉2V_{1}\times V_{2}italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT that is spanned by

(v+w,x)(v,x)(w,x),(v,x+y)(v,w)(v,x),(λv,x)λ(v,x),(v,λx)λ(v,x),𝑣𝑤𝑥𝑣𝑥𝑤𝑥𝑣𝑥𝑦𝑣𝑤𝑣𝑥𝜆𝑣𝑥𝜆𝑣𝑥𝑣𝜆𝑥𝜆𝑣𝑥\begin{array}[]{c}(v+w,x)-(v,x)-(w,x),\\ (v,x+y)-(v,w)-(v,x),\\ (\lambda v,x)-\lambda(v,x),\\ (v,\lambda x)-\lambda(v,x),\end{array}start_ARRAY start_ROW start_CELL ( italic_v + italic_w , italic_x ) - ( italic_v , italic_x ) - ( italic_w , italic_x ) , end_CELL end_ROW start_ROW start_CELL ( italic_v , italic_x + italic_y ) - ( italic_v , italic_w ) - ( italic_v , italic_x ) , end_CELL end_ROW start_ROW start_CELL ( italic_λ italic_v , italic_x ) - italic_λ ( italic_v , italic_x ) , end_CELL end_ROW start_ROW start_CELL ( italic_v , italic_λ italic_x ) - italic_λ ( italic_v , italic_x ) , end_CELL end_ROW end_ARRAY (1)

where v,wV1𝑣𝑤subscript𝑉1v,w\in V_{1}italic_v , italic_w ∈ italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, x,yV2𝑥𝑦subscript𝑉2x,y\in V_{2}italic_x , italic_y ∈ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, λ𝔽𝜆𝔽\lambda\in\mathbb{F}italic_λ ∈ blackboard_F.

By induction, it can be defined for k𝑘kitalic_k vector spaces {Vi}i=1ksuperscriptsubscriptsubscript𝑉𝑖𝑖1𝑘\{V_{i}\}_{i=1}^{k}{ italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT and denoted by V1Vktensor-productsubscript𝑉1subscript𝑉𝑘V_{1}\otimes\ldots\otimes V_{k}italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊗ … ⊗ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT that can be dented by i=1kVisuperscriptsubscripttensor-product𝑖1𝑘absentsubscript𝑉𝑖\otimes_{i=1}^{k}V_{i}⊗ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

The tensor product can also be defined for duals spaces Visuperscriptsubscript𝑉𝑖V_{i}^{*}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT of a vector spaces Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for i{1,k}𝑖1𝑘i\in\{1,\ldots k\}italic_i ∈ { 1 , … italic_k }, 0<k<0𝑘0<k<\infty0 < italic_k < ∞, and we can define mixed tensor product made for vector spaces and their duals.

The tensor product of vector space is a vecotr space, so we can define a base, e.g., if the base of V𝑉Vitalic_V is {ei}i=1n1superscriptsubscriptsubscript𝑒𝑖𝑖1subscript𝑛1\{e_{i}\}_{i=1}^{n_{1}}{ italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and W𝑊Witalic_W is {fi}i=1n2superscriptsubscriptsubscript𝑓𝑖𝑖1subscript𝑛2\{f_{i}\}_{i=1}^{n_{2}}{ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, then the base of VWtensor-product𝑉𝑊V\otimes Witalic_V ⊗ italic_W is {eifj}i,j=1n1,n2superscriptsubscripttensor-productsubscript𝑒𝑖subscript𝑓𝑗𝑖𝑗1subscript𝑛1subscript𝑛2\{e_{i}\otimes f_{j}\}_{i,j=1}^{n_{1},n_{2}}{ italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊗ italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT.

The tensor space can be used to decompose any multilinear map**. It is expressed in the universal factorization theorem for tensor product. It states that for a bilinear map** F:V×WX:𝐹𝑉𝑊𝑋F:V\times W\rightarrow Xitalic_F : italic_V × italic_W → italic_X of vector spaces V,W,X𝑉𝑊𝑋V,W,Xitalic_V , italic_W , italic_X can be uniquely factorized by a new map** F¯:VWX:¯𝐹tensor-product𝑉𝑊𝑋\bar{F}:V\otimes W\rightarrow Xover¯ start_ARG italic_F end_ARG : italic_V ⊗ italic_W → italic_X according to Fig. 1.

V×W𝑉𝑊\textstyle{V\times W\ignorespaces\ignorespaces\ignorespaces\ignorespaces% \ignorespaces\ignorespaces\ignorespaces\ignorespaces}italic_V × italic_WF𝐹\scriptstyle{F}italic_Ftensor-product\scriptstyle{\otimes}VWtensor-product𝑉𝑊\textstyle{V\otimes W\ignorespaces\ignorespaces\ignorespaces\ignorespaces}italic_V ⊗ italic_WF~~𝐹\scriptstyle{\tilde{F}}over~ start_ARG italic_F end_ARGX𝑋\textstyle{X}italic_X

Figure 1: Unique factorization property of the tensor product.

Then the map F¯¯𝐹\bar{F}over¯ start_ARG italic_F end_ARG is called the tensor. Moreover, in the map F𝐹Fitalic_F, we can move X𝑋Xitalic_X to the domain of the map, i.e., we can define F^:V×W×X𝔽:^𝐹𝑉𝑊𝑋𝔽\hat{F}:V\times W\times X\rightarrow\mathbb{F}over^ start_ARG italic_F end_ARG : italic_V × italic_W × italic_X → blackboard_F instead of F𝐹Fitalic_F, where 𝔽𝔽\mathbb{F}blackboard_F is a field common to the vector spaces.

Example 1.

In the base {ei}subscript𝑒𝑖\{e_{i}\}{ italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } of V𝑉Vitalic_V and {fj}subscript𝑓𝑗\{f_{j}\}{ italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } of W𝑊Witalic_W the bilinear map** F:VW:𝐹tensor-product𝑉𝑊F:V\otimes W\rightarrow\mathbb{R}italic_F : italic_V ⊗ italic_W → blackboard_R has the form

F=Fijeifj,𝐹tensor-productsuperscript𝐹𝑖𝑗subscript𝑒𝑖subscript𝑓𝑗F=F^{ij}e_{i}\otimes f_{j},italic_F = italic_F start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊗ italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , (2)

where Fijsuperscript𝐹𝑖𝑗F^{ij}\mathbb{R}italic_F start_POSTSUPERSCRIPT italic_i italic_j end_POSTSUPERSCRIPT blackboard_R for all i,j𝑖𝑗i,jitalic_i , italic_j, are the coefficients of a numerical matrix that is a representation of the multilinear map** (tensor) F𝐹Fitalic_F in the fixed base of V𝑉Vitalic_V and W𝑊Witalic_W. The matrix collects the components of the tensor in a fixed tensor base.

The matrix [Fij]delimited-[]subscript𝐹𝑖𝑗[F_{ij}][ italic_F start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ] is implemented in TensorFlow and PyTorch library as a tensor class. The critical difference is that the mathematical tensors have specific properties of transformations under the change of basis of underlying vector spaces. The libraries implement the tensors as a multidimensional matrix of numbers. Moreover, they do not keep the upper (contravariant) or lower (covariant) position of indices.

The example can be extended to the tensor product of multiple vector spaces and their duals.

Example 2.

The linear map** A:VW:𝐴𝑉𝑊A:V\rightarrow Witalic_A : italic_V → italic_W can be written as a map** A¯VW¯𝐴tensor-productsuperscript𝑉𝑊\bar{A}\in V^{*}\otimes W\rightarrow\mathbb{R}over¯ start_ARG italic_A end_ARG ∈ italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ⊗ italic_W → blackboard_R that can be written in the base {ei}superscript𝑒𝑖\{e^{i}\}{ italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } of Vsuperscript𝑉V^{*}italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and {fi}subscript𝑓𝑖\{f_{i}\}{ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } of W𝑊Witalic_W as111Here we use Einstein summation convention: repeated bottom and top index indicate the summation over the whole range. A=Aijeifj𝐴tensor-productsuperscriptsubscript𝐴𝑖𝑗superscript𝑒𝑖subscript𝑓𝑗A=A_{i}^{j}e^{i}\otimes f_{j}italic_A = italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⊗ italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. The vector space of linear operators is denoted as L(V,W)𝐿𝑉𝑊L(V,W)italic_L ( italic_V , italic_W ).

For tensors we also often use abstract index notation where we provide only components of the tensor, e.g., Aijsubscript𝐴𝑖𝑗A_{ij}italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, understanding them not as fixed base numerical values but as a full tensor Aijeiejtensor-productsubscript𝐴𝑖𝑗superscript𝑒𝑖superscript𝑒𝑗A_{ij}e^{i}\otimes e^{j}italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⊗ italic_e start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT, see [16],

Since the tensor product is a functor in the category of linear spaces [2], therefore, for a linear map** A:VW:𝐴𝑉𝑊A:V\rightarrow Witalic_A : italic_V → italic_W, we can define the extension of the map** for a tensor product space i=1nA:i=1nVi=1nW\otimes_{i=1}^{n}A:\otimes_{i=1}^{n}V\rightarrow\otimes_{i=1}^{n}W⊗ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_A : ⊗ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_V → ⊗ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_W by acting on product as A(v1vn)=Av1Avn\otimes A(v_{1}\otimes\ldots\otimes v_{n})=Av_{1}\otimes\ldots\otimes Av_{n}⊗ italic_A ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊗ … ⊗ italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = italic_A italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊗ … ⊗ italic_A italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and extending by linearity for all combinations.

This is similar behaviour as for the Cartesian product of vector spaces. The Cartesian product is also a functor, and therefore, all the above operations apply in this case.

We can define a few linear algebra operations realized in tensor libraries.

  • Broadcasting: it is defined on for a linear operator A:VW:𝐴𝑉𝑊A:V\rightarrow Witalic_A : italic_V → italic_W to be an multilinear extension

    b1n:L(V,W)L(×i=1nV,×i=1nW),b_{1}^{n}:L(V,W)\rightarrow L(\times_{i=1}^{n}V,\times_{i=1}^{n}W),italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT : italic_L ( italic_V , italic_W ) → italic_L ( × start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_V , × start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_W ) , (3)

    that is b1n(A)(v1,,vn)=(Av1,,Avn)superscriptsubscript𝑏1𝑛𝐴subscript𝑣1subscript𝑣𝑛𝐴subscript𝑣1𝐴subscript𝑣𝑛b_{1}^{n}(A)(v_{1},\ldots,v_{n})=(Av_{1},\ldots,Av_{n})italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_A ) ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ( italic_A italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_A italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). Similar broadcasting is realized for the tensor product. Both operations relies on functoriality of Cartesian product and tensor product.

  • Transposition/Permutation: The transposition of two components relies on the following fact: there is the unique map** τ:VWWV:𝜏tensor-product𝑉𝑊tensor-product𝑊𝑉\tau:V\otimes W\rightarrow W\otimes Vitalic_τ : italic_V ⊗ italic_W → italic_W ⊗ italic_V that simply reverses the order of factors τ(vw)=wv𝜏tensor-product𝑣𝑤tensor-product𝑤𝑣\tau(v\otimes w)=w\otimes vitalic_τ ( italic_v ⊗ italic_w ) = italic_w ⊗ italic_v. We can extend it to arbitrary permutation of n𝑛nitalic_n numbers, p:{1,,n}{1,,n}:𝑝1𝑛1𝑛p:\{1,\ldots,n\}\rightarrow\{1,\ldots,n\}italic_p : { 1 , … , italic_n } → { 1 , … , italic_n }, we have τp(v1vn)=vp(1)vp(n)subscript𝜏𝑝tensor-productsubscript𝑣1subscript𝑣𝑛tensor-productsubscript𝑣𝑝1subscript𝑣𝑝𝑛\tau_{p}(v_{1}\otimes\ldots\otimes v_{n})=v_{p(1)}\otimes\ldots\otimes v_{p(n)}italic_τ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊗ … ⊗ italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = italic_v start_POSTSUBSCRIPT italic_p ( 1 ) end_POSTSUBSCRIPT ⊗ … ⊗ italic_v start_POSTSUBSCRIPT italic_p ( italic_n ) end_POSTSUBSCRIPT. We can define a similar operation for the Cartesian product.

    In the abstract index notation, we have τpAi1ik=Aip(1)ip(k)subscript𝜏𝑝subscript𝐴subscript𝑖1subscript𝑖𝑘subscript𝐴subscript𝑖𝑝1subscript𝑖𝑝𝑘\tau_{p}A_{i_{1}\ldots i_{k}}=A_{i_{p(1)}\ldots i_{p(k)}}italic_τ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_A start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_p ( 1 ) end_POSTSUBSCRIPT … italic_i start_POSTSUBSCRIPT italic_p ( italic_k ) end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

  • Resha**: For a pair of indices i,j𝑖𝑗i,jitalic_i , italic_j of the range 0R(i)<0𝑅𝑖0\leq R(i)<\infty0 ≤ italic_R ( italic_i ) < ∞ and 0R(j)<0𝑅𝑗0\leq R(j)<\infty0 ≤ italic_R ( italic_j ) < ∞ we can define a reshape operation, which is the new sole index Rs(i,j)𝑅𝑠𝑖𝑗Rs(i,j)italic_R italic_s ( italic_i , italic_j ) that value depends on the values of i,j𝑖𝑗i,jitalic_i , italic_j given by the function: Rs(i,j)=iR(j)+j𝑅𝑠𝑖𝑗𝑖𝑅𝑗𝑗Rs(i,j)=iR(j)+jitalic_R italic_s ( italic_i , italic_j ) = italic_i italic_R ( italic_j ) + italic_j. We can extend the operation for the pair (p,p+1)𝑝𝑝1(p,p+1)( italic_p , italic_p + 1 ) of neighbour indices of a tensor by RspAi1ipip+1ik=Ai1Rs(ip,ip+1)ik𝑅subscript𝑠𝑝subscript𝐴subscript𝑖1subscript𝑖𝑝subscript𝑖𝑝1subscript𝑖𝑘subscript𝐴subscript𝑖1𝑅𝑠subscript𝑖𝑝subscript𝑖𝑝1subscript𝑖𝑘Rs_{p}A_{i_{1}\ldots i_{p}i_{p+1}\ldots i_{k}}=A_{i_{1}\ldots Rs(i_{p},i_{p+1}% )\ldots i_{k}}italic_R italic_s start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_i start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT … italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_A start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_R italic_s ( italic_i start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ) … italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT, where p+1k𝑝1𝑘p+1\leq kitalic_p + 1 ≤ italic_k. This operation changes only the way of indexing; however, it is useful in applications. In the abstract index notation we can write RspAi1ipip+1ik=Ai1ip1Rs(ip,ip+1)ip+2ik𝑅subscript𝑠𝑝subscript𝐴subscript𝑖1subscript𝑖𝑝subscript𝑖𝑝1subscript𝑖𝑘subscript𝐴subscript𝑖1subscript𝑖𝑝1𝑅𝑠subscript𝑖𝑝subscript𝑖𝑝1subscript𝑖𝑝2subscript𝑖𝑘Rs_{p}A_{i_{1}\ldots i_{p}i_{p+1}\ldots i_{k}}=A_{i_{1}\ldots i_{p-1}Rs(i_{p},% i_{p+1})i_{p+2}\ldots i_{k}}italic_R italic_s start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_i start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT … italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_A start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_i start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT italic_R italic_s ( italic_i start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT ) italic_i start_POSTSUBSCRIPT italic_p + 2 end_POSTSUBSCRIPT … italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

  • Contraction: For a two tensors A𝐴Aitalic_A and B𝐵Bitalic_B, and a fixed base {ei}subscript𝑒𝑖\{e_{i}\}{ italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } of V𝑉Vitalic_V, the contraction of indices p𝑝pitalic_p (related to V𝑉Vitalic_V) and q𝑞qitalic_q (related to V)V^{*})italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is (note implicit sum): Cp,q(A,B)=A(eip)B(eiq)subscript𝐶𝑝𝑞𝐴𝐵tensor-product𝐴subscriptsubscript𝑒𝑖𝑝𝐵subscriptsuperscript𝑒𝑖𝑞C_{p,q}(A,B)=A(\ldots\underbrace{e_{i}}_{p}\ldots)\otimes B(\ldots\underbrace{% e^{i}}_{q}\ldots)italic_C start_POSTSUBSCRIPT italic_p , italic_q end_POSTSUBSCRIPT ( italic_A , italic_B ) = italic_A ( … under⏟ start_ARG italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT … ) ⊗ italic_B ( … under⏟ start_ARG italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT … ) where AVp𝐴tensor-productsubscript𝑉𝑝A\in\ldots\otimes\underbrace{V}_{p}\otimes\ldotsitalic_A ∈ … ⊗ under⏟ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ⊗ … and BVq𝐵tensor-productsubscriptsuperscript𝑉𝑞B\in\ldots\otimes\underbrace{V^{*}}_{q}\otimes\ldotsitalic_B ∈ … ⊗ under⏟ start_ARG italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ⊗ ….

    The contraction can also be defined for a single tensor in the same way, e.g., in abstract index notation for a single tensor T𝑇Titalic_T, ones get Cp,qTipiq=Tipiqsubscript𝐶𝑝𝑞superscriptsubscript𝑇subscript𝑖𝑝subscript𝑖𝑞superscriptsubscript𝑇subscript𝑖𝑝superscript𝑖𝑞C_{p,q}T_{\ldots i_{p}\ldots}^{\ldots i_{q}\ldots}=T_{\ldots\underbrace{i}_{p}% \ldots}^{\ldots\overbrace{i}^{q}\dots}italic_C start_POSTSUBSCRIPT italic_p , italic_q end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT … italic_i start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT … end_POSTSUBSCRIPT start_POSTSUPERSCRIPT … italic_i start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT … end_POSTSUPERSCRIPT = italic_T start_POSTSUBSCRIPT … under⏟ start_ARG italic_i end_ARG start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT … end_POSTSUBSCRIPT start_POSTSUPERSCRIPT … over⏞ start_ARG italic_i end_ARG start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT … end_POSTSUPERSCRIPT, where implicit summation was applied.

  • Concatenation: joins the tensors {Ai1il(i)}i=1nsuperscriptsubscriptsubscriptsuperscript𝐴𝑖subscript𝑖1subscript𝑖𝑙𝑖1𝑛\{A^{(i)}_{i_{1}\ldots i_{l}}\}_{i=1}^{n}{ italic_A start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_i start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT of the same shape along given dimension j𝑗jitalic_j, i.e., Kj({A(i)}i=1n)=Ai1ij1kjijilsubscript𝐾𝑗superscriptsubscriptsuperscript𝐴𝑖𝑖1𝑛subscript𝐴subscript𝑖1subscript𝑖𝑗1subscript𝑘𝑗subscript𝑖𝑗subscript𝑖𝑙K_{j}(\{A^{(i)}\}_{i=1}^{n})=A_{i_{1}\ldots i_{j-1}k_{j}i_{j}\ldots i_{l}}italic_K start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( { italic_A start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = italic_A start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_i start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT … italic_i start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

2.2 (Hypercomplex) Algebras

In this part we introduce mathematical concepts related to hypercomplex and general algebras, as in the following definition.

Definition 2.

The algebra over a field 𝔽𝔽\mathbb{F}blackboard_F is a vector space V𝑉Vitalic_V equipped with a product - a binary operation __:V×VV:__𝑉𝑉𝑉\_\cdot\_:V\times V\rightarrow V_ ⋅ _ : italic_V × italic_V → italic_V with the following properties:

  • (x+y)z=xz+yz𝑥𝑦𝑧𝑥𝑧𝑦𝑧(x+y)\cdot z=x\cdot z+y\cdot z( italic_x + italic_y ) ⋅ italic_z = italic_x ⋅ italic_z + italic_y ⋅ italic_z,

  • z(x+y)=zx+zy𝑧𝑥𝑦𝑧𝑥𝑧𝑦z\cdot(x+y)=z\cdot x+z\cdot yitalic_z ⋅ ( italic_x + italic_y ) = italic_z ⋅ italic_x + italic_z ⋅ italic_y,

  • (αx)(βy)=αβ(xy)𝛼𝑥𝛽𝑦𝛼𝛽𝑥𝑦(\alpha x)\cdot(\beta y)=\alpha\beta(x\cdot y)( italic_α italic_x ) ⋅ ( italic_β italic_y ) = italic_α italic_β ( italic_x ⋅ italic_y ),

for x,y,zV𝑥𝑦𝑧𝑉x,y,z\in Vitalic_x , italic_y , italic_z ∈ italic_V and α,β𝔽𝛼𝛽𝔽\alpha,\beta\in\mathbb{F}italic_α , italic_β ∈ blackboard_F.

Moreover, the algebra is commutative if xy=yx𝑥𝑦𝑦𝑥x\cdot y=y\cdot xitalic_x ⋅ italic_y = italic_y ⋅ italic_x for x,yV𝑥𝑦𝑉x,y\in Vitalic_x , italic_y ∈ italic_V.

Example 3.

For real numbers \mathbb{R}blackboard_R we have V={e0}𝑉subscript𝑒0V=\{e_{0}\}italic_V = { italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT }, 𝔽=𝔽\mathbb{F}=\mathbb{R}blackboard_F = blackboard_R with e0e0=1subscript𝑒0subscript𝑒01e_{0}\cdot e_{0}=1italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋅ italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.

Complex numbers \mathbb{C}blackboard_C can be obtained from commutative algebra with V={e0,e1}𝑉subscript𝑒0subscript𝑒1V=\{e_{0},e_{1}\}italic_V = { italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT }, F=𝐹F=\mathbb{R}italic_F = blackboard_R and e0e0=e0subscript𝑒0subscript𝑒0subscript𝑒0e_{0}\cdot e_{0}=e_{0}italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋅ italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, e0e1=e1subscript𝑒0subscript𝑒1subscript𝑒1e_{0}\cdot e_{1}=e_{1}italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋅ italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and e1e1=e0subscript𝑒1subscript𝑒1subscript𝑒0e_{1}\cdot e_{1}=-e_{0}italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = - italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

By convention we will always assume that the neutral element of the algebra mutliplication is e0subscript𝑒0e_{0}italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

Efficient use of algebras in computations based on tensors (TensorFlow, PyTorch) relies on converting the product within the algebra into a tensor operation.

Treating algebra as a vector space with additional structure of vector multiplication222It can be define using a functor that cast category of algebras into the category of linear spaces. we have the following definition, which is essential to the rest of the paper.

Definition 3.

For an algebra V𝑉Vitalic_V over F𝐹Fitalic_F the product can be defined as a tensor AVVV𝐴tensor-productsuperscript𝑉superscript𝑉𝑉A\in V^{*}\otimes V^{*}\otimes Vitalic_A ∈ italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ⊗ italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ⊗ italic_V. Selecting the base of V=span{ei}i=01𝑉𝑠𝑝𝑎𝑛superscriptsubscriptsubscript𝑒𝑖𝑖01V=span\{e_{i}\}_{i=0}^{1}italic_V = italic_s italic_p italic_a italic_n { italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and the dual base V={ei}i=1nsuperscript𝑉superscriptsubscriptsuperscript𝑒𝑖𝑖1𝑛V^{*}=\{e^{i}\}_{i=1}^{n}italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = { italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with ei(ej)=δjisuperscript𝑒𝑖subscript𝑒𝑗superscriptsubscript𝛿𝑗𝑖e^{i}(e_{j})=\delta_{j}^{i}italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = italic_δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, the product has the form

A=Aijkeiejek.𝐴tensor-productsuperscriptsubscript𝐴𝑖𝑗𝑘superscript𝑒𝑖superscript𝑒𝑗subscript𝑒𝑘A=A_{ij}^{k}e^{i}\otimes e^{j}\otimes e_{k}.italic_A = italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⊗ italic_e start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ⊗ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT . (4)

Then the multiplication table entry is presented in (5).

ejeiAijkekmissing-subexpressionsubscript𝑒𝑗missing-subexpressionmissing-subexpressionsubscript𝑒𝑖superscriptsubscript𝐴𝑖𝑗𝑘subscript𝑒𝑘\begin{array}[]{c|c}&e_{j}\\ \hline\cr e_{i}&A_{ij}^{k}e_{k}\end{array}start_ARRAY start_ROW start_CELL end_CELL start_CELL italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY (5)

The tensor coefficients (abstract index notation), Aijksuperscriptsubscript𝐴𝑖𝑗𝑘A_{ij}^{k}italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, play the same role as structure constants for a group [8]. These coefficients in a fixed base can be represented as a multidimensional matrix (called tensors in TensorFlow and PyTorch). When the algebra is commutative, then A(x,y)=A(y,x)𝐴𝑥𝑦𝐴𝑦𝑥A(x,y)=A(y,x)italic_A ( italic_x , italic_y ) = italic_A ( italic_y , italic_x ) or Aijk=Ajiksuperscriptsubscript𝐴𝑖𝑗𝑘superscriptsubscript𝐴𝑗𝑖𝑘A_{ij}^{k}=A_{ji}^{k}italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = italic_A start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT.

3 Results

In this section, we provide mathematical details of the implementation of hypercomplex dense and convolutional neural networks.

We do not distinguish co- and contravariant indices in the description and write them at the bottom level. Moreover, the tensor is treated as a multidimensional array, with indices starting from 0. This is a standard convention in the tensor libraries such as TensorFlow and PyTorch.

We assume that the algebra multiplication structure constant is fixed and stored in the tensor (in abstract index notation) A=Aiajaka𝐴subscript𝐴subscript𝑖𝑎subscript𝑗𝑎subscript𝑘𝑎A=A_{i_{a}j_{a}k_{a}}italic_A = italic_A start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT, see Subsection 2.2.

3.1 Hypercomplex Dense layer

We start with a description of the dense layer. It is a general-purpose layer that operates on the data with additional dimensionality, which is a multiple algebra dimensions. We assume that the input data are of dimension b×al×inn𝑏subscript𝑎𝑙𝑖𝑛𝑛b\times\underbrace{al\times in}_{n}italic_b × under⏟ start_ARG italic_a italic_l × italic_i italic_n end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, where b𝑏bitalic_b is the batch size, al𝑎𝑙alitalic_a italic_l - the algebra size, and in𝑖𝑛initalic_i italic_n - the positive integer multiplier. The last two numbers determine the input data size. The input tensor X=XibRs(ial,iin)𝑋subscript𝑋subscript𝑖𝑏𝑅𝑠subscript𝑖𝑎𝑙subscript𝑖𝑖𝑛X=X_{i_{b}Rs(i_{al},i_{in})}italic_X = italic_X start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT italic_R italic_s ( italic_i start_POSTSUBSCRIPT italic_a italic_l end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT, where ibsubscript𝑖𝑏i_{b}italic_i start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT - batch index, ialsubscript𝑖𝑎𝑙i_{al}italic_i start_POSTSUBSCRIPT italic_a italic_l end_POSTSUBSCRIPT is the dimension of algebra, iinsubscript𝑖𝑖𝑛i_{in}italic_i start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT is the multiplicity index of algebra dimension. Moreover, we use learning parameters (weights/kernel) K=Kialiiniu𝐾subscript𝐾subscript𝑖𝑎𝑙subscript𝑖𝑖𝑛subscript𝑖𝑢K=K_{i_{al}i_{in}i_{u}}italic_K = italic_K start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_a italic_l end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT, where iusubscript𝑖𝑢i_{u}italic_i start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT is the index over units/neurons. The bias b=bRs(ial,iu)𝑏subscript𝑏𝑅𝑠subscript𝑖𝑎𝑙subscript𝑖𝑢b=b_{Rs(i_{al},i_{u})}italic_b = italic_b start_POSTSUBSCRIPT italic_R italic_s ( italic_i start_POSTSUBSCRIPT italic_a italic_l end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT is used if needed. Kernel and bias are usually initialized with numbers taken from specific distributions [9].

We now provide the algorithm of hypercomplex dense network in Algorithm 1. We offer both tensorial and abstract index notations (AIN). We need two flags bias - if bias is included and activation - if activation function σ𝜎\sigmaitalic_σ is used.

Algorithm 1 Hypercomplex dense NN
X𝑋Xitalic_X, A𝐴Aitalic_A, σ𝜎\sigmaitalic_σ, bias, activation
K𝐾Kitalic_K, b𝑏bitalic_b - initialized
WC1,0(A,K)𝑊subscript𝐶10𝐴𝐾W\leftarrow C_{1,0}(A,K)italic_W ← italic_C start_POSTSUBSCRIPT 1 , 0 end_POSTSUBSCRIPT ( italic_A , italic_K ) [AIN: WiakaiiniujAiajkaKjiiniusubscript𝑊subscript𝑖𝑎subscript𝑘𝑎subscript𝑖𝑖𝑛subscript𝑖𝑢subscript𝑗subscript𝐴subscript𝑖𝑎𝑗subscript𝑘𝑎subscript𝐾𝑗subscript𝑖𝑖𝑛subscript𝑖𝑢W_{i_{a}k_{a}i_{in}i_{u}}\leftarrow\sum_{j}A_{i_{a}jk_{a}}K_{ji_{in}i_{u}}italic_W start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_j italic_k start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_j italic_i start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT]
Wτp=(00,12,21,33)(W)𝑊subscript𝜏𝑝formulae-sequence00formulae-sequence12formulae-sequence2133𝑊W\leftarrow\tau_{p=(0\rightarrow 0,1\rightarrow 2,2\rightarrow 1,3\rightarrow 3% )}(W)italic_W ← italic_τ start_POSTSUBSCRIPT italic_p = ( 0 → 0 , 1 → 2 , 2 → 1 , 3 → 3 ) end_POSTSUBSCRIPT ( italic_W ) [AIN: WiaiinkaiuWp(iakaiiniu)subscript𝑊subscript𝑖𝑎subscript𝑖𝑖𝑛subscript𝑘𝑎subscript𝑖𝑢subscript𝑊𝑝subscript𝑖𝑎subscript𝑘𝑎subscript𝑖𝑖𝑛subscript𝑖𝑢W_{i_{a}i_{in}k_{a}i_{u}}\leftarrow W_{p(i_{a}k_{a}i_{in}i_{u})}italic_W start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← italic_W start_POSTSUBSCRIPT italic_p ( italic_i start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT]
WRs1Rs0(W)𝑊𝑅subscript𝑠1𝑅subscript𝑠0𝑊W\leftarrow Rs_{1}\circ Rs_{0}(W)italic_W ← italic_R italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∘ italic_R italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_W ) [AIN: Wi1i2WRs(ia,iin)Rs(ka,iu)subscript𝑊subscript𝑖1subscript𝑖2subscript𝑊𝑅𝑠subscript𝑖𝑎subscript𝑖𝑖𝑛𝑅𝑠subscript𝑘𝑎subscript𝑖𝑢W_{i_{1}i_{2}}\leftarrow W_{Rs(i_{a},i_{in})Rs(k_{a},i_{u})}italic_W start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← italic_W start_POSTSUBSCRIPT italic_R italic_s ( italic_i start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT ) italic_R italic_s ( italic_k start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT]
OutputC1,0(X,W)𝑂𝑢𝑡𝑝𝑢𝑡subscript𝐶10𝑋𝑊Output\leftarrow C_{1,0}(X,W)italic_O italic_u italic_t italic_p italic_u italic_t ← italic_C start_POSTSUBSCRIPT 1 , 0 end_POSTSUBSCRIPT ( italic_X , italic_W ) [AIN: Outputsibi2kXibk=Rs(ial,iin)Wki2𝑂𝑢𝑡𝑝𝑢𝑡subscript𝑠subscript𝑖𝑏subscript𝑖2subscript𝑘subscript𝑋subscript𝑖𝑏𝑘𝑅𝑠subscript𝑖𝑎𝑙subscript𝑖𝑖𝑛subscript𝑊𝑘subscript𝑖2Outputs_{i_{b}i_{2}}\leftarrow\sum_{k}X_{i_{b}k=Rs(i_{al},i_{in})}W_{ki_{2}}italic_O italic_u italic_t italic_p italic_u italic_t italic_s start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT italic_k = italic_R italic_s ( italic_i start_POSTSUBSCRIPT italic_a italic_l end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_k italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT]
if bias is True then
     OutputOutput+b𝑂𝑢𝑡𝑝𝑢𝑡𝑂𝑢𝑡𝑝𝑢𝑡𝑏Output\leftarrow Output+bitalic_O italic_u italic_t italic_p italic_u italic_t ← italic_O italic_u italic_t italic_p italic_u italic_t + italic_b
end if
if activation is True then
     Outputb(σ)(Output)𝑂𝑢𝑡𝑝𝑢𝑡𝑏𝜎𝑂𝑢𝑡𝑝𝑢𝑡Output\leftarrow b(\sigma)(Output)italic_O italic_u italic_t italic_p italic_u italic_t ← italic_b ( italic_σ ) ( italic_O italic_u italic_t italic_p italic_u italic_t )
end if

The Keras with TensorFlow implementation and PyTorch implementations are given in [13].

3.2 Hypercomplex Convolutional layer

In this part the hypercomplex convolutional neural network will be described. We present general k𝑘kitalic_k-dimensional (k=1,2,3𝑘123k=1,2,3italic_k = 1 , 2 , 3) layers. They differ by the shape of the input data and kernel size.

The additional ’image channels’ of the data can be packed as an element of algebra. For instance, the two-dimensional image can be decomposed as a matrix of four-dimensional algebra elements, i.e., color channels plus alpha, making a single pixel an element of four-dimensional algebra.

The general idea of NN action, as in [18], is to use algebra constants A𝐴Aitalic_A tensor to separate the coefficients of the algebra base component and then apply the traditional convolution for each algebra component.

The dimension of the input data X=Xibi1ikRs(ial,iin)𝑋subscript𝑋subscript𝑖𝑏subscript𝑖1subscript𝑖𝑘𝑅𝑠subscript𝑖𝑎𝑙subscript𝑖𝑖𝑛X=X_{i_{b}i_{1}\ldots i_{k}Rs(i_{al},i_{in})}italic_X = italic_X start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_R italic_s ( italic_i start_POSTSUBSCRIPT italic_a italic_l end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT of dimension b×n1××nk×al×inn𝑏subscript𝑛1subscript𝑛𝑘subscript𝑎𝑙𝑖𝑛𝑛b\times n_{1}\times\ldots\times n_{k}\times\underbrace{al\times in}_{n}italic_b × italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × … × italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT × under⏟ start_ARG italic_a italic_l × italic_i italic_n end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, where b𝑏bitalic_b is the batch size, nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i{1,,k}𝑖1𝑘i\in\{1,\ldots,k\}italic_i ∈ { 1 , … , italic_k } - the size of data sample in each dimension, al𝑎𝑙alitalic_a italic_l - the algebra dimension, and in𝑖𝑛initalic_i italic_n - the positive multiplier333One can see that the size of the data sample in each dimension (axes) stands right after the batch size, which is used in TensorFlow convention. For PyTorch, this multiindex is placed at the end. We will not provide an implementation for the PyTorch convention since it differs by a few permutations.. The kernel size is al×L1×Lk×in×F𝑎𝑙subscript𝐿1subscript𝐿𝑘𝑖𝑛𝐹al\times L_{1}\times\ldots L_{k}\times in\times Fitalic_a italic_l × italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × … italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT × italic_i italic_n × italic_F, where: Lisubscript𝐿𝑖L_{i}italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i{1,,k}𝑖1𝑘i\in\{1,\ldots,k\}italic_i ∈ { 1 , … , italic_k } is the kernel dimension in each dimension, and F𝐹Fitalic_F - the number of filters. We therefore have the kernel K=Kialil1ilkiinif𝐾subscript𝐾subscript𝑖𝑎𝑙subscript𝑖𝑙1subscript𝑖𝑙𝑘subscript𝑖𝑖𝑛subscript𝑖𝑓K=K_{i_{al}i_{l1}\ldots i_{lk}i_{in}i_{f}}italic_K = italic_K start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_a italic_l end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_l 1 end_POSTSUBSCRIPT … italic_i start_POSTSUBSCRIPT italic_l italic_k end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT. The flag bias𝑏𝑖𝑎𝑠biasitalic_b italic_i italic_a italic_s indicates the bias b𝑏bitalic_b usage. If it is used, it has dimension al×F𝑎𝑙𝐹al\times Fitalic_a italic_l × italic_F. Kernel and bias can be initialized by arbitrary distributions [9]. We use standard k𝑘kitalic_k-dimensional convolution convkD(X,K,strides,padding)𝑐𝑜𝑛𝑣𝑘𝐷𝑋𝐾𝑠𝑡𝑟𝑖𝑑𝑒𝑠𝑝𝑎𝑑𝑑𝑖𝑛𝑔convkD(X,K,strides,padding)italic_c italic_o italic_n italic_v italic_k italic_D ( italic_X , italic_K , italic_s italic_t italic_r italic_i italic_d italic_e italic_s , italic_p italic_a italic_d italic_d italic_i italic_n italic_g ) for convoluting algebra components, there are standard optimized convolution operations [12]. The algorithm is presented in the Algorithm 2.

Algorithm 2 Hypercomplex k𝑘kitalic_k-dimensional convolutional NN
X𝑋Xitalic_X, A𝐴Aitalic_A, σ𝜎\sigmaitalic_σ, bias, activation
K𝐾Kitalic_K, b𝑏bitalic_b - initialized
WC1,0(A,K)𝑊subscript𝐶10𝐴𝐾W\leftarrow C_{1,0}(A,K)italic_W ← italic_C start_POSTSUBSCRIPT 1 , 0 end_POSTSUBSCRIPT ( italic_A , italic_K ) [AIN: Wiakail1llkiinifjAiajkaKjil1ilkiinifsubscript𝑊subscript𝑖𝑎subscript𝑘𝑎subscript𝑖𝑙1subscript𝑙𝑙𝑘subscript𝑖𝑖𝑛subscript𝑖𝑓subscript𝑗subscript𝐴subscript𝑖𝑎𝑗subscript𝑘𝑎subscript𝐾𝑗subscript𝑖𝑙1subscript𝑖𝑙𝑘subscript𝑖𝑖𝑛subscript𝑖𝑓W_{i_{a}k_{a}i_{l1}\ldots l_{lk}i_{in}i_{f}}\leftarrow\sum_{j}A_{i_{a}jk_{a}}K% _{ji_{l1}\ldots i_{lk}i_{in}i_{f}}italic_W start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_l 1 end_POSTSUBSCRIPT … italic_l start_POSTSUBSCRIPT italic_l italic_k end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_j italic_k start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_j italic_i start_POSTSUBSCRIPT italic_l 1 end_POSTSUBSCRIPT … italic_i start_POSTSUBSCRIPT italic_l italic_k end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT]
Wτp=(0k+2)(W)𝑊subscript𝜏𝑝0𝑘2𝑊W\leftarrow\tau_{p=(0\rightarrow k+2)}(W)italic_W ← italic_τ start_POSTSUBSCRIPT italic_p = ( 0 → italic_k + 2 ) end_POSTSUBSCRIPT ( italic_W ) [AIN: Wkail1llkiaiinifWp(iakail1llkiinif)subscript𝑊subscript𝑘𝑎subscript𝑖𝑙1subscript𝑙𝑙𝑘subscript𝑖𝑎subscript𝑖𝑖𝑛subscript𝑖𝑓subscript𝑊𝑝subscript𝑖𝑎subscript𝑘𝑎subscript𝑖𝑙1subscript𝑙𝑙𝑘subscript𝑖𝑖𝑛subscript𝑖𝑓W_{k_{a}i_{l1}\ldots l_{lk}i_{a}i_{in}i_{f}}\leftarrow W_{p(i_{a}k_{a}i_{l1}% \ldots l_{lk}i_{in}i_{f})}italic_W start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_l 1 end_POSTSUBSCRIPT … italic_l start_POSTSUBSCRIPT italic_l italic_k end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← italic_W start_POSTSUBSCRIPT italic_p ( italic_i start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_l 1 end_POSTSUBSCRIPT … italic_l start_POSTSUBSCRIPT italic_l italic_k end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT]
WRsk+2(W)𝑊𝑅subscript𝑠𝑘2𝑊W\leftarrow Rs_{k+2}(W)italic_W ← italic_R italic_s start_POSTSUBSCRIPT italic_k + 2 end_POSTSUBSCRIPT ( italic_W ) [AIN: Wkail1llkjifWkail1llkRs(ia,iin)=jifsubscript𝑊subscript𝑘𝑎subscript𝑖𝑙1subscript𝑙𝑙𝑘𝑗subscript𝑖𝑓subscript𝑊subscript𝑘𝑎subscript𝑖𝑙1subscript𝑙𝑙𝑘subscript𝑅𝑠subscript𝑖𝑎subscript𝑖𝑖𝑛absent𝑗subscript𝑖𝑓W_{k_{a}i_{l1}\ldots l_{lk}ji_{f}}\leftarrow W_{k_{a}i_{l1}\ldots l_{lk}% \underbrace{Rs(i_{a},i_{in})}_{=j}i_{f}}italic_W start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_l 1 end_POSTSUBSCRIPT … italic_l start_POSTSUBSCRIPT italic_l italic_k end_POSTSUBSCRIPT italic_j italic_i start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← italic_W start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_l 1 end_POSTSUBSCRIPT … italic_l start_POSTSUBSCRIPT italic_l italic_k end_POSTSUBSCRIPT under⏟ start_ARG italic_R italic_s ( italic_i start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_i start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT = italic_j end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT]
temp = []
for i{0,,al}𝑖0𝑎𝑙i\in\{0,\ldots,al\}italic_i ∈ { 0 , … , italic_a italic_l } do temp = temp + [convkD(X,Wi)𝑐𝑜𝑛𝑣𝑘𝐷𝑋subscript𝑊𝑖convkD(X,W_{i\ldots})italic_c italic_o italic_n italic_v italic_k italic_D ( italic_X , italic_W start_POSTSUBSCRIPT italic_i … end_POSTSUBSCRIPT )]
end for
Outputibil1ilkhKk+1(temp)𝑂𝑢𝑡𝑝𝑢subscript𝑡subscript𝑖𝑏subscript𝑖𝑙1subscript𝑖𝑙𝑘subscript𝐾𝑘1𝑡𝑒𝑚𝑝Output_{i_{b}i_{l1}\ldots i_{lk}h}\leftarrow K_{k+1}(temp)italic_O italic_u italic_t italic_p italic_u italic_t start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_l 1 end_POSTSUBSCRIPT … italic_i start_POSTSUBSCRIPT italic_l italic_k end_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ← italic_K start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_t italic_e italic_m italic_p )
if bias is True then
     OutputOutput+b𝑂𝑢𝑡𝑝𝑢𝑡𝑂𝑢𝑡𝑝𝑢𝑡𝑏Output\leftarrow Output+bitalic_O italic_u italic_t italic_p italic_u italic_t ← italic_O italic_u italic_t italic_p italic_u italic_t + italic_b
end if
if activation is True then
     Outputb(σ)(Output)𝑂𝑢𝑡𝑝𝑢𝑡𝑏𝜎𝑂𝑢𝑡𝑝𝑢𝑡Output\leftarrow b(\sigma)(Output)italic_O italic_u italic_t italic_p italic_u italic_t ← italic_b ( italic_σ ) ( italic_O italic_u italic_t italic_p italic_u italic_t )
end if

4 Conclusion

In this paper, we introduced the mathematical details of the implementation of dense and convolutional NN based on hypercomplex and general algebras. The critical point in this presentation is to associate algebra multiplication with rank-three tensor. Thanks to this observation, all the NN processing steps can be represented as tensorial operations.

The fully tensorial operations applied to algebra operations simplifies neural networks operations, and allows to support for fast tensorial operations in modern packages as TensorFlow and PyTorch. Implementation of the above generalized hypercomplex NN is described elsewhere [13].

\bmhead

Supplementary information

\bmhead

Acknowledgements This paper has been supported by the Polish National Agency for Academic Exchange Strategic Partnership Programme under Grant No. BPI/PST/2021/1/00031 (nawa.gov.pl).

Declarations

  • Funding - This paper has been supported by the Polish National Agency for Academic Exchange Strategic Partnership Programme under Grant No. BPI/PST/2021/1/00031 (nawa.gov.pl).

  • Conflict of interest/Competing interests - Not applicable

  • Ethics approval and consent to participate - Not applicable

  • Consent for publication - All authors agree on publication

  • Data availability - Not applicable

  • Materials availability - Not applicable

  • Code availability - Not applicable

  • Author contribution - A.N., R.K.: Conceptualization, Methodology, Investigation, Writing - Original Draft, Review, Editing. A.N.: Funding acquisition, Supervision.

Acknowledgements

This paper has been supported by the Polish National Agency for Academic Exchange Strategic Partnership Programme under Grant No. BPI/PST/2021/1/00031 (nawa.gov.pl).

References

  • [1] M. Abadi, P. Barham, J. Chen, Z. Chen et al. TensorFlow: a system for large-scale machine learning, In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, USA, 265–283, (2016).
  • [2] P. Aluffi, Algebra: Chapter 0, American Mathematical Society, 2009
  • [3] P. Arena, L. Fortuna, G. Muscato, M.G. Xibilia, Neural Networks in Multidimensional Domains, Lecture Notes in Control and Information Sciences, Springer London, 1998l doi: 10.1007/BFb0047683
  • [4] S. Buchholz, G. Sommer, On Clifford neurons and Clifford multi-layer perceptrons, Neural Networks, 21, 7, 925-935 (2008); doi: 10.1016/j.neunet.2008.03.004.
  • [5] W. Cao, C. Zheng, Z. Yan, et al. Geometric machine learning: research and applications. Multimed Tools Appl 81, 30545–30597 (2022). https://doi.org/10.1007/s11042-022-12683-9
  • [6] W. Cao, Z. Yan, Z. He and Z. He, A Comprehensive Survey on Geometric Deep Learning, IEEE Access, vol. 8, pp. 35929-35949, 2020, doi: 10.1109/ACCESS.2020.2975067.
  • [7] G. Carlsson, Topology and Data, Bull. Amer. Math. Soc. 46 (2), 255–308. (2009) doi:10.1090/s0273-0979-09-01249-x
  • [8] W. Fulton, J. Harris, Representation theory. A first course, Springer-Verlag, 1991
  • [9] X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:249-256 (2010)
  • [10] E. Grassucci, A. Zhang, D. Comminiello, PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex Convolutions, IEEE Transactions on Neural Networks and Learning Systems, 1-13 (2022); doi: 10.1109/TNNLS.2022.3226772
  • [11] R. Kycia, A. Niemczynowicz, Hypercomplex neural network in time series forecasting of stock data, Submitted, arXiv:2401.04632 [cs.NE]
  • [12] A. Lavin, S. Gray, Fast Algorithms for Convolutional Neural Networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 4013-4021, doi: 10.1109/CVPR.2016.435.
  • [13] A. Niemczynowicz, R. Kycia, KHNNs: hypercomplex neural networks computations via Keras using TensorFlow and PyTorch, in preparation
  • [14] A. Parada-Mayorga, A. Ribeiro, Algebraic Neural Networks: Stability to Deformations, IEEE Transactions on Signal Processing, vol. 69, pp. 3351-3366, (2021); doi: 10.1109/TSP.2021.3084537.
  • [15] A. Paszke, S. Gross, F. Massa, A. Lerer et al., PyTorch: an imperative style, high-performance deep learning library, Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, Article 721, 8026–8037 (2019)
  • [16] R. Penrose, W. Rindler, Spinors and Space-Time, Volume 1: Two-Spinor Calculus and Relativistic Fields, Cambridge University Press, 1984
  • [17] D. Ruhe, J.K. Gupta, S. De Keninck, M. Welling, J. Brandstetter, Geometric clifford algebra networks. In Proceedings of the 40th International Conference on Machine Learning (ICML’23), Vol. 202. JMLR.org, Article 1219, 29306–29337, (2023)
  • [18] G. Vieira, M.E. Valle, W. Lopes, Clifford Convolutional Neural Networks for Lymphoblast Image Classification, Silva, D.W., Hitzer, E., Hildenbrand, D. (eds) Advanced Computational Applications of Geometric Algebra. ICACGA 2022. Lecture Notes in Computer Science, vol 13771. Springer, Cham. (2024); doi: 10.1007/978-3-031-34031-4_7
  • [19] J. Wood, J. ShaweTaylor, Representation theory and invariant neural networks, Discrete Applied Mathematics, 69, 1-2, 33–60 (1996)