Multiple Kronecker RLS fusion-based link propagation for drug-side effect prediction

Yuqing Qian [email protected]
Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, P.R.China
Ziyu Zheng [email protected]
Department of Mathematical Sciences, University of Nottingham Ningbo, P.R.China
Prayag Tiwari* [email protected]
School of Information Technology, Halmstad University, Sweden
Yijie Ding* [email protected]
Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, P.R.China
Quan Zou [email protected]
Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, P.R.China

*Corresponding author.
Abstract

Drug-side effect prediction has become an essential area of research in the field of pharmacology. As the use of medications continues to rise, so does the importance of understanding and mitigating the potential risks associated with them. At present, researchers have turned to data-driven methods to predict drug-side effects. Drug-side effect prediction is a link prediction problem, and the related data can be described from various perspectives. To process these kinds of data, a multi-view method, called Multiple Kronecker RLS fusion-based link propagation (MKronRLSF-LP), is proposed. MKronRLSF-LP extends the Kron-RLS by finding the consensus partitions and multiple graph Laplacian constraints in the multi-view setting. Both of these multi-view settings contribute to a higher quality result. Extensive experiments have been conducted on drug-side effect datasets, and our empirical results provide evidence that our approach is effective and robust.

1 Introduction

Pharmacovigilance is critical to drug safety and surveillance. The field of pharmacovigilance plays a crucial role in public health by continuously monitoring and evaluating the safety profile of drugs. Pharmacovigilance involves collecting and analyzing data from various sources, including health care professionals (Yang et al., 2016), patients, regulatory authorities, and pharmaceutical companies. These data are then used to identify possible side effects and assess their severity and frequency (Da Silva & Krishnamurthy, 2016; Galeano et al., 2020). Traditionally, drug-side effects were primarily identified through spontaneous reporting systems, where health care professionals and patients reported adverse events to regulatory authorities. However, this approach has limitations, such as underreporting and delayed detection.

To overcome these limitations, researchers have turned to data-driven methods to find drug-side effects. With the advent of electronic health records, large-scale databases containing valuable information on medication usage and patient outcomes have become available. These databases have allowed researchers to analyze vast amounts of data to identify patterns between drugs and side effects.

One of the most commonly used approaches to drug-side effects prediction is model-based methods. Model-based methods involve the use of advanced statistical and machine learning techniques to extract knowledge from large datasets. By analyzing patterns in the data, researchers can identify potential drug-side effects and their associated risk factors. In their work, (Pauwels et al., 2011) predicted the side effects of drugs (Pau’s method) by applying K-nearest neighbor (KNN), support vector machine (SVM), ordinary canonical correlation analysis (OCCA) and sparse canonical correlation analysis (SCCA) from drug chemical substructures; furthermore, their experiment outcome suggests that SCCA performs the best. Sayaka et al. (2012) utilized SCCA to associate targeted proteins with side effects (Miz’s method). Liu et al. (2012) predicted drug side effects (Liu’s method) using SVM and multivariate information, such as the phenotypic characteristics, chemical structures, and biological properties of the drug. Cheng et al. (2013) proposed a phenotypic network inference classifier to associate drugs with side effects (Cheng’s method). NDDSA models (Shabani-Mashcool et al., 2020) the drug-side effects prediction problem using a bipartite graph and applies a resource allocation method to find new links. MKL-LGC (Ding et al., 2018) integrates multiple kernels to describe the diversified information of drugs and side-effects. These kernels are then combined using an optimized linear weighting algorithm. The Local and Global Consistency algorithm (LGC) is used to estimate new potential associations based on the integrated kernel information.

Deep learning techniques (Xu et al., 2022) have been increasingly used to predict drug side effects in recent years. These methods leverage the power of neural networks to analyze complex relationships between drugs, genes, and proteins. In SDPred (Zhao et al., 2022), chemical-chemical associations, chemical substructure, drug target information, word representations of drug molecular substructures, semantic similarity of side effects, and drug side effect associations are integrated. To learn drug-side effect pair representation vectors from different interaction maps, SDPred uses the CNN module. Drug interaction profile similarity (DIPA) provided the most contribution. GCRS (Xuan et al., 2022) builds a complex deep-learning structure to fuse and learn the specific topologies, common topologies and pairwise attributes from multiple drug-side effect heterogeneous graphs. Drug-side effect heterogeneous graphs are constructed using drug-side effect associations, drug-disease associations and drug chemical substructures. Based on a graph attention network, Zhao et al. (2021) developed a prediction model for drug-side effect frequencies that integrated information on similarity, known drug-side effect frequencies, and word embeddings. The above deep learning-based method is a kind of pairwise learning. To keep the sample balanced, this group selected the positive sample from trusted databases and the negative sample by random sampling. Such a treatment results in a certain loss of information and introduces noise to the label.

Drug-side effect prediction is a classic link prediction problem (Yuan et al., 2019). To solve this kind of problem, many multi-view methods have been proposed in recent years (Ding et al., 2021; 2016; Cichonska et al., 2018). Based on the information fusion at different stages of the training process, multi-view methods can roughly be divided into three categories: early fusion, late fusion and fusion during the training phase. Fig. 1 illustrates our taxonomy of multi-view learning method literature.

In early fusion techniques, the views are combined before training process is performed. Multiple kernel learning (MKL) (Wang et al., 2023b; Cichonska et al., 2018; Nascimento et al., 2016) is a typical early fusion technique. For each view, it computes one or more kernels, and then learns the optimal kernel from the base kernels. For example, MKL-KroneckerRLS (Ding et al., 2019) combines diversified information using Centered Kernel Alignment-based Multiple Kernel Learning (CKA-MKL). Based on the optimal kernel, Kronecker regularized least squares (Kro-RLS) was used to classify drug-side effect pairs. It must be noted that the performance of these methods relies heavily on the optimal view, which may be redundant or miss some key information. In late fusion techniques, a different model for each view is separately trained and later a weighted combination is taken as the final model. For instance, in Zhang et al. (2016), an ensemble model was constructed by integrating multiple methods, each providing a unique view. The model incorporates Liu et al. (2012), Cheng et al. (2013), a Integrated Neighbour-based Method (INBM), and a Restricted Boltzmann Machine-based Method (RBMBM). Each model is trained independently, and the final partition is the average weighted average of the base partitions. Late fusion allows for individual modeling of inherently different views, providing flexibility and advantage when dealing with diverse data. However, its drawback is the delayed coupling of information, limiting the extent to which each model can benefit from the information provided by other views.

Refer to caption
Figure 1: Taxonomy of multi-view learning framework literature. Note: "Partition" commonly refers to the learned result. This concept is more commonly found in classification and clustering tasks (Liu et al., 2023; Bruno & Marchand-Maillet, 2009; Wang et al., 2019). (a) Early fusion: the views are combined before the training process is performed; (b) Late fusion: a different model for each view is separately trained and then a combination is taken as the final partition; (c) Fusion during the training phase: it has some degree of freedom to model the views differently but to also ensure that information from other views is exploited during the training phase.

A third category is fusion during the training phase, which combines the benefits of both fusion types. It fuses multiple views at the partition level and enables the model to explore all views while being allowed to model one view differently. This framework has been applied to classification models (Houthuys & Suykens, 2021; Houthuys et al., 2018; Qian et al., 2022b; Xie & Sun, 2020) and clustering models (Lv et al., 2021; Houthuys et al., 2018; Wang et al., 2023a). By exploring consensus or complementarity information from multiple views, multi-view method can achieve better performance than single view method. The consensus principle pursues to achieve view-agreement among views. For instance, Wang et al. (2019) maximized the alignment between the consensus partition (clustering matrix) and the weighted combination of base partitions.

In this work, we apply this technique to the Kron-RLS algorithm. Due to its fast and scalable nature. The proposed method is named Multiple Kronecker RLS fusion-based link propagation (MKronRLSF-LP). Our work’s main contributions are listed as follows:

  • (1)

    We extend Kron-RLS to the multiple information fusion setting by finding the consensus partition and multiple graph Laplacian constraint. Specifically, we generate multiple partitions by normal Kron-RLS and adaptively learn a weight for each partition to control its contribution to the shared partitions. This work was conducted with the aim of fusing partitions while still allowing for some flexibility in modeling single information. Furthermore, multiple graph Laplacian regularization is adopted to boost the performance of semi-supervised learning. Both settings co-evolve toward better performance.

  • (2)

    To fuse the features of multiple information more reasonably, we design an iterative optimization algorithm to effectively fuse multiple Kron-RLS submodels and obtain the final predictive model of drug-side effects. In the whole optimization, we avoid explicit computation of any pairwise matrices, which makes our method suitable for solving problems in large pairwise spaces.

  • (3)

    The proposed method can address the general link prediction problem; it is empirically tested on four real drug-side effect datasets, which are more sparse. The results show that MKronRLSF-LP can achieve excellent classification results and outperform other competitive methods.

The rest of this paper is organized as follows. Section 2 provides a description of the drug-side effect prediction problem. Section 3 reviews related work about MKronRLSF-LP. Section 4 comprehensively presents the proposed MKronRLSF-LP. After reporting the experimental results in Section 5, we conclude this paper and mention future work in Section 6.

2 Problem description

Identification of drug-side effects is an example of the link prediction problem, which has the aim of predicting how likely it is that there is a link between two arbitrary nodes in a network. This problem can also be seen as a recommendation system (Jiang et al., 2019; Fan et al., 2021) task.

Let the drug nodes and side effect nodes of a network be 𝔻={d1,d2,,dN}𝔻subscript𝑑1subscript𝑑2subscript𝑑𝑁\displaystyle{\mathbb{D}}=\left\{{{d_{1}},{d_{2}},\ldots,{d_{N}}}\right\}blackboard_D = { italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_d start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT } and 𝕊={s1,s2,,sM}𝕊subscript𝑠1subscript𝑠2subscript𝑠𝑀\displaystyle{\mathbb{S}}=\left\{{{s_{1}},{s_{2}},\ldots,{s_{M}}}\right\}blackboard_S = { italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT }, respectively. We denote the number of drug and side effect nodes by N𝑁Nitalic_N and M𝑀Mitalic_M, respectively.

We define an adjacency matrix 𝑭N×M𝑭superscript𝑁𝑀{\bm{F}}\in{{\mathbb{R}}^{N\times M}}bold_italic_F ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_M end_POSTSUPERSCRIPT to represent the associations between drugs and side effects. Each element of 𝑭𝑭{\bm{F}}bold_italic_F is defined as 𝑭i,j=1subscript𝑭𝑖𝑗1\displaystyle{\bm{F}}_{i,j}=1bold_italic_F start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = 1 if the node pair (di,sj)subscript𝑑𝑖subscript𝑠𝑗(d_{i},s_{j})( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) is linked and 𝑭i,j=0subscript𝑭𝑖𝑗0\displaystyle{\bm{F}}_{i,j}=0bold_italic_F start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = 0 otherwise.

The link prediction has the aim of predicting whether a link exists for the unknown state node pair (di,sj)𝔻×𝕊subscript𝑑𝑖subscript𝑠𝑗𝔻𝕊\left({{d_{i}},{s_{j}}}\right)\in{\mathbb{D}}\times{\mathbb{S}}( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∈ blackboard_D × blackboard_S. Thus, it is a classification problem. Most methods use regression algorithms to predict a score (ranging from 0-1), which we call the link confidence. Then, a class of 0 or 1 is assigned to the predicted score by the threshold. Higher link confidence indicates a greater probability of the link existing, while lower values indicate the opposite. We define a new matrix 𝑭^^𝑭\hat{\bm{F}}over^ start_ARG bold_italic_F end_ARG, which is estimated by the prediction model. Each of elements 𝑭^i,jsubscript^𝑭𝑖𝑗\hat{\bm{F}}_{i,j}over^ start_ARG bold_italic_F end_ARG start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT represents the predicted link confidence for the node pair (di,sj)subscript𝑑𝑖subscript𝑠𝑗(d_{i},s_{j})( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). Figure 5 summarizes the link prediction problem discussed in this paper.

3 Related work

3.1 Regularized Least Squares

The objective function of Regularized Least Squares (RLS) regression is:

argminf12𝑭f(𝑲)F2+λ2fK2,subscript𝑓12superscriptsubscriptnorm𝑭𝑓𝑲𝐹2𝜆2superscriptsubscriptnorm𝑓𝐾2\mathop{\arg\min}\limits_{f}\ \frac{1}{{2}}\left\|{{\bm{F}}-f\left({\bm{K}}% \right)}\right\|_{F}^{2}+\frac{\lambda}{2}\left\|f\right\|_{K}^{2},start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_italic_F - italic_f ( bold_italic_K ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_λ end_ARG start_ARG 2 end_ARG ∥ italic_f ∥ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (1)

where λ𝜆\lambdaitalic_λ is a regularization parameter, fKsubscriptnorm𝑓𝐾{\left\|f\right\|_{K}}∥ italic_f ∥ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT denotes the RKHS norm (Kailath, 1971) of f()𝑓f\left(\cdot\right)italic_f ( ⋅ ). f()𝑓f\left(\cdot\right)italic_f ( ⋅ ) is the prediction function and be defined as:

f(𝑲)=𝑲𝒂,𝑓𝑲𝑲𝒂\displaystyle f\left({\bm{K}}\right)={\bm{K}}{\bm{a}},italic_f ( bold_italic_K ) = bold_italic_K bold_italic_a , (2)

where 𝒂𝒂{\bm{a}}bold_italic_a is the solution of the model, 𝑭𝑭{\bm{F}}bold_italic_F is a kernel matrix with elements

𝑲i,j=k(di,dj)(i,j=1,,N),{{\bm{K}}_{i,j}}=k\left({{d_{i}},{d_{j}}}\right)\left({i,j=1,\ldots,N}\right),bold_italic_K start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_k ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ( italic_i , italic_j = 1 , … , italic_N ) , (3)

and k𝑘kitalic_k represents the kernel function.

By formulating the stationary points of Equation 1 and elimination the unknown parameters 𝒂𝒂{\bm{a}}bold_italic_a, the following solution is obtained

𝑭^=𝑲(𝑲+λ𝑰N)1𝑭.^𝑭𝑲superscript𝑲𝜆subscript𝑰𝑁1𝑭\hat{\bm{F}}={\bm{K}}{\left({{\bm{K}}+\lambda{\bm{I}}_{N}}\right)^{-1}}{\bm{F}}.over^ start_ARG bold_italic_F end_ARG = bold_italic_K ( bold_italic_K + italic_λ bold_italic_I start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_F . (4)

There is only one kind of feature space considered in this model. In the drug-side effect identification problem, there are two feature spaces: the drug space and the side effect space.

3.2 Kronecker Regularized Least Squares

Combining the kernels of the two spaces into a single large kernel that directly relates drug-side effect pairs would be a better option. Kronecker product kernel (Hue & Vert, 2010) is used for this. Given the drug kernel 𝑲Dsubscript𝑲𝐷{\bm{K}}_{D}bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT and side effect kernel 𝑲Ssubscript𝑲𝑆{\bm{K}}_{S}bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT, then we have the kronecker product kernel

𝑲=𝑲S𝑲D,𝑲tensor-productsubscript𝑲𝑆subscript𝑲𝐷{\bm{K}}={{\bm{K}}_{S}}\otimes{{\bm{K}}_{D}},bold_italic_K = bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ⊗ bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT , (5)

where the tensor-product\otimes indicates the Kronecker product (Laub, 2004). By applying the Kronecker product kernel to RLS, the objective function of Kronecker Regularized Least Squares (Kron-RLS) is botained:

argminf12vec(𝑭)f(𝑲)F2+λ2fK2,subscript𝑓12superscriptsubscriptnormvec𝑭𝑓𝑲𝐹2𝜆2superscriptsubscriptnorm𝑓𝐾2\mathop{\arg\min}\limits_{f}\ \frac{1}{{2}}\left\|{\text{vec}\left({\bm{F}}% \right)-f\left({\bm{K}}\right)}\right\|_{F}^{2}+\frac{\lambda}{2}\left\|f% \right\|_{K}^{2},start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ vec ( bold_italic_F ) - italic_f ( bold_italic_K ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_λ end_ARG start_ARG 2 end_ARG ∥ italic_f ∥ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (6)

where vec()vec\text{vec}\left(\cdot\right)vec ( ⋅ ) is the vectorization operating function. By setting the derivative of Equation 6 w.r.t 𝒂𝒂{\bm{a}}bold_italic_a to zero, we obtain:

𝒂=(𝑲+λ𝑰NM)1vec(𝑭).𝒂superscript𝑲𝜆subscript𝑰𝑁𝑀1vec𝑭{\bm{a}}={\left({{\bm{K}}+\lambda{\bm{I}}_{NM}}\right)^{-1}}\text{vec}\left({% \bm{F}}\right).bold_italic_a = ( bold_italic_K + italic_λ bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT vec ( bold_italic_F ) . (7)

Obviously, it needs calculating the inverse of (𝑲+λ𝑰NM)𝑲𝜆subscript𝑰𝑁𝑀\left({{\bm{K}}+\lambda{\bm{I}}_{NM}}\right)( bold_italic_K + italic_λ bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT ) with size of NM×NM𝑁𝑀𝑁𝑀NM\times NMitalic_N italic_M × italic_N italic_M, whose time complexity is O(N3M3)𝑂superscript𝑁3superscript𝑀3O\left({{N^{3}}{M^{3}}}\right)italic_O ( italic_N start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ). Thus, a well-known theorem (Raymond & Kashima, 2010; Laub, 2004) is proposed to obtain the approximate inverse.

It is well known that the kernel (Liu et al., 2023; Pekalska & Haasdonk, 2008) matrices are positive semi-definite matrices, they can be eigen decomposed, 𝑲D=𝑽D𝚲D𝑽DTsubscript𝑲𝐷subscript𝑽𝐷subscript𝚲𝐷superscriptsubscript𝑽𝐷𝑇{{\bm{K}}_{D}}={{\bm{V}}_{D}}{\bm{\Lambda}_{D}}{\bm{V}}_{D}^{T}bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT = bold_italic_V start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT bold_Λ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT bold_italic_V start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and 𝑲S=𝑽S𝚲S𝑽STsubscript𝑲𝑆subscript𝑽𝑆subscript𝚲𝑆superscriptsubscript𝑽𝑆𝑇{{\bm{K}}_{S}}={{\bm{V}}_{S}}{\bm{\Lambda}_{S}}{\bm{V}}_{S}^{T}bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = bold_italic_V start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT bold_Λ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT bold_italic_V start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. According to the theorem (Raymond & Kashima, 2010; Laub, 2004), the eigenvectors of the Kronecker product kernel 𝑲𝑲{\bm{K}}bold_italic_K is the 𝑽=𝑽S𝑽D𝑽tensor-productsubscript𝑽𝑆subscript𝑽𝐷{\bm{V}}={{\bm{V}}_{S}}\otimes{{\bm{V}}_{D}}bold_italic_V = bold_italic_V start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ⊗ bold_italic_V start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT. Define the matrix 𝚲𝚲\bm{\Lambda}bold_Λ to be either 𝚲i,j=[𝚲S]i,i×[𝚲D]j,jsubscript𝚲𝑖𝑗subscriptdelimited-[]subscript𝚲𝑆𝑖𝑖subscriptdelimited-[]subscript𝚲𝐷𝑗𝑗{\bm{\Lambda}_{i,j}}={\left[{{\bm{\Lambda}_{S}}}\right]_{i,i}}\times{\left[{{% \bm{\Lambda}_{D}}}\right]_{j,j}}bold_Λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = [ bold_Λ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT × [ bold_Λ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_j , italic_j end_POSTSUBSCRIPT. The eigenvalues of 𝑲𝑲{\bm{K}}bold_italic_K is diag(vec(𝚲))diagvec𝚲\text{diag}\left(\text{vec}\left(\bm{\Lambda}\right)\right)diag ( vec ( bold_Λ ) ). The matrix 𝑲+λ𝑰NM𝑲𝜆subscript𝑰𝑁𝑀{{\bm{K}}+\lambda{\bm{I}}_{NM}}bold_italic_K + italic_λ bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT has the same eigenvactors 𝑽𝑽{\bm{V}}bold_italic_V, and eigenvalues diag(vec(𝚲+λ𝟏))diagvec𝚲𝜆1\text{diag}\left(\text{vec}\left(\bm{\Lambda}+\lambda\mathbf{1}\right)\right)diag ( vec ( bold_Λ + italic_λ bold_1 ) ). Then, we can rewrite Equation 7 as:

𝑲(𝑲+λ𝑰NM)1vec(𝑭)=𝑽diag(vec(𝚲))𝑽T𝑽diag(vec(𝚲+λ𝟏))1𝑽Tvec(𝑭).𝑲superscript𝑲𝜆subscript𝑰𝑁𝑀1vec𝑭𝑽diagvec𝚲superscript𝑽𝑇𝑽diagsuperscriptvec𝚲𝜆11superscript𝑽𝑇vec𝑭{\bm{K}}{\left({{\bm{K}}+\lambda{\bm{I}}_{NM}}\right)^{-1}}\text{vec}\left({% \bm{F}}\right)={\bm{V}}{\text{diag}{{\left({\text{vec}\left(\bm{\Lambda}\right% )}\right)}}}{{\bm{V}}^{T}}{\bm{V}}{\text{diag}{{\left({\text{vec}\left({\bm{% \Lambda}+\lambda\bm{1}}\right)}\right)}^{-1}}}{{\bm{V}}^{T}}\text{vec}\left({% \bm{F}}\right).bold_italic_K ( bold_italic_K + italic_λ bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT vec ( bold_italic_F ) = bold_italic_V diag ( vec ( bold_Λ ) ) bold_italic_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_V diag ( vec ( bold_Λ + italic_λ bold_1 ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT vec ( bold_italic_F ) . (8)

Since 𝑽T𝑽=𝑰NMsuperscript𝑽𝑇𝑽subscript𝑰𝑁𝑀{\bm{V}}^{T}{\bm{V}}={\bm{I}}_{NM}bold_italic_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_V = bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT and diag(vec(𝚲))diag(vec(𝚲+λ𝟏))1diagvec𝚲diagsuperscriptvec𝚲𝜆11{\text{diag}{{\left({\text{vec}\left(\bm{\Lambda}\right)}\right)}}}{\text{diag% }{{\left({\text{vec}\left({\bm{\Lambda}+\lambda\bm{1}}\right)}\right)}^{-1}}}diag ( vec ( bold_Λ ) ) diag ( vec ( bold_Λ + italic_λ bold_1 ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is also a diagonal matrix, we further simplify Equation 8 and get

𝑲(𝑲+λ𝑰NM)1vec(𝑭)=𝑽diag(vec(𝑱))𝑽Tvec(𝑭),𝑲superscript𝑲𝜆subscript𝑰𝑁𝑀1vec𝑭𝑽diagvec𝑱superscript𝑽𝑇vec𝑭{\bm{K}}{\left({{\bm{K}}+\lambda{\bm{I}}_{NM}}\right)^{-1}}\text{vec}\left({% \bm{F}}\right)={\bm{V}}{\text{diag}{{\left({\text{vec}\left(\bm{{\bm{J}}}% \right)}\right)}}}{\bm{V}}^{T}\text{vec}\left({\bm{F}}\right),bold_italic_K ( bold_italic_K + italic_λ bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT vec ( bold_italic_F ) = bold_italic_V diag ( vec ( bold_italic_J ) ) bold_italic_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT vec ( bold_italic_F ) , (9)

where the matrix 𝑱𝑱{\bm{J}}bold_italic_J to be either

𝑱i,j=𝚲i,j𝚲i,j+λ.subscript𝑱𝑖𝑗subscript𝚲𝑖𝑗subscript𝚲𝑖𝑗𝜆{{\bm{J}}_{i,j}}=\frac{{{\bm{\Lambda}_{i,j}}}}{{{\bm{\Lambda}_{i,j}}+\lambda}}.bold_italic_J start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = divide start_ARG bold_Λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG start_ARG bold_Λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT + italic_λ end_ARG . (10)

Using the vec-tricks techniques ((𝑨𝑩)vec(𝑪)=vec(𝑩𝑪𝑨T)tensor-product𝑨𝑩vec𝑪vec𝑩𝑪superscript𝑨𝑇\left({{\bm{A}}\otimes{\bm{B}}}\right)\text{vec}\left({\bm{C}}\right)=\text{% vec}\left({{\bm{B}}{\bm{C}}{{\bm{A}}^{T}}}\right)( bold_italic_A ⊗ bold_italic_B ) vec ( bold_italic_C ) = vec ( bold_italic_B bold_italic_C bold_italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT )), we further simplify Equation 8. Then, we get

𝑭^=𝑽D(𝑱(𝑽DT𝑭𝑽S))T𝑽ST,^𝑭subscript𝑽𝐷superscriptdirect-product𝑱superscriptsubscript𝑽𝐷𝑇𝑭subscript𝑽𝑆𝑇superscriptsubscript𝑽𝑆𝑇\hat{\bm{F}}={{\bm{V}}_{D}}{\left({{\bm{J}}\odot\left({{{\bm{V}}_{D}^{T}}{\bm{% F}}{{\bm{V}}_{S}}}\right)}\right)^{T}}{{\bm{V}}_{S}^{T}},over^ start_ARG bold_italic_F end_ARG = bold_italic_V start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( bold_italic_J ⊙ ( bold_italic_V start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_F bold_italic_V start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_V start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , (11)

where direct-product\odot represents the Hadamard product. The computational time of this optimization method is O(N3+M3)𝑂superscript𝑁3superscript𝑀3O\left({{N^{3}}+{M^{3}}}\right)italic_O ( italic_N start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + italic_M start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ), which is much less than O(N3M3)𝑂superscript𝑁3superscript𝑀3O\left({{N^{3}}{M^{3}}}\right)italic_O ( italic_N start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ).

3.3 Kronecker Regularized Least Squares with Multiple Kernel Learning

Kron-RLS is a kind of kernel method. It can be difficult for nonexpert users to choose an appropriate kernel. To address such limitations, Multiple Kernel Learning (MKL) (Gönen & Alpaydın, 2011) is proposed. Since kernels in MKL can naturally correspond to different views, MKL has been applied with great success to cope with the multi-view data (Wang et al., 2021; Xu et al., 2021; Guo et al., 2021; Qian et al., 2022a; Wang et al., 2023b) by combining kernels appropriately.

Given predefined base kernels {𝑲Di}i=1Psuperscriptsubscriptsuperscriptsubscript𝑲𝐷𝑖𝑖1𝑃\left\{{{\bm{K}}_{D}^{i}}\right\}_{i=1}^{{P}}{ bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT and {𝑲Sj}j=1Qsuperscriptsubscriptsuperscriptsubscript𝑲𝑆𝑗𝑗1𝑄\left\{{{\bm{K}}_{S}^{j}}\right\}_{j=1}^{{Q}}{ bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT from drug feature space and side effect feature space, respectively. These kernels can be built from different types or views. The optimal kernel can be combined by a linear function corresponding to the base kernels:

𝑲Dopt=i=1Pwi𝑲Di.superscriptsubscript𝑲𝐷𝑜𝑝𝑡superscriptsubscript𝑖1𝑃superscript𝑤𝑖superscriptsubscript𝑲𝐷𝑖{\bm{K}}_{D}^{opt}=\sum\limits_{i=1}^{{P}}{w^{i}{\bm{K}}_{D}^{i}}.bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o italic_p italic_t end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT . (12)

Usually, an additional constraint is imposed on the corresponding combination coefficient w𝑤witalic_w to control its structure:

i=1Pwi=1,wi0,i=1,,P.formulae-sequencesuperscriptsubscript𝑖1𝑃superscript𝑤𝑖1formulae-sequencesuperscript𝑤𝑖0𝑖1𝑃\sum\limits_{i=1}^{{P}}{w^{i}}=1,w^{i}\geq 0,i=1,\ldots,{P}.∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = 1 , italic_w start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ≥ 0 , italic_i = 1 , … , italic_P . (13)

The optimal side effect kernel 𝑲Soptsuperscriptsubscript𝑲𝑆𝑜𝑝𝑡{\bm{K}}_{S}^{opt}bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o italic_p italic_t end_POSTSUPERSCRIPT is omitted.

Based on MKL method, Ding et al. (2019) and Nascimento et al. (2016) developed Kron-RLS based MKL methods, called Kron-RLS with CKA-MKL and Kron-RLS with selfMKL, respectively. Kron-RLS with CKA-MKL combines diversified information using Centered Kernel Alignment-based Multiple Kernel Learning (CKA-MKL). In Kron-RLS with selfMKL, the weights indicating the importance of individual kernels are calculated automatically to select the more relevant kernels. The final decision function of both methods is given by:

vec(𝑭^)=(𝑲Sopt𝑲Dopt)(𝑲Sopt𝑲Dopt+λ𝑰NM)1vec(𝑭).vec^𝑭tensor-productsuperscriptsubscript𝑲𝑆𝑜𝑝𝑡superscriptsubscript𝑲𝐷𝑜𝑝𝑡superscripttensor-productsuperscriptsubscript𝑲𝑆𝑜𝑝𝑡superscriptsubscript𝑲𝐷𝑜𝑝𝑡𝜆subscript𝑰𝑁𝑀1vec𝑭{\text{vec}}\left({{\hat{\bm{F}}}}\right)=\left({{\bm{K}}_{S}^{opt}\otimes{\bm% {K}}_{D}^{opt}}\right){\left({{\bm{K}}_{S}^{opt}\otimes{\bm{K}}_{D}^{opt}+% \lambda{\bm{I}}_{NM}}\right)^{-1}}{\text{vec}}\left({\bm{F}}\right).vec ( over^ start_ARG bold_italic_F end_ARG ) = ( bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o italic_p italic_t end_POSTSUPERSCRIPT ⊗ bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o italic_p italic_t end_POSTSUPERSCRIPT ) ( bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o italic_p italic_t end_POSTSUPERSCRIPT ⊗ bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o italic_p italic_t end_POSTSUPERSCRIPT + italic_λ bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT vec ( bold_italic_F ) . (14)

4 Proposed method

Refer to caption
Figure 2: Framework diagram of MKronRLSF-LP. MKronRLSF-LP allow the multiple partitions have a degree of freedom to model the single information and introduce a multiple graph Laplacian regularization into consensus partition.

Existing multi view fusion methods based on Kron-RLS all follow MKL framework. These methods optimize the optimal pairwise kernel as a linear combination of a set of base kernels. Prior to training, all views are fused, and information is not shared during training phase. This is typical early fusion technology. Our proposal addresses this limitation by fusing multi-view information in a consensus partition. Compared with MKL framework, the advantage of the proposed method is that it allows sub partitions to have a certain degree of freedom to model the single information. Further, multiple graph Laplacian regularization is introduced into the consensus partition to boost performance. Fig. 2 illustrates the main procedure of MKronRLSF-LP.

4.1 The construction of kernel matrix

Kron-RLS is a kind of kernel method. We construct drug kernels using five different kinds of functions.

Gaussian Interaction Profile (GIP):

[𝑲GIP,D]i,j=exp(γdidj2),subscriptdelimited-[]subscript𝑲𝐺𝐼𝑃𝐷𝑖𝑗𝛾superscriptnormsubscript𝑑𝑖subscript𝑑𝑗2{\left[{{{\bm{K}}_{GIP,D}}}\right]_{i,j}}=\exp\left({-\gamma{{\left\|{{d_{i}}-% {d_{j}}}\right\|}^{2}}}\right),[ bold_italic_K start_POSTSUBSCRIPT italic_G italic_I italic_P , italic_D end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = roman_exp ( - italic_γ ∥ italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (15)

where γ𝛾\gammaitalic_γ is the gaussian kernel bandwidth and γ=1𝛾1\gamma=1italic_γ = 1.

Cosine Similarity (COS):

[𝑲COS,D]i,j=diTdj|di||dj|.subscriptdelimited-[]subscript𝑲𝐶𝑂𝑆𝐷𝑖𝑗superscriptsubscript𝑑𝑖𝑇subscript𝑑𝑗subscript𝑑𝑖subscript𝑑𝑗{\left[{{{\bm{K}}_{COS,D}}}\right]_{i,j}}=\frac{{d_{i}^{T}{d_{j}}}}{{\left|{{d% _{i}}}\right|\left|{{d_{j}}}\right|}}.[ bold_italic_K start_POSTSUBSCRIPT italic_C italic_O italic_S , italic_D end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = divide start_ARG italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG | italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_ARG . (16)

Correlation coefficient (Corr):

[𝑲Corr,D]i,j=Cov(di,dj)Var(di)Var(dj).subscriptdelimited-[]subscript𝑲𝐶𝑜𝑟𝑟𝐷𝑖𝑗Covsubscript𝑑𝑖subscript𝑑𝑗Varsubscript𝑑𝑖Varsubscript𝑑𝑗{\left[{{{\bm{K}}_{Corr,D}}}\right]_{i,j}}=\frac{{\text{Cov}\left({{d_{i}},{d_% {j}}}\right)}}{{\sqrt{\text{Var}\left({{d_{i}}}\right)\text{Var}\left({{d_{j}}% }\right)}}}.[ bold_italic_K start_POSTSUBSCRIPT italic_C italic_o italic_r italic_r , italic_D end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = divide start_ARG Cov ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG square-root start_ARG Var ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) Var ( italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG end_ARG . (17)

Normalized Mutual Information (NMI):

[𝑲NMI,D]i,j=Q(di,dj)H(di)H(dj),subscriptdelimited-[]subscript𝑲𝑁𝑀𝐼𝐷𝑖𝑗Qsubscript𝑑𝑖subscript𝑑𝑗Hsubscript𝑑𝑖Hsubscript𝑑𝑗{\left[{{{\bm{K}}_{NMI,D}}}\right]_{i,j}}=\frac{{\text{Q}\left({{d_{i}},{d_{j}% }}\right)}}{{\sqrt{\text{H}\left({{d_{i}}}\right)\text{H}\left({{d_{j}}}\right% )}}},[ bold_italic_K start_POSTSUBSCRIPT italic_N italic_M italic_I , italic_D end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = divide start_ARG Q ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG square-root start_ARG H ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) H ( italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG end_ARG , (18)

where Q(di,dj)Qsubscript𝑑𝑖subscript𝑑𝑗\text{Q}\left({{d_{i}},{d_{j}}}\right)Q ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) is the mutual information of disubscript𝑑𝑖d_{i}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and djsubscript𝑑𝑗d_{j}italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. H(di)Hsubscript𝑑𝑖\text{H}\left({{d_{i}}}\right)H ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and H(dj)Hsubscript𝑑𝑗\text{H}\left({{d_{j}}}\right)H ( italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) are the entropies of disubscript𝑑𝑖d_{i}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and djsubscript𝑑𝑗d_{j}italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, respectively.

Neural Tangent Kernel (NTK):

[KNTK,D]i,j=𝔼θw[fNTK(θ,di),fNTK(θ,dj)],subscriptdelimited-[]subscript𝐾𝑁𝑇𝐾𝐷𝑖𝑗subscript𝔼similar-to𝜃𝑤subscript𝑓𝑁𝑇𝐾𝜃subscript𝑑𝑖subscript𝑓𝑁𝑇𝐾𝜃subscript𝑑𝑗{\left[{{K_{NTK,D}}}\right]_{i,j}}={{\mathbb{E}}_{\theta\sim w}}\left[{{f_{NTK% }}\left({\theta,{d_{i}}}\right),{f_{NTK}}\left({\theta,{d_{j}}}\right)}\right],[ italic_K start_POSTSUBSCRIPT italic_N italic_T italic_K , italic_D end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT italic_θ ∼ italic_w end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_N italic_T italic_K end_POSTSUBSCRIPT ( italic_θ , italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_f start_POSTSUBSCRIPT italic_N italic_T italic_K end_POSTSUBSCRIPT ( italic_θ , italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ] , (19)

where fNTKsubscript𝑓𝑁𝑇𝐾f_{NTK}italic_f start_POSTSUBSCRIPT italic_N italic_T italic_K end_POSTSUBSCRIPT is a fully connected neural network and θ𝜃\thetaitalic_θ is collection of parameters in this network.

Similarity, we construct the side effect kernels (𝑲GIP,Ssubscript𝑲𝐺𝐼𝑃𝑆{\bm{K}}_{GIP,S}bold_italic_K start_POSTSUBSCRIPT italic_G italic_I italic_P , italic_S end_POSTSUBSCRIPT, 𝑲COS,Ssubscript𝑲𝐶𝑂𝑆𝑆{\bm{K}}_{COS,S}bold_italic_K start_POSTSUBSCRIPT italic_C italic_O italic_S , italic_S end_POSTSUBSCRIPT, 𝑲Corr,Ssubscript𝑲𝐶𝑜𝑟𝑟𝑆{\bm{K}}_{Corr,S}bold_italic_K start_POSTSUBSCRIPT italic_C italic_o italic_r italic_r , italic_S end_POSTSUBSCRIPT, 𝑲NMI,Ssubscript𝑲𝑁𝑀𝐼𝑆{\bm{K}}_{NMI,S}bold_italic_K start_POSTSUBSCRIPT italic_N italic_M italic_I , italic_S end_POSTSUBSCRIPT, 𝑲NTK,Ssubscript𝑲𝑁𝑇𝐾𝑆{\bm{K}}_{NTK,S}bold_italic_K start_POSTSUBSCRIPT italic_N italic_T italic_K , italic_S end_POSTSUBSCRIPT) in side effect space.

4.2 The MKronRLSF-LP model

Let us define two sets of base kernel sets separately:

𝕂D={𝑲D1,,𝑲DP},subscript𝕂𝐷superscriptsubscript𝑲𝐷1superscriptsubscript𝑲𝐷𝑃{{{\mathbb{K}}}_{D}}=\left\{{{\bm{K}}_{D}^{1},\ldots,{\bm{K}}_{D}^{P}}\right\},blackboard_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT = { bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT } , (20a)
𝕂S={𝑲S1,,𝑲SQ},subscript𝕂𝑆superscriptsubscript𝑲𝑆1superscriptsubscript𝑲𝑆𝑄{{{\mathbb{K}}}_{S}}=\left\{{{\bm{K}}_{S}^{1},\ldots,{\bm{K}}_{S}^{Q}}\right\},blackboard_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = { bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT } , (20b)

where P𝑃Pitalic_P and Q𝑄Qitalic_Q represents the numbers of drug and side effect kernels, respectively. Based on the 𝑲Dsubscript𝑲𝐷{\bm{K}}_{D}bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT and 𝑲Ssubscript𝑲𝑆{\bm{K}}_{S}bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT, we can get a set of pairwise kernels:

𝕂={𝑲1=𝑲S1𝑲D1,,𝑲V=𝑲SP𝑲DQ},𝕂formulae-sequencesuperscript𝑲1tensor-productsuperscriptsubscript𝑲𝑆1superscriptsubscript𝑲𝐷1superscript𝑲𝑉tensor-productsuperscriptsubscript𝑲𝑆𝑃superscriptsubscript𝑲𝐷𝑄{{\mathbb{K}}}=\left\{{{{\bm{K}}^{1}}={\bm{K}}_{S}^{1}\otimes{\bm{K}}_{D}^{1},% \ldots,{{\bm{K}}^{V}}={\bm{K}}_{S}^{P}\otimes{\bm{K}}_{D}^{Q}}\right\},blackboard_K = { bold_italic_K start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT = bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ⊗ bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , bold_italic_K start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT = bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ⊗ bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT } , (21)

where V𝑉Vitalic_V denotes the numbers of base pairwise kernels. Obviously, V𝑉Vitalic_V is equal to P×Q𝑃𝑄P\times Qitalic_P × italic_Q.

By using multiple partitions, we can manipulate multiple views in a partition space, which enhances the robustness of the model. The following ensemble KronRLS model is obtained

argmin𝒂vv=1V(12vec(𝑭)𝑲v𝒂v22+λv2𝒂vT𝑲v𝒂v).subscriptsuperscript𝒂𝑣superscriptsubscript𝑣1𝑉12superscriptsubscriptnormvec𝑭superscript𝑲𝑣superscript𝒂𝑣22subscript𝜆𝑣2superscript𝒂superscript𝑣𝑇superscript𝑲𝑣superscript𝒂𝑣\mathop{\arg\min}\limits_{{{\bm{a}}^{v}}}\ \sum\limits_{v=1}^{V}{\left({\frac{% 1}{2}\left\|{{\mathop{\text{vec}}\nolimits}\left({\bm{F}}\right)-{{\bm{K}}^{v}% }{{\bm{a}}^{v}}}\right\|_{2}^{2}+\frac{{{\lambda_{v}}}}{2}{{\bm{a}}^{{v^{T}}}}% {{\bm{K}}^{v}}{{\bm{a}}^{v}}}\right)}.start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ vec ( bold_italic_F ) - bold_italic_K start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG bold_italic_a start_POSTSUPERSCRIPT italic_v start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT bold_italic_K start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) . (22)

In multi-view methods, the consensus principle establishes consistency between partitions from different views. However, it’s essential to find that these partitions deliver varying degrees of importance to the final prediction, unlike fusion without discrimination. To facilitate this, we introduce a consensus partition, denoted by 𝑭^^𝑭\hat{\bm{F}}over^ start_ARG bold_italic_F end_ARG. It is a weighted linear combination of partitions 𝑭^vsubscript^𝑭𝑣\hat{\bm{F}}_{v}over^ start_ARG bold_italic_F end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT from multiple distinct views. A variable 𝒘vsubscript𝒘𝑣{\bm{w}}_{v}bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT is introduced for view v𝑣vitalic_v which characterizes its importance, which is calculated based on the training error. To prevent sparse situations, we employ 22\left\|\cdot\right\|_{2}^{2}∥ ⋅ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT to smooth the weights. Then, we have the following optimization problem

argmin𝑭^,𝒂v,𝒘subscript^𝑭superscript𝒂𝑣𝒘\displaystyle\mathop{\arg\min}\limits_{\hat{\bm{F}},{{\bm{a}}^{v}},{{\bm{w}}}}start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT over^ start_ARG bold_italic_F end_ARG , bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT , bold_italic_w end_POSTSUBSCRIPT 12vec(𝑭^)v=1V𝒘v𝑲v𝒂v22+μv=1V(𝒘v2vec(𝑭)𝑲v𝒂vF2+λv2𝒂vT𝑲v𝒂v)+12β𝒘2212superscriptsubscriptnormvec^𝑭superscriptsubscript𝑣1𝑉subscript𝒘𝑣superscript𝑲𝑣superscript𝒂𝑣22𝜇superscriptsubscript𝑣1𝑉subscript𝒘𝑣2superscriptsubscriptnormvec𝑭superscript𝑲𝑣superscript𝒂𝑣𝐹2subscript𝜆𝑣2superscript𝒂superscript𝑣𝑇superscript𝑲𝑣superscript𝒂𝑣12𝛽superscriptsubscriptnorm𝒘22\displaystyle\ \frac{1}{2}\left\|{{\mathop{\rm vec}\nolimits}\left({\hat{\bm{F% }}}\right)-\sum\limits_{v=1}^{V}{{{\bm{w}}_{v}}{{\bm{K}}^{v}}{{\bm{a}}^{v}}}}% \right\|_{2}^{2}+{\mu}\sum\limits_{v=1}^{V}{\left({\frac{{\bm{w}}_{v}}{2}\left% \|{{\mathop{\rm vec}\nolimits}\left({\bm{F}}\right)-{{\bm{K}}^{v}}{{\bm{a}}^{v% }}}\right\|_{F}^{2}+\frac{{{\lambda_{v}}}}{2}{{\bm{a}}^{{v^{T}}}}{{\bm{K}}^{v}% }{{\bm{a}}^{v}}}\right)}+\frac{1}{2}\beta\left\|{\bm{w}}\right\|_{2}^{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ roman_vec ( over^ start_ARG bold_italic_F end_ARG ) - ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT bold_italic_K start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT ( divide start_ARG bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ roman_vec ( bold_italic_F ) - bold_italic_K start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG bold_italic_a start_POSTSUPERSCRIPT italic_v start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT bold_italic_K start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_β ∥ bold_italic_w ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (23)
s.t.v=1V𝒘v=1,𝒘v0,v=1,,V.formulae-sequence𝑠𝑡formulae-sequencesuperscriptsubscript𝑣1𝑉subscript𝒘𝑣1formulae-sequencesubscript𝒘𝑣0𝑣1𝑉\displaystyle s.t.\sum\limits_{v=1}^{V}{{{\bm{w}}_{v}}}=1,{{\bm{w}}_{v}}\geq 0% ,v=1,\ldots,V.italic_s . italic_t . ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 1 , bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ≥ 0 , italic_v = 1 , … , italic_V .

In Equation 23, we observe that the consensus partition 𝑭^^𝑭\hat{\bm{F}}over^ start_ARG bold_italic_F end_ARG fits to an adjacency matrix 𝑭𝑭{\bm{F}}bold_italic_F by an indirect path. As described in section 2, false zeros represent unobserved links in the network. Hence, we must avoid overfitting the observed matrix 𝑭𝑭{\bm{F}}bold_italic_F. Inspired by manifold scenarios, the Laplacian operator adeptly mitigates overfitting and noise, preserving the original data structure and kee** nodes with common labels closely associated. This approach is simple, and empirical evidence confirms its effective performance (Pang & Cheung, 2017; Chao & Sun, 2019; Jiang et al., 2023). Here, we apply multiple graph Laplacian regularization to Equation 23, which can effectively explore multiple different views and boost the performance of 𝑭^^𝑭\hat{\bm{F}}over^ start_ARG bold_italic_F end_ARG. Specifically, the Kronecker product Laplacian matrix is calculated from the optimal drug and side effect similarity matrix, which are weighted linear combinations of multiple related kernel matrices. The weight of each kernel can be adaptively optimized during the training process and reduce the impact of noisy or less relevant graphs. The optimization problems for MKronRLSF-LP can be formulated as:

argmin𝑭^,𝒂v,𝒘,𝜽D,𝜽Ssubscript^𝑭superscript𝒂𝑣𝒘subscript𝜽𝐷subscript𝜽𝑆\displaystyle\mathop{\arg\min}\limits_{\hat{\bm{F}},{{\bm{a}}^{v}},{{\bm{w}}},% {\bm{\theta}_{D}},{\bm{\theta}_{S}}}start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT over^ start_ARG bold_italic_F end_ARG , bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT , bold_italic_w , bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT , bold_italic_θ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT 12vec(𝑭^)v=1V𝒘v𝑲v𝒂v22+μv=1V(𝒘v2vec(𝑭)𝑲v𝒂v22+λv2𝒂vT𝑲v𝒂v)+12β𝒘2212superscriptsubscriptnormvec^𝑭superscriptsubscript𝑣1𝑉subscript𝒘𝑣superscript𝑲𝑣superscript𝒂𝑣22𝜇superscriptsubscript𝑣1𝑉subscript𝒘𝑣2superscriptsubscriptnormvec𝑭superscript𝑲𝑣superscript𝒂𝑣22superscript𝜆𝑣2superscript𝒂superscript𝑣𝑇superscript𝑲𝑣superscript𝒂𝑣12𝛽superscriptsubscriptnorm𝒘22\displaystyle\frac{1}{2}\left\|{{\mathop{\rm vec}\nolimits}\left({\hat{\bm{F}}% }\right)-\sum\limits_{v=1}^{V}{{{\bm{w}}_{v}}{{\bm{K}}^{v}}{{\bm{a}}^{v}}}}% \right\|_{2}^{2}+\mu\sum\limits_{v=1}^{V}{\left({\frac{{{{\bm{w}}_{v}}}}{2}% \left\|{{\mathop{\rm vec}\nolimits}\left({\bm{F}}\right)-{{\bm{K}}^{v}}{{\bm{a% }}^{v}}}\right\|_{2}^{2}+\frac{{{\lambda^{v}}}}{2}{{\bm{a}}^{{v^{T}}}}{{\bm{K}% }^{v}}{{\bm{a}}^{v}}}\right)}+\frac{1}{2}\beta\left\|{\bm{w}}\right\|_{2}^{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ roman_vec ( over^ start_ARG bold_italic_F end_ARG ) - ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT bold_italic_K start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT ( divide start_ARG bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ roman_vec ( bold_italic_F ) - bold_italic_K start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_λ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_italic_a start_POSTSUPERSCRIPT italic_v start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT bold_italic_K start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_β ∥ bold_italic_w ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (24)
+12σvec(𝑭^)T𝑳vec(𝑭^)12𝜎vecsuperscript^𝑭𝑇𝑳vec^𝑭\displaystyle+\frac{1}{2}\sigma{\mathop{\rm vec}\nolimits}{\left({\hat{\bm{F}}% }\right)^{T}}{\bm{L}}{\mathop{\rm vec}\nolimits}\left({\hat{\bm{F}}}\right)+ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_σ roman_vec ( over^ start_ARG bold_italic_F end_ARG ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_L roman_vec ( over^ start_ARG bold_italic_F end_ARG )
s.t.formulae-sequence𝑠𝑡\displaystyle s.t.italic_s . italic_t . v=1V𝒘v=1,𝒘v0,v=1,,V,formulae-sequencesuperscriptsubscript𝑣1𝑉subscript𝒘𝑣1formulae-sequencesubscript𝒘𝑣0𝑣1𝑉\displaystyle\sum\limits_{v=1}^{V}{{{\bm{w}}_{v}}}=1,{{\bm{w}}_{v}}\geq 0,v=1,% \ldots,V,∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 1 , bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ≥ 0 , italic_v = 1 , … , italic_V ,
𝑳=𝑰NM(𝑯S0.5𝑲S𝑯S0.5)(𝑯D0.5𝑲D𝑯D0.5),𝑳subscript𝑰𝑁𝑀tensor-productsuperscriptsubscript𝑯𝑆0.5superscriptsubscript𝑲𝑆superscriptsubscript𝑯𝑆0.5superscriptsubscript𝑯𝐷0.5superscriptsubscript𝑲𝐷superscriptsubscript𝑯𝐷0.5\displaystyle{\bm{L}}={\bm{I}}_{NM}-\left({{\bm{H}}_{S}^{-0.5}{\bm{K}}_{S}^{*}% {\bm{H}}_{S}^{-0.5}}\right)\otimes\left({{\bm{H}}_{D}^{-0.5}{\bm{K}}_{D}^{*}{% \bm{H}}_{D}^{-0.5}}\right),bold_italic_L = bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT - ( bold_italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT bold_italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT ) ⊗ ( bold_italic_H start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT bold_italic_H start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT ) ,
𝑲S=i=1Q[𝜽S]iε𝑲Si,𝑲D=i=1P[𝜽D]iε𝑲Di,formulae-sequencesuperscriptsubscript𝑲𝑆superscriptsubscript𝑖1𝑄subscriptsuperscriptdelimited-[]subscript𝜽𝑆𝜀𝑖superscriptsubscript𝑲𝑆𝑖superscriptsubscript𝑲𝐷superscriptsubscript𝑖1𝑃subscriptsuperscriptdelimited-[]subscript𝜽𝐷𝜀𝑖superscriptsubscript𝑲𝐷𝑖\displaystyle{\bm{K}}_{S}^{*}=\sum\limits_{i=1}^{Q}{{{\left[{\bm{\theta}_{S}}% \right]}^{\varepsilon}_{i}}{\bm{K}}_{S}^{i}},{\bm{K}}_{D}^{*}=\sum\limits_{i=1% }^{P}{{{\left[{\bm{\theta}_{D}}\right]}^{\varepsilon}_{i}}{\bm{K}}_{D}^{i}},bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT [ bold_italic_θ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT [ bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ,
i=1Q[𝜽S]i=1,[𝜽S]i0,i=1,,Q,i=1P[𝜽D]i=1,[𝜽D]i0,i=1,,P.formulae-sequencesuperscriptsubscript𝑖1𝑄subscriptdelimited-[]subscript𝜽𝑆𝑖1formulae-sequencesubscriptdelimited-[]subscript𝜽𝑆𝑖0formulae-sequence𝑖1𝑄formulae-sequencesuperscriptsubscript𝑖1𝑃subscriptdelimited-[]subscript𝜽𝐷𝑖1formulae-sequencesubscriptdelimited-[]subscript𝜽𝐷𝑖0𝑖1𝑃\displaystyle\sum\limits_{i=1}^{Q}{\left[{\bm{\theta}_{S}}\right]_{i}}=1,{% \left[{\bm{\theta}_{S}}\right]_{i}}\geq 0,i=1,\ldots,Q,\sum\limits_{i=1}^{P}{% \left[{\bm{\theta}_{D}}\right]_{i}}=1,{\left[{\bm{\theta}_{D}}\right]_{i}}\geq 0% ,i=1,\ldots,P.∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT [ bold_italic_θ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , [ bold_italic_θ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0 , italic_i = 1 , … , italic_Q , ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT [ bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , [ bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0 , italic_i = 1 , … , italic_P .

where 𝑳𝑳{\bm{L}}bold_italic_L is a normalized laplacian matrix, 𝑯Ssubscript𝑯𝑆{\bm{H}}_{S}bold_italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT and 𝑯Dsubscript𝑯𝐷{\bm{H}}_{D}bold_italic_H start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT are diagonal matrix with the j𝑗jitalic_jth diagonal elements as k[𝑲S]j,ksubscript𝑘subscriptdelimited-[]superscriptsubscript𝑲𝑆𝑗𝑘\sum\nolimits_{k}{{{\left[{{\bm{K}}_{S}^{*}}\right]}_{j,k}}}∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT and k[𝑲D]j,ksubscript𝑘subscriptdelimited-[]superscriptsubscript𝑲𝐷𝑗𝑘\sum\nolimits_{k}{{{\left[{{\bm{K}}_{D}^{*}}\right]}_{j,k}}}∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT, respectively. And, ε>1𝜀1\varepsilon>1italic_ε > 1, guaranteeing each graph has a particular contribution to the Laplacian matrix.

Due to the lack of space, we present optimization algorithm of the Equation 24 in Appendix Section A.1.

5 Experiments

In this section, the performance of MKronRLSF-LP is shown, and we make comparisons with baseline methods and other drug-side effect predictors.

5.1 Dataset

Table 1: Summary of the real drug-side effect datasets.
Name Drug Side effect Associations Sparsity Reference
Liu 832 1385 59205 94.86% (Cheng et al., 2013)
Pau 888 1385 61102 95.03% (Pauwels et al., 2011)
Miz 658 1339 49051 94.43% (Sayaka et al., 2012)
Luo 708 4192 80164 97.30% (Luo et al., 2017)

Four real drug-side effect datasets are used to assess the effectiveness of our proposed method. Pau dataset is derived from the SIDER database (Kuhn et al., 2010) which contains information about drugs and their recorded side effects. Miz dataset includes information about drug-protein interactions and drug-side effect interactions, obtained from the DrugBank (Wishart et al., 2006) and SIDER database, respectively. There were 658 drugs with both targeted protein and side effect information. Additionally, Liu et al. mapped drugs in SIDER to DrugBank 3.0 (Knox et al., 2010), resulting in a final dataset of 832 drugs and 1385 side effects. Luo dataset has a large number of side effects and was extracted from the SIDER 2.0. Table 1 summarizes information about the datasets. We can see that these four datasets are sparse. In other words, there are fewer positive samples than negative samples. Thus, drug-side effect prediction can be viewed as a classification problem with extremely imbalanced data.

5.2 Parament setting

In this paper, the objective function 24 contains the following regularization parameters: μ𝜇\muitalic_μ, β𝛽\betaitalic_β, σ𝜎\sigmaitalic_σ, ε𝜀\varepsilonitalic_ε and λv,v=1,,Vformulae-sequencesuperscript𝜆𝑣𝑣1𝑉{{\lambda^{v}},v=1,\ldots,V}italic_λ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT , italic_v = 1 , … , italic_V. To find the right combinations of the regularization parameters of MKronRLSF-LP to give the best performance, the grid search method is performed on the Pau dataset. The optimal parameters with the best AUPR are selected.

We first select λv,v=1,,Vformulae-sequencesuperscript𝜆𝑣𝑣1𝑉{{\lambda^{v}},v=1,\ldots,V}italic_λ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT , italic_v = 1 , … , italic_V by the relative pairwise kernel with a single view Kron-RLS model. For each parameter λvsuperscript𝜆𝑣\lambda^{v}italic_λ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT, we select it in the range from 25superscript252^{-5}2 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT to 25superscript252^{5}2 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT with step 21superscript212^{1}2 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT. The optimal parameters λvsuperscript𝜆𝑣\lambda^{v}italic_λ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT are shown in Table 4. According to a previous study(Shi et al., 2019), the performance is not affected by parameter ε𝜀\varepsilonitalic_ε, so it is set to 2. Then, we fix λv,v=1,,Vformulae-sequencesuperscript𝜆𝑣𝑣1𝑉{{\lambda^{v}},v=1,\ldots,V}italic_λ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT , italic_v = 1 , … , italic_V at the best values and tune μ𝜇\muitalic_μ, β𝛽\betaitalic_β, σ𝜎\sigmaitalic_σ from within the range 210superscript2102^{-10}2 start_POSTSUPERSCRIPT - 10 end_POSTSUPERSCRIPT to 20superscript202^{0}2 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT with step 21superscript212^{1}2 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT. The optimal regularization parameters are μ=27𝜇superscript27\mu=2^{-7}italic_μ = 2 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT, β=20𝛽superscript20\beta=2^{0}italic_β = 2 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and σ=28𝜎superscript28\sigma=2^{-8}italic_σ = 2 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT.

5.3 Baseline methods

In this work, we compare MKronRLSF-LP with the following baseline methods: BSV, Comm Kron-RLS(Perrone & Cooper, 1995), Kron-RLS+CKA-MKL(Ding et al., 2019), Kron-RLS+pairwiseMKL(Cichonska et al., 2018), Kron-RLS+self-MKL(Nascimento et al., 2016), MvGRLP(Ding et al., 2021) and MvGCN(Fu et al., 2022). Due to the lack of space, we present details of these baseline methods in Appendix Section A.3. For a fair comparison, the same input as our method is fed into these baseline methods. To achieve the best performance, we also adopt 5-fold CV on the Pau dataset to tune the parameters.

5.4 Threshold finding

Because the MKronRLSF-LP and baseline methods only output the value of regression, we apply a threshold finding operation. For a certain validation set in the five-fold cross-validation (5-fold CV) procedure, we collect the labels and their corresponding predicted scores. Then, we obtain the optimal threshold by maximizing the Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT on the predicted scores and labels from this validation sets. A trend of Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT, Recall𝑅𝑒𝑐𝑎𝑙𝑙Recallitalic_R italic_e italic_c italic_a italic_l italic_l and Precision𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛Precisionitalic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n with different thresholds over four datasets is shown in Fig. 6. While the threshold of prediction rises, the values of Recall𝑅𝑒𝑐𝑎𝑙𝑙Recallitalic_R italic_e italic_c italic_a italic_l italic_l is rising. Oppositely, Precision𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛Precisionitalic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n is falling. The Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT is the harmonic mean of the Recall𝑅𝑒𝑐𝑎𝑙𝑙Recallitalic_R italic_e italic_c italic_a italic_l italic_l and Precision𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛Precisionitalic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n. It thus symmetrically represents both Recall𝑅𝑒𝑐𝑎𝑙𝑙Recallitalic_R italic_e italic_c italic_a italic_l italic_l and Precision𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛Precisionitalic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n in one metric. Here, we find the optimal threshold under maximizing the value of Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT. Table 5 summarizes the thresholds of different baseline methods on different datasets.

5.5 Comparison with baseline methods

We conduct the 5-fold CV to evaluate the performance of our method versus the baseline method. To further provide a fair and comprehensive comparison, each algorithm is iterated 10 times with different cross index, and then the mean values and standard deviations are reported in Table 3. The best single view is KGIP,DKNTK,Stensor-productsubscript𝐾𝐺𝐼𝑃𝐷subscript𝐾𝑁𝑇𝐾𝑆K_{GIP,D}\otimes K_{NTK,S}italic_K start_POSTSUBSCRIPT italic_G italic_I italic_P , italic_D end_POSTSUBSCRIPT ⊗ italic_K start_POSTSUBSCRIPT italic_N italic_T italic_K , italic_S end_POSTSUBSCRIPT, which is selected by 5-fold CV on Pau dataset.

First, we observe that the proposed method has the best AUPR and Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT on all datasets. Especially, the proposed method has a higher AUPR and Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT than BSV on datasets. This indicates the improvement in using multiple views. The simple coupling frameworks BSV and Comm perform well on the Pau dataset. However, BSV and Comm cannot perform as well on other datasets, which indicates that the simple fusion schemes are sensitive to the dataset and not robust. Furthermore, Kron-RLS+pairwiseMKL achieves the highest AUC of 95.01%, 95.02% and 94.70% on the Liu, Pau and Miz datasets, respectively. This shows slight improvements of 0.23%, 0.21% and 0.23% over our method, respectively. As we discussed in Section 5.1, drug-side effect prediction is an extremely imbalanced classification problem. The AUC can be considered as the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance. Therefore, the AUC is not an important metric for predicting drug side effects.

Another interesting observation is that MKronRLSF-LP outperforms other MKL strategy methods in comparison. For example, it exceeds the best MKL method (CKA-MKL) by 2.1%, 2.32%, 1.43%, 2.51% in terms of AUPR on Liu, Pau, Miz and Luo dataset, respectively. These results verify the effectiveness of the consensus partition and multiple graph Laplacian constraint.

For a more thorough analysis and reliable conclusions, we use post-hoc test statistics to statistically assess the different metrics shown in Table 3. Fig. 3 shows the results of these tests visualized as Critical Difference diagrams. These results show that MKronRLSF-LP is significantly better ranked than all methods in terms of AUPR, Recall𝑅𝑒𝑐𝑎𝑙𝑙Recallitalic_R italic_e italic_c italic_a italic_l italic_l and Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT. In addition, MKronRLSF-LP is only inferior than Kron-RLS+pairwiseMKL and Kron-RLS+CKA-MKL in terms of AUC and Precision𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛Precisionitalic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n, respectively. Besides, MvGCN is worse ranked than our method. Another point worth mentioning is that there is no sufficient statistical evidence to support that MvGCN performs better than model-based methods. MvGCN uses shallow GCN to avoid over-smoothing. The shallow GCN (Miao et al., 2021) can only capture local neighbourhood information of nodes, but the global features of the network have not been fully explored. A result of this is inaccurate embedding vectors.

In summary, the above experimental results demonstrate the superior prediction performance of MKronRLSF-LP to other baseline methods. We attribute the superiority of MKronRLSF-LP as three aspects: (1) The consensus partition is derived through joint fusion of weighted multiple partitions; (2) MKronRLSF-LP utilizes the multiple graph Laplacian regularization to constrain the consensus predicted value 𝑭^^𝑭\hat{\bm{F}}over^ start_ARG bold_italic_F end_ARG, which makes the consensus partition is robust; (3) Unlike existing MKL methods, the proposed MKronRLSF-LP fuses multiple pairwise kernels at the partition level. It is these three factors that contribute to the improvement in prediction performance.

Refer to caption
Figure 3: Critical difference diagram of average score ranks. A crossbar is over each group of methods that do not show a statistically significant difference among themselves.

5.6 Ablation study

To validate the benefits of jointly applying the consensus partition and multiple graph Laplacian constraint, we conduct an ablation study by excluding a particular component. First, we construct a Kron-RLS based on each pairwise kernel separately. Each partition learns independently, so it can be regarded as an ensemble Kron-RLS, and its objective function is Equation 22. The results should be consistent for each view, and heterogeneous views have varying degrees of importance in the final prediction. Therefore, we set a consensus partition 𝑭^^𝑭\hat{\bm{F}}over^ start_ARG bold_italic_F end_ARG, which is a weighted linear combination of base partitions (as shown in Equation 23). To further improve the performance and robustness of the model, we apply multiple graph Laplacian constraints to the consensus partition. Finally, the objective function 24 of MKronRLSF-LP is obtained. The results of the ablation study are shown in Fig. 4. It can be observed that the consensus partition and the multiple graph Laplacian constraint is helpful for MKronRLSF-LP to achieve the best results.

Refer to caption
Figure 4: Ablation study of the consensus partition and multiple graph Laplacian constraint on four datasets.

5.7 Comparisons of computational speed

In order to demonstrate the effectiveness of MKronRLSF-LP, we are now comparing it to different baseline methods in terms of computational speed. Except MvGCN, other methods are performed on a PC equipped with an Intel Core i7-13700 and 16GB RAM. Because MvGCN is a deep learning-based method, it is performed on a workstation equipped with a NVIDIA GeForce RTX 3090 GPU. For all baseline methods, we tested 10 times to report the mean running time. The results are shown in Table 2. The results do not include the kernel calculation time.

As expected, learning from multiple views takes more time than learning from only one view (BSV). Also, since MKronRLSF-LP fuses multiple views at the partition level, it requires more running time than Kron-RLS+CKA-MKL and Kron-RLS+self-MKL. Another observation is that MKronRLSF-LP is much faster than Kron-RLS+pairwiseMKL. This can be explained by looking at the time complexity of MKronRLSF-LP and Kron-RLS+pairwiseMKL. The inverse of pairwise kernels dominates the time complexity of both methods. In our optimization algorithm, we use eigendecomposition techniques to compute the approximate inverse. The time complexity of our method is O((P+Iter)N3+(Q+Iter)M3)𝑂𝑃subscript𝐼𝑡𝑒𝑟superscript𝑁3𝑄subscript𝐼𝑡𝑒𝑟superscript𝑀3O((P+I_{ter})N^{3}+(Q+I_{ter})M^{3})italic_O ( ( italic_P + italic_I start_POSTSUBSCRIPT italic_t italic_e italic_r end_POSTSUBSCRIPT ) italic_N start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + ( italic_Q + italic_I start_POSTSUBSCRIPT italic_t italic_e italic_r end_POSTSUBSCRIPT ) italic_M start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ). Differently, Kron-RLS+pairwiseMKL solves the system with the conjugate gradient approach that iteratively improves the result by performing matrix-vector products. Hence, Kron-RLS+pairwiseMKL is carried out in O(IterPQ(N2M+M2N))𝑂subscript𝐼𝑡𝑒𝑟𝑃𝑄superscript𝑁2𝑀superscript𝑀2𝑁O(I_{ter}PQ(N^{2}M+M^{2}N))italic_O ( italic_I start_POSTSUBSCRIPT italic_t italic_e italic_r end_POSTSUBSCRIPT italic_P italic_Q ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_M + italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_N ) ). When MvGCN deal with Luo dataset, its running time exceeds 2 hours. This is because MvGCN utilizes a self-supervised learning strategy based on deep graph infomax (DGI) to initialize node embeddings. Whenever there are many nodes in a bipartite network, DGI takes a very long time to implement.

Table 2: Mean running time (in seconds) of baseline methods on four datasets.
Methods Pau Liu Miz Luo
BSV 0.79 0.83 0.68 5.84
Comm Kron-RLS 19.38 20.95 18.39 148.60
Kron-RLS+CKA-MKL 2.69 2.18 2.36 13.13
Kron-RLS+pairwiseMKL 1583.67 1483.26 1364.21 -
Kron-RLS+self-MKL 12.21 13.05 12.09 155.85
MvGRLP 8.94 8.37 7.23 58.53
MvGCN 305.44 329.43 343.50 -
MKronRLSF-LP 50.55 43.9 35.83 280
  • - represents that the method took more than 2 hours to run.

5.8 Comparison with other drug-side effect predictors

A comparison of the proposed drug-side effect prediction method with state-of-the-art methods is also provided. Tables 6,7, 8 and 9 present the results of 5-fold CV in terms of AUPR, AUC, Recall𝑅𝑒𝑐𝑎𝑙𝑙Recallitalic_R italic_e italic_c italic_a italic_l italic_l, Precision𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛Precisionitalic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n and Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT on the four datasets, respectively. We have highlighted the best results in bold and underlined the second-best results.

Obviously, MKronRLSF-LP achieves the highest AUPR and Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT on all datasets. In the problem of drug-side effects prediction, AUPR and Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT more desirable metrics (Ezzat et al., 2017; Li et al., 2021). Therefore, we conclude that our method outperforms the other assessed methods. GCRS (Xuan et al., 2022) and SDPred (Zhao et al., 2022) are deep learning-based methods. GCRS constructs multiple heterogeneous graphs and multi-layer convolutional neural networks with attribute-level attention to predict drug-side effect pair nodes. SDPred fuses multiple side information (including drug chemical structures, drug target, drug word, side effect semantic similarity, side effect word) by feature concatenation and adopts CNN and MLP for prediction tasks. However, on Luo dataset, GCRS and SDPred perform poorly; this is probably because they are pairwise learning methods and randomly negative sampling to construct the training set. The randomly negative sampling method cannot be guaranteed due to the reliability and quality of negative sample pairs, which results in a certain loss of information(Zhang et al., 2015; Ali & Aittokallio, 2019). The ensemble model (Zhang et al., 2016) combine Liu’s method (Liu et al., 2012), Cheng’s method (Cheng et al., 2013), INBM and RBM by the average scoring rule. It is obvious that the results of the ensemble model are significantly improved than the results of the sub-model on four datasets.

6 Conclusion

This paper presents MKronRLSF-LP for drug-side effect prediction. The MKronRLSF-LP method solves the general problem of multi-view fusion-based link prediction by utilizing the consensus partition and multiple graph Laplacian constraint. MKronRLSF-LP allows for some degree of freedom to model the views differently and combination weights for each view to find the consensus partition. Each view’s weight is dynamically learned and plays a crucial role in exploring consensus information. It is found that the use of Laplacian regularization enhances semi-supervised learning performance, so a term of multiple graph Laplacian regularization is added to the objective function. Finally, we present an efficient alternating optimization algorithm. The results of our experiments indicate that our proposed methods are superior in terms of their classification results to other baseline algorithms and current drug-side effect predictors.

Acknowledgments

This work is supported in part by the National Natural Science Foundation of China (NSFC 62172076, 62250028 and U22A2038), the Zhejiang Provincial Natural Science Foundation of China (Grant No. LY23F020003), and the Municipal Government of Quzhou (Grant No. 2023D036).

References

  • Ali & Aittokallio (2019) Mehreen Ali and Tero Aittokallio. Machine learning and feature selection for drug response prediction in precision oncology applications. Biophysical reviews, 11(1):31–39, 2019.
  • Bruno & Marchand-Maillet (2009) Eric Bruno and Stéphane Marchand-Maillet. Multiview clustering: a late fusion approach using latent models. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp.  736–737, 2009.
  • Byrd et al. (1999) Richard H Byrd, Mary E Hribar, and Jorge Nocedal. An interior point algorithm for large-scale nonlinear programming. SIAM Journal on Optimization, 9(4):877–900, 1999.
  • Chao & Sun (2019) Guoqing Chao and Shiliang Sun. Semi-supervised multi-view maximum entropy discrimination with expectation laplacian regularization. Information Fusion, 45:296–306, 2019.
  • Cheng et al. (2013) F. Cheng, W. Li, X. Wang, Y. Zhou, Z. Wu, J. Shen, and Y. Tang. Adverse drug events: database construction and in silico prediction. Journal of Chemical Information & Modeling, 53(4):744–752, 2013.
  • Cichonska et al. (2018) Anna Cichonska, Tapio Pahikkala, Sandor Szedmak, Heli Julkunen, Antti Airola, Markus Heinonen, Tero Aittokallio, and Juho Rousu. Learning with multiple pairwise kernels for drug bioactivity prediction. Bioinformatics, 34(13):i509–i518, 2018.
  • Da Silva & Krishnamurthy (2016) Brianna A Da Silva and Mahesh Krishnamurthy. The alarming reality of medication error: a patient case and review of pennsylvania and national data. Journal of community hospital internal medicine perspectives, 6(4):31758, 2016.
  • Ding et al. (2016) Yijie Ding, Jijun Tang, and Fei Guo. Predicting protein-protein interactions via multivariate mutual information of protein sequences. BMC bioinformatics, 17(1):1–13, 2016.
  • Ding et al. (2018) Yijie Ding, Jijun Tang, and Fei Guo. Identification of drug-side effect association via semisupervised model and multiple kernel learning. IEEE journal of biomedical and health informatics, 23(6):2619–2632, 2018.
  • Ding et al. (2019) Yijie Ding, Jijun Tang, and Fei Guo. Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing, 325:211–224, 2019.
  • Ding et al. (2021) Yijie Ding, Jijun Tang, and Fei Guo. Identification of drug-target interactions via multi-view graph regularized link propagation model. Neurocomputing, 461:618–631, 2021.
  • Ezzat et al. (2017) Ali Ezzat, Peilin Zhao, Min Wu, Xiao-Li Li, and Chee-Keong Kwoh. Drug-target interaction prediction with graph regularized matrix factorization. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14(3):646–656, 2017. doi: 10.1109/TCBB.2016.2530062.
  • Fan et al. (2021) Haoyi Fan, Fengbin Zhang, Yuxuan Wei, Zuoyong Li, Changqing Zou, Yue Gao, and Qionghai Dai. Heterogeneous hypergraph variational autoencoder for link prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8):4125–4138, 2021.
  • Fu et al. (2022) Haitao Fu, Feng Huang, Xuan Liu, Yang Qiu, and Wen Zhang. Mvgcn: data integration through multi-view graph convolutional network for predicting links in biomedical bipartite networks. Bioinformatics, 38(2):426–434, 2022.
  • Galeano et al. (2020) Diego Galeano, Shantao Li, Mark Gerstein, and Alberto Paccanaro. Predicting the frequencies of drug side effects. Nature communications, 11(1):1–14, 2020.
  • Gönen & Alpaydın (2011) Mehmet Gönen and Ethem Alpaydın. Multiple kernel learning algorithms. The Journal of Machine Learning Research, 12:2211–2268, 2011.
  • Guo et al. (2021) Xiaoyi Guo, Wei Zhou, Bin Shi, Xiaohua Wang, Aiyan Du, Yijie Ding, Jijun Tang, and Fei Guo. An efficient multiple kernel support vector regression model for assessing dry weight of hemodialysis patients. Current Bioinformatics, 16(2):284–293, 2021.
  • Houthuys & Suykens (2021) Lynn Houthuys and Johan AK Suykens. Tensor-based restricted kernel machines for multi-view classification. Information Fusion, 68:54–66, 2021.
  • Houthuys et al. (2018) Lynn Houthuys, Rocco Langone, and Johan AK Suykens. Multi-view kernel spectral clustering. Information Fusion, 44:46–56, 2018.
  • Hue & Vert (2010) Martial Hue and Jean-Philippe Vert. On learning with kernels for unordered pairs. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp.  463–470, 2010.
  • Jiang et al. (2023) Bingbing Jiang, Chenglong Zhang, Yan Zhong, Yi Liu, Yingwei Zhang, Xingyu Wu, and Weiguo Sheng. Adaptive collaborative fusion for multi-view semi-supervised classification. Information Fusion, 96:37–50, 2023.
  • Jiang et al. (2019) Shuhui Jiang, Zhengming Ding, and Yun Fu. Heterogeneous recommendation via deep low-rank sparse collective factorization. IEEE transactions on pattern analysis and machine intelligence, 42(5):1097–1111, 2019.
  • Kailath (1971) Thomas Kailath. Rkhs approach to detection and estimation problems–i: Deterministic signals in gaussian noise. IEEE Transactions on Information Theory, 17(5):530–549, 1971.
  • Knox et al. (2010) Craig Knox, Vivian Law, Timothy Jewison, Philip Liu, Son Ly, Alex Frolkis, Allison Pon, Kelly Banco, Christine Mak, Vanessa Neveu, et al. Drugbank 3.0: a comprehensive resource for ‘omics’ research on drugs. Nucleic acids research, 39(suppl_1):D1035–D1041, 2010.
  • Kuhn et al. (2010) Michael Kuhn, Monica Campillos, Ivica Letunic, Lars Juhl Jensen, and Peer Bork. A side effect resource to capture phenotypic effects of drugs. Molecular systems biology, 6(1):343, 2010.
  • Laub (2004) Alan J Laub. Matrix analysis for scientists and engineers. SIAM, 2004.
  • Li et al. (2021) Tianjiao Li, Xing-Ming Zhao, and Limin Li. Co-vae: Drug-target binding affinity prediction by co-regularized variational autoencoders. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12):8861–8873, 2021.
  • Liu et al. (2023) Jiyuan Liu, Xinwang Liu, Yuexiang Yang, Qing Liao, and Yuanqing Xia. Contrastive multi-view kernel learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  • Liu et al. (2012) Mei Liu, Yonghui Wu, Yukun Chen, **gchun Sun, Zhongming Zhao, Xue-wen Chen, Michael Edwin Matheny, and Hua Xu. Large-scale prediction of adverse drug reactions using chemical, biological, and phenotypic properties of drugs. Journal of the American Medical Informatics Association, 19(e1):e28–e35, 2012.
  • Luo et al. (2017) Yunan Luo, Xinbin Zhao, **gtian Zhou, **glin Yang, Yanqing Zhang, Wenhua Kuang, Jian Peng, Ligong Chen, and Jianyang Zeng. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nature communications, 8(1):1–13, 2017.
  • Lv et al. (2021) Juncheng Lv, Zhao Kang, Boyu Wang, Lu** Ji, and Zenglin Xu. Multi-view subspace clustering via partition fusion. Information Sciences, 560:410–423, 2021.
  • Miao et al. (2021) Xupeng Miao, Wentao Zhang, Yingxia Shao, Bin Cui, Lei Chen, Ce Zhang, and Jiawei Jiang. Lasagne: A multi-layer graph convolutional network framework via node-aware deep architecture. IEEE Transactions on Knowledge and Data Engineering, 35(2):1721–1733, 2021.
  • Nascimento et al. (2016) André CA Nascimento, Ricardo BC Prudêncio, and Ivan G Costa. A multiple kernel learning algorithm for drug-target interaction prediction. BMC bioinformatics, 17:1–16, 2016.
  • Nocedal & Wright (2006) Jorge Nocedal and Stephen J Wright. Quadratic programming. Numerical optimization, pp.  448–492, 2006.
  • Pang & Cheung (2017) Jiahao Pang and Gene Cheung. Graph laplacian regularization for image denoising: Analysis in the continuous domain. IEEE Transactions on Image Processing, 26(4):1770–1785, 2017.
  • Pauwels et al. (2011) E. Pauwels, V. Stoven, and Y. Yamanishi. Predicting drug side-effect profiles: a chemical fragment-based approach. Bmc Bioinformatics, 12(1):169, 2011.
  • Pekalska & Haasdonk (2008) El.zbieta Pekalska and Bernard Haasdonk. Kernel discriminant analysis for positive definite and indefinite kernels. IEEE transactions on pattern analysis and machine intelligence, 31(6):1017–1032, 2008.
  • Perrone & Cooper (1995) Michael P Perrone and Leon N Cooper. When networks disagree: Ensemble methods for hybrid neural networks. In How We Learn; How We Remember: Toward An Understanding Of Brain And Neural Systems: Selected Papers of Leon N Cooper, pp.  342–358. World Scientific, 1995.
  • Qian et al. (2022a) Yuqing Qian, Yijie Ding, Quan Zou, and Fei Guo. Identification of drug-side effect association via restricted boltzmann machines with penalized term. Briefings in Bioinformatics, 23(6):bbac458, 2022a.
  • Qian et al. (2022b) Yuqing Qian, Yijie Ding, Quan Zou, and Fei Guo. Multi-view kernel sparse representation for identification of membrane protein types. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 20(2):1234–1245, 2022b.
  • Raymond & Kashima (2010) Rudy Raymond and Hisashi Kashima. Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs. In Joint european conference on machine learning and knowledge discovery in databases, pp.  131–147. Springer, 2010.
  • Sayaka et al. (2012) M. Sayaka, P. Edouard, S Véronique, G. Susumu, and Y. Yoshihiro. Relating drug–protein interaction network with drug side effects. Bioinformatics, 2012.
  • Shabani-Mashcool et al. (2020) S. Shabani-Mashcool, S. A. Marashi, and S. Gharaghani. Nddsa: A network- and domain-based method for predicting drug-side effect associations. Information Processing & Management, 57(6):102357, 2020.
  • Shi et al. (2019) Caijuan Shi, Changyu Duan, Zhibin Gu, Qi Tian, Gaoyun An, and Ruizhen Zhao. Semi-supervised feature selection analysis with structured multi-view sparse regularization. Neurocomputing, 330:412–424, 2019.
  • Wang et al. (2023a) Dexian Wang, Tianrui Li, Wei Huang, Zhipeng Luo, ** Deng, Pengfei Zhang, and Minbo Ma. A multi-view clustering algorithm based on deep semi-nmf. Information Fusion, pp.  101884, 2023a.
  • Wang et al. (2019) Siwei Wang, ** Yin. Multi-view clustering via late fusion alignment maximization. In IJCAI, pp.  3778–3784, 2019.
  • Wang et al. (2021) Tinghua Wang, Lin Zhang, and Wenyu Hu. Bridging deep and multiple kernel learning: A review. Information Fusion, 67:3–13, 2021.
  • Wang et al. (2023b) Yizheng Wang, Yixiao Zhai, Yijie Ding, and Quan Zou. Sbsm-pro: Support bio-sequence machine for proteins. arXiv preprint arXiv:2308.10275, 2023b. doi: 10.48550/arXiv.2308.10275.
  • Wishart et al. (2006) David S Wishart, Craig Knox, An Chi Guo, Savita Shrivastava, Murtaza Hassanali, Paul Stothard, Zhan Chang, and Jennifer Woolsey. Drugbank: a comprehensive resource for in silico drug discovery and exploration. Nucleic acids research, 34(suppl_1):D668–D672, 2006.
  • Xie & Sun (2020) Xijiong Xie and Shiliang Sun. General multi-view semi-supervised least squares support vector machines with multi-manifold regularization. Information Fusion, 62:63–72, 2020.
  • Xu et al. (2021) Lixiang Xu, Lu Bai, ** Xiao, Qi Liu, Enhong Chen, Xiaofeng Wang, and Yuanyan Tang. Multiple graph kernel learning based on gmdh-type neural network. Information Fusion, 66:100–110, 2021.
  • Xu et al. (2022) Xianyu Xu, Ling Yue, Bingchun Li, Ying Liu, Yuan Wang, Wenjuan Zhang, and Lin Wang. Dsgat: predicting frequencies of drug side effects by graph attention networks. Briefings in Bioinformatics, 23(2):bbab586, 2022.
  • Xuan et al. (2022) ** Xuan, Meng Wang, Yong Liu, Dong Wang, Tiangang Zhang, and Toshiya Nakaguchi. Integrating specific and common topologies of heterogeneous graphs and pairwise attributes for drug-related side effect prediction. Briefings in Bioinformatics, 23(3):bbac126, 2022.
  • Yang et al. (2016) Bo Yang, Hongbin Pei, Hechang Chen, Jiming Liu, and Shang Xia. Characterizing and discovering spatiotemporal social contact patterns for healthcare. IEEE transactions on pattern analysis and machine intelligence, 39(8):1532–1546, 2016.
  • Yuan et al. (2019) Weiwei Yuan, Kangya He, Donghai Guan, Li Zhou, and Chenliang Li. Graph kernel based link prediction for signed social networks. Information Fusion, 46:1–10, 2019.
  • Zha et al. (2009) Zheng-Jun Zha, Tao Mei, **gdong Wang, Zengfu Wang, and Xian-Sheng Hua. Graph-based semi-supervised learning with multiple labels. Journal of Visual Communication and Image Representation, 20(2):97–103, 2009.
  • Zhang et al. (2015) ** Zhang, Fei Wang, Jianying Hu, and Robert Sorrentino. Label propagation prediction of drug-drug interactions based on clinical side effects. Scientific reports, 5(1):12339, 2015.
  • Zhang et al. (2019) Si Zhang, Hanghang Tong, Jiejun Xu, and Ross Maciejewski. Graph convolutional networks: a comprehensive review. Computational Social Networks, 6(1):1–23, 2019.
  • Zhang et al. (2016) Wen Zhang, Hua Zou, Longqiang Luo, Qianchao Liu, Weijian Wu, and Wenyi Xiao. Predicting potential side effects of drugs by recommender methods and ensemble learning. Neurocomputing, 173:979–987, 2016.
  • Zhao et al. (2021) Haochen Zhao, Kai Zheng, Yaohang Li, and Jianxin Wang. A novel graph attention model for predicting frequencies of drug–side effects from multi-view data. Briefings in Bioinformatics, 22(6):bbab239, 2021.
  • Zhao et al. (2022) Haochen Zhao, Shaokai Wang, Kai Zheng, Qichang Zhao, Feng Zhu, and Jianxin Wang. A similarity-based deep learning approach for determining the frequencies of drug side effects. Briefings in Bioinformatics, 23(1):bbab449, 2022.

Appendix A Appendix

A.1 Optimization

It is difficult and time-consuming to solve the Equation 24 because it contains multiple variables and large pairwise matrices. In this section, we divide the original problem into five subproblems and develop an iterative algorithm to optimize them. And, we avoid explicit computation of any pairwise matrices in the whole optimization, which makes our method suitable for solving problems in large pairwise spaces.

𝑭^^𝑭\hat{\bm{F}}over^ start_ARG bold_italic_F end_ARG-subproblem: we fix 𝒂vsuperscript𝒂𝑣{{\bm{a}}^{v}}bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT, 𝒘𝒘{\bm{w}}bold_italic_w, 𝜽Dsubscript𝜽𝐷{\bm{\theta}_{D}}bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT and 𝜽Ssubscript𝜽𝑆{\bm{\theta}_{S}}bold_italic_θ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT to optimize variants 𝑭^^𝑭\hat{\bm{F}}over^ start_ARG bold_italic_F end_ARG. Let 𝑨=𝑯S0.5𝑲S𝑯S0.5𝑨superscriptsubscript𝑯𝑆0.5superscriptsubscript𝑲𝑆superscriptsubscript𝑯𝑆0.5{\bm{A}}={\bm{H}}_{S}^{-0.5}{\bm{K}}_{S}^{*}{\bm{H}}_{S}^{-0.5}bold_italic_A = bold_italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT bold_italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT, 𝑩=𝑯D0.5𝑲D𝑯D0.5𝑩superscriptsubscript𝑯𝐷0.5superscriptsubscript𝑲𝐷superscriptsubscript𝑯𝐷0.5{\bm{B}}={\bm{H}}_{D}^{-0.5}{\bm{K}}_{D}^{*}{\bm{H}}_{D}^{-0.5}bold_italic_B = bold_italic_H start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT bold_italic_H start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT and vec(𝑭^v)=𝑲v𝒂vvecsuperscript^𝑭𝑣superscript𝑲𝑣superscript𝒂𝑣\text{vec}\left(\hat{\bm{F}}^{v}\right)={{\bm{K}}^{v}}{{\bm{a}}^{v}}vec ( over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) = bold_italic_K start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT. Then, the optimization model of 𝑭^^𝑭\hat{\bm{F}}over^ start_ARG bold_italic_F end_ARG as follows:

argmin𝑭^subscript^𝑭\displaystyle\mathop{\arg\min}\limits_{\hat{\bm{F}}}start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT over^ start_ARG bold_italic_F end_ARG end_POSTSUBSCRIPT 12vec(𝑭^)v=1V𝒘vvec(𝑭^v)22+12σvec(𝑭^)T𝑳vec(𝑭^)12superscriptsubscriptnormvec^𝑭superscriptsubscript𝑣1𝑉subscript𝒘𝑣vecsuperscript^𝑭𝑣2212𝜎vecsuperscript^𝑭𝑇𝑳vec^𝑭\displaystyle\frac{1}{2}\left\|{{\mathop{\rm vec}\nolimits}\left({\hat{\bm{F}}% }\right)-\sum\limits_{v=1}^{V}{{{\bm{w}}_{v}}\text{vec}\left(\hat{\bm{F}}^{v}% \right)}}\right\|_{2}^{2}+\frac{1}{2}\sigma{\mathop{\rm vec}\nolimits}{\left({% \hat{\bm{F}}}\right)^{T}}{\bm{L}}{\mathop{\rm vec}\nolimits}\left({\hat{\bm{F}% }}\right)divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ roman_vec ( over^ start_ARG bold_italic_F end_ARG ) - ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT vec ( over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_σ roman_vec ( over^ start_ARG bold_italic_F end_ARG ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_L roman_vec ( over^ start_ARG bold_italic_F end_ARG ) (25)
s.t.formulae-sequence𝑠𝑡\displaystyle s.t.italic_s . italic_t . 𝑳=𝑰NM𝑨𝑩.𝑳subscript𝑰𝑁𝑀tensor-product𝑨𝑩\displaystyle{\bm{L}}={\bm{I}}_{NM}-{\bm{A}}\otimes{\bm{B}}.bold_italic_L = bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT - bold_italic_A ⊗ bold_italic_B .

Let the derivative of Equation 25 w.r.t 𝑭^^𝑭\hat{\bm{F}}over^ start_ARG bold_italic_F end_ARG to zero, the solution of 𝑭^^𝑭\hat{\bm{F}}over^ start_ARG bold_italic_F end_ARG can be obtained:

vec(𝑭^)=((1+σ)𝑰NMσ𝑨𝑩)1(v=1V𝒘vvec(𝑭^v)).vec^𝑭superscript1𝜎subscript𝑰𝑁𝑀tensor-product𝜎𝑨𝑩1superscriptsubscript𝑣1𝑉subscript𝒘𝑣vecsuperscript^𝑭𝑣\text{vec}\left({\hat{\bm{F}}}\right)={\left({\left(1+\sigma\right){\bm{I}}_{% NM}-\sigma{\bm{A}}\otimes{\bm{B}}}\right)^{-1}}\left({\sum\nolimits_{v=1}^{V}{% {{\bm{w}}_{v}}}\text{vec}\left(\hat{\bm{F}}^{v}\right)}\right).vec ( over^ start_ARG bold_italic_F end_ARG ) = ( ( 1 + italic_σ ) bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT - italic_σ bold_italic_A ⊗ bold_italic_B ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT vec ( over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) ) . (26)

Notice that the inverse matrix on the right-hand side of Equation 26 needs too much time and memory. Therefore, we use eigen decomposed techniques to compute the approximate inverse. Let 𝑽A𝚲A𝑽ATsubscript𝑽𝐴subscript𝚲𝐴superscriptsubscript𝑽𝐴𝑇{{\bm{V}}_{A}}{\bm{\Lambda}_{A}}{\bm{V}}_{A}^{T}bold_italic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT bold_Λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT bold_italic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and 𝑽B𝚲B𝑽BTsubscript𝑽𝐵subscript𝚲𝐵superscriptsubscript𝑽𝐵𝑇{{\bm{V}}_{B}}{\bm{\Lambda}_{B}}{\bm{V}}_{B}^{T}bold_italic_V start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT bold_Λ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT bold_italic_V start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT be the eigen decomposition of the matrices 𝑨𝑨{\bm{A}}bold_italic_A and 𝑩𝑩{\bm{B}}bold_italic_B, respectively. Define the matrix 𝑼𝑼{\bm{U}}bold_italic_U to be 𝑼i,j=[𝚲A]i,i×[𝚲B]j,jsubscript𝑼𝑖𝑗subscriptdelimited-[]subscript𝚲𝐴𝑖𝑖subscriptdelimited-[]subscript𝚲𝐵𝑗𝑗{{\bm{U}}_{i,j}}={\left[{{\bm{\Lambda}_{A}}}\right]_{i,i}}\times{\left[{{\bm{% \Lambda}_{B}}}\right]_{j,j}}bold_italic_U start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = [ bold_Λ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT × [ bold_Λ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_j , italic_j end_POSTSUBSCRIPT. By the theorem (Raymond & Kashima, 2010), the kronecker product matrix 𝑨𝑩tensor-product𝑨𝑩{\bm{A}}\otimes{\bm{B}}bold_italic_A ⊗ bold_italic_B can be eigendecomposed as (𝑽A𝑽B)diag(vec(𝑼))(𝑽A𝑽B)Ttensor-productsubscript𝑽𝐴subscript𝑽𝐵diagvec𝑼superscripttensor-productsubscript𝑽𝐴subscript𝑽𝐵𝑇\left({\bm{V}}_{A}\otimes{\bm{V}}_{B}\right)\text{diag}\left(\text{vec}\left({% \bm{U}}\right)\right)\left({\bm{V}}_{A}\otimes{\bm{V}}_{B}\right)^{T}( bold_italic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ⊗ bold_italic_V start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) diag ( vec ( bold_italic_U ) ) ( bold_italic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ⊗ bold_italic_V start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. Then substituting it in Equation 26, we can write the inverse matrix in Equation 26 as

((1+σ)𝑰NMσ𝑨𝑩)1=((1+σ)𝑰NMσ(𝑽A𝑽B)diag(vec(𝑼))(𝑽A𝑽B)T)1.superscript1𝜎subscript𝑰𝑁𝑀tensor-product𝜎𝑨𝑩1superscript1𝜎subscript𝑰𝑁𝑀𝜎tensor-productsubscript𝑽𝐴subscript𝑽𝐵diagvec𝑼superscripttensor-productsubscript𝑽𝐴subscript𝑽𝐵𝑇1{\left({\left(1+\sigma\right){\bm{I}}_{NM}-\sigma{\bm{A}}\otimes{\bm{B}}}% \right)^{-1}}={\left({\left(1+\sigma\right){\bm{I}}_{NM}-\sigma{\left({\bm{V}}% _{A}\otimes{\bm{V}}_{B}\right)\text{diag}\left(\text{vec}\left({\bm{U}}\right)% \right)\left({\bm{V}}_{A}\otimes{\bm{V}}_{B}\right)^{T}}}\right)^{-1}}.( ( 1 + italic_σ ) bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT - italic_σ bold_italic_A ⊗ bold_italic_B ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = ( ( 1 + italic_σ ) bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT - italic_σ ( bold_italic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ⊗ bold_italic_V start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) diag ( vec ( bold_italic_U ) ) ( bold_italic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ⊗ bold_italic_V start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT . (27)

Since, it holds that (𝑽A𝑽B)(𝑽A𝑽B)T=𝑰NMtensor-productsubscript𝑽𝐴subscript𝑽𝐵superscripttensor-productsubscript𝑽𝐴subscript𝑽𝐵𝑇subscript𝑰𝑁𝑀\left({\bm{V}}_{A}\otimes{\bm{V}}_{B}\right)\left({\bm{V}}_{A}\otimes{\bm{V}}_% {B}\right)^{T}={\bm{I}}_{NM}( bold_italic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ⊗ bold_italic_V start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) ( bold_italic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ⊗ bold_italic_V start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT. Equation 27 can be transformed into

((1+σ)𝑰NMσ𝑨𝑩)1=(𝑽A𝑽B)((1+σ)𝑰NMσdiag(vec(𝑼)))1(𝑽A𝑽B)T.superscript1𝜎subscript𝑰𝑁𝑀tensor-product𝜎𝑨𝑩1tensor-productsubscript𝑽𝐴subscript𝑽𝐵superscript1𝜎subscript𝑰𝑁𝑀𝜎diagvec𝑼1superscripttensor-productsubscript𝑽𝐴subscript𝑽𝐵𝑇{\left({\left(1+\sigma\right){\bm{I}}_{NM}-\sigma{\bm{A}}\otimes{\bm{B}}}% \right)^{-1}}={\left({\bm{V}}_{A}\otimes{\bm{V}}_{B}\right)}{\left({\left(1+% \sigma\right){\bm{I}}_{NM}-\sigma{\text{diag}\left(\text{vec}\left({\bm{U}}% \right)\right)}}\right)^{-1}{\left({\bm{V}}_{A}\otimes{\bm{V}}_{B}\right)^{T}}}.( ( 1 + italic_σ ) bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT - italic_σ bold_italic_A ⊗ bold_italic_B ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = ( bold_italic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ⊗ bold_italic_V start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) ( ( 1 + italic_σ ) bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT - italic_σ diag ( vec ( bold_italic_U ) ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ⊗ bold_italic_V start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT . (28)

Notice that the inverse matrix in Equation 28 is a diagonal matrix whose value can be calculated as the matrix 𝑾𝑾{\bm{W}}bold_italic_W

𝑾i,j=(1+σσ𝑼i,j)1subscript𝑾𝑖𝑗superscript1𝜎𝜎subscript𝑼𝑖𝑗1{\bm{W}}_{i,j}=\left({1+\sigma-\sigma{\bm{U}}_{i,j}}\right)^{-1}bold_italic_W start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = ( 1 + italic_σ - italic_σ bold_italic_U start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT (29)

So, we can further rewrite the Equation 26 as

vec(𝑭^)=(𝑽A𝑽B)diag(vec(𝑾))(𝑽A𝑽B)T(v=1V𝒘vvec(𝑭^v))vec^𝑭tensor-productsubscript𝑽𝐴subscript𝑽𝐵diagvec𝑾superscripttensor-productsubscript𝑽𝐴subscript𝑽𝐵𝑇superscriptsubscript𝑣1𝑉subscript𝒘𝑣vecsuperscript^𝑭𝑣\text{vec}\left({\hat{\bm{F}}}\right)={\left({\bm{V}}_{A}\otimes{\bm{V}}_{B}% \right)}\text{diag}\left(\text{vec}\left({\bm{W}}\right)\right){{\left({\bm{V}% }_{A}\otimes{\bm{V}}_{B}\right)^{T}}}\left({\sum\nolimits_{v=1}^{V}{{{\bm{w}}_% {v}}}\text{vec}\left(\hat{\bm{F}}^{v}\right)}\right)vec ( over^ start_ARG bold_italic_F end_ARG ) = ( bold_italic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ⊗ bold_italic_V start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) diag ( vec ( bold_italic_W ) ) ( bold_italic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ⊗ bold_italic_V start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT vec ( over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) ) (30)

Taking out the vec-tricks operation, we can obtain the solution

𝑭^=𝑽B(𝑾(𝑽BT(v=1V𝒘v𝑭^v)𝑽A))𝑽AT^𝑭subscript𝑽𝐵direct-product𝑾superscriptsubscript𝑽𝐵𝑇superscriptsubscript𝑣1𝑉subscript𝒘𝑣superscript^𝑭𝑣subscript𝑽𝐴superscriptsubscript𝑽𝐴𝑇\hat{\bm{F}}={\bm{V}}_{B}\left({\bm{W}}\odot\left({\bm{V}}_{B}^{T}\left({\sum% \nolimits_{v=1}^{V}{{{\bm{w}}_{v}}}\hat{\bm{F}}^{v}}\right){\bm{V}}_{A}\right)% \right){\bm{V}}_{A}^{T}over^ start_ARG bold_italic_F end_ARG = bold_italic_V start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( bold_italic_W ⊙ ( bold_italic_V start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) bold_italic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) ) bold_italic_V start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (31)

𝒘𝒘{\bm{w}}bold_italic_w-subproblem: we fix all the variants except 𝒘𝒘{\bm{w}}bold_italic_w. The formula is as follows:

argmin𝒘subscript𝒘\displaystyle\mathop{\arg\min}\limits_{\bm{w}}start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT 12𝑭^v=1V𝒘v𝑭^vF2+μv=1V(𝒘v2𝑭𝑭^vF2)+12β𝒘2212superscriptsubscriptnorm^𝑭superscriptsubscript𝑣1𝑉subscript𝒘𝑣superscript^𝑭𝑣𝐹2𝜇superscriptsubscript𝑣1𝑉subscript𝒘𝑣2superscriptsubscriptnorm𝑭superscript^𝑭𝑣𝐹212𝛽superscriptsubscriptnorm𝒘22\displaystyle\ \frac{1}{2}\left\|{{\mathop{\rm}\nolimits}{\hat{\bm{F}}}-\sum% \limits_{v=1}^{V}{{{\bm{w}}_{v}}\hat{\bm{F}}^{v}}}\right\|_{F}^{2}+\mu\sum% \limits_{v=1}^{V}{\left({\frac{{{{\bm{w}}_{v}}}}{2}\left\|{{\mathop{\rm}% \nolimits}{\bm{F}}-\hat{\bm{F}}^{v}}\right\|_{F}^{2}}\right)}+\frac{1}{2}\beta% \left\|{\bm{w}}\right\|_{2}^{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ over^ start_ARG bold_italic_F end_ARG - ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ ∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT ( divide start_ARG bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ bold_italic_F - over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_β ∥ bold_italic_w ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (32)
s.t.formulae-sequence𝑠𝑡\displaystyle s.t.italic_s . italic_t . v=1V𝒘v=1,𝒘v0,v=1,,V.formulae-sequencesuperscriptsubscript𝑣1𝑉subscript𝒘𝑣1formulae-sequencesubscript𝒘𝑣0𝑣1𝑉\displaystyle\sum\limits_{v=1}^{V}{{{\bm{w}}_{v}}}=1,{{\bm{w}}_{v}}\geq 0,v=1,% \ldots,V.∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 1 , bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ≥ 0 , italic_v = 1 , … , italic_V .

Problem 32 can be simplified as a standard quadratic programming problem (Nocedal & Wright, 2006)

argmin𝒘subscript𝒘\displaystyle\mathop{\arg\min}\limits_{\bm{w}}start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT 𝒘T𝑮𝒘𝒘T𝒉superscript𝒘𝑇𝑮𝒘superscript𝒘𝑇𝒉\displaystyle\ {{\bm{w}}^{T}}{\bm{G}}{\bm{w}}-{\bm{w}}^{T}{\bm{h}}bold_italic_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_G bold_italic_w - bold_italic_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_h (33)
s.t.formulae-sequence𝑠𝑡\displaystyle s.t.italic_s . italic_t . v=1V𝒘v=1,𝒘v0,v=1,,V.formulae-sequencesuperscriptsubscript𝑣1𝑉subscript𝒘𝑣1formulae-sequencesubscript𝒘𝑣0𝑣1𝑉\displaystyle\sum\limits_{v=1}^{V}{{{\bm{w}}_{v}}}=1,{{\bm{w}}_{v}}\geq 0,v=1,% \ldots,V.∑ start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 1 , bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ≥ 0 , italic_v = 1 , … , italic_V .

where 𝑮V×V𝑮superscript𝑉𝑉{\bm{G}}\in{{\mathbb{R}}^{V\times V}}bold_italic_G ∈ blackboard_R start_POSTSUPERSCRIPT italic_V × italic_V end_POSTSUPERSCRIPT with the element as

𝑮i,j={12trace((𝑭^i)T𝑭^j),ifij,12trace((𝑭^i)T𝑭^j)+12β,ifi=j.subscript𝑮𝑖𝑗cases12tracesuperscriptsuperscript^𝑭𝑖𝑇superscript^𝑭𝑗if𝑖𝑗12tracesuperscriptsuperscript^𝑭𝑖𝑇superscript^𝑭𝑗12𝛽if𝑖𝑗{{\bm{G}}_{i,j}}=\left\{\begin{array}[]{l}\frac{1}{2}\text{trace}\left(\left(% \hat{\bm{F}}^{i}\right)^{T}\hat{\bm{F}}^{j}\right),{\quad}{\rm{if}}\ i\neq j,% \\ \frac{1}{2}{\text{trace}\left(\left(\hat{\bm{F}}^{i}\right)^{T}\hat{\bm{F}}^{j% }\right)}+\frac{1}{2}\beta,{\quad}{\rm{if}}\ i=j.\end{array}\right.bold_italic_G start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = { start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG trace ( ( over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) , roman_if italic_i ≠ italic_j , end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG trace ( ( over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_β , roman_if italic_i = italic_j . end_CELL end_ROW end_ARRAY (34)

𝒉𝒉{\bm{h}}bold_italic_h is a vector with

𝒉i=trace(𝑭^T𝑭^i)μ2𝑭𝑭^iF2.subscript𝒉𝑖tracesuperscript^𝑭𝑇superscript^𝑭𝑖𝜇2superscriptsubscriptnorm𝑭superscript^𝑭𝑖𝐹2{{\bm{h}}_{i}}=\text{trace}\left({{\hat{\bm{F}}}^{T}\hat{\bm{F}}^{i}}\right)-{% \frac{\mu}{2}\left\|{{\bm{F}}-\hat{\bm{F}}^{i}}\right\|_{F}^{2}}.bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = trace ( over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) - divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ bold_italic_F - over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (35)

The optimization method for Equation 33 is the interior-point optimization algorithm (Byrd et al., 1999).

𝜽Dsubscript𝜽𝐷\bm{\theta}_{D}bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT-subproblem: With the fixed all the variants except 𝜽Dsubscript𝜽𝐷\bm{\theta}_{D}bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT, the formula can be written as

argmin𝜽Dsubscriptsubscript𝜽𝐷\displaystyle\mathop{\arg\min}\limits_{{\bm{\theta}_{D}}}start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT end_POSTSUBSCRIPT 12σvec(𝑭^)T𝑳vec(𝑭^)12𝜎vecsuperscript^𝑭𝑇𝑳vec^𝑭\displaystyle\ \frac{1}{2}\sigma{\mathop{\rm vec}\nolimits}{\left({\hat{\bm{F}% }}\right)^{T}}{\bm{L}}{\mathop{\rm vec}\nolimits}\left({\hat{\bm{F}}}\right)divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_σ roman_vec ( over^ start_ARG bold_italic_F end_ARG ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_L roman_vec ( over^ start_ARG bold_italic_F end_ARG ) (36)
s.t.formulae-sequence𝑠𝑡\displaystyle s.t.italic_s . italic_t . 𝑳=𝑰NM(𝑯S0.5𝑲S𝑯S0.5)(𝑯D0.5𝑲D𝑯D0.5),𝑳subscript𝑰𝑁𝑀tensor-productsuperscriptsubscript𝑯𝑆0.5superscriptsubscript𝑲𝑆superscriptsubscript𝑯𝑆0.5superscriptsubscript𝑯𝐷0.5superscriptsubscript𝑲𝐷superscriptsubscript𝑯𝐷0.5\displaystyle{\bm{L}}={\bm{I}}_{NM}-\left({{\bm{H}}_{S}^{-0.5}{\bm{K}}_{S}^{*}% {\bm{H}}_{S}^{-0.5}}\right)\otimes\left({{\bm{H}}_{D}^{-0.5}{\bm{K}}_{D}^{*}{% \bm{H}}_{D}^{-0.5}}\right),bold_italic_L = bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT - ( bold_italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT bold_italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT ) ⊗ ( bold_italic_H start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT bold_italic_H start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT ) ,
𝑲D=i=1P[𝜽D]iε𝑲Di,i=1P[𝜽D]i=1,[𝜽D]i0,i=1,,P.formulae-sequencesuperscriptsubscript𝑲𝐷superscriptsubscript𝑖1𝑃subscriptsuperscriptdelimited-[]subscript𝜽𝐷𝜀𝑖superscriptsubscript𝑲𝐷𝑖formulae-sequencesuperscriptsubscript𝑖1𝑃subscriptdelimited-[]subscript𝜽𝐷𝑖1formulae-sequencesubscriptdelimited-[]subscript𝜽𝐷𝑖0𝑖1𝑃\displaystyle{\bm{K}}_{D}^{*}=\sum\limits_{i=1}^{P}{{{\left[{\bm{\theta}_{D}}% \right]}^{\varepsilon}_{i}}{\bm{K}}_{D}^{i}},\sum\limits_{i=1}^{P}\left[{\bm{% \theta}_{D}}\right]_{i}=1,\left[{\bm{\theta}_{D}}\right]_{i}\geq 0,i=1,\ldots,P.bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT [ bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT [ bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , [ bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0 , italic_i = 1 , … , italic_P .

Let 𝑨=𝑯S0.5𝑲S𝑯S0.5𝑨superscriptsubscript𝑯𝑆0.5superscriptsubscript𝑲𝑆superscriptsubscript𝑯𝑆0.5{\bm{A}}={{\bm{H}}_{S}^{-0.5}{\bm{K}}_{S}^{*}{\bm{H}}_{S}^{-0.5}}bold_italic_A = bold_italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT bold_italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT and 𝑩i=𝑯D0.5𝑲Di𝑯D0.5superscript𝑩𝑖superscriptsubscript𝑯𝐷0.5superscriptsubscript𝑲𝐷𝑖superscriptsubscript𝑯𝐷0.5{\bm{B}}^{i}={{\bm{H}}_{D}^{-0.5}{\bm{K}}_{D}^{i}{\bm{H}}_{D}^{-0.5}}bold_italic_B start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = bold_italic_H start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_H start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT. Then substituting 𝑳𝑳{\bm{L}}bold_italic_L in Equation 36 with 𝑨𝑨{\bm{A}}bold_italic_A and 𝑩isuperscript𝑩𝑖{\bm{B}}^{i}bold_italic_B start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, the objective function 36 can be written as

argmin𝜽Dsubscriptsubscript𝜽𝐷\displaystyle\mathop{\arg\min}\limits_{{\bm{\theta}_{D}}}start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT end_POSTSUBSCRIPT 12σvec(𝑭^)Ti=1P(𝑨𝑩i)vec(𝑭^)12𝜎vecsuperscript^𝑭𝑇superscriptsubscript𝑖1𝑃tensor-product𝑨superscript𝑩𝑖vec^𝑭\displaystyle\ -\frac{1}{2}\sigma{\mathop{\rm vec}\nolimits}{\left({\hat{\bm{F% }}}\right)^{T}}{\sum\limits_{i=1}^{P}\left({\bm{A}}\otimes{\bm{B}}^{i}\right)}% {\mathop{\rm vec}\nolimits}\left({\hat{\bm{F}}}\right)- divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_σ roman_vec ( over^ start_ARG bold_italic_F end_ARG ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ( bold_italic_A ⊗ bold_italic_B start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) roman_vec ( over^ start_ARG bold_italic_F end_ARG ) (37)
s.t.formulae-sequence𝑠𝑡\displaystyle s.t.italic_s . italic_t . i=1P[𝜽D]i=1,[𝜽D]i0,i=1,,P.formulae-sequencesuperscriptsubscript𝑖1𝑃subscriptdelimited-[]subscript𝜽𝐷𝑖1formulae-sequencesubscriptdelimited-[]subscript𝜽𝐷𝑖0𝑖1𝑃\displaystyle\sum\limits_{i=1}^{P}\left[{\bm{\theta}_{D}}\right]_{i}=1,\left[{% \bm{\theta}_{D}}\right]_{i}\geq 0,i=1,\ldots,P.∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT [ bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , [ bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0 , italic_i = 1 , … , italic_P .

Further, introduce the Lagrange multiplier ξ𝜉\xiitalic_ξ and the objective function 37 can be converted to a Lagrange function:

Lag(𝜽D,ξ)=12σvec(𝑭^)Ti=1P(𝑨𝑩i)vec(𝑭^)ξ(i=1P[𝜽D]i1)Lagsubscript𝜽𝐷𝜉12𝜎vecsuperscript^𝑭𝑇superscriptsubscript𝑖1𝑃tensor-product𝑨superscript𝑩𝑖vec^𝑭𝜉superscriptsubscript𝑖1𝑃subscriptdelimited-[]subscript𝜽𝐷𝑖1\text{Lag}\left(\bm{\theta}_{D},\xi\right)=-\frac{1}{2}\sigma{\mathop{\rm vec}% \nolimits}{\left({\hat{\bm{F}}}\right)^{T}}{\sum\limits_{i=1}^{P}\left({\bm{A}% }\otimes{\bm{B}}^{i}\right)}{\mathop{\rm vec}\nolimits}\left({\hat{\bm{F}}}% \right)-\xi\left(\sum\limits_{i=1}^{P}\left[{\bm{\theta}_{D}}\right]_{i}-1\right)Lag ( bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT , italic_ξ ) = - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_σ roman_vec ( over^ start_ARG bold_italic_F end_ARG ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ( bold_italic_A ⊗ bold_italic_B start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) roman_vec ( over^ start_ARG bold_italic_F end_ARG ) - italic_ξ ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT [ bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 ) (38)

Based on setting the derivative of Equation 38 w.r.t 𝜽Dsubscript𝜽𝐷\bm{\theta}_{D}bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT and ξ𝜉\xiitalic_ξ to zero respectively, we have the following solution

[𝜽D]i=(vec(𝑭^)T(𝑨𝑩i)vec(𝑭^))11ε/j=1P(vec(𝑭^)T(𝑨𝑩j)vec(𝑭^))11ε.subscriptdelimited-[]subscript𝜽𝐷𝑖superscriptvecsuperscript^𝑭𝑇tensor-product𝑨superscript𝑩𝑖vec^𝑭11𝜀/superscriptsubscript𝑗1𝑃superscriptvecsuperscript^𝑭𝑇tensor-product𝑨superscript𝑩𝑗vec^𝑭11𝜀\left[{\bm{\theta}_{D}}\right]_{i}={{{{\left({{\mathop{\rm vec}\nolimits}{{% \left({\hat{\bm{F}}}\right)}^{T}}\left({{\bm{A}}\otimes{{\bm{B}}^{i}}}\right){% \mathop{\rm vec}\nolimits}\left({\hat{\bm{F}}}\right)}\right)}^{\frac{1}{{1-% \varepsilon}}}}}\mathord{\left/{\vphantom{{{{\left({{\mathop{\rm vec}\nolimits% }{{\left({\hat{\bm{F}}}\right)}^{T}}\left({{\bm{A}}\otimes{{\bm{B}}^{i}}}% \right){\mathop{\rm vec}\nolimits}\left({\hat{\bm{F}}}\right)}\right)}^{\frac{% 1}{{1-\varepsilon}}}}}{\sum\limits_{j=1}^{P}{{{\left({{\mathop{\rm vec}% \nolimits}{{\left({\hat{\bm{F}}}\right)}^{T}}\left({{\bm{A}}\otimes{{\bm{B}}^{% i}}}\right){\mathop{\rm vec}\nolimits}\left({\hat{\bm{F}}}\right)}\right)}^{% \frac{1}{{1-\varepsilon}}}}}}}}\right.\kern-1.2pt}{\sum\limits_{j=1}^{P}{{{% \left({{\mathop{\rm vec}\nolimits}{{\left({\hat{\bm{F}}}\right)}^{T}}\left({{% \bm{A}}\otimes{{\bm{B}}^{j}}}\right){\mathop{\rm vec}\nolimits}\left({\hat{\bm% {F}}}\right)}\right)}^{\frac{1}{{1-\varepsilon}}}}}}}.[ bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( roman_vec ( over^ start_ARG bold_italic_F end_ARG ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_italic_A ⊗ bold_italic_B start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) roman_vec ( over^ start_ARG bold_italic_F end_ARG ) ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_ε end_ARG end_POSTSUPERSCRIPT start_ID / end_ID ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ( roman_vec ( over^ start_ARG bold_italic_F end_ARG ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_italic_A ⊗ bold_italic_B start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) roman_vec ( over^ start_ARG bold_italic_F end_ARG ) ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_ε end_ARG end_POSTSUPERSCRIPT . (39)

By using the vec-tricks operation, we can describe the solution as

[𝜽D]i=trace(𝑭^T𝑩i𝑭^𝑨T)11ε/j=1Ptrace(𝑭^T𝑩j𝑭^𝑨T)11εsubscriptdelimited-[]subscript𝜽𝐷𝑖tracesuperscriptsuperscript^𝑭𝑇superscript𝑩𝑖^𝑭superscript𝑨𝑇11𝜀/superscriptsubscript𝑗1𝑃tracesuperscriptsuperscript^𝑭𝑇superscript𝑩𝑗^𝑭superscript𝑨𝑇11𝜀\left[{\bm{\theta}_{D}}\right]_{i}={{{\text{trace}}{{\left({{{\hat{\bm{F}}}^{T% }}{{\bm{B}}^{i}}\hat{\bm{F}}{{\bm{A}}^{T}}}\right)}^{\frac{1}{{1-\varepsilon}}% }}}\mathord{\left/{\vphantom{{{\text{trace}}{{\left({{{\hat{\bm{F}}}^{T}}{{\bm% {B}}^{i}}\hat{\bm{F}}{{\bm{A}}^{T}}}\right)}^{\frac{1}{{1-\varepsilon}}}}}{% \sum\limits_{j=1}^{P}{trace{{\left({{{\hat{\bm{F}}}^{T}}{{\bm{B}}^{i}}\hat{\bm% {F}}{{\bm{A}}^{T}}}\right)}^{\frac{1}{{1-\varepsilon}}}}}}}}\right.\kern-1.2pt% }{\sum\limits_{j=1}^{P}{\text{trace}{{\left({{{\hat{\bm{F}}}^{T}}{{\bm{B}}^{j}% }\hat{\bm{F}}{{\bm{A}}^{T}}}\right)}^{\frac{1}{{1-\varepsilon}}}}}}}[ bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = trace ( over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_B start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT over^ start_ARG bold_italic_F end_ARG bold_italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_ε end_ARG end_POSTSUPERSCRIPT start_ID / end_ID ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT trace ( over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_B start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT over^ start_ARG bold_italic_F end_ARG bold_italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_ε end_ARG end_POSTSUPERSCRIPT (40)

𝜽Ssubscript𝜽𝑆\bm{\theta}_{S}bold_italic_θ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT-subproblem:The solution of 𝜽Ssubscript𝜽𝑆\bm{\theta}_{S}bold_italic_θ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT is similarity to 𝜽Dsubscript𝜽𝐷\bm{\theta}_{D}bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT. Here, the optimization process is omitted and we directly give the solution

[𝜽S]i=trace(𝑭^T𝑩𝑭^(𝑨i)T)11ε/j=1Qtrace(𝑭^T𝑩𝑭^(𝑨j)T)11εsubscriptdelimited-[]subscript𝜽𝑆𝑖tracesuperscriptsuperscript^𝑭𝑇𝑩^𝑭superscriptsuperscript𝑨𝑖𝑇11𝜀/superscriptsubscript𝑗1𝑄tracesuperscriptsuperscript^𝑭𝑇𝑩^𝑭superscriptsuperscript𝑨𝑗𝑇11𝜀\left[{\bm{\theta}_{S}}\right]_{i}={{{\text{trace}}{{\left({{{\hat{\bm{F}}}^{T% }}{{\bm{B}}}\hat{\bm{F}}{({\bm{A}}^{i})^{T}}}\right)}^{\frac{1}{{1-\varepsilon% }}}}}\mathord{\left/{\vphantom{{{\text{trace}}{{\left({{{\hat{\bm{F}}}^{T}}{{% \bm{B}}^{i}}\hat{\bm{F}}{{\bm{A}}^{T}}}\right)}^{\frac{1}{{1-\varepsilon}}}}}{% \sum\limits_{j=1}^{P}{trace{{\left({{{\hat{\bm{F}}}^{T}}{{\bm{B}}^{i}}\hat{\bm% {F}}{{\bm{A}}^{T}}}\right)}^{\frac{1}{{1-\varepsilon}}}}}}}}\right.\kern-1.2pt% }{\sum\limits_{j=1}^{Q}{\text{trace}{{\left({{{\hat{\bm{F}}}^{T}}{{\bm{B}}}% \hat{\bm{F}}{({\bm{A}}^{j})^{T}}}\right)}^{\frac{1}{{1-\varepsilon}}}}}}}[ bold_italic_θ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = trace ( over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_B over^ start_ARG bold_italic_F end_ARG ( bold_italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_ε end_ARG end_POSTSUPERSCRIPT start_ID / end_ID ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT trace ( over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_B over^ start_ARG bold_italic_F end_ARG ( bold_italic_A start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_ε end_ARG end_POSTSUPERSCRIPT (41)

where 𝑩=𝑯D0.5𝑲D𝑯D0.5𝑩superscriptsubscript𝑯𝐷0.5superscriptsubscript𝑲𝐷superscriptsubscript𝑯𝐷0.5{\bm{B}}={{\bm{H}}_{D}^{-0.5}{\bm{K}}_{D}^{*}{\bm{H}}_{D}^{-0.5}}bold_italic_B = bold_italic_H start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT bold_italic_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT bold_italic_H start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT and 𝑨i=𝑯S0.5𝑲Si𝑯S0.5superscript𝑨𝑖superscriptsubscript𝑯𝑆0.5superscriptsubscript𝑲𝑆𝑖superscriptsubscript𝑯𝑆0.5{\bm{A}}^{i}={{\bm{H}}_{S}^{-0.5}{\bm{K}}_{S}^{i}{\bm{H}}_{S}^{-0.5}}bold_italic_A start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = bold_italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT bold_italic_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT.

𝒂vsuperscript𝒂𝑣{{\bm{a}}^{v}}bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT-subproblem: By drop** all other irrelevant terms with respect 𝒂vsuperscript𝒂𝑣{{\bm{a}}^{v}}bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT, we have

argmin𝒂vsubscriptsuperscript𝒂𝑣\displaystyle\mathop{\arg\min}\limits_{{{\bm{a}}^{v}}}start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT end_POSTSUBSCRIPT 12vec(𝑭^)i=1V𝒘i𝑲i𝒂i22+μ(𝒘v2vec(𝑭)𝑲v𝒂v22+λv2𝒂vT𝑲v𝒂v).12superscriptsubscriptnormvec^𝑭superscriptsubscript𝑖1𝑉subscript𝒘𝑖superscript𝑲𝑖superscript𝒂𝑖22𝜇subscript𝒘𝑣2superscriptsubscriptnormvec𝑭superscript𝑲𝑣superscript𝒂𝑣22superscript𝜆𝑣2superscript𝒂superscript𝑣𝑇superscript𝑲𝑣superscript𝒂𝑣\displaystyle\ \frac{1}{2}\left\|{{\mathop{\rm vec}\nolimits}\left({\hat{\bm{F% }}}\right)-\sum\limits_{i=1}^{V}{{{\bm{w}}_{i}}{{\bm{K}}^{i}}{{\bm{a}}^{i}}}}% \right\|_{2}^{2}+\mu\left({\frac{{{{\bm{w}}_{v}}}}{2}\left\|{{\mathop{\rm vec}% \nolimits}\left({\bm{F}}\right)-{{\bm{K}}^{v}}{{\bm{a}}^{v}}}\right\|_{2}^{2}+% \frac{{{\lambda^{v}}}}{2}{{\bm{a}}^{{v^{T}}}}{{\bm{K}}^{v}}{{\bm{a}}^{v}}}% \right).divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ roman_vec ( over^ start_ARG bold_italic_F end_ARG ) - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ ( divide start_ARG bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ roman_vec ( bold_italic_F ) - bold_italic_K start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_λ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_italic_a start_POSTSUPERSCRIPT italic_v start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT bold_italic_K start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ) . (42)

It can be observed from the objective function 42 that when training the parameter 𝒂vsuperscript𝒂𝑣{\bm{a}}^{v}bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT, other views 𝑲isubscript𝑲𝑖{\bm{K}}_{i}bold_italic_K start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with weight 𝒘isubscript𝒘𝑖{\bm{w}}_{i}bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT were taken into consideration. Therefore, each partition’s training is not completely separate, but involves information sharing.

Based on setting the derivative of problem 42 w.r.t 𝒂vsuperscript𝒂𝑣{{\bm{a}}^{v}}bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT to zero, we get

(𝑲v+λv1+μ𝒘v𝑰NM)𝒂v=11+μ𝒘v(vec(𝑭^)i=1,ivV𝒘i𝑲i𝒂i+μ𝒘vvec(𝑭))superscript𝑲𝑣subscript𝜆𝑣1𝜇subscript𝒘𝑣subscript𝑰𝑁𝑀superscript𝒂𝑣11𝜇subscript𝒘𝑣vec^𝑭superscriptsubscriptformulae-sequence𝑖1𝑖𝑣𝑉subscript𝒘𝑖superscript𝑲𝑖superscript𝒂𝑖𝜇subscript𝒘𝑣vec𝑭\left({{{\bm{K}}^{v}}+\frac{{{\lambda_{v}}}}{{1+\mu{{\bm{w}}_{v}}}}{\bm{I}}_{% NM}}\right){{\bm{a}}^{v}}=\frac{1}{{1+\mu{{\bm{w}}_{v}}}}\left({{\mathop{\rm vec% }\nolimits}\left({\hat{\bm{F}}}\right)-\sum\limits_{i=1,i\neq v}^{V}{{{\bm{w}}% _{i}}{{\bm{K}}^{i}}{{\bm{a}}^{i}}}+\mu{{\bm{w}}_{v}}{\mathop{\rm vec}\nolimits% }\left({\bm{F}}\right)}\right)( bold_italic_K start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT + divide start_ARG italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_μ bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT ) bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 1 + italic_μ bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG ( roman_vec ( over^ start_ARG bold_italic_F end_ARG ) - ∑ start_POSTSUBSCRIPT italic_i = 1 , italic_i ≠ italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + italic_μ bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT roman_vec ( bold_italic_F ) ) (43)

Let 𝑾=𝑭^i=1,ivV𝒘i𝑭^i+μ𝒘v𝑭𝑾^𝑭superscriptsubscriptformulae-sequence𝑖1𝑖𝑣𝑉subscript𝒘𝑖superscript^𝑭𝑖𝜇subscript𝒘𝑣𝑭{\bm{W}}={{\hat{\bm{F}}}-\sum\limits_{i=1,i\neq v}^{V}{{{\bm{w}}_{i}}{\hat{\bm% {F}}^{i}}}+\mu{{\bm{w}}_{v}}{\bm{F}}}bold_italic_W = over^ start_ARG bold_italic_F end_ARG - ∑ start_POSTSUBSCRIPT italic_i = 1 , italic_i ≠ italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over^ start_ARG bold_italic_F end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + italic_μ bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT bold_italic_F, the Equation 43 can be written as

𝒂v=11+μ𝒘v(𝑲v+λv1+μ𝒘v𝑰NM)1vec(𝑾).superscript𝒂𝑣11𝜇subscript𝒘𝑣superscriptsuperscript𝑲𝑣subscript𝜆𝑣1𝜇subscript𝒘𝑣subscript𝑰𝑁𝑀1vec𝑾{{\bm{a}}^{v}}=\frac{1}{{1+\mu{{\bm{w}}_{v}}}}{\left({{{\bm{K}}^{v}}+\frac{{{% \lambda_{v}}}}{{1+\mu{{\bm{w}}_{v}}}}{\bm{I}}_{NM}}\right)^{-1}}\text{vec}% \left({\bm{W}}\right).bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 1 + italic_μ bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG ( bold_italic_K start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT + divide start_ARG italic_λ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_μ bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG bold_italic_I start_POSTSUBSCRIPT italic_N italic_M end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT vec ( bold_italic_W ) . (44)

We can observe that the form of Equation 44 is similar to Equation 7. Therefore, we use eigen decomposed techniques and the vec-trick operation to effectively compute 𝒂vsuperscript𝒂𝑣{\bm{a}}^{v}bold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT.

We summarize the complete optimization process for problem 24 in Algorithm 1.

Input: The link matrix 𝑭𝑭{\bm{F}}bold_italic_F; The regulation parameters μ𝜇\muitalic_μ, β𝛽\betaitalic_β, σ𝜎\sigmaitalic_σ, ε𝜀\varepsilonitalic_ε and λv,v=1,,Vformulae-sequencesuperscript𝜆𝑣𝑣1𝑉{\lambda^{v}},v=1,\ldots,Vitalic_λ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT , italic_v = 1 , … , italic_V;
Output: The predicted link matrix 𝑭^^𝑭\hat{\bm{F}}over^ start_ARG bold_italic_F end_ARG;
1 Compute two sets of base kernel sets 𝕂Dsubscript𝕂𝐷{\mathbb{K}}_{D}blackboard_K start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT and 𝕂Ssubscript𝕂𝑆{\mathbb{K}}_{S}blackboard_K start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT by Equation 20a and 20b;
2 Initialize 𝒂v,v=1,,Vformulae-sequencesuperscript𝒂𝑣𝑣1𝑉{{\bm{a}}^{v}},v=1,\ldots,Vbold_italic_a start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT , italic_v = 1 , … , italic_V by single view Kron-RLS; 𝒘v=1/V,v=1,,Vformulae-sequencesubscript𝒘𝑣1𝑉𝑣1𝑉{{\bm{w}}_{v}}=1/V,v=1,\ldots,Vbold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 1 / italic_V , italic_v = 1 , … , italic_V; 𝜽Di=1/P,i=1,,Pformulae-sequencesuperscriptsubscript𝜽𝐷𝑖1𝑃𝑖1𝑃\bm{\theta}_{D}^{i}=1/P,i=1,\ldots,Pbold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = 1 / italic_P , italic_i = 1 , … , italic_P; 𝜽Si=1/Q,i=1,,Qformulae-sequencesuperscriptsubscript𝜽𝑆𝑖1𝑄𝑖1𝑄\bm{\theta}_{S}^{i}=1/Q,i=1,\ldots,Qbold_italic_θ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = 1 / italic_Q , italic_i = 1 , … , italic_Q;
3 while Not convergence do
4       Update 𝑭^^𝑭\hat{\bm{F}}over^ start_ARG bold_italic_F end_ARG by solve the subproblem 25;
5       Update 𝒘𝒘{\bm{w}}bold_italic_w by solve the subproblem 32;
6       Update 𝜽Dsubscript𝜽𝐷\bm{\theta}_{D}bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT by solve the subproblem 36;
7       Update 𝜽Ssubscript𝜽𝑆\bm{\theta}_{S}bold_italic_θ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT by Equation 41;
8       for i=1𝑖1i=1italic_i = 1 to V𝑉Vitalic_V do
9             Update 𝒂isuperscript𝒂𝑖{\bm{a}}^{i}bold_italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT by solve the subproblem 42;
10       end for
11      
12 end while
Algorithm 1 Optimization for MKronRLSF-LP.

A.2 Measurements

Considering that drug-side effect prediction is an extremely imbalanced classification problem and we do not want incorrect predictions to be recommended by the prediction model, we utilize the following evaluation parameters:

Recall=TPTP+FN,𝑅𝑒𝑐𝑎𝑙𝑙𝑇𝑃𝑇𝑃𝐹𝑁Recall=\frac{{TP}}{{TP+FN}},italic_R italic_e italic_c italic_a italic_l italic_l = divide start_ARG italic_T italic_P end_ARG start_ARG italic_T italic_P + italic_F italic_N end_ARG , (45a)
Precision=TPTP+FP,𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑇𝑃𝑇𝑃𝐹𝑃Precision=\frac{{TP}}{{TP+FP}},italic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n = divide start_ARG italic_T italic_P end_ARG start_ARG italic_T italic_P + italic_F italic_P end_ARG , (45b)
Fscore=2×Precision×RecallPrecision+Recall,subscript𝐹𝑠𝑐𝑜𝑟𝑒2𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑅𝑒𝑐𝑎𝑙𝑙𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑅𝑒𝑐𝑎𝑙𝑙{F_{score}}=2\times\frac{{Precision\times Recall}}{{Precision+Recall}},italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT = 2 × divide start_ARG italic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n × italic_R italic_e italic_c italic_a italic_l italic_l end_ARG start_ARG italic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n + italic_R italic_e italic_c italic_a italic_l italic_l end_ARG , (45c)

where TP𝑇𝑃TPitalic_T italic_P, FN𝐹𝑁FNitalic_F italic_N, FP𝐹𝑃FPitalic_F italic_P and TN𝑇𝑁TNitalic_T italic_N are the number of true-positive samples, false-negative samples, false-positive samples and true-negative samples, respectively. The area under the ROC curve (AUC) and area under the precision recall curve (AUPR) is also used to measure predictive accuracy, because they are the most commonly used evaluate metrics in the biomedical link prediction. The precision-recall curve shows the tradeoff between precision and recall at different thresholds. Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT is calculated from Precision and Recall. The highest possible value of an Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT is 1, indicating perfect precision and recall, and the lowest possible value is 0, if either precision or recall are zero. AUC can be considered as the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance (Li et al., 2021). Therefore, we consider AUPR and Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT more desirable metrics (Ezzat et al., 2017; Li et al., 2021).

A.3 Baseline methods

  • \bullet

    Best single view (BSV): Applying Kron-RLS to the best single view. The one with the maximum AUPR is chosen here.

  • \bullet

    Committee Kron-RLS (Comm Kron-RLS)(Perrone & Cooper, 1995): Each view is trained by Kron-RLS separately, and the final classifier is a weighted average.

  • \bullet

    Kron-RLS with Centered Kernel Alignment-based Multiple Kernel Learning (Kron-RLS+CKA-MKL)(Ding et al., 2019): Multiple kernels from the drug space and side effect space are linearly weighted by the optimized CKA-MKL. Finally, Kron-RLS is employed on optimal kernels.

  • \bullet

    Kron-RLS with pairwise Multiple Kernel Learning (Kron-RLS+pairwiseMKL)(Cichonska et al., 2018): First, it constructs multiple pairwise kernels. Then, the mixture weights of the pairwise kernels are determined by CKA-MKL. Finally, it learns the Kron-RLS function based on the optimal pairwise kernel.

  • \bullet

    Kron-RLS with self-weighted multiple kernel learning (Kron-RLS+self-MKL)(Nascimento et al., 2016): The optimal drug and side effect kernels are linearly weighted based on the multiple base kernel. The proper weights assignment to each kernel is performed automatically.

  • \bullet

    Multi-view graph regularized link propagation model (MvGRLP)(Ding et al., 2021): This is an extension of the graph model (Zha et al., 2009). To fuse multi view information, multi-view Laplacian regularization is introduced to constrain the predicted values.

  • \bullet

    Multi-view graph convolution network (MvGCN)(Fu et al., 2022): This extends the GCN (Zhang et al., 2019) from a single view to multi-view by combining the embeddings of multiple neighborhood information aggregation layers in each view.

A.4 Code and Data Available

The code and data are available at https://github.com/QYuQing/MKronRLSF-LP.

A.5 Figures

Refer to caption
Figure 5: Visualization of drug-side effect association problems.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 6: Results (MKronRLSF-LP) for Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT, Recall𝑅𝑒𝑐𝑎𝑙𝑙Recallitalic_R italic_e italic_c italic_a italic_l italic_l and Precision𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛Precisionitalic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n of different thresholds.

A.6 Tables

Table 3: Prediction performance comparison of baseline methods on four datasets.
Dataset Methods AUPR(%) AUC(%) Recall𝑅𝑒𝑐𝑎𝑙𝑙Recallitalic_R italic_e italic_c italic_a italic_l italic_l(%) Precision𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛Precisionitalic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n(%) Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT(%)
Liu BSV 60.12±1.12 93.22±1.63 58.77±0.33 59.09±0.49 58.52±0.23
Comm Kron-RLS 65.63±1.95 94.11±1.45 61.63±0.33 61.9±1.37 61.57±1.65
Kron-RLS
+CKA-MKL
65.92±0.43 92.51±0.08 62.11±0.43 63.09±0.56 62.59±0.41
Kron-RLS
+pairwiseMKL
62.03±0.44 95.01±0.06 65.39±0.24 54.46±0.30 59.43±0.21
Kron-RLS
+self-MKL
65.02±0.47 92.1±0.10 60.97±0.57 63.12±0.61 62.03±0.52
MvGRLP 66.32±0.45 94.29±0.08 63.56±0.46 60.87±0.62 62.18±0.39
MvGCN 62.69±1.81 94.01±0.87 60.81±0.37 60.33±1.31 60.48±1.15
MKronRLSF-LP 68.02±0.44 94.78±0.13 65.18±0.93 61.27±1.08 63.02±0.43
Pau BSV 65.26±0.98 94.57±0.34 62.54±0.5 60.77±1.27 60.65±0.73
Comm Kron-RLS 65.63±0.36 94.78±0.13 64.01±0.38 60.05±0.49 61.01±0.27
Kron-RLS
+CKA-MKL
65.49±0.37 92.39±0.13 61.65±0.40 63.22±0.51 62.42±0.27
Kron-RLS
+pairwiseMKL
63.48±0.39 95.02±0.07 78.1±0.26 45.01±0.48 57.11±0.36
Kron-RLS
+self-MKL
64.11±1.75 91.94±0.25 62.37±0.29 60.97±1.57 61.65±0.79
MvGRLP 66.17±0.32 94.42±0.07 62.18±0.38 61.95±0.45 62.06±0.22
MvGCN 63.51±1.43 94.08±0.49 63.21±0.69 57.94±1.34 60.4±1.78
MKronRLSF-LP 67.81±0.37 94.81±0.18 65.72±3.58 60.65±3.75 62.87±0.48
Miz BSV 56.58±2.33 90.71±2.06 62.76±0.69 53.94±2.31 55.39±2.33
Comm Kron-RLS 58.08±1.07 91.36±1.25 62.37±0.81 55.16±1.99 56.54±1.77
Kron-RLS
+CKA-MKL
66.92±0.44 92.58±0.14 62.62±0.52 64.3±0.46 61.45±0.44
Kron-RLS
+pairwiseMKL
62.13±0.29 94.70±0.11 63.78±0.47 56.26±0.42 59.79±0.30
Kron-RLS
+self-MKL
65.84±0.43 92.06±0.16 63.63±0.48 61.77±0.52 60.68±0.43
MvGRLP 66.68±0.35 94.10±0.12 63.46±0.43 61.82±0.30 62.63±0.29
MvGCN 62.17±1.90 93.35±1.73 59.54±0.43 60.74±1.78 59.76±1.95
MKronRLSF-LP 68.35±0.38 94.47±0.09 65.15±2.77 62.10±3.19 63.45±0.53
Luo BSV 60.40±0.40 94.40±0.11 58.28±0.41 58.68±0.46 58.48±0.39
Comm Kron-RLS 54.19±1.36 91.92±4.01 57.64±2.46 53.16±1.97 52.99±1.54
Kron-RLS
+CKA-MKL
60.87±0.36 92.03±0.15 55.55±0.34 64.15±0.46 59.54±0.36
Kron-RLS
+pairwiseMKL
50.29±0.29 94.37±0.10 55.66±0.39 45.97±0.39 50.35±0.31
Kron-RLS
+self-MKL
22.29±1.57 79.74±1.62 56.62±1.47 20.91±1.64 28.23±1.15
MvGRLP 61.76±0.45 94.08±0.07 58.70±0.40 60.05±0.61 58.37±0.42
MvGCN 61.18±0.41 94.54±0.1 57.94±0.37 61.26±0.48 51.07±0.38
MKronRLSF-LP 63.32±0.58 94.07±0.14 59.43±0.95 61.58±1.22 60.47±0.39
Table 4: The optimal parameters λvsuperscript𝜆𝑣\lambda^{v}italic_λ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT obtained with the single view Kron-RLS model (based on the relative pairwise kernel).
tensor-product\otimes 𝑲GIP,Ssubscript𝑲𝐺𝐼𝑃𝑆{\bm{K}}_{GIP,S}bold_italic_K start_POSTSUBSCRIPT italic_G italic_I italic_P , italic_S end_POSTSUBSCRIPT 𝑲GIP,Ssubscript𝑲𝐺𝐼𝑃𝑆{\bm{K}}_{GIP,S}bold_italic_K start_POSTSUBSCRIPT italic_G italic_I italic_P , italic_S end_POSTSUBSCRIPT 𝑲GIP,Ssubscript𝑲𝐺𝐼𝑃𝑆{\bm{K}}_{GIP,S}bold_italic_K start_POSTSUBSCRIPT italic_G italic_I italic_P , italic_S end_POSTSUBSCRIPT 𝑲GIP,Ssubscript𝑲𝐺𝐼𝑃𝑆{\bm{K}}_{GIP,S}bold_italic_K start_POSTSUBSCRIPT italic_G italic_I italic_P , italic_S end_POSTSUBSCRIPT 𝑲GIP,Ssubscript𝑲𝐺𝐼𝑃𝑆{\bm{K}}_{GIP,S}bold_italic_K start_POSTSUBSCRIPT italic_G italic_I italic_P , italic_S end_POSTSUBSCRIPT
𝑲GIP,Dsubscript𝑲𝐺𝐼𝑃𝐷{\bm{K}}_{GIP,D}bold_italic_K start_POSTSUBSCRIPT italic_G italic_I italic_P , italic_D end_POSTSUBSCRIPT 20superscript202^{0}2 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT 22superscript222^{2}2 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 22superscript222^{2}2 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 21superscript212^{1}2 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 21superscript212^{1}2 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT
𝑲COS,Dsubscript𝑲𝐶𝑂𝑆𝐷{\bm{K}}_{COS,D}bold_italic_K start_POSTSUBSCRIPT italic_C italic_O italic_S , italic_D end_POSTSUBSCRIPT 22superscript222^{2}2 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 23superscript232^{3}2 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 23superscript232^{3}2 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 22superscript222^{2}2 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 22superscript222^{-2}2 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT
𝑲Corr,Dsubscript𝑲𝐶𝑜𝑟𝑟𝐷{\bm{K}}_{Corr,D}bold_italic_K start_POSTSUBSCRIPT italic_C italic_o italic_r italic_r , italic_D end_POSTSUBSCRIPT 23superscript232^{3}2 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 23superscript232^{3}2 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 24superscript242^{4}2 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 22superscript222^{2}2 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 24superscript242^{4}2 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT
𝑲MI,Dsubscript𝑲𝑀𝐼𝐷{\bm{K}}_{MI,D}bold_italic_K start_POSTSUBSCRIPT italic_M italic_I , italic_D end_POSTSUBSCRIPT 20superscript202^{0}2 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT 21superscript212^{1}2 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 21superscript212^{1}2 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 20superscript202^{0}2 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT 21superscript212^{-1}2 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT
𝑲NTK,Dsubscript𝑲𝑁𝑇𝐾𝐷{\bm{K}}_{NTK,D}bold_italic_K start_POSTSUBSCRIPT italic_N italic_T italic_K , italic_D end_POSTSUBSCRIPT 22superscript222^{2}2 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 24superscript242^{4}2 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 23superscript232^{3}2 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 21superscript212^{1}2 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 21superscript212^{1}2 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT
Table 5: Summary of the threshold of baseline methods on four datasets.
Methods Liu Pau Miz Luo
BSV 0.145 0.146 0.142 0.128
Comm Kron-RLS 0.205 0.204 0.192 0.183
Kron-RLS+CKA-MKL 0.100 0.106 0.099 0.102
Kron-RLS+pairwiseMKL 0.149 0.159 0.101 0.107
Kron-RLS+self-MKL 0.119 0.116 0.113 0.129
MvGRLP 0.090 0.091 0.094 0.085
MVGCN 0.225 0.237 0.208 0.197
MKronRLSF-LP 0.177 0.168 0.179 0.149
Table 6: Prediction performance comparison of other drug-side effect predictors on Liu datasets.
Methods AUPR(%) AUC(%) Recall𝑅𝑒𝑐𝑎𝑙𝑙Recallitalic_R italic_e italic_c italic_a italic_l italic_l(%) Precision𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛Precisionitalic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n(%) Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT(%)
Liu’s method 28.0 90.7 67.5 34.0 45.2
Cheng’s method 59.2 92.2 59.0 55.7 56.9
RBMBM 61.6 94.1 61.5 57.4 59.4
INBM 64.1 93.4 60.7 60.4 60.6
Ensemble model 66.1 94.8 62.3 61.1 61.7
MKL-LGCa 67.0 95.1 - - -
NDDSA with sschemc 60.5 94.1 57.9 56.4 57.1
NDDSA without sschemc 60.4 94.0 57.4 56.8 57.1
MKronRLSF-LP 68.2 94.7 63.8 62.5 63.1
  • - represents not available; the bold and underlined values represent the best and second performance measure in each column, respectively;

  • a and b represents the results are derived from (Ding et al., 2018) and (Shabani-Mashcool et al., 2020), respectively.

Table 7: Prediction performance comparison of other drug-side effect predictors on Pau datasets.
Methods AUPR(%) AUC(%) Recall𝑅𝑒𝑐𝑎𝑙𝑙Recallitalic_R italic_e italic_c italic_a italic_l italic_l(%) Precision𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛Precisionitalic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n(%) Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT(%)
Pau’s methoda 38.9 89.7 51.7 36.1 42.5
Liu’s method 34.7 92.1 64.6 40.0 49.5
Cheng’s method 58.8 82.3 58.3 55.0 56.6
RBMBM 61.3 94.1 60.8 57.7 59.2
INBM 64.1 93.4 60.8 60.5 60.7
Ensembel model 66.0 94.9 62.4 61.2 61.6
MKL-LGCb 66.8 95.2 - - -
NDDSA with sschemc 60.3 94.2 59.3 54.9 57.0
NDDSA without sschemc 60.3 94.1 58.2 55.9 57.0
MKronRLSF-LP 67.9 94.7 63.4 62.9 63.2
  • - represents not available; the bold and underlined values represent the best and second performance measure in each column, respectively;

  • a, b and c represents the results are derived from (Zhang et al., 2016), (Ding et al., 2018) and (Shabani-Mashcool et al., 2020), respectively.

Table 8: Prediction performance comparison of other drug-side effect predictors on Miz datasets.
Methods AUPR(%) AUC(%) Recall𝑅𝑒𝑐𝑎𝑙𝑙Recallitalic_R italic_e italic_c italic_a italic_l italic_l(%) Precision𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛Precisionitalic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n(%) Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT(%)
Miz’s methoda 41.2 89.0 52.7 38.7 44.6
Liu’ method 36.3 91.8 64.0 41.5 50.5
Cheng’s method 56.0 92.3 58.4 56.8 57.6
RBMBM 61.7 93.9 60.5 58.8 59.6
INBM 64.6 93.2 61.6 60.5 61.1
Ensemble model 66.6 94.6 62.4 61.9 62.2
MKL-LGCb 67.3 94.8 - - -
NDDSA with sschemc 60.6 93.9 58.8 56.3 57.5
NDDSA without sschemc 60.7 93.6 60.0 55.5 57.6
MKronRLSF-LP 68.5 94.5 63.0 64.2 63.6
  • - represents not available; the bold and underlined values represent the best and second performance measure in each column, respectively;

  • a, b and c represents the results are derived from (Zhang et al., 2016), (Ding et al., 2018) and (Shabani-Mashcool et al., 2020), respectively.

Table 9: Prediction performance comparison of other drug-side effect predictors on Luo datasets.
Methods AUPR(%) AUC(%) Recall𝑅𝑒𝑐𝑎𝑙𝑙Recallitalic_R italic_e italic_c italic_a italic_l italic_l(%) Precision𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛Precisionitalic_P italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n(%) Fscoresubscript𝐹𝑠𝑐𝑜𝑟𝑒F_{score}italic_F start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT(%)
Liu’s method 39.4 93.5 59.6 48.3 53.3
Cheng’s method 53.2 90.9 53.1 52.3 52.7
RBMBM 55.1 93.5 56.1 54.3 55.1
INBM 57.3 91.7 55.8 56.7 56.2
Ensemble model 58.6 93.9 46.1 68.4 55.1
MKL-LGC 61.7 94.6 - - -
NDDSA with sschema 53.1 94.2 47.6 57.3 52.0
NDDSA without sschema 44.5 93.7 44.7 47.8 46.2
GCRSb 27.2 95.7 - - -
SDPred 22.6 94.6 - - -
MKronRLSF-LP 63.5 94.1 59.2 61.9 60.5
  • - represents not available; the bold and underlined values represent the best and second performance measure in each column, respectively;

  • a,b represents the results are derived from (Shabani-Mashcool et al., 2020) and (Xuan et al., 2022), respectively.