-
A Convergent Algorithm for Bi-orthogonal Nonnegative Matrix Tri-Factorization
Authors:
Andri Mirzal
Abstract:
A convergent algorithm for nonnegative matrix factorization with orthogonality constraints imposed on both factors is proposed in this paper. This factorization concept was first introduced by Ding et al. with intent to further improve clustering capability of NMF. However, as the original algorithm was developed based on multiplicative update rules, the convergence of the algorithm cannot be guar…
▽ More
A convergent algorithm for nonnegative matrix factorization with orthogonality constraints imposed on both factors is proposed in this paper. This factorization concept was first introduced by Ding et al. with intent to further improve clustering capability of NMF. However, as the original algorithm was developed based on multiplicative update rules, the convergence of the algorithm cannot be guaranteed. In this paper, we utilize the technique presented in our previous work to develop the algorithm and prove that it converges to a stationary point inside the solution space.
△ Less
Submitted 15 November, 2018; v1 submitted 29 October, 2017;
originally announced October 2017.
-
A Converged Algorithm for Tikhonov Regularized Nonnegative Matrix Factorization with Automatic Regularization Parameters Determination
Authors:
Andri Mirzal
Abstract:
We present a converged algorithm for Tikhonov regularized nonnegative matrix factorization (NMF). We specially choose this regularization because it is known that Tikhonov regularized least square (LS) is the more preferable form in solving linear inverse problems than the conventional LS. Because an NMF problem can be decomposed into LS subproblems, it can be expected that Tikhonov regularized NM…
▽ More
We present a converged algorithm for Tikhonov regularized nonnegative matrix factorization (NMF). We specially choose this regularization because it is known that Tikhonov regularized least square (LS) is the more preferable form in solving linear inverse problems than the conventional LS. Because an NMF problem can be decomposed into LS subproblems, it can be expected that Tikhonov regularized NMF will be the more appropriate approach in solving NMF problems. The algorithm is derived using additive update rules which have been shown to have convergence guarantee. We equip the algorithm with a mechanism to automatically determine the regularization parameters based on the L-curve, a well-known concept in the inverse problems community, but is rather unknown in the NMF research. The introduction of this algorithm thus solves two inherent problems in Tikhonov regularized NMF algorithm research, i.e., convergence guarantee and regularization parameters determination.
△ Less
Submitted 9 May, 2012;
originally announced May 2012.
-
PID Parameters Optimization by Using Genetic Algorithm
Authors:
Andri Mirzal,
Shinichiro Yoshii,
Masashi Furukawa
Abstract:
Time delays are components that make time-lag in systems response. They arise in physical, chemical, biological and economic systems, as well as in the process of measurement and computation. In this work, we implement Genetic Algorithm (GA) in determining PID controller parameters to compensate the delay in First Order Lag plus Time Delay (FOLPD) and compare the results with Iterative Method and…
▽ More
Time delays are components that make time-lag in systems response. They arise in physical, chemical, biological and economic systems, as well as in the process of measurement and computation. In this work, we implement Genetic Algorithm (GA) in determining PID controller parameters to compensate the delay in First Order Lag plus Time Delay (FOLPD) and compare the results with Iterative Method and Ziegler-Nichols rule results.
△ Less
Submitted 4 April, 2012;
originally announced April 2012.
-
Clustering and Latent Semantic Indexing Aspects of the Nonnegative Matrix Factorization
Authors:
Andri Mirzal
Abstract:
This paper provides a theoretical support for clustering aspect of the nonnegative matrix factorization (NMF). By utilizing the Karush-Kuhn-Tucker optimality conditions, we show that NMF objective is equivalent to graph clustering objective, so clustering aspect of the NMF has a solid justification. Different from previous approaches which usually discard the nonnegativity constraints, our approac…
▽ More
This paper provides a theoretical support for clustering aspect of the nonnegative matrix factorization (NMF). By utilizing the Karush-Kuhn-Tucker optimality conditions, we show that NMF objective is equivalent to graph clustering objective, so clustering aspect of the NMF has a solid justification. Different from previous approaches which usually discard the nonnegativity constraints, our approach guarantees the stationary point being used in deriving the equivalence is located on the feasible region in the nonnegative orthant. Additionally, since clustering capability of a matrix decomposition technique can sometimes imply its latent semantic indexing (LSI) aspect, we will also evaluate LSI aspect of the NMF by showing its capability in solving the synonymy and polysemy problems in synthetic datasets. And more extensive evaluation will be conducted by comparing LSI performances of the NMF and the singular value decomposition (SVD), the standard LSI method, using some standard datasets.
△ Less
Submitted 16 December, 2011;
originally announced December 2011.
-
Design and Implementation of a Simple Web Search Engine
Authors:
Andri Mirzal
Abstract:
We present a simple web search engine for indexing and searching html documents using python programming language. Because python is well known for its simple syntax and strong support for main operating systems, we hope it will be beneficial for learning information retrieval techniques, especially web search engine technology.
We present a simple web search engine for indexing and searching html documents using python programming language. Because python is well known for its simple syntax and strong support for main operating systems, we hope it will be beneficial for learning information retrieval techniques, especially web search engine technology.
△ Less
Submitted 8 February, 2012; v1 submitted 13 December, 2011;
originally announced December 2011.
-
Clustering and Latent Semantic Indexing Aspects of the Singular Value Decomposition
Authors:
Andri Mirzal
Abstract:
This paper discusses clustering and latent semantic indexing (LSI) aspects of the singular value decomposition (SVD). The purpose of this paper is twofold. The first is to give an explanation on how and why the singular vectors can be used in clustering. And the second is to show that the two seemingly unrelated SVD aspects actually originate from the same source: related vertices tend to be more…
▽ More
This paper discusses clustering and latent semantic indexing (LSI) aspects of the singular value decomposition (SVD). The purpose of this paper is twofold. The first is to give an explanation on how and why the singular vectors can be used in clustering. And the second is to show that the two seemingly unrelated SVD aspects actually originate from the same source: related vertices tend to be more clustered in the graph representation of lower rank approximate matrix using the SVD than in the original semantic graph. Accordingly, the SVD can improve retrieval performance of an information retrieval system since queries made to the approximate matrix can retrieve more relevant documents and filter out more irrelevant documents than the same queries made to the original matrix. By utilizing this fact, we will devise an LSI algorithm that mimicks SVD capability in clustering related vertices. Convergence analysis shows that the algorithm is convergent and produces a unique solution for each input. Experimental results using some standard datasets in LSI research show that retrieval performances of the algorithm are comparable to the SVD's. In addition, the algorithm is more practical and easier to use because there is no need to determine decomposition rank which is crucial in driving retrieval performance of the SVD.
△ Less
Submitted 15 November, 2012; v1 submitted 17 November, 2010;
originally announced November 2010.
-
Converged Algorithms for Orthogonal Nonnegative Matrix Factorizations
Authors:
Andri Mirzal
Abstract:
This paper proposes uni-orthogonal and bi-orthogonal nonnegative matrix factorization algorithms with robust convergence proofs. We design the algorithms based on the work of Lee and Seung [1], and derive the converged versions by utilizing ideas from the work of Lin [2]. The experimental results confirm the theoretical guarantees of the convergences.
This paper proposes uni-orthogonal and bi-orthogonal nonnegative matrix factorization algorithms with robust convergence proofs. We design the algorithms based on the work of Lee and Seung [1], and derive the converged versions by utilizing ideas from the work of Lin [2]. The experimental results confirm the theoretical guarantees of the convergences.
△ Less
Submitted 16 March, 2011; v1 submitted 25 October, 2010;
originally announced October 2010.
-
On the clustering aspect of nonnegative matrix factorization
Authors:
Andri Mirzal,
Masashi Furukawa
Abstract:
This paper provides a theoretical explanation on the clustering aspect of nonnegative matrix factorization (NMF). We prove that even without imposing orthogonality nor sparsity constraint on the basis and/or coefficient matrix, NMF still can give clustering results, thus providing a theoretical support for many works, e.g., Xu et al. [1] and Kim et al. [2], that show the superiority of the standar…
▽ More
This paper provides a theoretical explanation on the clustering aspect of nonnegative matrix factorization (NMF). We prove that even without imposing orthogonality nor sparsity constraint on the basis and/or coefficient matrix, NMF still can give clustering results, thus providing a theoretical support for many works, e.g., Xu et al. [1] and Kim et al. [2], that show the superiority of the standard NMF as a clustering method.
△ Less
Submitted 12 June, 2010; v1 submitted 29 May, 2010;
originally announced May 2010.
-
Eigenvectors for clustering: Unipartite, bipartite, and directed graph cases
Authors:
Andri Mirzal,
Masashi Furukawa
Abstract:
This paper presents a concise tutorial on spectral clustering for broad spectrum graphs which include unipartite (undirected) graph, bipartite graph, and directed graph. We show how to transform bipartite graph and directed graph into corresponding unipartite graph, therefore allowing a unified treatment to all cases. In bipartite graph, we show that the relaxed solution to the $K$-way co-clusteri…
▽ More
This paper presents a concise tutorial on spectral clustering for broad spectrum graphs which include unipartite (undirected) graph, bipartite graph, and directed graph. We show how to transform bipartite graph and directed graph into corresponding unipartite graph, therefore allowing a unified treatment to all cases. In bipartite graph, we show that the relaxed solution to the $K$-way co-clustering can be found by computing the left and right eigenvectors of the data matrix. This gives a theoretical basis for $K$-way spectral co-clustering algorithms proposed in the literatures. We also show that solving row and column co-clustering is equivalent to solving row and column clustering separately, thus giving a theoretical support for the claim: ``column clustering implies row clustering and vice versa''. And in the last part, we generalize the Ky Fan theorem---which is the central theorem for explaining spectral clustering---to rectangular complex matrix motivated by the results from bipartite graph analysis.
△ Less
Submitted 12 July, 2010; v1 submitted 14 May, 2010;
originally announced May 2010.
-
Node-Context Network Clustering using PARAFAC Tensor Decomposition
Authors:
Andri Mirzal,
Masashi Furukawa
Abstract:
We describe a clustering method for labeled link network (semantic graph) that can be used to group important nodes (highly connected nodes) with their relevant link's labels by using PARAFAC tensor decomposition. In this kind of network, the adjacency matrix can not be used to fully describe all information about the network structure. We have to expand the matrix into 3-way adjacency tensor, so…
▽ More
We describe a clustering method for labeled link network (semantic graph) that can be used to group important nodes (highly connected nodes) with their relevant link's labels by using PARAFAC tensor decomposition. In this kind of network, the adjacency matrix can not be used to fully describe all information about the network structure. We have to expand the matrix into 3-way adjacency tensor, so that not only the information about to which nodes a node connects to but by which link's labels is also included. And by applying PARAFAC decomposition on this tensor, we get two lists, nodes and link's labels with scores attached to each node and labels, for each decomposition group. So clustering process to get the important nodes along with their relevant labels can be done simply by sorting the lists in decreasing order. To test the method, we construct labeled link network by using blog's dataset, where the blogs are the nodes and labeled links are the shared words among them. The similarity measures between the results and standard measures look promising, especially for two most important tasks, finding the most relevant words to blogs query and finding the most similar blogs to blogs query, about 0.87.
△ Less
Submitted 3 May, 2010;
originally announced May 2010.
-
Weblog Clustering in Multilinear Algebra Perspective
Authors:
Andri Mirzal
Abstract:
This paper describes a clustering method to group the most similar and important weblogs with their descriptive shared words by using a technique from multilinear algebra known as PARAFAC tensor decomposition. The proposed method first creates labeled-link network representation of the weblog datasets, where the nodes are the blogs and the labels are the shared words. Then, 3-way adjacency tenso…
▽ More
This paper describes a clustering method to group the most similar and important weblogs with their descriptive shared words by using a technique from multilinear algebra known as PARAFAC tensor decomposition. The proposed method first creates labeled-link network representation of the weblog datasets, where the nodes are the blogs and the labels are the shared words. Then, 3-way adjacency tensor is extracted from the network and the PARAFAC decomposition is applied to the tensor to get pairs of node lists and label lists with scores attached to each list as the indication of the degree of importance. The clustering is done by sorting the lists in decreasing order and taking the pairs of top ranked blogs and words. Thus, unlike standard co-clustering methods, this method not only groups the similar blogs with their descriptive words but also tends to produce clusters of important blogs and descriptive words.
△ Less
Submitted 12 September, 2009;
originally announced September 2009.
-
A Method for Accelerating the HITS Algorithm
Authors:
Andri Mirzal,
Masashi Furukawa
Abstract:
We present a new method to accelerate the HITS algorithm by exploiting hyperlink structure of the web graph. The proposed algorithm extends the idea of authority and hub scores from HITS by introducing two diagonal matrices which contain constants that act as weights to make authority pages more authoritative and hub pages more hubby. This method works because in the web graph good authorities a…
▽ More
We present a new method to accelerate the HITS algorithm by exploiting hyperlink structure of the web graph. The proposed algorithm extends the idea of authority and hub scores from HITS by introducing two diagonal matrices which contain constants that act as weights to make authority pages more authoritative and hub pages more hubby. This method works because in the web graph good authorities are pointed to by good hubs and good hubs point to good authorities. Consequently, these pages will collect their scores faster under the proposed algorithm than under the standard HITS. We show that the authority and hub vectors of the proposed algorithm exist but are not necessarily be unique, and then give a treatment to ensure the uniqueness property of the vectors. The experimental results show that the proposed algorithm can improve HITS computations, especially for back button datasets.
△ Less
Submitted 3 September, 2009;
originally announced September 2009.
-
On the Relationship between Trading Network and WWW Network: A Preferential Attachment Perspective
Authors:
Andri Mirzal
Abstract:
This paper describes the relationship between trading network and WWW network from preferential attachment mechanism perspective. This mechanism is known to be the underlying principle in the network evolution and has been incorporated to formulate two famous web pages ranking algorithms, PageRank and HITS. We point out the differences between trading network and WWW network in this mechanism, d…
▽ More
This paper describes the relationship between trading network and WWW network from preferential attachment mechanism perspective. This mechanism is known to be the underlying principle in the network evolution and has been incorporated to formulate two famous web pages ranking algorithms, PageRank and HITS. We point out the differences between trading network and WWW network in this mechanism, derive the formulation of HITS-based ranking algorithm for trading network as a direct consequence of the differences, and apply the same framework when deriving the formulation back to the HITS formulation that turns to become a technique to accelerate its convergences.
△ Less
Submitted 29 September, 2009; v1 submitted 22 August, 2009;
originally announced August 2009.