Clustering and Latent Semantic Indexing Aspects of the Singular Value Decomposition

Mirzal, Andri

Computer Science > Machine Learning

arXiv:1011.4104 (cs)

[Submitted on 17 Nov 2010 (v1), last revised 16 Nov 2012 (this version, v4)]

Title:Clustering and Latent Semantic Indexing Aspects of the Singular Value Decomposition

Authors:Andri Mirzal

View PDF

Abstract:This paper discusses clustering and latent semantic indexing (LSI) aspects of the singular value decomposition (SVD). The purpose of this paper is twofold. The first is to give an explanation on how and why the singular vectors can be used in clustering. And the second is to show that the two seemingly unrelated SVD aspects actually originate from the same source: related vertices tend to be more clustered in the graph representation of lower rank approximate matrix using the SVD than in the original semantic graph. Accordingly, the SVD can improve retrieval performance of an information retrieval system since queries made to the approximate matrix can retrieve more relevant documents and filter out more irrelevant documents than the same queries made to the original matrix. By utilizing this fact, we will devise an LSI algorithm that mimicks SVD capability in clustering related vertices. Convergence analysis shows that the algorithm is convergent and produces a unique solution for each input. Experimental results using some standard datasets in LSI research show that retrieval performances of the algorithm are comparable to the SVD's. In addition, the algorithm is more practical and easier to use because there is no need to determine decomposition rank which is crucial in driving retrieval performance of the SVD.

Comments:	38 pages, submitted to Pattern Recognition
Subjects:	Machine Learning (cs.LG); Numerical Analysis (math.NA); Spectral Theory (math.SP)
MSC classes:	15A18, 65F15
Cite as:	arXiv:1011.4104 [cs.LG]
	(or arXiv:1011.4104v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1011.4104

Submission history

From: Andri Mirzal [view email]
[v1] Wed, 17 Nov 2010 23:39:12 UTC (75 KB)
[v2] Wed, 9 Mar 2011 18:56:56 UTC (75 KB)
[v3] Wed, 17 Oct 2012 08:41:06 UTC (1 KB) (withdrawn)
[v4] Fri, 16 Nov 2012 04:26:29 UTC (153 KB)

Computer Science > Machine Learning

Title:Clustering and Latent Semantic Indexing Aspects of the Singular Value Decomposition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Clustering and Latent Semantic Indexing Aspects of the Singular Value Decomposition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators