Showing 1–2 of 2 results for author: Starosta, B

Search v0.5.6 released 2020-02-24

arXiv:2308.10999 [pdf, other]

cs.LG cs.AI cs.IR

Eigenvalue-based Incremental Spectral Clustering

Authors: Mieczysław A. Kłopotek, Bartłmiej Starosta, Sławomir T. Wierzchoń

Abstract: Our previous experiments demonstrated that subsets collections of (short) documents (with several hundred entries) share a common normalized in some way eigenvalue spectrum of combinatorial Laplacian. Based on this insight, we propose a method of incremental spectral clustering. The method consists of the following steps: (1) split the data into manageable subsets, (2) cluster each of the subsets,… ▽ More Our previous experiments demonstrated that subsets collections of (short) documents (with several hundred entries) share a common normalized in some way eigenvalue spectrum of combinatorial Laplacian. Based on this insight, we propose a method of incremental spectral clustering. The method consists of the following steps: (1) split the data into manageable subsets, (2) cluster each of the subsets, (3) merge clusters from different subsets based on the eigenvalue spectrum similarity to form clusters of the entire set. This method can be especially useful for clustering methods of complexity strongly increasing with the size of the data sample,like in case of typical spectral clustering. Experiments were performed showing that in fact the clustering and merging the subsets yields clusters close to clustering the entire dataset. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: 14 tables, 6 figures
arXiv:2308.00504 [pdf, other]

cs.LG cs.AI cs.IR

Explainable Graph Spectral Clustering of Text Documents

Authors: Bartłomiej Starosta, Mieczysław A. Kłopotek, Sławomir T. Wierzchoń

Abstract: Spectral clustering methods are known for their ability to represent clusters of diverse shapes, densities etc. However, results of such algorithms, when applied e.g. to text documents, are hard to explain to the user, especially due to embedding in the spectral space which has no obvious relation to document contents. Therefore there is an urgent need to elaborate methods for explaining the outco… ▽ More Spectral clustering methods are known for their ability to represent clusters of diverse shapes, densities etc. However, results of such algorithms, when applied e.g. to text documents, are hard to explain to the user, especially due to embedding in the spectral space which has no obvious relation to document contents. Therefore there is an urgent need to elaborate methods for explaining the outcome of the clustering. This paper presents a contribution towards this goal. We present a proposal of explanation of results of combinatorial Laplacian based graph spectral clustering. It is based on showing (approximate) equivalence of combinatorial Laplacian embedding, $K$-embedding (proposed in this paper) and term vector space embedding. Hence a bridge is constructed between the textual contents and the clustering results. We provide theoretical background for this approach. We performed experimental study showing that $K$-embedding approximates well Laplacian embedding under favourable block matrix conditions and show that approximation is good enough under other conditions. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: 4 figures, 15 tables

Search v0.5.6 released 2020-02-24