Showing 1–2 of 2 results for author: Zambelli, A

Search v0.5.6 released 2020-02-24

arXiv:2112.13680 [pdf, ps, other]

cs.LG q-bio.QM stat.ME

Ensemble Method for Cluster Number Determination and Algorithm Selection in Unsupervised Learning

Authors: Antoine Zambelli

Abstract: Unsupervised learning, and more specifically clustering, suffers from the need for expertise in the field to be of use. Researchers must make careful and informed decisions on which algorithm to use with which set of hyperparameters for a given dataset. Additionally, researchers may need to determine the number of clusters in the dataset, which is unfortunately itself an input to most clustering a… ▽ More Unsupervised learning, and more specifically clustering, suffers from the need for expertise in the field to be of use. Researchers must make careful and informed decisions on which algorithm to use with which set of hyperparameters for a given dataset. Additionally, researchers may need to determine the number of clusters in the dataset, which is unfortunately itself an input to most clustering algorithms. All of this before embarking on their actual subject matter work. After quantifying the impact of algorithm and hyperparameter selection, we propose an ensemble clustering framework which can be leveraged with minimal input. It can be used to determine both the number of clusters in the dataset and a suitable choice of algorithm to use for a given dataset. A code library is included in the Conclusion for ease of integration. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: 8 pages, 5 tables, preprint
arXiv:1608.04700 [pdf, other]

q-bio.QM cs.LG stat.ME

A Data-Driven Approach to Estimating the Number of Clusters in Hierarchical Clustering

Authors: Antoine Zambelli

Abstract: We propose two new methods for estimating the number of clusters in a hierarchical clustering framework in the hopes of creating a fully automated process with no human intervention. The methods are completely data-driven and require no input from the researcher, and as such are fully automated. They are quite easy to implement and not computationally intensive in the least. We analyze performance… ▽ More We propose two new methods for estimating the number of clusters in a hierarchical clustering framework in the hopes of creating a fully automated process with no human intervention. The methods are completely data-driven and require no input from the researcher, and as such are fully automated. They are quite easy to implement and not computationally intensive in the least. We analyze performance on several simulated data sets and the Biobase Gene Expression Set, comparing our methods to the established Gap statistic and Elbow methods and outperforming both in multi-cluster scenarios. △ Less

Submitted 16 August, 2016; originally announced August 2016.

Comments: 6 pages, 7 figures, 12 tables

MSC Class: 62-07

Search v0.5.6 released 2020-02-24