Computer Science > Data Structures and Algorithms
[Submitted on 22 May 2015]
Title:Nearest Neighbor based Clustering Algorithm for Large Data Sets
View PDFAbstract:Clustering is an unsupervised learning technique in which data or objects are grouped into sets based on some similarity measure. Most of the clustering algorithms assume that the main memory is infinite and can accommodate the set of patterns. In reality many applications give rise to a large set of patterns which does not fit in the main memory. When the data set is too large, much of the data is stored in the secondary memory. Input/Outputs (I/O) from the disk are the major bottleneck in designing efficient clustering algorithms for large data sets. Different designing techniques have been used to design clustering algorithms for large data sets. External memory algorithms are one class of algorithms which can be used for large data sets. These algorithms exploit the hierarchical memory structure of the computers by incorporating locality of reference directly in the algorithm. This paper makes some contribution towards designing clustering algorithms in the external memory model (Proposed by Aggarwal and Vitter 1988) to make the algorithms scalable. In this paper, it is shown that the Shared near neighbors algorithm is not very I/O efficient since the computational complexity is same as the I/O complexity. The algorithm is designed in the external memory model and I/O complexity is reduced. The computational complexity remains same. We substantiate the theoretical analysis by showing the performance of the algorithms with their traditional counterpart by implementing in STXXL library.
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.