Skip to main content

Showing 1–3 of 3 results for author: Capó, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2012.03628  [pdf, other

    stat.ML cs.LG

    K-means for Evolving Data Streams

    Authors: Arkaitz Bidaurrazaga, Aritz Pérez, Marco Capó

    Abstract: Currently the amount of data produced worldwide is increasing beyond measure, thus a high volume of unsupervised data must be processed continuously. One of the main unsupervised data analysis is clustering. In streaming data scenarios, the data is composed by an increasing sequence of batches of samples where the concept drift phenomenon may happen. In this paper, we formally define the Streaming… ▽ More

    Submitted 17 September, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: This is a extended version of a short paper published in ICDM 2021. Please cite the short paper instead of this version. As soon as it is published we will add how to reference it

    MSC Class: 62H30 ACM Class: I.5.3

  2. arXiv:1801.02949  [pdf, other

    stat.ML cs.LG

    An efficient K -means clustering algorithm for massive data

    Authors: Marco Capó, Aritz Pérez, Jose A. Lozano

    Abstract: The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the implementation and relatively low computational cost. Among these algorithms, the K -means algorithm stands out as the most popular approach, besides its high depende… ▽ More

    Submitted 9 January, 2018; originally announced January 2018.

  3. arXiv:1605.02989  [pdf, ps, other

    stat.ML cs.LG

    An efficient K-means algorithm for Massive Data

    Authors: Marco Capó, Aritz Pérez, José Antonio Lozano

    Abstract: Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to ma- nipulate and analyze such information. Even though datasets have grown in size, the K-means algorithm remains as one of the most popular clustering methods, in spite of its dependency on the initial settings and high computational cost, especially in terms of di… ▽ More

    Submitted 10 May, 2016; originally announced May 2016.

    Comments: 38 pages, 10 figures