Skip to main content

Showing 1–4 of 4 results for author: Race, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:1408.5427  [pdf, other

    stat.ML cs.CL cs.IR cs.LG

    A Case Study in Text Mining: Interpreting Twitter Data From World Cup Tweets

    Authors: Daniel Godfrey, Caley Johns, Carl Meyer, Shaina Race, Carol Sadek

    Abstract: Cluster analysis is a field of data analysis that extracts underlying patterns in data. One application of cluster analysis is in text-mining, the analysis of large collections of text to find similarities between documents. We used a collection of about 30,000 tweets extracted from Twitter just before the World Cup started. A common problem with real world text data is the presence of linguistic… ▽ More

    Submitted 21 August, 2014; originally announced August 2014.

    ACM Class: I.5.4; I.2.7; H.2.8; H.3.3

  2. arXiv:1408.0972  [pdf, other

    stat.ML cs.CV cs.LG

    A Flexible Iterative Framework for Consensus Clustering

    Authors: Shaina Race, Carl Meyer

    Abstract: A novel framework for consensus clustering is presented which has the ability to determine both the number of clusters and a final solution using multiple algorithms. A consensus similarity matrix is formed from an ensemble using multiple algorithms and several values for k. A variety of dimension reduction techniques and clustering algorithms are considered for analysis. For noisy or high-dimensi… ▽ More

    Submitted 5 August, 2014; originally announced August 2014.

  3. arXiv:1408.0967  [pdf, other

    stat.ML cs.CV cs.LG

    Determining the Number of Clusters via Iterative Consensus Clustering

    Authors: Shaina Race, Carl Meyer, Kevin Valakuzhy

    Abstract: We use a cluster ensemble to determine the number of clusters, k, in a group of data. A consensus similarity matrix is formed from the ensemble using multiple algorithms and several values for k. A random walk is induced on the graph defined by the consensus matrix and the eigenvalues of the associated transition probability matrix are used to determine the number of clusters. For noisy or high-di… ▽ More

    Submitted 5 August, 2014; originally announced August 2014.

    Comments: Proceedings of the 2013 SIAM International Conference on Data Mining

  4. arXiv:1211.4142  [pdf, other

    stat.ML cs.LG

    Data Clustering via Principal Direction Gap Partitioning

    Authors: Ralph Abbey, Jeremy Diepenbrock, Amy Langville, Carl Meyer, Shaina Race, Dexin Zhou

    Abstract: We explore the geometrical interpretation of the PCA based clustering algorithm Principal Direction Divisive Partitioning (PDDP). We give several examples where this algorithm breaks down, and suggest a new method, gap partitioning, which takes into account natural gaps in the data between clusters. Geometric features of the PCA space are derived and illustrated and experimental results are given… ▽ More

    Submitted 17 November, 2012; originally announced November 2012.