Search | arXiv e-print repository

doi 10.1016/j.ins.2016.07.076

A-Ward_p\b{eta}: Effective hierarchical clustering using the Minkowski metric and a fast k -means initialisation

Authors: Renato Cordeiro de Amorim, Vladimir Makarenkov, Boris Mirkin

Abstract: In this paper we make two novel contributions to hierarchical clustering. First, we introduce an anomalous pattern initialisation method for hierarchical clustering algorithms, called A-Ward, capable of substantially reducing the time they take to converge. This method generates an initial partition with a sufficiently large number of clusters. This allows the cluster merging process to start from… ▽ More In this paper we make two novel contributions to hierarchical clustering. First, we introduce an anomalous pattern initialisation method for hierarchical clustering algorithms, called A-Ward, capable of substantially reducing the time they take to converge. This method generates an initial partition with a sufficiently large number of clusters. This allows the cluster merging process to start from this partition rather than from a trivial partition composed solely of singletons. Our second contribution is an extension of the Ward and Ward p algorithms to the situation where the feature weight exponent can differ from the exponent of the Minkowski distance. This new method, called A-Ward p\b{eta} , is able to generate a much wider variety of clustering solutions. We also demonstrate that its parameters can be estimated reasonably well by using a cluster validity index. We perform numerous experiments using data sets with two types of noise, insertion of noise features and blurring within-cluster values of some features. These experiments allow us to conclude: (i) our anomalous pattern initialisation method does indeed reduce the time a hierarchical clustering algorithm takes to complete, without negatively impacting its cluster recovery ability; (ii) A-Ward p\b{eta} provides better cluster recovery than both Ward and Ward p. △ Less

Submitted 3 November, 2016; originally announced November 2016.

Journal ref: Information Sciences, 370, 343-354 (2016)

arXiv:1607.03200 [pdf, other]

doi 10.1007/s00357-018-9247-0

Qualitative Judgement of Research Impact: Domain Taxonomy as a Fundamental Framework for Judgement of the Quality of Research

Authors: Fionn Murtagh, Michael Orlov, Boris Mirkin

Abstract: The appeal of metric evaluation of research impact has attracted considerable interest in recent times. Although the public at large and administrative bodies are much interested in the idea, scientists and other researchers are much more cautious, insisting that metrics are but an auxiliary instrument to the qualitative peer-based judgement. The goal of this article is to propose availing of such… ▽ More The appeal of metric evaluation of research impact has attracted considerable interest in recent times. Although the public at large and administrative bodies are much interested in the idea, scientists and other researchers are much more cautious, insisting that metrics are but an auxiliary instrument to the qualitative peer-based judgement. The goal of this article is to propose availing of such a well positioned construct as domain taxonomy as a tool for directly assessing the scope and quality of research. We first show how taxonomies can be used to analyse the scope and perspectives of a set of research projects or papers. Then we proceed to define a research team or researcher's rank by those nodes in the hierarchy that have been created or significantly transformed by the results of the researcher. An experimental test of the approach in the data analysis domain is described. Although the concept of taxonomy seems rather simplistic to describe all the richness of a research domain, its changes and use can be made transparent and subject to open discussions. △ Less

Submitted 8 April, 2018; v1 submitted 11 July, 2016; originally announced July 2016.

Comments: 22 pages, 7 figures, Journal of Classification, Online First, March 25, 2018

MSC Class: 68P01 ACM Class: H.0, I.5.3, G.3

arXiv:cs/0503030 [pdf, ps, other]

A Suffix Tree Approach to Email Filtering

Authors: Rajesh M. Pampapathi, Boris Mirkin, Mark Levene

Abstract: We present an approach to email filtering based on the suffix tree data structure. A method for the scoring of emails using the suffix tree is developed and a number of scoring and score normalisation functions are tested. Our results show that the character level representation of emails and classes facilitated by the suffix tree can significantly improve classification accuracy when compared w… ▽ More We present an approach to email filtering based on the suffix tree data structure. A method for the scoring of emails using the suffix tree is developed and a number of scoring and score normalisation functions are tested. Our results show that the character level representation of emails and classes facilitated by the suffix tree can significantly improve classification accuracy when compared with the currently popular methods, such as naive Bayes. We believe the method can be extended to the classification of documents in other domains. △ Less

Submitted 6 December, 2005; v1 submitted 14 March, 2005; originally announced March 2005.

Comments: Revisions made in the light of reviewer comments. Main changes: (i) The extension and elaboration of section 4.4 which describes the scoring algorithm; (ii) Favouring the use of false positive and false negative performance measures over the use of precision and recall; (iii) The addition of ROC curves wherever possible; and (iv) Inclusion of performance statistics for algorithm. Re-submitted 5th August 2005

Showing 1–3 of 3 results for author: Mirkin, B