-
Combining shape and contour features to improve tool wear monitoring in milling processes
Authors:
M. T. García-Ordás,
E. Alegre-Gutiérrez,
V. González-Castro,
R. Alaiz-Rodríguez
Abstract:
In this paper, a new system based on combinations of a shape descriptor and a contour descriptor has been proposed for classifying inserts in milling processes according to their wear level following a computer vision based approach. To describe the wear region shape we have proposed a new descriptor called ShapeFeat and its contour has been characterized using the method BORCHIZ that, to the best…
▽ More
In this paper, a new system based on combinations of a shape descriptor and a contour descriptor has been proposed for classifying inserts in milling processes according to their wear level following a computer vision based approach. To describe the wear region shape we have proposed a new descriptor called ShapeFeat and its contour has been characterized using the method BORCHIZ that, to the best of our knowledge, achieves the best performance for tool wear monitoring following a computer vision-based approach. Results show that the combination of BORCHIZ with ShapeFeat using a late fusion method improves the classification performance significantly, obtaining an accuracy of 91.44% in the binary classification (i.e. the classification of the wear as high or low) and 82.90% using three target classes (i.e. classification of the wear as high, medium or low). These results outperform the ones obtained by both descriptors used on their own, which achieve accuracies of 88.70 and 80.67% for two and three classes, respectively, using ShapeFeat and 87.06 and 80.24% with B-ORCHIZ. This study yielded encouraging results for the manufacturing community in order to classify automatically the inserts in terms of their wear for milling processes.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Tool wear monitoring using an online, automatic and low cost system based on local texture
Authors:
M. T. García-Ordás,
E. Alegre-Gutiérrez,
R. Alaiz-Rodríguez,
V. González-Castro
Abstract:
In this work we propose a new online, low cost and fast approach based on computer vision and machine learning to determine whether cutting tools used in edge profile milling processes are serviceable or disposable based on their wear level. We created a new dataset of 254 images of edge profile cutting heads which is, to the best of our knowledge, the first publicly available dataset with enough…
▽ More
In this work we propose a new online, low cost and fast approach based on computer vision and machine learning to determine whether cutting tools used in edge profile milling processes are serviceable or disposable based on their wear level. We created a new dataset of 254 images of edge profile cutting heads which is, to the best of our knowledge, the first publicly available dataset with enough quality for this purpose. All the inserts were segmented and their cutting edges were cropped, obtaining 577 images of cutting edges: 301 functional and 276 disposable. The proposed method is based on (1) dividing the cutting edge image in different regions, called Wear Patches (WP), (2) characterising each one as worn or serviceable using texture descriptors based on different variants of Local Binary Patterns (LBP) and (3) determine, based on the state of these WP, if the cutting edge (and, therefore, the tool) is serviceable or disposable. We proposed and assessed five different patch division configurations. The individual WP were classified by a Support Vector Machine (SVM) with an intersection kernel. The best patch division configuration and texture descriptor for the WP achieves an accuracy of 90.26% in the detection of the disposable cutting edges. These results show a very promising opportunity for automatic wear monitoring in edge profile milling processes.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Classifying spam emails using agglomerative hierarchical clustering and a topic-based approach
Authors:
F. Janez-Martino,
R. Alaiz-Rodriguez,
V. Gonzalez-Castro,
E. Fidalgo,
E. Alegre
Abstract:
Spam emails are unsolicited, annoying and sometimes harmful messages which may contain malware, phishing or hoaxes. Unlike most studies that address the design of efficient anti-spam filters, we approach the spam email problem from a different and novel perspective. Focusing on the needs of cybersecurity units, we follow a topic-based approach for addressing the classification of spam email into m…
▽ More
Spam emails are unsolicited, annoying and sometimes harmful messages which may contain malware, phishing or hoaxes. Unlike most studies that address the design of efficient anti-spam filters, we approach the spam email problem from a different and novel perspective. Focusing on the needs of cybersecurity units, we follow a topic-based approach for addressing the classification of spam email into multiple categories. We propose SPEMC-15K-E and SPEMC-15K-S, two novel datasets with approximately 15K emails each in English and Spanish, respectively, and we label them using agglomerative hierarchical clustering into 11 classes. We evaluate 16 pipelines, combining four text representation techniques -Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words, Word2Vec and BERT- and four classifiers: Support Vector Machine, Näive Bayes, Random Forest and Logistic Regression. Experimental results show that the highest performance is achieved with TF-IDF and LR for the English dataset, with a F1 score of 0.953 and an accuracy of 94.6%, and while for the Spanish dataset, TF-IDF with NB yields a F1 score of 0.945 and 98.5% accuracy. Regarding the processing time, TF-IDF with LR leads to the fastest classification, processing an English and Spanish spam email in and on average, respectively.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Efficient Detection of Botnet Traffic by features selection and Decision Trees
Authors:
Javier Velasco-Mata,
Víctor González-Castro,
Eduardo Fidalgo,
Enrique Alegre
Abstract:
Botnets are one of the online threats with the biggest presence, causing billionaire losses to global economies. Nowadays, the increasing number of devices connected to the Internet makes it necessary to analyze large amounts of network traffic data. In this work, we focus on increasing the performance on botnet traffic classification by selecting those features that further increase the detection…
▽ More
Botnets are one of the online threats with the biggest presence, causing billionaire losses to global economies. Nowadays, the increasing number of devices connected to the Internet makes it necessary to analyze large amounts of network traffic data. In this work, we focus on increasing the performance on botnet traffic classification by selecting those features that further increase the detection rate. For this purpose we use two feature selection techniques, Information Gain and Gini Importance, which led to three pre-selected subsets of five, six and seven features. Then, we evaluate the three feature subsets along with three models, Decision Tree, Random Forest and k-Nearest Neighbors. To test the performance of the three feature vectors and the three models we generate two datasets based on the CTU-13 dataset, namely QB-CTU13 and EQB-CTU13. We measure the performance as the macro averaged F1 score over the computational time required to classify a sample. The results show that the highest performance is achieved by Decision Trees using a five feature set which obtained a mean F1 score of 85% classifying each sample in an average time of 0.78 microseconds.
△ Less
Submitted 30 June, 2021;
originally announced July 2021.
-
Machine learning of neuroimaging to diagnose cognitive impairment and dementia: a systematic review and comparative analysis
Authors:
Enrico Pellegrini,
Lucia Ballerini,
Maria del C. Valdes Hernandez,
Francesca M. Chappell,
Victor González-Castro,
Devasuda Anblagan,
Samuel Danso,
Susana Muñoz Maniega,
Dominic Job,
Cyril Pernet,
Grant Mair,
Tom MacGillivray,
Emanuele Trucco,
Joanna Wardlaw
Abstract:
INTRODUCTION: Advanced machine learning methods might help to identify dementia risk from neuroimaging, but their accuracy to date is unclear.
METHODS: We systematically reviewed the literature, 2006 to late 2016, for machine learning studies differentiating healthy ageing through to dementia of various types, assessing study quality, and comparing accuracy at different disease boundaries.
RES…
▽ More
INTRODUCTION: Advanced machine learning methods might help to identify dementia risk from neuroimaging, but their accuracy to date is unclear.
METHODS: We systematically reviewed the literature, 2006 to late 2016, for machine learning studies differentiating healthy ageing through to dementia of various types, assessing study quality, and comparing accuracy at different disease boundaries.
RESULTS: Of 111 relevant studies, most assessed Alzheimer's disease (AD) vs healthy controls, used ADNI data, support vector machines and only T1-weighted sequences. Accuracy was highest for differentiating AD from healthy controls, and poor for differentiating healthy controls vs MCI vs AD, or MCI converters vs non-converters. Accuracy increased using combined data types, but not by data source, sample size or machine learning method.
DISCUSSION: Machine learning does not differentiate clinically-relevant disease categories yet. More diverse datasets, combinations of different types of data, and close clinical integration of machine learning would help to advance the field.
△ Less
Submitted 11 April, 2018; v1 submitted 5 April, 2018;
originally announced April 2018.