Search | arXiv e-print repository

arXiv:2004.09397 [pdf, other]

Multi-label Stream Classification with Self-Organizing Maps

Authors: Ricardo Cerri, Joel David Costa Júnior, Elaine Ribeiro de Faria Paiva, João Manuel Portela da Gama

Abstract: Several learning algorithms have been proposed for offline multi-label classification. However, applications in areas such as traffic monitoring, social networks, and sensors produce data continuously, the so called data streams, posing challenges to batch multi-label learning. With the lack of stationarity in the distribution of data streams, new algorithms are needed to online adapt to such chan… ▽ More Several learning algorithms have been proposed for offline multi-label classification. However, applications in areas such as traffic monitoring, social networks, and sensors produce data continuously, the so called data streams, posing challenges to batch multi-label learning. With the lack of stationarity in the distribution of data streams, new algorithms are needed to online adapt to such changes (concept drift). Also, in realistic applications, changes occur in scenarios of infinitely delayed labels, where the true classes of the arrival instances are never available. We propose an online unsupervised incremental method based on self-organizing maps for multi-label stream classification with infinitely delayed labels. In the classification phase, we use a k-nearest neighbors strategy to compute the winning neurons in the maps, adapting to concept drift by online adjusting neuron weight vectors and dataset label cardinality. We predict labels for each instance using the Bayes rule and the outputs of each neuron, adapting the probabilities and conditional probabilities of the classes in the stream. Experiments using synthetic and real datasets show that our method is highly competitive with several ones from the literature, in both stationary and concept drift scenarios. △ Less

Submitted 20 April, 2020; originally announced April 2020.

Comments: 7 pages, 14 figures

ACM Class: I.2.6

arXiv:2003.12526 [pdf, other]

Generation of Consistent Sets of Multi-Label Classification Rules with a Multi-Objective Evolutionary Algorithm

Authors: Thiago Zafalon Miranda, Diorge Brognara Sardinha, Márcio Porto Basgalupp, Yaochu **, Ricardo Cerri

Abstract: Multi-label classification consists in classifying an instance into two or more classes simultaneously. It is a very challenging task present in many real-world applications, such as classification of biology, image, video, audio, and text. Recently, the interest in interpretable classification models has grown, partially as a consequence of regulations such as the General Data Protection Regulati… ▽ More Multi-label classification consists in classifying an instance into two or more classes simultaneously. It is a very challenging task present in many real-world applications, such as classification of biology, image, video, audio, and text. Recently, the interest in interpretable classification models has grown, partially as a consequence of regulations such as the General Data Protection Regulation. In this context, we propose a multi-objective evolutionary algorithm that generates multiple rule-based multi-label classification models, allowing users to choose among models that offer different compromises between predictive power and interpretability. An important contribution of this work is that different from most algorithms, which usually generate models based on lists (ordered collections) of rules, our algorithm generates models based on sets (unordered collections) of rules, increasing interpretability. Also, by employing a conflict avoidance algorithm during the rule-creation, every rule within a given model is guaranteed to be consistent with every other rule in the same model. Thus, no conflict resolution strategy is required, evolving simpler models. We conducted experiments on synthetic and real-world datasets and compared our results with state-of-the-art algorithms in terms of predictive performance (F-Score) and interpretability (model size), and demonstrate that our best models had comparable F-Score and smaller model sizes. △ Less

Submitted 27 March, 2020; originally announced March 2020.

arXiv:1908.09652 [pdf, other]

Preventing the Generation of Inconsistent Sets of Classification Rules

Authors: Thiago Zafalon Miranda, Diorge Brognara Sardinha, Ricardo Cerri

Abstract: In recent years, the interest in interpretable classification models has grown. One of the proposed ways to improve the interpretability of a rule-based classification model is to use sets (unordered collections) of rules, instead of lists (ordered collections) of rules. One of the problems associated with sets is that multiple rules may cover a single instance, but predict different classes for i… ▽ More In recent years, the interest in interpretable classification models has grown. One of the proposed ways to improve the interpretability of a rule-based classification model is to use sets (unordered collections) of rules, instead of lists (ordered collections) of rules. One of the problems associated with sets is that multiple rules may cover a single instance, but predict different classes for it, thus requiring a conflict resolution strategy. In this work, we propose two algorithms capable of finding feature-space regions inside which any created rule would be consistent with the already existing rules, preventing inconsistencies from arising. Our algorithms do not generate classification models, but are instead meant to enhance algorithms that do so, such as Learning Classifier Systems. Both algorithms are described and analyzed exclusively from a theoretical perspective, since we have not modified a model-generating algorithm to incorporate our proposed solutions yet. This work presents the novelty of using conflict avoidance strategies instead of conflict resolution strategies. △ Less

Submitted 30 March, 2020; v1 submitted 23 August, 2019; originally announced August 2019.

MSC Class: 68T05; 68T20; 68T30

arXiv:1812.02207 [pdf, other]

doi 10.1007/s10618-024-01002-5

Better Trees: An empirical study on hyperparameter tuning of classification decision tree induction algorithms

Authors: Rafael Gomes Mantovani, Tomáš Horváth, André L. D. Rossi, Ricardo Cerri, Sylvio Barbon Junior, Joaquin Vanschoren, André Carlos Ponce de Leon Ferreira de Carvalho

Abstract: Machine learning algorithms often contain many hyperparameters (HPs) whose values affect the predictive performance of the induced models in intricate ways. Due to the high number of possibilities for these HP configurations and their complex interactions, it is common to use optimization techniques to find settings that lead to high predictive performance. However, insights into efficiently explo… ▽ More Machine learning algorithms often contain many hyperparameters (HPs) whose values affect the predictive performance of the induced models in intricate ways. Due to the high number of possibilities for these HP configurations and their complex interactions, it is common to use optimization techniques to find settings that lead to high predictive performance. However, insights into efficiently exploring this vast space of configurations and dealing with the trade-off between predictive and runtime performance remain challenging. Furthermore, there are cases where the default HPs fit the suitable configuration. Additionally, for many reasons, including model validation and attendance to new legislation, there is an increasing interest in interpretable models, such as those created by the Decision Tree (DT) induction algorithms. This paper provides a comprehensive approach for investigating the effects of hyperparameter tuning for the two DT induction algorithms most often used, CART and C4.5. DT induction algorithms present high predictive performance and interpretable classification models, though many HPs need to be adjusted. Experiments were carried out with different tuning strategies to induce models and to evaluate HPs' relevance using 94 classification datasets from OpenML. The experimental results point out that different HP profiles for the tuning of each algorithm provide statistically significant improvements in most of the datasets for CART, but only in one-third for C4.5. Although different algorithms may present different tuning scenarios, the tuning techniques generally required few evaluations to find accurate solutions. Furthermore, the best technique for all the algorithms was the IRACE. Finally, we found out that tuning a specific small subset of HPs is a good alternative for achieving optimal predictive performance. △ Less

Submitted 21 December, 2023; v1 submitted 5 December, 2018; originally announced December 2018.

Comments: 60 pages, 16 figures

arXiv:1708.09284 [pdf]

Automated Setup to Accurately Calibrate Electrical DC Voltage Generators

Authors: Flavio Galliana, Pier Paolo Capra, Roberto Cerri, Marco Lanzillotti

Abstract: At National Institute of Metrological Research (INRIM), an automated setup to calibrate DC Voltage generators, mainly top-level calibrators from 1 mV to 1 kV has been developed. The heart of the setup is an INRIM-built automated fixed ratios DC Voltage divider. The significant achievement of this setup is the possibility to interconnect the divider, a DMM characterized in linearity, a DC Voltage S… ▽ More At National Institute of Metrological Research (INRIM), an automated setup to calibrate DC Voltage generators, mainly top-level calibrators from 1 mV to 1 kV has been developed. The heart of the setup is an INRIM-built automated fixed ratios DC Voltage divider. The significant achievement of this setup is the possibility to interconnect the divider, a DMM characterized in linearity, a DC Voltage Standard and a DC Voltage generator under calibration and automatically to manage the calibration process. This calibration method allows to save a lot of time, to improve the reliability and to increase the accuracy of the calibration of generators. The relative uncertainties of the system span from 0.6x10-6 to 1.2x10-4 improving the previous capabilities of the INRIM laboratory for calibration of programmable multifunction instruments. In addition, this system allows to avoid the employment of several Standards (some of them still manually operating) carrying out the entire process without changing the setup configuration and without the presence of operators. The concept of this setup can be transferred to secondary high-level electrical calibration Laboratories that could be consider it useful for their calibration activities. △ Less

Submitted 30 August, 2017; originally announced August 2017.

Comments: 6 pages 8 figures

arXiv:1605.04412 [pdf]

High perfomance selectable value transportable high dc Voltage standard

Authors: Flavio Galliana, Roberto Cerri, Luca Roncaglione Tet

Abstract: At National Institute of Metrological Research (INRIM), a selectable-value Transportable High dcVoltage Standard (THVS) operating in the range from 10 V to 100 V in steps of 10 V, was developed. This Standard was built to cover the lack of high level dc Voltage Standards at voltages higher than 10 V to employ as laboratory (local) or travelling Standards for Inter-Laboratory Comparisons (ILCs). A… ▽ More At National Institute of Metrological Research (INRIM), a selectable-value Transportable High dcVoltage Standard (THVS) operating in the range from 10 V to 100 V in steps of 10 V, was developed. This Standard was built to cover the lack of high level dc Voltage Standards at voltages higher than 10 V to employ as laboratory (local) or travelling Standards for Inter-Laboratory Comparisons (ILCs). A ground-mobile electronic technique was used to enhance the accuracy of the THVS at the higher values. The THVS shows better noise, better short-mid-term stability than top level dc Voltage and multifunction calibrators (MFCs) and better suitability and insensibility to be transported than these instruments. The project is extensible to 1000 V. △ Less

Submitted 14 May, 2016; originally announced May 2016.

Showing 1–6 of 6 results for author: Cerri, R