-
Generation of Consistent Sets of Multi-Label Classification Rules with a Multi-Objective Evolutionary Algorithm
Authors:
Thiago Zafalon Miranda,
Diorge Brognara Sardinha,
Márcio Porto Basgalupp,
Yaochu **,
Ricardo Cerri
Abstract:
Multi-label classification consists in classifying an instance into two or more classes simultaneously. It is a very challenging task present in many real-world applications, such as classification of biology, image, video, audio, and text. Recently, the interest in interpretable classification models has grown, partially as a consequence of regulations such as the General Data Protection Regulati…
▽ More
Multi-label classification consists in classifying an instance into two or more classes simultaneously. It is a very challenging task present in many real-world applications, such as classification of biology, image, video, audio, and text. Recently, the interest in interpretable classification models has grown, partially as a consequence of regulations such as the General Data Protection Regulation. In this context, we propose a multi-objective evolutionary algorithm that generates multiple rule-based multi-label classification models, allowing users to choose among models that offer different compromises between predictive power and interpretability. An important contribution of this work is that different from most algorithms, which usually generate models based on lists (ordered collections) of rules, our algorithm generates models based on sets (unordered collections) of rules, increasing interpretability. Also, by employing a conflict avoidance algorithm during the rule-creation, every rule within a given model is guaranteed to be consistent with every other rule in the same model. Thus, no conflict resolution strategy is required, evolving simpler models. We conducted experiments on synthetic and real-world datasets and compared our results with state-of-the-art algorithms in terms of predictive performance (F-Score) and interpretability (model size), and demonstrate that our best models had comparable F-Score and smaller model sizes.
△ Less
Submitted 27 March, 2020;
originally announced March 2020.
-
Preventing the Generation of Inconsistent Sets of Classification Rules
Authors:
Thiago Zafalon Miranda,
Diorge Brognara Sardinha,
Ricardo Cerri
Abstract:
In recent years, the interest in interpretable classification models has grown. One of the proposed ways to improve the interpretability of a rule-based classification model is to use sets (unordered collections) of rules, instead of lists (ordered collections) of rules. One of the problems associated with sets is that multiple rules may cover a single instance, but predict different classes for i…
▽ More
In recent years, the interest in interpretable classification models has grown. One of the proposed ways to improve the interpretability of a rule-based classification model is to use sets (unordered collections) of rules, instead of lists (ordered collections) of rules. One of the problems associated with sets is that multiple rules may cover a single instance, but predict different classes for it, thus requiring a conflict resolution strategy. In this work, we propose two algorithms capable of finding feature-space regions inside which any created rule would be consistent with the already existing rules, preventing inconsistencies from arising. Our algorithms do not generate classification models, but are instead meant to enhance algorithms that do so, such as Learning Classifier Systems. Both algorithms are described and analyzed exclusively from a theoretical perspective, since we have not modified a model-generating algorithm to incorporate our proposed solutions yet. This work presents the novelty of using conflict avoidance strategies instead of conflict resolution strategies.
△ Less
Submitted 30 March, 2020; v1 submitted 23 August, 2019;
originally announced August 2019.
-
Beagle: Automated Extraction and Interpretation of Visualizations from the Web
Authors:
Leilani Battle,
Peitong Duan,
Zachery Miranda,
Dana Mukusheva,
Remco Chang,
Michael Stonebraker
Abstract:
"How common is interactive visualization on the web?" "What is the most popular visualization design?" "How prevalent are pie charts really?" These questions intimate the role of interactive visualization in the real (online) world. In this paper, we present our approach (and findings) to answering these questions. First, we introduce Beagle, which mines the web for SVG-based visualizations and au…
▽ More
"How common is interactive visualization on the web?" "What is the most popular visualization design?" "How prevalent are pie charts really?" These questions intimate the role of interactive visualization in the real (online) world. In this paper, we present our approach (and findings) to answering these questions. First, we introduce Beagle, which mines the web for SVG-based visualizations and automatically classifies them by type (i.e., bar, pie, etc.). With Beagle, we extract over 41,000 visualizations across five different tools and repositories, and classify them with 86% accuracy, across 24 visualization types. Given this visualization collection, we study usage across tools. We find that most visualizations fall under four types: bar charts, line charts, scatter charts, and geographic maps. Though controversial, pie charts are relatively rare in practice. Our findings also indicate that users may prefer tools that emphasize a succinct set of visualization types, and provide diverse expert visualization examples.
△ Less
Submitted 16 November, 2017;
originally announced November 2017.
-
Selecting Representative Examples for Program Synthesis
Authors:
Yewen Pu,
Zachery Miranda,
Armando Solar-Lezama,
Leslie Pack Kaelbling
Abstract:
Program synthesis is a class of regression problems where one seeks a solution, in the form of a source-code program, map** the inputs to their corresponding outputs exactly. Due to its precise and combinatorial nature, program synthesis is commonly formulated as a constraint satisfaction problem, where input-output examples are encoded as constraints and solved with a constraint solver. A key c…
▽ More
Program synthesis is a class of regression problems where one seeks a solution, in the form of a source-code program, map** the inputs to their corresponding outputs exactly. Due to its precise and combinatorial nature, program synthesis is commonly formulated as a constraint satisfaction problem, where input-output examples are encoded as constraints and solved with a constraint solver. A key challenge of this formulation is scalability: while constraint solvers work well with a few well-chosen examples, a large set of examples can incur significant overhead in both time and memory. We describe a method to discover a subset of examples that is both small and representative: the subset is constructed iteratively, using a neural network to predict the probability of unchosen examples conditioned on the chosen examples in the subset, and greedily adding the least probable example. We empirically evaluate the representativeness of the subsets constructed by our method, and demonstrate such subsets can significantly improve synthesis time and stability.
△ Less
Submitted 7 June, 2018; v1 submitted 8 November, 2017;
originally announced November 2017.