Search | arXiv e-print repository

Description Complexity of Unary Structures in First-Order Logic with Links to Entropy

Authors: Reijo Jaakkola, Antti Kuusisto, Miikka Vilander

Abstract: The description complexity of a model is the length of the shortest formula that defines the model. We study the description complexity of unary structures in first-order logic FO, also drawing links to semantic complexity in the form of entropy. The class of unary structures provides a simple way to represent tabular Boolean data sets as relational structures. We define structures with FO-formula… ▽ More The description complexity of a model is the length of the shortest formula that defines the model. We study the description complexity of unary structures in first-order logic FO, also drawing links to semantic complexity in the form of entropy. The class of unary structures provides a simple way to represent tabular Boolean data sets as relational structures. We define structures with FO-formulas that are strictly linear in the size of the model as opposed to using the naive quadratic ones, and we use arguments based on formula size games to obtain related lower bounds for description complexity. We also obtain a precise asymptotic result on the expected description complexity of a randomly selected structure. We then give bounds on the relationship between Shannon entropy and description complexity. We extend this relationship also to Boltzmann entropy by establishing an asymptotic match between the two entropies. Despite the simplicity of unary structures, our arguments require the use of formula size games, Stirling's approximation and Chernoff bounds. △ Less

Submitted 4 June, 2024; originally announced June 2024.

MSC Class: 03C13 (Primary) 68R01; 94A17; 68Q30 (Secondary) ACM Class: F.4.1; G.2.0; E.4

arXiv:2406.01114 [pdf, ps, other]

Globally Interpretable Classifiers via Boolean Formulas with Dynamic Propositions

Authors: Reijo Jaakkola, Tomi Janhunen, Antti Kuusisto, Masood Feyzbakhsh Rankooh, Miikka Vilander

Abstract: Interpretability and explainability are among the most important challenges of modern artificial intelligence, being mentioned even in various legislative sources. In this article, we develop a method for extracting immediately human interpretable classifiers from tabular data. The classifiers are given in the form of short Boolean formulas built with propositions that can either be directly extra… ▽ More Interpretability and explainability are among the most important challenges of modern artificial intelligence, being mentioned even in various legislative sources. In this article, we develop a method for extracting immediately human interpretable classifiers from tabular data. The classifiers are given in the form of short Boolean formulas built with propositions that can either be directly extracted from categorical attributes or dynamically computed from numeric ones. Our method is implemented using Answer Set Programming. We investigate seven datasets and compare our results to ones obtainable by state-of-the-art classifiers for tabular data, namely, XGBoost and random forests. Over all datasets, the accuracies obtainable by our method are similar to the reference methods. The advantage of our classifiers in all cases is that they are very short and immediately human intelligible as opposed to the black-box nature of the reference methods. △ Less

Submitted 3 June, 2024; originally announced June 2024.

ACM Class: I.2.6; F.4.1; I.2.4

arXiv:2402.05680 [pdf, ps, other]

Interpretable classifiers for tabular data via discretization and feature selection

Authors: Reijo Jaakkola, Tomi Janhunen, Antti Kuusisto, Masood Feyzbakhsh Rankooh, Miikka Vilander

Abstract: We introduce a method for computing immediately human interpretable yet accurate classifiers from tabular data. The classifiers obtained are short Boolean formulas, computed via first discretizing the original data and then using feature selection coupled with a very fast algorithm for producing the best possible Boolean classifier for the setting. We demonstrate the approach via 13 experiments, o… ▽ More We introduce a method for computing immediately human interpretable yet accurate classifiers from tabular data. The classifiers obtained are short Boolean formulas, computed via first discretizing the original data and then using feature selection coupled with a very fast algorithm for producing the best possible Boolean classifier for the setting. We demonstrate the approach via 13 experiments, obtaining results with accuracies comparable to ones obtained via random forests, XGBoost, and existing results for the same datasets in the literature. In most cases, the accuracy of our method is in fact similar to that of the reference methods, even though the main objective of our study is the immediate interpretability of our classifiers. We also prove a new result on the probability that the classifier we obtain from real-life data corresponds to the ideally best classifier with respect to the background distribution the data comes from. △ Less

Submitted 30 May, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: Changes in relation to version 1: more thorough and detailed experiments, general corrections and refinements

ACM Class: I.2.6; F.4.1; I.2.4; E.2

arXiv:2307.06971 [pdf, ps, other]

Short Boolean Formulas as Explanations in Practice

Authors: Reijo Jaakkola, Tomi Janhunen, Antti Kuusisto, Masood Feyzbakhsh Rankooh, Miikka Vilander

Abstract: We investigate explainability via short Boolean formulas in the data model based on unary relations. As an explanation of length k, we take a Boolean formula of length k that minimizes the error with respect to the target attribute to be explained. We first provide novel quantitative bounds for the expected error in this scenario. We then also demonstrate how the setting works in practice by study… ▽ More We investigate explainability via short Boolean formulas in the data model based on unary relations. As an explanation of length k, we take a Boolean formula of length k that minimizes the error with respect to the target attribute to be explained. We first provide novel quantitative bounds for the expected error in this scenario. We then also demonstrate how the setting works in practice by studying three concrete data sets. In each case, we calculate explanation formulas of different lengths using an encoding in Answer Set Programming. The most accurate formulas we obtain achieve errors similar to other methods on the same data sets. However, due to overfitting, these formulas are not necessarily ideal explanations, so we use cross validation to identify a suitable length for explanations. By limiting to shorter formulas, we obtain explanations that avoid overfitting but are still reasonably accurate and also, importantly, human interpretable. △ Less

Submitted 21 December, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

Comments: Long version of a paper published in JELIA 2023. Changes to version 1: typos fixed, clarifications added

ACM Class: F.4.1; D.1.6; I.2.6; I.2.4

arXiv:2301.13800 [pdf, ps, other]

A monotone connection between model class size and description length

Authors: Reijo Jaakkola, Antti Kuusisto, Miikka Vilander

Abstract: This paper links sizes of model classes to the minimum lengths of their defining formulas, that is, to their description complexities. Limiting to models with a fixed domain of size n, we study description complexities with respect to the extension of propositional logic with the ability to count assignments. This logic, called GMLU, can alternatively be conceived as graded modal logic over Kripke… ▽ More This paper links sizes of model classes to the minimum lengths of their defining formulas, that is, to their description complexities. Limiting to models with a fixed domain of size n, we study description complexities with respect to the extension of propositional logic with the ability to count assignments. This logic, called GMLU, can alternatively be conceived as graded modal logic over Kripke models with the universal accessibility relation. While GMLU is expressively complete for defining multisets of assignments, we also investigate its fragments GMLU(d) that can count only up to the integer threshold d. We focus in particular on description complexities of equivalence classes of GMLU(d). We show that, in restriction to a poset of type realizations, the order of the equivalence classes based on size is identical to the order based on description complexities. This also demonstrates a monotone connection between Boltzmann entropies of model classes and description complexities. Furthermore, we characterize how the relation between domain size n and counting threshold d determines whether or not there exists a dominating class, which essentially means a model class with limit probability one. To obtain our results, we prove new estimates on r-associated Stirling numbers. As another crucial tool, we show that model classes split into two distinct cases in relation to their description complexity. △ Less

Submitted 31 January, 2023; originally announced January 2023.

MSC Class: 03C13 (Primary) 68R01; 94A17; 68Q30 (Secondary) ACM Class: G.2.0; F.4.1

arXiv:2209.12564 [pdf, ps, other]

Relating description complexity to entropy

Authors: Reijo Jaakkola, Antti Kuusisto, Miikka Vilander

Abstract: We demonstrate some novel links between entropy and description complexity, a notion referring to the minimal formula length for specifying given properties. Let MLU be the logic obtained by extending propositional logic with the universal modality, and let GMLU be the corresponding extension with the ability to count. In the finite, MLU is expressively complete for specifying sets of variable ass… ▽ More We demonstrate some novel links between entropy and description complexity, a notion referring to the minimal formula length for specifying given properties. Let MLU be the logic obtained by extending propositional logic with the universal modality, and let GMLU be the corresponding extension with the ability to count. In the finite, MLU is expressively complete for specifying sets of variable assignments, while GMLU is expressively complete for multisets. We show that for MLU, the model classes with maximal Boltzmann entropy are the ones with maximal description complexity. Concerning GMLU, we show that expected Boltzmann entropy is asymptotically equivalent to expected description complexity multiplied by the number of proposition symbols considered. To contrast these results, we show that this link breaks when we move to considering first-order logic FO over vocabularies with higher-arity relations. To establish the aforementioned result, we show that almost all finite models require relatively large FO-formulas to define them. Our results relate to links between Kolmogorov complexity and entropy, demonstrating a way to conceive such results in the logic-based scenario where relational structures are classified by formulas of different sizes. △ Less

Submitted 26 September, 2022; originally announced September 2022.

MSC Class: 03C13 (Primary) 94A17; 68Q30 (Secondary) ACM Class: G.2.0; F.4.1

arXiv:2209.01403 [pdf, other]

Explainability via Short Formulas: the Case of Propositional Logic with Implementation

Authors: Reijo Jaakkola, Tomi Janhunen, Antti Kuusisto, Masood Feyzbakhsh Rankooh, Miikka Vilander

Abstract: We conceptualize explainability in terms of logic and formula size, giving a number of related definitions of explainability in a very general setting. Our main interest is the so-called special explanation problem which aims to explain the truth value of an input formula in an input model. The explanation is a formula of minimal size that (1) agrees with the input formula on the input model and (… ▽ More We conceptualize explainability in terms of logic and formula size, giving a number of related definitions of explainability in a very general setting. Our main interest is the so-called special explanation problem which aims to explain the truth value of an input formula in an input model. The explanation is a formula of minimal size that (1) agrees with the input formula on the input model and (2) transmits the involved truth value to the input formula globally, i.e., on every model. As an important example case, we study propositional logic in this setting and show that the special explainability problem is complete for the second level of the polynomial hierarchy. We also provide an implementation of this problem in answer set programming and investigate its capacity in relation to explaining answers to the n-queens and dominating set problems. △ Less

Submitted 25 October, 2022; v1 submitted 3 September, 2022; originally announced September 2022.

Comments: 16 pages, 1 figure. A variant of a RCRA 2022 paper. Changes to version one: typos fixed in Section 3.1

MSC Class: 68T27; 03B05 ACM Class: I.2.3; F.4.1

arXiv:2202.10180 [pdf, ps, other]

Defining long words succinctly in FO and MSO

Authors: Lauri Hella, Miikka Vilander

Abstract: We consider the length of the longest word definable in FO and MSO via a formula of size n. For both logics we obtain as an upper bound for this number an exponential tower of height linear in n. We prove this by counting types with respect to a fixed quantifier rank. As lower bounds we obtain for both FO and MSO an exponential tower of height in the order of a rational power of n. We show these l… ▽ More We consider the length of the longest word definable in FO and MSO via a formula of size n. For both logics we obtain as an upper bound for this number an exponential tower of height linear in n. We prove this by counting types with respect to a fixed quantifier rank. As lower bounds we obtain for both FO and MSO an exponential tower of height in the order of a rational power of n. We show these lower bounds by giving concrete formulas defining word representations of levels of the cumulative hierarchy of sets. In addition, we consider the Löwenheim-Skolem and Hanf numbers of these logics on words and obtain similar bounds for these as well. △ Less

Submitted 21 February, 2022; originally announced February 2022.

Comments: Submitted to Computability in Europe 2022

ACM Class: F.4.1

arXiv:2109.08324 [pdf, ps, other]

doi 10.4204/EPTCS.346.17

Games for Succinctness of Regular Expressions

Authors: Miikka Vilander

Abstract: We present a version of so called formula size games for regular expressions. These games characterize the equivalence of languages up to expressions of a given size. We use the regular expression size game to give a simple proof of a known non-elementary succinctness gap between first-order logic and regular expressions. We also use the game to only count the number of stars in an expression inst… ▽ More We present a version of so called formula size games for regular expressions. These games characterize the equivalence of languages up to expressions of a given size. We use the regular expression size game to give a simple proof of a known non-elementary succinctness gap between first-order logic and regular expressions. We also use the game to only count the number of stars in an expression instead of the overall size. For regular expressions this measure trivially gives a hierarchy in terms of expressive power. We obtain such a hierarchy also for what we call RE over star-free expressions, where star-free expressions, that is ones with complement but no stars, are combined using the operations of regular expressions. △ Less

Submitted 16 September, 2021; originally announced September 2021.

Comments: In Proceedings GandALF 2021, arXiv:2109.07798

Journal ref: EPTCS 346, 2021, pp. 258-272

arXiv:1912.08715 [pdf, ps, other]

doi 10.1093/logcom/exz025

Formula size games for modal logic and $μ$-calculus

Authors: Lauri Hella, Miikka Vilander

Abstract: We propose a new version of formula size game for modal logic. The game characterizes the equivalence of pointed Kripke-models up to formulas of given numbers of modal operators and binary connectives. Our game is similar to the well-known Adler-Immerman game. However, due to a crucial difference in the definition of positions of the game, its winning condition is simpler, and the second player do… ▽ More We propose a new version of formula size game for modal logic. The game characterizes the equivalence of pointed Kripke-models up to formulas of given numbers of modal operators and binary connectives. Our game is similar to the well-known Adler-Immerman game. However, due to a crucial difference in the definition of positions of the game, its winning condition is simpler, and the second player does not have a trivial optimal strategy. Thus, unlike the Adler-Immerman game, our game is a genuine two-person game. We illustrate the use of the game by proving a non-elementary succinctness gap between bisimulation invariant first-order logic $\mathrm{FO}$ and (basic) modal logic $\mathrm{ML}$. We also present a version of the game for the modal $μ$-calculus $\mathrm{L}_μ$ and show that $\mathrm{FO}$ is also non-elementarily more succinct than $\mathrm{L}_μ$. △ Less

Submitted 17 December, 2019; originally announced December 2019.

Comments: This is a preprint of an article published in Journal of Logic and Computation Published by Oxford University Press. arXiv admin note: substantial text overlap with arXiv:1604.07225

Journal ref: Journal of Logic and Computation, exz025, Oxford University Press, 2019

arXiv:1903.02344 [pdf, other]

doi 10.23638/LMCS-15(3:17)2019

On the Succinctness of Atoms of Dependency

Authors: Martin Lück, Miikka Vilander

Abstract: Propositional team logic is the propositional analog to first-order team logic. Non-classical atoms of dependence, independence, inclusion, exclusion and anonymity can be expressed in it, but for all atoms except dependence only exponential translations are known. In this paper, we systematically compare their succinctness in the existential fragment, where the splitting disjunction only occurs po… ▽ More Propositional team logic is the propositional analog to first-order team logic. Non-classical atoms of dependence, independence, inclusion, exclusion and anonymity can be expressed in it, but for all atoms except dependence only exponential translations are known. In this paper, we systematically compare their succinctness in the existential fragment, where the splitting disjunction only occurs positively, and in full propositional team logic with unrestricted negation. By introducing a variant of the Ehrenfeucht-Fraïssé game called formula size game into team logic, we obtain exponential lower bounds in the existential fragment for all atoms. In the full fragment, we present polynomial upper bounds also for all atoms. △ Less

Submitted 19 August, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

MSC Class: 68Q17; 03B60; 03B70 ACM Class: F.4.1

Journal ref: Logical Methods in Computer Science, Volume 15, Issue 3 (August 20, 2019) lmcs:5263

arXiv:1604.07225 [pdf, ps, other]

The Succinctness of First-order Logic over Modal Logic via a Formula Size Game

Authors: Lauri Hella, Miikka Vilander

Abstract: We propose a new version of formula size game for modal logic. The game characterizes the equivalence of pointed Kripke-models up to formulas of given numbers of modal operators and binary connectives. Our game is similar to the well-known Adler-Immerman game. However, due to a crucial difference in the definition of positions of the game, its winning condition is simpler, and the second player (d… ▽ More We propose a new version of formula size game for modal logic. The game characterizes the equivalence of pointed Kripke-models up to formulas of given numbers of modal operators and binary connectives. Our game is similar to the well-known Adler-Immerman game. However, due to a crucial difference in the definition of positions of the game, its winning condition is simpler, and the second player (duplicator) does not have a trivial optimal strategy. Thus, unlike the Adler-Immerman game, our game is a genuine two-person game. We illustrate the use of the game by proving a nonelementary succinctness gap between bisimulation invariant first-order logic FO and (basic) modal logic ML. △ Less

Submitted 25 April, 2016; originally announced April 2016.

MSC Class: 03B45

Showing 1–12 of 12 results for author: Vilander, M