-
Conjunctive categorial grammars and Lambek grammars with additives
Authors:
Stepan L. Kuznetsov,
Alexander Okhotin
Abstract:
A new family of categorial grammars is proposed, defined by enriching basic categorial grammars with a conjunction operation. It is proved that the formalism obtained in this way has the same expressive power as conjunctive grammars, that is, context-free grammars enhanced with conjunction. It is also shown that categorial grammars with conjunction can be naturally embedded into the Lambek calculu…
▽ More
A new family of categorial grammars is proposed, defined by enriching basic categorial grammars with a conjunction operation. It is proved that the formalism obtained in this way has the same expressive power as conjunctive grammars, that is, context-free grammars enhanced with conjunction. It is also shown that categorial grammars with conjunction can be naturally embedded into the Lambek calculus with conjunction and disjunction operations. This further implies that a certain NP-complete set can be defined in the Lambek calculus with conjunction. We also show how to handle some subtle issues connected with the empty string. Finally, we prove that a language generated by a conjunctive grammar can be described by a Lambek grammar with disjunction (but without conjunction).
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Formal concept analysis for evaluating intrinsic dimension of a natural language
Authors:
Sergei O. Kuznetsov,
Vasilii A. Gromov,
Nikita S. Borodin,
Andrei M. Divavin
Abstract:
Some results of a computational experiment for determining the intrinsic dimension of linguistic varieties for the Bengali and Russian languages are presented. At the same time, both sets of words and sets of bigrams in these languages were considered separately. The method used to solve this problem was based on formal concept analysis algorithms. It was found that the intrinsic dimensions of the…
▽ More
Some results of a computational experiment for determining the intrinsic dimension of linguistic varieties for the Bengali and Russian languages are presented. At the same time, both sets of words and sets of bigrams in these languages were considered separately. The method used to solve this problem was based on formal concept analysis algorithms. It was found that the intrinsic dimensions of these languages are significantly less than the dimensions used in popular neural network models in natural language processing.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Explorations in Subexponential Non-associative Non-commutative Linear Logic
Authors:
Eben Blaisdell,
Max Kanovich,
Stepan L. Kuznetsov,
Elaine Pimentel,
Andre Scedrov
Abstract:
In a previous work we introduced a non-associative non-commutative logic extended by multimodalities, called subexponentials, licensing local application of structural rules. Here, we further explore this system, exhibiting a classical one-sided multi-succedent classical analogue of our intuitionistic system, following the exponential-free calculi of Buszkowski, and de Groote, Lamarche. A large fr…
▽ More
In a previous work we introduced a non-associative non-commutative logic extended by multimodalities, called subexponentials, licensing local application of structural rules. Here, we further explore this system, exhibiting a classical one-sided multi-succedent classical analogue of our intuitionistic system, following the exponential-free calculi of Buszkowski, and de Groote, Lamarche. A large fragment of the intuitionistic calculus is shown to embed faithfully into the classical fragment.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
Explorations in Subexponential non-associative non-commutative Linear Logic (extended version)
Authors:
Eben Blaisdell,
Max I. Kanovich,
Stepan L. Kuznetsov,
Elaine Pimentel,
Andre Scedrov
Abstract:
In a previous work we introduced a non-associative non-commutative logic extended by multimodalities, called subexponentials, licensing local application of structural rules. Here, we further explore this system, considering a classical one-sided multi-succedent classical version of the system, following the exponential-free calculi of Buszkowski's and de Groote and Lamarche's works, where the int…
▽ More
In a previous work we introduced a non-associative non-commutative logic extended by multimodalities, called subexponentials, licensing local application of structural rules. Here, we further explore this system, considering a classical one-sided multi-succedent classical version of the system, following the exponential-free calculi of Buszkowski's and de Groote and Lamarche's works, where the intuitionistic calculus is shown to embed faithfully into the classical fragment.
△ Less
Submitted 21 July, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Delta-Closure Structure for Studying Data Distribution
Authors:
Aleksey Buzmakov,
Tatiana Makhalova,
Sergei O. Kuznetsov,
Amedeo Napoli
Abstract:
In this paper, we revisit pattern mining and study the distribution underlying a binary dataset thanks to the closure structure which is based on passkeys, i.e., minimum generators in equivalence classes robust to noise. We introduce $Δ$-closedness, a generalization of the closure operator, where $Δ$ measures how a closed set differs from its upper neighbors in the partial order induced by closure…
▽ More
In this paper, we revisit pattern mining and study the distribution underlying a binary dataset thanks to the closure structure which is based on passkeys, i.e., minimum generators in equivalence classes robust to noise. We introduce $Δ$-closedness, a generalization of the closure operator, where $Δ$ measures how a closed set differs from its upper neighbors in the partial order induced by closure. A $Δ$-class of equivalence includes minimum and maximum elements and allows us to characterize the distribution underlying the data. Moreover, the set of $Δ$-classes of equivalence can be partitioned into the so-called $Δ$-closure structure. In particular, a $Δ$-class of equivalence with a high level demonstrates correlations among many attributes, which are supported by more observations when $Δ$ is large. In the experiments, we study the $Δ$-closure structure of several real-world datasets and show that this structure is very stable for large $Δ$ and does not substantially depend on the data sampling used for the analysis.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
Relational Models for the Lambek Calculus with Intersection and Constants
Authors:
Stepan L. Kuznetsov
Abstract:
We consider relational semantics (R-models) for the Lambek calculus extended with intersection and explicit constants for zero and unit. For its variant without constants and a restriction which disallows empty antecedents, Andreka and Mikulas (1994) prove strong completeness. We show that it fails without this restriction, but, on the other hand, prove weak completeness for non-standard interpret…
▽ More
We consider relational semantics (R-models) for the Lambek calculus extended with intersection and explicit constants for zero and unit. For its variant without constants and a restriction which disallows empty antecedents, Andreka and Mikulas (1994) prove strong completeness. We show that it fails without this restriction, but, on the other hand, prove weak completeness for non-standard interpretation of constants. For the standard interpretation, even weak completeness fails. The weak completeness result extends to an infinitary setting, for so-called iterative divisions (Kleene star under division). We also prove strong completeness results for product-free fragments.
△ Less
Submitted 15 December, 2023; v1 submitted 2 October, 2022;
originally announced October 2022.
-
Experimental Study of Concise Representations of Concepts and Dependencies
Authors:
Aleksey Buzmakov,
Egor Dudyrev,
Sergei O. Kuznetsov,
Tatiana Makhalova,
Amedeo Napoli
Abstract:
In this paper we are interested in studying concise representations of concepts and dependencies, i.e., implications and association rules. Such representations are based on equivalence classes and their elements, i.e., minimal generators, minimum generators including keys and passkeys, proper premises, and pseudo-intents. All these sets of attributes are significant and well studied from the comp…
▽ More
In this paper we are interested in studying concise representations of concepts and dependencies, i.e., implications and association rules. Such representations are based on equivalence classes and their elements, i.e., minimal generators, minimum generators including keys and passkeys, proper premises, and pseudo-intents. All these sets of attributes are significant and well studied from the computational point of view, while their statistical properties remain to be studied. This is the purpose of this paper to study these singular attribute sets and in parallel to study how to evaluate the complexity of a dataset from an FCA point of view. In the paper we analyze the empirical distributions and the sizes of these particular attribute sets. In addition we propose several measures of data complexity, such as distributivity, linearity, size of concepts, size of minimum generators, for the analysis of real-world and synthetic datasets.
△ Less
Submitted 24 November, 2022; v1 submitted 5 May, 2022;
originally announced May 2022.
-
Decision Concept Lattice vs. Decision Trees and Random Forests
Authors:
Egor Dudyrev,
Sergei O. Kuznetsov
Abstract:
Decision trees and their ensembles are very popular models of supervised machine learning. In this paper we merge the ideas underlying decision trees, their ensembles and FCA by proposing a new supervised machine learning model which can be constructed in polynomial time and is applicable for both classification and regression problems. Specifically, we first propose a polynomial-time algorithm fo…
▽ More
Decision trees and their ensembles are very popular models of supervised machine learning. In this paper we merge the ideas underlying decision trees, their ensembles and FCA by proposing a new supervised machine learning model which can be constructed in polynomial time and is applicable for both classification and regression problems. Specifically, we first propose a polynomial-time algorithm for constructing a part of the concept lattice that is based on a decision tree. Second, we describe a prediction scheme based on a concept lattice for solving both classification and regression tasks with prediction quality comparable to that of state-of-the-art models.
△ Less
Submitted 1 June, 2021;
originally announced June 2021.
-
Mint: MDL-based approach for Mining INTeresting Numerical Pattern Sets
Authors:
Tatiana Makhalova,
Sergei O. Kuznetsov,
Amedeo Napoli
Abstract:
Pattern mining is well established in data mining research, especially for mining binary datasets. Surprisingly, there is much less work about numerical pattern mining and this research area remains under-explored. In this paper, we propose Mint, an efficient MDL-based algorithm for mining numerical datasets. The MDL principle is a robust and reliable framework widely used in pattern mining, and a…
▽ More
Pattern mining is well established in data mining research, especially for mining binary datasets. Surprisingly, there is much less work about numerical pattern mining and this research area remains under-explored. In this paper, we propose Mint, an efficient MDL-based algorithm for mining numerical datasets. The MDL principle is a robust and reliable framework widely used in pattern mining, and as well in subgroup discovery. In Mint we reuse MDL for discovering useful patterns and returning a set of non-redundant overlap** patterns with well-defined boundaries and covering meaningful groups of objects. Mint is not alone in the category of numerical pattern miners based on MDL. In the experiments presented in the paper we show that Mint outperforms competitors among which Slim and RealKrimp.
△ Less
Submitted 30 November, 2020;
originally announced November 2020.
-
Discovery data topology with the closure structure. Theoretical and practical aspects
Authors:
Tatiana Makhalova,
Aleksey Buzmakov,
Sergei O. Kuznetsov,
Amedeo Napoli
Abstract:
In this paper, we are revisiting pattern mining and especially itemset mining, which allows one to analyze binary datasets in searching for interesting and meaningful association rules and respective itemsets in an unsupervised way. While a summarization of a dataset based on a set of patterns does not provide a general and satisfying view over a dataset, we introduce a concise representation -- t…
▽ More
In this paper, we are revisiting pattern mining and especially itemset mining, which allows one to analyze binary datasets in searching for interesting and meaningful association rules and respective itemsets in an unsupervised way. While a summarization of a dataset based on a set of patterns does not provide a general and satisfying view over a dataset, we introduce a concise representation -- the closure structure -- based on closed itemsets and their minimum generators, for capturing the intrinsic content of a dataset. The closure structure allows one to understand the topology of the dataset in the whole and the inherent complexity of the data. We propose a formalization of the closure structure in terms of Formal Concept Analysis, which is well adapted to study this data topology. We present and demonstrate theoretical results, and as well, practical results using the GDPM algorithm. GDPM is rather unique in its functionality as it returns a characterization of the topology of a dataset in terms of complexity levels, highlighting the diversity and the distribution of the itemsets. Finally, a series of experiments shows how GDPM can be practically used and what can be expected from the output.
△ Less
Submitted 30 March, 2021; v1 submitted 6 October, 2020;
originally announced October 2020.
-
The Multiplicative-Additive Lambek Calculus with Subexponential and Bracket Modalities
Authors:
Max Kanovich,
Stepan Kuznetsov,
Andre Scedrov
Abstract:
We give a proof-theoretic and algorithmic complexity analysis for systems introduced by Morrill to serve as the core of the CatLog categorial grammar parser. We consider two recent versions of Morrill's calculi, and focus on their fragments including multiplicative (Lambek) connectives, additive conjunction and disjunction, brackets and bracket modalities, and the ! subexponential modality. For bo…
▽ More
We give a proof-theoretic and algorithmic complexity analysis for systems introduced by Morrill to serve as the core of the CatLog categorial grammar parser. We consider two recent versions of Morrill's calculi, and focus on their fragments including multiplicative (Lambek) connectives, additive conjunction and disjunction, brackets and bracket modalities, and the ! subexponential modality. For both systems, we resolve issues connected with the cut rule and provide necessary modifications, after which we prove admissibility of cut (cut elimination theorem). We also prove algorithmic undecidability for both calculi, and show that categorial grammars based on them can generate arbitrary recursively enumerable languages.
△ Less
Submitted 30 September, 2020; v1 submitted 31 July, 2020;
originally announced August 2020.
-
Language Models for Some Extensions of the Lambek Calculus
Authors:
Max Kanovich,
Stepan Kuznetsov,
Andre Scedrov
Abstract:
We investigate language interpretations of two extensions of the Lambek calculus: with additive conjunction and disjunction and with additive conjunction and the unit constant. For extensions with additive connectives, we show that conjunction and disjunction behave differently. Adding both of them leads to incompleteness due to the distributivity law. We show that with conjunction only no issues…
▽ More
We investigate language interpretations of two extensions of the Lambek calculus: with additive conjunction and disjunction and with additive conjunction and the unit constant. For extensions with additive connectives, we show that conjunction and disjunction behave differently. Adding both of them leads to incompleteness due to the distributivity law. We show that with conjunction only no issues with distributivity arise. In contrast, there exists a corollary of the distributivity law in the language with disjunction only which is not derivable in the non-distributive system. Moreover, this difference keeps valid for systems with permutation and/or weakening structural rules, that is, intuitionistic linear and affine logics and affine multiplicative-additive Lambek calculus. For the extension of the Lambek with the unit constant, we present a calculus which reflects natural algebraic properties of the empty word. We do not claim completeness for this calculus, but we prove undecidability for the whole range of systems extending this minimal calculus and sound w.r.t. language models. As a corollary, we show that in the language with the unit there exissts a sequent that is true if all variables are interpreted by regular language, but not true in language models in general.
△ Less
Submitted 31 July, 2020;
originally announced August 2020.
-
Complexity of the Infinitary Lambek Calculus with Kleene Star
Authors:
Stepan Kuznetsov
Abstract:
We consider the Lambek calculus, or non-commutative multiplicative intuitionistic linear logic, extended with iteration, or Kleene star, axiomatised by means of an $ω$-rule, and prove that the derivability problem in this calculus is $Π_1^0$-hard. This solves a problem left open by Buszkowski (2007), who obtained the same complexity bound for infinitary action logic, which additionally includes ad…
▽ More
We consider the Lambek calculus, or non-commutative multiplicative intuitionistic linear logic, extended with iteration, or Kleene star, axiomatised by means of an $ω$-rule, and prove that the derivability problem in this calculus is $Π_1^0$-hard. This solves a problem left open by Buszkowski (2007), who obtained the same complexity bound for infinitary action logic, which additionally includes additive conjunction and disjunction. As a by-product, we prove that any context-free language without the empty word can be generated by a Lambek grammar with unique type assignment, without Lambek's non-emptiness restriction imposed (cf. Safiullin 2007).
△ Less
Submitted 1 May, 2020;
originally announced May 2020.
-
Infinitary Action Logic with Exponentiation
Authors:
Stepan L. Kuznetsov,
Stanislav O. Speranski
Abstract:
We introduce infinitary action logic with exponentiation -- that is, the multiplicative-additive Lambek calculus extended with Kleene star and with a family of subexponential modalities, which allows some of the structural rules (contraction, weakening, permutation). The logic is presented in the form of an infinitary sequent calculus. We prove cut elimination and, in the case where at least one s…
▽ More
We introduce infinitary action logic with exponentiation -- that is, the multiplicative-additive Lambek calculus extended with Kleene star and with a family of subexponential modalities, which allows some of the structural rules (contraction, weakening, permutation). The logic is presented in the form of an infinitary sequent calculus. We prove cut elimination and, in the case where at least one subexponential allows non-local contraction, establish exact complexity boundaries in two senses. First, we show that the derivability problem for this logic is $Π_1^1$-complete. Second, we show that the closure ordinal of its derivability operator is $ω_1^{\mathrm{CK}}$. In the case where no subexponential allows contraction, we show that complexity is the same as for infinitary action logic itself. Namely, the derivability problem in this case is $Π^0_1$-complete and the closure ordinal is not greater than $ω^ω$.
△ Less
Submitted 7 July, 2021; v1 submitted 19 January, 2020;
originally announced January 2020.
-
Action Logic is Undecidable
Authors:
Stepan Kuznetsov
Abstract:
Action logic is the algebraic logic (inequational theory) of residuated Kleene lattices. This logic involves Kleene star, axiomatized by an induction scheme. For a stronger system which uses an $ω$-rule instead (infinitary action logic) Buszkowski and Palka (2007) have proved $Π_1^0$-completeness (thus, undecidability). Decidability of action logic itself was an open question, raised by D. Kozen i…
▽ More
Action logic is the algebraic logic (inequational theory) of residuated Kleene lattices. This logic involves Kleene star, axiomatized by an induction scheme. For a stronger system which uses an $ω$-rule instead (infinitary action logic) Buszkowski and Palka (2007) have proved $Π_1^0$-completeness (thus, undecidability). Decidability of action logic itself was an open question, raised by D. Kozen in 1994. In this article, we show that it is undecidable, more precisely, $Σ_1^0$-complete. We also prove the same complexity results for all recursively enumerable logics between action logic and infinitary action logic; for fragments of those only one of the two lattice (additive) connectives; for action logic extended with the law of distributivity.
△ Less
Submitted 24 December, 2019;
originally announced December 2019.
-
Next Priority Concept: A new and generic algorithm computing concepts from complex and heterogeneous data
Authors:
Christophe Demko,
Karell Bertet,
Cyril Faucher,
Jean-François Viaud,
Sergeï Kuznetsov
Abstract:
In this article, we present a new data type agnostic algorithm calculating a concept lattice from heterogeneous and complex data. Our NextPriorityConcept algorithm is first introduced and proved in the binary case as an extension of Bordat's algorithm with the notion of strategies to select only some predecessors of each concept, avoiding the generation of unreasonably large lattices. The algorith…
▽ More
In this article, we present a new data type agnostic algorithm calculating a concept lattice from heterogeneous and complex data. Our NextPriorityConcept algorithm is first introduced and proved in the binary case as an extension of Bordat's algorithm with the notion of strategies to select only some predecessors of each concept, avoiding the generation of unreasonably large lattices. The algorithm is then extended to any type of data in a generic way. It is inspired from pattern structure theory, where data are locally described by predicates independent of their types, allowing the management of heterogeneous data.
△ Less
Submitted 20 December, 2019;
originally announced December 2019.
-
Ordered Sets for Data Analysis
Authors:
Sergei O. Kuznetsov
Abstract:
This book dwells on mathematical and algorithmic issues of data analysis based on generality order of descriptions and respective precision. To speak of these topics correctly, we have to go some way getting acquainted with the important notions of relation and order theory. On the one hand, data often have a complex structure with natural order on it. On the other hand, many symbolic methods of d…
▽ More
This book dwells on mathematical and algorithmic issues of data analysis based on generality order of descriptions and respective precision. To speak of these topics correctly, we have to go some way getting acquainted with the important notions of relation and order theory. On the one hand, data often have a complex structure with natural order on it. On the other hand, many symbolic methods of data analysis and machine learning allow to compare the obtained classifiers w.r.t. their generality, which is also an order relation. Efficient algorithms are very important in data analysis, especially when one deals with big data, so scalability is a real issue. That is why we analyze the computational complexity of algorithms and problems of data analysis. We start from the basic definitions and facts of algorithmic complexity theory and analyze the complexity of various tools of data analysis we consider. The tools and methods of data analysis, like computing taxonomies, groups of similar objects (concepts and n-clusters), dependencies in data, classification, etc., are illustrated with applications in particular subject domains, from chemoinformatics to text mining and natural language processing.
△ Less
Submitted 27 August, 2019;
originally announced August 2019.
-
On Pattern Setups and Pattern Multistructures
Authors:
Aimene Belfodil,
Sergei Kuznetsov,
Mehdi Kaytoue
Abstract:
Modern order and lattice theory provides convenient mathematical tools for pattern mining, in particular for condensed irredundant representations of pattern spaces and their efficient generation. Formal Concept Analysis (FCA) offers a generic framework , called pattern structures, to formalize many types of patterns, such as itemsets, intervals, graph and sequence sets. Moreover, FCA provides gen…
▽ More
Modern order and lattice theory provides convenient mathematical tools for pattern mining, in particular for condensed irredundant representations of pattern spaces and their efficient generation. Formal Concept Analysis (FCA) offers a generic framework , called pattern structures, to formalize many types of patterns, such as itemsets, intervals, graph and sequence sets. Moreover, FCA provides generic algorithms to generate irredundantly all closed patterns, the only condition being that the pattern space is a meet-semilattice. This does not always hold, e.g., for sequential and graph patterns. Here, we discuss pattern setups consisting of descriptions making just a partial order. Such a framework can be too broad, causing several problems, so we propose a new model, dubbed pattern multistructure, lying between pattern setups and pattern structures, which relies on multilattices. Finally, we consider some techniques , namely completions, transforming pattern setups to pattern structures using sets/antichains of patterns.
△ Less
Submitted 7 June, 2019;
originally announced June 2019.
-
Eliminating the unit constant in the Lambek calculus with brackets
Authors:
Stepan Kuznetsov
Abstract:
We present a translation of the Lambek calculus with brackets and the unit constant, $\mathbf{Lb}^{\boldsymbol{*}}_{\mathbf{1}}$, into the Lambek calculus with brackets allowing empty antecedents, but without the unit constant, $\mathbf{Lb}^{\boldsymbol{*}}$. Using this translation, we extend previously known results for $\mathbf{Lb}^{\boldsymbol{*}}$ to…
▽ More
We present a translation of the Lambek calculus with brackets and the unit constant, $\mathbf{Lb}^{\boldsymbol{*}}_{\mathbf{1}}$, into the Lambek calculus with brackets allowing empty antecedents, but without the unit constant, $\mathbf{Lb}^{\boldsymbol{*}}$. Using this translation, we extend previously known results for $\mathbf{Lb}^{\boldsymbol{*}}$ to $\mathbf{Lb}^{\boldsymbol{*}}_{\mathbf{1}}$: (1) languages generated by categorial grammars based on the Lambek calculus with brackets are context-free (Kanazawa 2017); (2) the polynomial-time algorithm for deciding derivability of bounded depth sequents (Kanovich et al. 2017).
△ Less
Submitted 16 November, 2017;
originally announced November 2017.
-
A polynomial time algorithm for the Lambek calculus with brackets of bounded order
Authors:
Max Kanovich,
Stepan Kuznetsov,
Glyn Morrill,
Andre Scedrov
Abstract:
Lambek calculus is a logical foundation of categorial grammar, a linguistic paradigm of grammar as logic and parsing as deduction. Pentus (2010) gave a polynomial-time algorithm for determ- ining provability of bounded depth formulas in the Lambek calculus with empty antecedents allowed. Pentus' algorithm is based on tabularisation of proof nets. Lambek calculus with brackets is a conservative ext…
▽ More
Lambek calculus is a logical foundation of categorial grammar, a linguistic paradigm of grammar as logic and parsing as deduction. Pentus (2010) gave a polynomial-time algorithm for determ- ining provability of bounded depth formulas in the Lambek calculus with empty antecedents allowed. Pentus' algorithm is based on tabularisation of proof nets. Lambek calculus with brackets is a conservative extension of Lambek calculus with bracket modalities, suitable for the modeling of syntactical domains. In this paper we give an algorithm for provability the Lambek calculus with brackets allowing empty antecedents. Our algorithm runs in polynomial time when both the formula depth and the bracket nesting depth are bounded. It combines a Pentus-style tabularisation of proof nets with an automata-theoretic treatment of bracketing.
△ Less
Submitted 18 December, 2017; v1 submitted 1 May, 2017;
originally announced May 2017.
-
Mining Best Closed Itemsets for Projection-antimonotonic Constraints in Polynomial Time
Authors:
Aleksey Buzmakov,
Sergei O. Kuznetsov,
Amedeo Napoli
Abstract:
The exponential explosion of the set of patterns is one of the main challenges in pattern mining. This challenge is approached by introducing a constraint for pattern selection. One of the first constraints proposed in pattern mining is support (frequency) of a pattern in a dataset. Frequency is an anti-monotonic function, i.e., given an infrequent pattern, all its superpatterns are not frequent.…
▽ More
The exponential explosion of the set of patterns is one of the main challenges in pattern mining. This challenge is approached by introducing a constraint for pattern selection. One of the first constraints proposed in pattern mining is support (frequency) of a pattern in a dataset. Frequency is an anti-monotonic function, i.e., given an infrequent pattern, all its superpatterns are not frequent. However, many other constraints for pattern selection are neither monotonic nor anti-monotonic, which makes it difficult to generate patterns satisfying these constraints.
In order to deal with nonmonotonic constraints we introduce the notion of "projection antimonotonicity" and SOFIA algorithm that allow generating best patterns for a class of nonmonotonic constraints. Cosine interest, robustness, stability of closed itemsets, and the associated delta-measure are among these constraints. SOFIA starts from light descriptions of transactions in dataset (a small set of items in the case of itemset description) and then iteratively adds more information to these descriptions (more items with indication of tidsets they describe).
△ Less
Submitted 28 March, 2017;
originally announced March 2017.
-
Bayesian Learning of Consumer Preferences for Residential Demand Response
Authors:
Mikhail V. Goubko,
Sergey O. Kuznetsov,
Alexey A. Neznanov,
Dmitry I. Ignatov
Abstract:
In coming years residential consumers will face real-time electricity tariffs with energy prices varying day to day, and effective energy saving will require automation - a recommender system, which learns consumer's preferences from her actions. A consumer chooses a scenario of home appliance use to balance her comfort level and the energy bill. We propose a Bayesian learning algorithm to estimat…
▽ More
In coming years residential consumers will face real-time electricity tariffs with energy prices varying day to day, and effective energy saving will require automation - a recommender system, which learns consumer's preferences from her actions. A consumer chooses a scenario of home appliance use to balance her comfort level and the energy bill. We propose a Bayesian learning algorithm to estimate the comfort level function from the history of appliance use. In numeric experiments with datasets generated from a simulation model of a consumer interacting with small home appliances the algorithm outperforms popular regression analysis tools. Our approach can be extended to control an air heating and conditioning system, which is responsible for up to half of a household's energy bill.
△ Less
Submitted 27 January, 2017;
originally announced January 2017.
-
On interestingness measures of formal concepts
Authors:
Sergei O. Kuznetsov,
Tatiana Makhalova
Abstract:
Formal concepts and closed itemsets proved to be of big importance for knowledge discovery, both as a tool for concise representation of association rules and a tool for clustering and constructing domain taxonomies and ontologies. Exponential explosion makes it difficult to consider the whole concept lattice arising from data, one needs to select most useful and interesting concepts. In this pape…
▽ More
Formal concepts and closed itemsets proved to be of big importance for knowledge discovery, both as a tool for concise representation of association rules and a tool for clustering and constructing domain taxonomies and ontologies. Exponential explosion makes it difficult to consider the whole concept lattice arising from data, one needs to select most useful and interesting concepts. In this paper interestingness measures of concepts are considered and compared with respect to various aspects, such as efficiency of computation and applicability to noisy data and performing ranking correlation.
△ Less
Submitted 19 April, 2017; v1 submitted 8 November, 2016;
originally announced November 2016.
-
Undecidability of the Lambek calculus with subexponential and bracket modalities
Authors:
Max Kanovich,
Stepan Kuznetsov,
Andre Scedrov
Abstract:
The Lambek calculus is a well-known logical formalism for modelling natural language syntax. The original calculus covered a substantial number of intricate natural language phenomena, but only those restricted to the context-free setting. In order to address more subtle linguistic issues, the Lambek calculus has been extended in various ways. In particular, Morrill and Valentin (2015) introduce a…
▽ More
The Lambek calculus is a well-known logical formalism for modelling natural language syntax. The original calculus covered a substantial number of intricate natural language phenomena, but only those restricted to the context-free setting. In order to address more subtle linguistic issues, the Lambek calculus has been extended in various ways. In particular, Morrill and Valentin (2015) introduce an extension with so-called exponential and bracket modalities. Their extension is based on a non-standard contraction rule for the exponential that interacts with the bracket structure in an intricate way. The standard contraction rule is not admissible in this calculus. In this paper we prove undecidability of the derivability problem in their calculus. We also investigate restricted decidable fragments considered by Morrill and Valentin and we show that these fragments belong to the NP class.
△ Less
Submitted 4 May, 2017; v1 submitted 13 August, 2016;
originally announced August 2016.
-
Reconciling Lambek's restriction, cut-elimination, and substitution in the presence of exponential modalities
Authors:
Max Kanovich,
Stepan Kuznetsov,
Andre Scedrov
Abstract:
The Lambek calculus can be considered as a version of non-commutative intuitionistic linear logic. One of the interesting features of the Lambek calculus is the so-called "Lambek's restriction," that is, the antecedent of any provable sequent should be non-empty. In this paper we discuss ways of extending the Lambek calculus with the linear logic exponential modality while kee** Lambek's restric…
▽ More
The Lambek calculus can be considered as a version of non-commutative intuitionistic linear logic. One of the interesting features of the Lambek calculus is the so-called "Lambek's restriction," that is, the antecedent of any provable sequent should be non-empty. In this paper we discuss ways of extending the Lambek calculus with the linear logic exponential modality while kee** Lambek's restriction. Interestingly enough, we show that for any system equipped with a reasonable exponential modality the following holds: if the system enjoys cut elimination and substitution to the full extent, then the system necessarily violates Lambek's restriction. Nevertheless, we show that two of the three conditions can be implemented. Namely, we design a system with Lambek's restriction and cut elimination and another system with Lambek's restriction and substitution. For both calculi we prove that they are undecidable, even if we take only one of the two divisions provided by the Lambek calculus. The system with cut elimination and substitution and without Lambek's restriction is folklore and known to be undecidable.
△ Less
Submitted 9 May, 2019; v1 submitted 7 August, 2016;
originally announced August 2016.
-
Undecidability of the Lambek calculus with a relevant modality
Authors:
Max Kanovich,
Stepan Kuznetsov,
Andre Scedrov
Abstract:
Morrill and Valentin in the paper "Computational coverage of TLG: Nonlinearity" considered an extension of the Lambek calculus enriched by a so-called "exponential" modality. This modality behaves in the "relevant" style, that is, it allows contraction and permutation, but not weakening. Morrill and Valentin stated an open problem whether this system is decidable. Here we show its undecidability.…
▽ More
Morrill and Valentin in the paper "Computational coverage of TLG: Nonlinearity" considered an extension of the Lambek calculus enriched by a so-called "exponential" modality. This modality behaves in the "relevant" style, that is, it allows contraction and permutation, but not weakening. Morrill and Valentin stated an open problem whether this system is decidable. Here we show its undecidability. Our result remains valid if we consider the fragment where all division operations have one direction. We also show that the derivability problem in a restricted case, where the modality can be applied only to variables (primitive types), is decidable and belongs to the NP class.
△ Less
Submitted 7 August, 2016; v1 submitted 23 January, 2016;
originally announced January 2016.
-
Revisiting Pattern Structure Projections
Authors:
Aleksey Buzmakov,
Sergei O. Kuznetsov,
Amedeo Napoli
Abstract:
Formal concept analysis (FCA) is a well-founded method for data analysis and has many applications in data mining. Pattern structures is an extension of FCA for dealing with complex data such as sequences or graphs. However the computational complexity of computing with pattern structures is high and projections of pattern structures were introduced for simplifying computation. In this paper we in…
▽ More
Formal concept analysis (FCA) is a well-founded method for data analysis and has many applications in data mining. Pattern structures is an extension of FCA for dealing with complex data such as sequences or graphs. However the computational complexity of computing with pattern structures is high and projections of pattern structures were introduced for simplifying computation. In this paper we introduce o-projections of pattern structures, a generalization of projections which defines a wider class of projections preserving the properties of the original approach. Moreover, we show that o-projections form a semilattice and we discuss the correspondence between o-projections and the representation contexts of o-projected pattern structures.
KEYWORDS: formal concept analysis, pattern structures, representation contexts, projections
△ Less
Submitted 16 June, 2015;
originally announced June 2015.
-
Fast Generation of Best Interval Patterns for Nonmonotonic Constraints
Authors:
Aleksey Buzmakov,
Sergei O. Kuznetsov,
Amedeo Napoli
Abstract:
In pattern mining, the main challenge is the exponential explosion of the set of patterns. Typically, to solve this problem, a constraint for pattern selection is introduced. One of the first constraints proposed in pattern mining is support (frequency) of a pattern in a dataset. Frequency is an anti-monotonic function, i.e., given an infrequent pattern, all its superpatterns are not frequent. How…
▽ More
In pattern mining, the main challenge is the exponential explosion of the set of patterns. Typically, to solve this problem, a constraint for pattern selection is introduced. One of the first constraints proposed in pattern mining is support (frequency) of a pattern in a dataset. Frequency is an anti-monotonic function, i.e., given an infrequent pattern, all its superpatterns are not frequent. However, many other constraints for pattern selection are not (anti-)monotonic, which makes it difficult to generate patterns satisfying these constraints. In this paper we introduce the notion of projection-antimonotonicity and $θ$-$Σøφια$ algorithm that allows efficient generation of the best patterns for some nonmonotonic constraints. In this paper we consider stability and $Δ$-measure, which are nonmonotonic constraints, and apply them to interval tuple datasets. In the experiments, we compute best interval tuple patterns w.r.t. these measures and show the advantage of our approach over postfiltering approaches.
KEYWORDS: Pattern mining, nonmonotonic constraints, interval tuple data
△ Less
Submitted 16 June, 2015; v1 submitted 2 June, 2015;
originally announced June 2015.
-
Graphlet-based lazy associative graph classification
Authors:
Yury Kashnitsky,
Sergei O. Kuznetsov
Abstract:
The paper addresses the graph classification problem and introduces a modification of the lazy associative classification method to efficiently handle intersections of graphs. Graph intersections are approximated with all common subgraphs up to a fixed size similarly to what is done with graphlet kernels. We explain the idea of the algorithm with a toy example and describe our experiments with a p…
▽ More
The paper addresses the graph classification problem and introduces a modification of the lazy associative classification method to efficiently handle intersections of graphs. Graph intersections are approximated with all common subgraphs up to a fixed size similarly to what is done with graphlet kernels. We explain the idea of the algorithm with a toy example and describe our experiments with a predictive toxicology dataset.
△ Less
Submitted 13 May, 2015; v1 submitted 21 April, 2015;
originally announced April 2015.
-
On mining complex sequential data by means of FCA and pattern structures
Authors:
Aleksey Buzmakov,
Elias Egho,
Nicolas Jay,
Sergei O. Kuznetsov,
Amedeo Napoli,
Chedy Raïssi
Abstract:
Nowadays data sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of "complex" sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of Formal Concept Analysis…
▽ More
Nowadays data sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of "complex" sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of Formal Concept Analysis (FCA) and its extension based on "pattern structures". Pattern structures are used for mining complex data (such as sequences or graphs) and are based on a subsumption operation, which in our case is defined with respect to the partial order on sequences. We show how pattern structures along with projections (i.e., a data reduction of sequential structures), are able to enumerate more meaningful patterns and increase the computing efficiency of the approach. Finally, we show the applicability of the presented method for discovering and analyzing interesting patient patterns from a French healthcare data set on cancer. The quantitative and qualitative results (with annotations and analysis from a physician) are reported in this use case which is the main motivation for this work.
Keywords: data mining; formal concept analysis; pattern structures; projections; sequences; sequential data.
△ Less
Submitted 9 April, 2015;
originally announced April 2015.
-
Dualization in Lattices Given by Ordered Sets of Irreducibles
Authors:
Mikhail A. Babin,
Sergei O. Kuznetsov
Abstract:
Dualization of a monotone Boolean function on a finite lattice can be represented by transforming the set of its minimal 1 to the set of its maximal 0 values. In this paper we consider finite lattices given by ordered sets of their meet and join irreducibles (i.e., as a concept lattice of a formal context). We show that in this case dualization is equivalent to the enumeration of so-called minimal…
▽ More
Dualization of a monotone Boolean function on a finite lattice can be represented by transforming the set of its minimal 1 to the set of its maximal 0 values. In this paper we consider finite lattices given by ordered sets of their meet and join irreducibles (i.e., as a concept lattice of a formal context). We show that in this case dualization is equivalent to the enumeration of so-called minimal hypotheses. In contrast to usual dualization setting, where a lattice is given by the ordered set of its elements, dualization in this case is shown to be impossible in output polynomial time unless P = NP. However, if the lattice is distributive, dualization is shown to be possible in subexponential time.
△ Less
Submitted 29 December, 2015; v1 submitted 5 April, 2015;
originally announced April 2015.
-
Interactive Error Correction in Implicative Theories
Authors:
Sergei O. Kuznetsov,
Artem Revenko
Abstract:
Errors in implicative theories coming from binary data are studied. First, two classes of errors that may affect implicative theories are singled out. Two approaches for finding errors of these classes are proposed, both of them based on methods of Formal Concept Analysis. The first approach uses the cardinality minimal (canonical or Duquenne-Guigues) implication base. The construction of such a b…
▽ More
Errors in implicative theories coming from binary data are studied. First, two classes of errors that may affect implicative theories are singled out. Two approaches for finding errors of these classes are proposed, both of them based on methods of Formal Concept Analysis. The first approach uses the cardinality minimal (canonical or Duquenne-Guigues) implication base. The construction of such a base is computationally intractable. Using an alternative approach one checks possible errors on the fly in polynomial time via computing closures of subsets of attributes. Both approaches are interactive, based on questions about the validity of certain implications. Results of computer experiments are presented and discussed.
△ Less
Submitted 20 October, 2014;
originally announced October 2014.
-
Concept Relation Discovery and Innovation Enabling Technology (CORDIET)
Authors:
Jonas Poelmans,
Paul Elzinga,
Alexey Neznanov,
Stijn Viaene,
Sergei O. Kuznetsov,
Dmitry Ignatov,
Guido Dedene
Abstract:
Concept Relation Discovery and Innovation Enabling Technology (CORDIET), is a toolbox for gaining new knowledge from unstructured text data. At the core of CORDIET is the C-K theory which captures the essential elements of innovation. The tool uses Formal Concept Analysis (FCA), Emergent Self Organizing Maps (ESOM) and Hidden Markov Models (HMM) as main artifacts in the analysis process. The user…
▽ More
Concept Relation Discovery and Innovation Enabling Technology (CORDIET), is a toolbox for gaining new knowledge from unstructured text data. At the core of CORDIET is the C-K theory which captures the essential elements of innovation. The tool uses Formal Concept Analysis (FCA), Emergent Self Organizing Maps (ESOM) and Hidden Markov Models (HMM) as main artifacts in the analysis process. The user can define temporal, text mining and compound attributes. The text mining attributes are used to analyze the unstructured text in documents, the temporal attributes use these document's timestamps for analysis. The compound attributes are XML rules based on text mining and temporal attributes. The user can cluster objects with object-cluster rules and can chop the data in pieces with segmentation rules. The artifacts are optimized for efficient data analysis; object labels in the FCA lattice and ESOM map contain an URL on which the user can click to open the selected document.
△ Less
Submitted 13 February, 2012;
originally announced February 2012.
-
Revisiting Numerical Pattern Mining with Formal Concept Analysis
Authors:
Mehdi Kaytoue,
Sergei O. Kuznetsov,
Amedeo Napoli
Abstract:
In this paper, we investigate the problem of mining numerical data in the framework of Formal Concept Analysis. The usual way is to use a scaling procedure --transforming numerical attributes into binary ones-- leading either to a loss of information or of efficiency, in particular w.r.t. the volume of extracted patterns. By contrast, we propose to directly work on numerical data in a more precise…
▽ More
In this paper, we investigate the problem of mining numerical data in the framework of Formal Concept Analysis. The usual way is to use a scaling procedure --transforming numerical attributes into binary ones-- leading either to a loss of information or of efficiency, in particular w.r.t. the volume of extracted patterns. By contrast, we propose to directly work on numerical data in a more precise and efficient way, and we prove it. For that, the notions of closed patterns, generators and equivalent classes are revisited in the numerical context. Moreover, two original algorithms are proposed and used in an evaluation involving real-world data, showing the predominance of the present approach.
△ Less
Submitted 24 November, 2011;
originally announced November 2011.
-
Mining Biclusters of Similar Values with Triadic Concept Analysis
Authors:
Mehdi Kaytoue,
Sergei O. Kuznetsov,
Juraj Macko,
Wagner Meira,
Amedeo Napoli
Abstract:
Biclustering numerical data became a popular data-mining task in the beginning of 2000's, especially for analysing gene expression data. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address a…
▽ More
Biclustering numerical data became a popular data-mining task in the beginning of 2000's, especially for analysing gene expression data. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address a complete, correct and non redundant enumeration of such patterns, which is a well-known intractable problem, while no formal framework exists. In this paper, we introduce important links between biclustering and formal concept analysis. More specifically, we originally show that Triadic Concept Analysis (TCA), provides a nice mathematical framework for biclustering. Interestingly, existing algorithms of TCA, that usually apply on binary data, can be used (directly or with slight modifications) after a preprocessing step for extracting maximal biclusters of similar values.
△ Less
Submitted 14 November, 2011;
originally announced November 2011.
-
Concept-based Recommendations for Internet Advertisement
Authors:
Dmitry I. Ignatov,
Sergei O. Kuznetsov
Abstract:
The problem of detecting terms that can be interesting to the advertiser is considered. If a company has already bought some advertising terms which describe certain services, it is reasonable to find out the terms bought by competing companies. A part of them can be recommended as future advertising terms to the company. The goal of this work is to propose better interpretable recommendations b…
▽ More
The problem of detecting terms that can be interesting to the advertiser is considered. If a company has already bought some advertising terms which describe certain services, it is reasonable to find out the terms bought by competing companies. A part of them can be recommended as future advertising terms to the company. The goal of this work is to propose better interpretable recommendations based on FCA and association rules.
△ Less
Submitted 26 June, 2009;
originally announced June 2009.
-
Concept Stability for Constructing Taxonomies of Web-site Users
Authors:
Sergei O. Kuznetsov,
Dmitry I. Ignatov
Abstract:
Owners of a web-site are often interested in analysis of groups of users of their site. Information on these groups can help optimizing the structure and contents of the site. In this paper we use an approach based on formal concepts for constructing taxonomies of user groups. For decreasing the huge amount of concepts that arise in applications, we employ stability index of a concept, which descr…
▽ More
Owners of a web-site are often interested in analysis of groups of users of their site. Information on these groups can help optimizing the structure and contents of the site. In this paper we use an approach based on formal concepts for constructing taxonomies of user groups. For decreasing the huge amount of concepts that arise in applications, we employ stability index of a concept, which describes how a group given by a concept extent differs from other such groups. We analyze resulting taxonomies of user groups for three target websites.
△ Less
Submitted 24 November, 2016; v1 submitted 9 May, 2009;
originally announced May 2009.