-
Eigenvalue-based Incremental Spectral Clustering
Authors:
Mieczysław A. Kłopotek,
Bartłmiej Starosta,
Sławomir T. Wierzchoń
Abstract:
Our previous experiments demonstrated that subsets collections of (short) documents (with several hundred entries) share a common normalized in some way eigenvalue spectrum of combinatorial Laplacian. Based on this insight, we propose a method of incremental spectral clustering. The method consists of the following steps: (1) split the data into manageable subsets, (2) cluster each of the subsets,…
▽ More
Our previous experiments demonstrated that subsets collections of (short) documents (with several hundred entries) share a common normalized in some way eigenvalue spectrum of combinatorial Laplacian. Based on this insight, we propose a method of incremental spectral clustering. The method consists of the following steps: (1) split the data into manageable subsets, (2) cluster each of the subsets, (3) merge clusters from different subsets based on the eigenvalue spectrum similarity to form clusters of the entire set. This method can be especially useful for clustering methods of complexity strongly increasing with the size of the data sample,like in case of typical spectral clustering. Experiments were performed showing that in fact the clustering and merging the subsets yields clusters close to clustering the entire dataset.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Wide Gaps and Clustering Axioms
Authors:
Mieczysław A. Kłopotek
Abstract:
The widely applied k-means algorithm produces clusterings that violate our expectations with respect to high/low similarity/density and is in conflict with Kleinberg's axiomatic system for distance based clustering algorithms that formalizes those expectations in a natural way. k-means violates in particular the consistency axiom. We hypothesise that this clash is due to the not explicated expecta…
▽ More
The widely applied k-means algorithm produces clusterings that violate our expectations with respect to high/low similarity/density and is in conflict with Kleinberg's axiomatic system for distance based clustering algorithms that formalizes those expectations in a natural way. k-means violates in particular the consistency axiom. We hypothesise that this clash is due to the not explicated expectation that the data themselves should have the property of being clusterable in order to expect the algorithm clustering hem to fit a clustering axiomatic system. To demonstrate this, we introduce two new clusterability properties, variational k-separability and residual k-separability and show that then the Kleinberg's consistency axiom holds for k-means operating in the Euclidean or non-Euclidean space. Furthermore, we propose extensions of k-means algorithm that fit approximately the Kleinberg's richness axiom that does not hold for k-means. In this way, we reconcile k-means with Kleinberg's axiomatic framework in Euclidean and non-Euclidean settings. Besides contribution to the theory of axiomatic frameworks of clustering and for clusterability theory, practical contribution is the possibility to construct {datasets for testing purposes of algorithms optimizing k-means cost function. This includes a method of construction of {clusterable data with known in advance global optimum.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Are Easy Data Easy (for K-Means)
Authors:
Mieczysław A. Kłopotek
Abstract:
This paper investigates the capability of correctly recovering well-separated clusters by various brands of the $k$-means algorithm. The concept of well-separatedness used here is derived directly from the common definition of clusters, which imposes an interplay between the requirements of within-cluster-homogenicity and between-clusters-diversity. Conditions are derived for a special case of wel…
▽ More
This paper investigates the capability of correctly recovering well-separated clusters by various brands of the $k$-means algorithm. The concept of well-separatedness used here is derived directly from the common definition of clusters, which imposes an interplay between the requirements of within-cluster-homogenicity and between-clusters-diversity. Conditions are derived for a special case of well-separated clusters such that the global minimum of $k$-means cost function coincides with the well-separatedness. An experimental investigation is performed to find out whether or no various brands of $k$-means are actually capable of discovering well separated clusters. It turns out that they are not. A new algorithm is proposed that is a variation of $k$-means++ via repeated {sub}sampling when choosing a seed. The new algorithm outperforms four other algorithms from $k$-means family on the task.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Explainable Graph Spectral Clustering of Text Documents
Authors:
Bartłomiej Starosta,
Mieczysław A. Kłopotek,
Sławomir T. Wierzchoń
Abstract:
Spectral clustering methods are known for their ability to represent clusters of diverse shapes, densities etc. However, results of such algorithms, when applied e.g. to text documents, are hard to explain to the user, especially due to embedding in the spectral space which has no obvious relation to document contents. Therefore there is an urgent need to elaborate methods for explaining the outco…
▽ More
Spectral clustering methods are known for their ability to represent clusters of diverse shapes, densities etc. However, results of such algorithms, when applied e.g. to text documents, are hard to explain to the user, especially due to embedding in the spectral space which has no obvious relation to document contents. Therefore there is an urgent need to elaborate methods for explaining the outcome of the clustering. This paper presents a contribution towards this goal. We present a proposal of explanation of results of combinatorial Laplacian based graph spectral clustering. It is based on showing (approximate) equivalence of combinatorial Laplacian embedding, $K$-embedding (proposed in this paper) and term vector space embedding. Hence a bridge is constructed between the textual contents and the clustering results. We provide theoretical background for this approach. We performed experimental study showing that $K$-embedding approximates well Laplacian embedding under favourable block matrix conditions and show that approximation is good enough under other conditions.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
High-Dimensional Wide Gap $k$-Means Versus Clustering Axioms
Authors:
Mieczysław A. Kłopotek
Abstract:
Kleinberg's axioms for distance based clustering proved to be contradictory.
Various efforts have been made to overcome this problem.
Here we make an attempt to handle the issue by embedding in high-dimensional space and granting wide gaps between clusters.
Kleinberg's axioms for distance based clustering proved to be contradictory.
Various efforts have been made to overcome this problem.
Here we make an attempt to handle the issue by embedding in high-dimensional space and granting wide gaps between clusters.
△ Less
Submitted 30 November, 2022;
originally announced November 2022.
-
How To Overcome Richness Axiom Fallacy
Authors:
Mieczysław A. Kłopotek,
Robert A. Kłopotek
Abstract:
The paper points at the grieving problems implied by the richness axiom in the Kleinberg's axiomatic system and suggests resolutions. The richness induces learnability problem in general and leads to conflicts with consistency axiom. As a resolution, learnability constraints and usage of centric consistency or restriction of the domain of considered clusterings to super-ball-clusterings is propose…
▽ More
The paper points at the grieving problems implied by the richness axiom in the Kleinberg's axiomatic system and suggests resolutions. The richness induces learnability problem in general and leads to conflicts with consistency axiom. As a resolution, learnability constraints and usage of centric consistency or restriction of the domain of considered clusterings to super-ball-clusterings is proposed.
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
A Clustering Preserving Transformation for k-Means Algorithm Output
Authors:
Mieczysław A. Kłopotek
Abstract:
This note introduces a novel clustering preserving transformation of cluster sets obtained from $k$-means algorithm. This transformation may be used to generate new labeled data{}sets from existent ones. It is more flexible that Kleinberg axiom based consistency transformation because data points in a cluster can be moved away and datapoints between clusters may come closer together.
This note introduces a novel clustering preserving transformation of cluster sets obtained from $k$-means algorithm. This transformation may be used to generate new labeled data{}sets from existent ones. It is more flexible that Kleinberg axiom based consistency transformation because data points in a cluster can be moved away and datapoints between clusters may come closer together.
△ Less
Submitted 25 July, 2022; v1 submitted 19 February, 2022;
originally announced February 2022.
-
Towards Continuous Consistency Axiom
Authors:
Mieczyslaw A. Klopotek,
Robert A. Klopotek
Abstract:
Development of new algorithms in the area of machine learning, especially clustering, comparative studies of such algorithms as well as testing according to software engineering principles requires availability of labeled data sets. While standard benchmarks are made available, a broader range of such data sets is necessary in order to avoid the problem of overfitting. In this context, theoretical…
▽ More
Development of new algorithms in the area of machine learning, especially clustering, comparative studies of such algorithms as well as testing according to software engineering principles requires availability of labeled data sets. While standard benchmarks are made available, a broader range of such data sets is necessary in order to avoid the problem of overfitting. In this context, theoretical works on axiomatization of clustering algorithms, especially axioms on clustering preserving transformations are quite a cheap way to produce labeled data sets from existing ones. However, the frequently cited axiomatic system of Kleinberg:2002, as we show in this paper, is not applicable for finite dimensional Euclidean spaces, in which many algorithms like $k$-means, operate. In particular, the so-called outer-consistency axiom fails upon making small changes in datapoint positions and inner-consistency axiom is valid only for identity transformation in general settings.
Hence we propose an alternative axiomatic system, in which Kleinberg's inner consistency axiom is replaced by a centric consistency axiom and outer consistency axiom is replaced by motion consistency axiom. We demonstrate that the new system is satisfiable for a hierarchical version of $k$-means with auto-adjusted $k$, hence it is not contradictory. Additionally, as $k$-means creates convex clusters only, we demonstrate that it is possible to create a version detecting concave clusters and still the axiomatic system can be satisfied. The practical application area of such an axiomatic system may be the generation of new labeled test data from existent ones for clustering algorithm testing. %We propose the gravitational consistency as a replacement which does not have this deficiency.
△ Less
Submitted 12 February, 2022;
originally announced February 2022.
-
p-d-Separation -- A Concept for Expressing Dependence/Independence Relations in Causal Networks
Authors:
Mieczysław A. Kłopotek
Abstract:
Spirtes, Glymour and Scheines formulated a Conjecture that a direct dependence test and a head-to-head meeting test would suffice to construe directed acyclic graph decompositions of a joint probability distribution (Bayesian network) for which Pearl's d-separation applies. This Conjecture was later shown to be a direct consequence of a result of Pearl and Verma. This paper is intended to prove th…
▽ More
Spirtes, Glymour and Scheines formulated a Conjecture that a direct dependence test and a head-to-head meeting test would suffice to construe directed acyclic graph decompositions of a joint probability distribution (Bayesian network) for which Pearl's d-separation applies. This Conjecture was later shown to be a direct consequence of a result of Pearl and Verma. This paper is intended to prove this Conjecture in a new way, by exploiting the concept of p-d-separation (partial dependency separation). While Pearl's d-separation works with Bayesian networks, p-d-separation is intended to apply to causal networks: that is partially oriented networks in which orientations are given to only to those edges, that express statistically confirmed causal influence, whereas undirected edges express existence of direct influence without possibility of determination of direction of causation. As a consequence of the particular way of proving the validity of this Conjecture, an algorithm for construction of all the directed acyclic graphs (dags) carrying the available independence information is also presented. The notion of a partially oriented graph (pog) is introduced and within this graph the notion of p-d-separation is defined. It is demonstrated that the p-d-separation within the pog is equivalent to d-separation in all derived dags.
△ Less
Submitted 15 June, 2020;
originally announced June 2020.
-
On Irrelevance of Attributes in Flexible Prediction
Authors:
Mieczyslaw A. Klopotek,
Andrzej Matuszewski
Abstract:
This paper analyses properties of conceptual hierarchy obtained via incremental concept formation method called "flexible prediction" in order to determine what kind of "relevance" of participating attributes may be requested for meaningful conceptual hierarchy. The impact of selection of simple and combined attributes, of scaling and of distribution of individual attributes and of correlation str…
▽ More
This paper analyses properties of conceptual hierarchy obtained via incremental concept formation method called "flexible prediction" in order to determine what kind of "relevance" of participating attributes may be requested for meaningful conceptual hierarchy. The impact of selection of simple and combined attributes, of scaling and of distribution of individual attributes and of correlation strengths among them is investigated. Paradoxically, both: attributes weakly and strongly related with other attributes have deteriorating impact onto the overall classification. Proper construction of derived attributes as well as selection of scaling of individual attributes strongly influences the obtained concept hierarchy. Attribute density of distribution seems to influence the classification weakly
It seems also, that concept hierarchies (taxonomies) reflect a compromise between the data and our interests in some objective truth about the data. To obtain classifications more suitable for one's purposes, breaking the symmetry among attributes (by dividing them into dependent and independent and applying differing evaluation formulas for their contribution) is suggested. Both continuous and discrete variables are considered. Some methodologies for the former are considered.
△ Less
Submitted 25 May, 2020;
originally announced May 2020.
-
Non-Destructive Sample Generation From Conditional Belief Functions
Authors:
Mieczysław A. Kłopotek
Abstract:
This paper presents a new approach to generate samples from conditional belief functions for a restricted but non trivial subset of conditional belief functions. It assumes the factorization (decomposition) of a belief function along a bayesian network structure. It applies general conditional belief functions.
This paper presents a new approach to generate samples from conditional belief functions for a restricted but non trivial subset of conditional belief functions. It assumes the factorization (decomposition) of a belief function along a bayesian network structure. It applies general conditional belief functions.
△ Less
Submitted 25 May, 2020;
originally announced May 2020.
-
A Note On k-Means Probabilistic Poverty
Authors:
Mieczysław A. Kłopotek
Abstract:
It is proven, by example, that the version of $k$-means with random initialization does not have the property probabilistic k-richness.
It is proven, by example, that the version of $k$-means with random initialization does not have the property probabilistic k-richness.
△ Less
Submitted 26 October, 2022; v1 submitted 28 September, 2019;
originally announced October 2019.
-
Query Optimization Properties of Modified VBS
Authors:
Mieczysław A. Kłopotek,
Sławomir T. Wierzchoń
Abstract:
Valuation-Based~System can represent knowledge in different domains including probability theory, Dempster-Shafer theory and possibility theory. More recent studies show that the framework of VBS is also appropriate for representing and solving Bayesian decision problems and optimization problems.
In this paper after introducing the valuation based system (VBS) framework, we present Markov-like…
▽ More
Valuation-Based~System can represent knowledge in different domains including probability theory, Dempster-Shafer theory and possibility theory. More recent studies show that the framework of VBS is also appropriate for representing and solving Bayesian decision problems and optimization problems.
In this paper after introducing the valuation based system (VBS) framework, we present Markov-like properties of VBS and a method for resolving queries to VBS.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Reasoning and Facts Explanation in Valuation Based Systems
Authors:
S. T. Wierzchoń,
M. A. Kłopotek,
M. Michalewicz
Abstract:
In the literature, the optimization problem to identify a set of composite hypotheses H, which will yield the $k$ largest $P(H|S_e)$ where a composite hypothesis is an instantiation of all the nodes in the network except the evidence nodes \cite{KSy:93} is of significant interest. This problem is called "finding the $k$ Most Plausible Explanation (MPE) of a given evidence $S_e$ in a Bayesian belie…
▽ More
In the literature, the optimization problem to identify a set of composite hypotheses H, which will yield the $k$ largest $P(H|S_e)$ where a composite hypothesis is an instantiation of all the nodes in the network except the evidence nodes \cite{KSy:93} is of significant interest. This problem is called "finding the $k$ Most Plausible Explanation (MPE) of a given evidence $S_e$ in a Bayesian belief network".
The problem of finding $k$ most probable hypotheses is generally NP-hard \cite{Cooper:90}. Therefore in the past various simplifications of the task by restricting $k$ (to 1 or 2), restricting the structure (e.g. to singly connected networks), or shifting the complexity to spatial domain have been investigated.
A genetic algorithm is proposed in this paper to overcome some of these restrictions while step** out from probabilistic domain onto the general Valuation based System (VBS) framework is also proposed by generalizing the genetic algorithm approach to the realm of Dempster-Shafer belief calculus.
△ Less
Submitted 21 December, 2018;
originally announced December 2018.
-
Rigid Body Structure and Motion From Two-Frame Point-Correspondences Under Perspective Projection
Authors:
Mieczysław A. Kłopotek
Abstract:
This paper is concerned with possibility of recovery of motion and structure parameters from multiframes under perspective projection when only points on a rigid body are traced. Free (unrestricted and uncontrolled) pattern of motion between frames is assumed. The major question is how many points and/or how many frames are necessary for the task. It has been shown in an earlier paper {Klopotek:95…
▽ More
This paper is concerned with possibility of recovery of motion and structure parameters from multiframes under perspective projection when only points on a rigid body are traced. Free (unrestricted and uncontrolled) pattern of motion between frames is assumed. The major question is how many points and/or how many frames are necessary for the task. It has been shown in an earlier paper {Klopotek:95b} that for orthogonal projection two frames are insufficient for the task. The paper demonstrates that, under perspective projection, that total uncertainty about relative position of focal point versus projection plane makes the recovery of structure and motion from two frames impossible.
△ Less
Submitted 7 December, 2018;
originally announced December 2018.
-
Factorization of Dempster-Shafer Belief Functions Based on Data
Authors:
Andrzej Matuszewski,
Mieczysław A. Kłopotek
Abstract:
One important obstacle in applying Dempster-Shafer Theory (DST) is its relationship to frequencies. In particular, there exist serious difficulties in finding factorizations of belief functions from data.
In probability theory factorizations are usually related to notion of (conditional) independence and their possibility tested accordingly. However, in DST conditional belief distributions prove…
▽ More
One important obstacle in applying Dempster-Shafer Theory (DST) is its relationship to frequencies. In particular, there exist serious difficulties in finding factorizations of belief functions from data.
In probability theory factorizations are usually related to notion of (conditional) independence and their possibility tested accordingly. However, in DST conditional belief distributions prove to be non-proper belief functions (that is ones connected with negative "frequencies"). This makes statistical testing of potential conditional independencies practically impossible, as no coherent interpretation could be found so far for negative belief function values.
In this paper a novel attempt is made to overcome this difficulty. In the proposal no conditional beliefs are calculated, but instead a new measure F is introduced within the framework of DST, closely related to conditional independence, allowing to apply conventional statistical tests for detection of dependence/independence.
△ Less
Submitted 14 December, 2018;
originally announced December 2018.
-
On Marginally Correct Approximations of Dempster-Shafer Belief Functions from Data
Authors:
Mieczysław A. Kłopotek,
Sławomir T. Wierzchoń
Abstract:
Mathematical Theory of Evidence (MTE), a foundation for reasoning under partial ignorance, is blamed to leave frequencies outside (or aside of) its framework. The seriousness of this accusation is obvious: no experiment may be run to compare the performance of MTE-based models of real world processes against real world data.
In this paper we consider this problem from the point of view of condit…
▽ More
Mathematical Theory of Evidence (MTE), a foundation for reasoning under partial ignorance, is blamed to leave frequencies outside (or aside of) its framework. The seriousness of this accusation is obvious: no experiment may be run to compare the performance of MTE-based models of real world processes against real world data.
In this paper we consider this problem from the point of view of conditioning in the MTE. We describe the class of belief functions for which marginal consistency with observed frequencies may be achieved and conditional belief functions are proper belief functions,%\ and deal with implications for (marginal) approximation of general belief functions by this class of belief functions and for inference models in MTE.
△ Less
Submitted 7 December, 2018;
originally announced December 2018.
-
Structure and Motion from Multiframes
Authors:
Mieczysław A. Kłopotek
Abstract:
The paper gives an overview of the problems and methods of recovery of structure and motion parameters of rigid bodies from multiframes.
The paper gives an overview of the problems and methods of recovery of structure and motion parameters of rigid bodies from multiframes.
△ Less
Submitted 30 November, 2018;
originally announced November 2018.
-
Dempsterian-Shaferian Belief Network From Data
Authors:
Mieczysław A. Kłopotek
Abstract:
Shenoy and Shafer {Shenoy:90} demonstrated that both for Dempster-Shafer Theory and probability theory there exists a possibility to calculate efficiently marginals of joint belief distributions (by so-called local computations) provided that the joint distribution can be decomposed (factorized) into a belief network. A number of algorithms exists for decomposition of probabilistic joint belief di…
▽ More
Shenoy and Shafer {Shenoy:90} demonstrated that both for Dempster-Shafer Theory and probability theory there exists a possibility to calculate efficiently marginals of joint belief distributions (by so-called local computations) provided that the joint distribution can be decomposed (factorized) into a belief network. A number of algorithms exists for decomposition of probabilistic joint belief distribution into a bayesian (belief) network from data. For example
Spirtes, Glymour and Schein{Spirtes:90b} formulated a Conjecture that a direct dependence test and a head-to-head meeting test would suffice to construe bayesian network from data in such a way that Pearl's concept of d-separation {Geiger:90} applies.
This paper is intended to transfer Spirtes, Glymour and Scheines {Spirtes:90b} approach onto the ground of the Dempster-Shafer Theory (DST). For this purpose, a frequentionistic interpretation of the DST developed in {Klopotek:93b} is exploited. A special notion of conditionality for DST is introduced and demonstrated to behave with respect to Pearl's d-separation {Geiger:90} much the same way as conditional probability (though some differences like non-uniqueness are evident). Based on this, an algorithm analogous to that from {Spirtes:90b} is developed.
The notion of a partially oriented graph (pog) is introduced and within this graph the notion of p-d-separation is defined. If direct dependence test and head-to-head meeting test are used to orient the pog then its p-d-separation is shown to be equivalent to the Pearl's d-separation for any compatible dag.
△ Less
Submitted 6 June, 2018;
originally announced June 2018.
-
Too Fast Causal Inference under Causal Insufficiency
Authors:
Mieczysław A. Kłopotek
Abstract:
Causally insufficient structures (models with latent or hidden variables, or with confounding etc.) of joint probability distributions have been subject of intense study not only in statistics, but also in various AI systems. In AI, belief networks, being representations of joint probability distribution with an underlying directed acyclic graph structure, are paid special attention due to the fac…
▽ More
Causally insufficient structures (models with latent or hidden variables, or with confounding etc.) of joint probability distributions have been subject of intense study not only in statistics, but also in various AI systems. In AI, belief networks, being representations of joint probability distribution with an underlying directed acyclic graph structure, are paid special attention due to the fact that efficient reasoning (uncertainty propagation) methods have been developed for belief network structures. Algorithms have been therefore developed to acquire the belief network structure from data. As artifacts due to variable hiding negatively influence the performance of derived belief networks, models with latent variables have been studied and several algorithms for learning belief network structure under causal insufficiency have also been developed.
Regrettably, some of them are known already to be erroneous (e.g. IC algorithm of [Pearl:Verma:91]. This paper is devoted to another algorithm, the Fast Causal Inference (FCI) Algorithm of [Spirtes:93]. It is proven by a specially constructed example that this algorithm, as it stands in [Spirtes:93], is also erroneous. Fundamental reason for failure of this algorithm is the temporary introduction of non-real links between nodes of the network with the intention of later removal. While for trivial dependency structures these non-real links may be actually removed, this may not be the case for complex ones, e.g. for the case described in this paper. A remedy of this failure is proposed.
△ Less
Submitted 30 May, 2018;
originally announced June 2018.
-
Fast Restricted Causal Inference
Authors:
Mieczysław A. Kłopotek
Abstract:
Hidden variables are well known sources of disturbance when recovering belief networks from data based only on measurable variables. Hence models assuming existence of hidden variables are under development.
This paper presents a new algorithm "accelerating" the known CI algorithm of Spirtes, Glymour and Scheines {Spirtes:93}. We prove that this algorithm does not produces (conditional) independ…
▽ More
Hidden variables are well known sources of disturbance when recovering belief networks from data based only on measurable variables. Hence models assuming existence of hidden variables are under development.
This paper presents a new algorithm "accelerating" the known CI algorithm of Spirtes, Glymour and Scheines {Spirtes:93}. We prove that this algorithm does not produces (conditional) independencies not present in the data if statistical independence test is reliable.
This result is to be considered as non-trivial since e.g. the same claim fails to be true for FCI algorithm, another "accelerator" of CI, developed in {Spirtes:93}.
△ Less
Submitted 13 July, 2017;
originally announced July 2017.
-
On (Anti)Conditional Independence in Dempster-Shafer Theory
Authors:
Mieczysław A. Kłopotek
Abstract:
This paper verifies a result of {Shenoy:94} concerning graphoidal structure of Shenoy's notion of independence for Dempster-Shafer theory of belief functions. Shenoy proved that his notion of independence has graphoidal properties for positive normal valuations.
The requirement of strict positive normal valuations as prerequisite for application of graphoidal properties excludes a wide class of…
▽ More
This paper verifies a result of {Shenoy:94} concerning graphoidal structure of Shenoy's notion of independence for Dempster-Shafer theory of belief functions. Shenoy proved that his notion of independence has graphoidal properties for positive normal valuations.
The requirement of strict positive normal valuations as prerequisite for application of graphoidal properties excludes a wide class of DS belief functions. It excludes especially so-called probabilistic belief functions. It is demonstrated that the requirement of positiveness of valuation may be weakened in that it may be required that commonality function is non-zero for singleton sets instead, and the graphoidal properties for independence of belief function variables are then preserved. This means especially that probabilistic belief functions with all singleton sets as focal points possess graphoidal properties for independence.
△ Less
Submitted 13 July, 2017;
originally announced July 2017.
-
Identification and Interpretation of Belief Structure in Dempster-Shafer Theory
Authors:
Mieczysław A. Kłopotek
Abstract:
Mathematical Theory of Evidence called also Dempster-Shafer Theory (DST) is known as a foundation for reasoning when knowledge is expressed at various levels of detail. Though much research effort has been committed to this theory since its foundation, many questions remain open. One of the most important open questions seems to be the relationship between frequencies and the Mathematical Theory o…
▽ More
Mathematical Theory of Evidence called also Dempster-Shafer Theory (DST) is known as a foundation for reasoning when knowledge is expressed at various levels of detail. Though much research effort has been committed to this theory since its foundation, many questions remain open. One of the most important open questions seems to be the relationship between frequencies and the Mathematical Theory of Evidence. The theory is blamed to leave frequencies outside (or aside of) its framework. The seriousness of this accusation is obvious: (1) no experiment may be run to compare the performance of DST-based models of real world processes against real world data, (2) data may not serve as foundation for construction of an appropriate belief model.
In this paper we develop a frequentist interpretation of the DST bringing to fall the above argument against DST. An immediate consequence of it is the possibility to develop algorithms acquiring automatically DST belief models from data. We propose three such algorithms for various classes of belief model structures: for tree structured belief networks, for poly-tree belief networks and for general type belief networks.
△ Less
Submitted 12 July, 2017;
originally announced July 2017.
-
Independence, Conditionality and Structure of Dempster-Shafer Belief Functions
Authors:
Mieczysław A. Kłopotek
Abstract:
Several approaches of structuring (factorization, decomposition) of Dempster-Shafer joint belief functions from literature are reviewed with special emphasis on their capability to capture independence from the point of view of the claim that belief functions generalize bayes notion of probability.
It is demonstrated that Zhu and Lee's {Zhu:93} logical networks and Smets' {Smets:93} directed acy…
▽ More
Several approaches of structuring (factorization, decomposition) of Dempster-Shafer joint belief functions from literature are reviewed with special emphasis on their capability to capture independence from the point of view of the claim that belief functions generalize bayes notion of probability.
It is demonstrated that Zhu and Lee's {Zhu:93} logical networks and Smets' {Smets:93} directed acyclic graphs are unable to capture statistical dependence/independence of bayesian networks {Pearl:88}. On the other hand, though Shenoy and Shafer's hypergraphs can explicitly represent bayesian network factorization of bayesian belief functions, they disclaim any need for representation of independence of variables in belief functions.
Cano et al. {Cano:93} reject the hypergraph representation of Shenoy and Shafer just on grounds of missing representation of variable independence, but in their frameworks some belief functions factorizable in Shenoy/Shafer framework cannot be factored.
The approach in {Klopotek:93f} on the other hand combines the merits of both Cano et al. and of Shenoy/Shafer approach in that for Shenoy/Shafer approach no simpler factorization than that in {Klopotek:93f} approach exists and on the other hand all independences among variables captured in Cano et al. framework and many more are captured in {Klopotek:93f} approach.%
△ Less
Submitted 12 July, 2017;
originally announced July 2017.
-
Restricted Causal Inference Algorithm
Authors:
Mieczysław A. Kłopotek
Abstract:
This paper proposes a new algorithm for recovery of belief network structure from data handling hidden variables. It consists essentially in an extension of the CI algorithm of Spirtes et al. by restricting the number of conditional dependencies checked up to k variables and in an extension of the original CI by additional steps transforming so called partial including path graph into a belief net…
▽ More
This paper proposes a new algorithm for recovery of belief network structure from data handling hidden variables. It consists essentially in an extension of the CI algorithm of Spirtes et al. by restricting the number of conditional dependencies checked up to k variables and in an extension of the original CI by additional steps transforming so called partial including path graph into a belief network. Its correctness is demonstrated.
△ Less
Submitted 30 June, 2017;
originally announced June 2017.
-
Evidence Against Evidence Theory (?!)
Authors:
Mieczysław A. Kłopotek,
Andrzej Matuszewski
Abstract:
This paper is concerned with the apparent greatest weakness of the Mathematical Theory of Evidence (MTE) of Shafer \cite{Shafer:76}, which has been strongly criticized by Wasserman \cite{Wasserman:92ijar} - the relationship to frequencies.
Weaknesses of various proposals of probabilistic interpretation of MTE belief functions are demonstrated.
A new frequency-based interpretation is presented…
▽ More
This paper is concerned with the apparent greatest weakness of the Mathematical Theory of Evidence (MTE) of Shafer \cite{Shafer:76}, which has been strongly criticized by Wasserman \cite{Wasserman:92ijar} - the relationship to frequencies.
Weaknesses of various proposals of probabilistic interpretation of MTE belief functions are demonstrated.
A new frequency-based interpretation is presented overcoming various drawbacks of earlier interpretations.
△ Less
Submitted 8 June, 2017;
originally announced June 2017.
-
What Does a Belief Function Believe In ?
Authors:
Andrzej Matuszewski,
Mieczysław A. Kłopotek
Abstract:
The conditioning in the Dempster-Shafer Theory of Evidence has been defined (by Shafer \cite{Shafer:90} as combination of a belief function and of an "event" via Dempster rule.
On the other hand Shafer \cite{Shafer:90} gives a "probabilistic" interpretation of a belief function (hence indirectly its derivation from a sample). Given the fact that conditional probability distribution of a sample-d…
▽ More
The conditioning in the Dempster-Shafer Theory of Evidence has been defined (by Shafer \cite{Shafer:90} as combination of a belief function and of an "event" via Dempster rule.
On the other hand Shafer \cite{Shafer:90} gives a "probabilistic" interpretation of a belief function (hence indirectly its derivation from a sample). Given the fact that conditional probability distribution of a sample-derived probability distribution is a probability distribution derived from a subsample (selected on the grounds of a conditioning event), the paper investigates the empirical nature of the Dempster- rule of combination.
It is demonstrated that the so-called "conditional" belief function is not a belief function given an event but rather a belief function given manipulation of original empirical data.\\ Given this, an interpretation of belief function different from that of Shafer is proposed. Algorithms for construction of belief networks from data are derived for this interpretation.
△ Less
Submitted 8 June, 2017;
originally announced June 2017.
-
Network Capacity Bound for Personalized PageRank in Multimodal Networks
Authors:
M. A. Kłopotek,
S. T. Wierzchoń,
R. A. Kłopotek
Abstract:
In a former paper the concept of Bipartite PageRank was introduced and a theorem on the limit of authority flowing between nodes for personalized PageRank has been generalized. In this paper we want to extend those results to multimodal networks. In particular we deal with a hypergraph type that may be used for describing multimodal network where a hyperlink connects nodes from each of the modalit…
▽ More
In a former paper the concept of Bipartite PageRank was introduced and a theorem on the limit of authority flowing between nodes for personalized PageRank has been generalized. In this paper we want to extend those results to multimodal networks. In particular we deal with a hypergraph type that may be used for describing multimodal network where a hyperlink connects nodes from each of the modalities. We introduce a generalisation of PageRank for such graphs and define the respective random walk model that can be used for computations. We state and prove theorems on the limit of outflow of authority for cases where individual modalities have identical and distinct dam** factors.
△ Less
Submitted 27 June, 2023; v1 submitted 1 June, 2017;
originally announced June 2017.
-
Knowledge Acquisition, Representation \& Manipulation in Decision Support Systems
Authors:
M. Michalewicz,
S. T. Wierzchoń,
M. A. Kłopotek
Abstract:
In this paper we present a methodology and discuss some implementation issues for a project on statistical/expert approach to data analysis and knowledge acquisition. We discuss some general assumptions underlying the project. Further, the requirements for a user-friendly computer assistant are specified along with the nature of tools aiding the researcher. Next we show some aspects of belief netw…
▽ More
In this paper we present a methodology and discuss some implementation issues for a project on statistical/expert approach to data analysis and knowledge acquisition. We discuss some general assumptions underlying the project. Further, the requirements for a user-friendly computer assistant are specified along with the nature of tools aiding the researcher. Next we show some aspects of belief network approach and Dempster-Shafer (DST) methodology introduced in practice to system SEAD. Specifically we present the application of DS methodology to belief revision problem. Further a concept of an interface to probabilistic and DS belief networks enabling a user to understand the communication with a belief network based reasoning system is presented
△ Less
Submitted 23 May, 2017;
originally announced May 2017.
-
Distribution of degrees of freedom over structure and motion of rigid bodies
Authors:
Mieczysław A. Kłopotek
Abstract:
This paper is concerned with recovery of motion and structure parameters from multiframes under orthogonal projection when only points are traced. The main question is how many points and/or how many frames are necessary for the task. It is demonstrated that 3 frames and 3 points are the absolute minimum. Closed-form solution is presented. Furthermore, it is shown that the task may be linearized i…
▽ More
This paper is concerned with recovery of motion and structure parameters from multiframes under orthogonal projection when only points are traced. The main question is how many points and/or how many frames are necessary for the task. It is demonstrated that 3 frames and 3 points are the absolute minimum. Closed-form solution is presented. Furthermore, it is shown that the task may be linearized if either four points or four frames are available. It is demonstrated that no increase in the number of points may lead to recovery of structure and motion parameters from two frames only. It is shown that instead the increase in the number of points may support the task of tracing the points from frame to frame.
△ Less
Submitted 10 May, 2017;
originally announced May 2017.
-
An Aposteriorical Clusterability Criterion for $k$-Means++ and Simplicity of Clustering
Authors:
Mieczysław A. Kłopotek
Abstract:
We define the notion of a well-clusterable data set combining the point of view of the objective of $k$-means clustering algorithm (minimising the centric spread of data elements) and common sense (clusters shall be separated by gaps). We identify conditions under which the optimum of $k$-means objective coincides with a clustering under which the data is separated by predefined gaps.
We investi…
▽ More
We define the notion of a well-clusterable data set combining the point of view of the objective of $k$-means clustering algorithm (minimising the centric spread of data elements) and common sense (clusters shall be separated by gaps). We identify conditions under which the optimum of $k$-means objective coincides with a clustering under which the data is separated by predefined gaps.
We investigate two cases: when the whole clusters are separated by some gap and when only the cores of the clusters meet some separation condition.
We overcome a major obstacle in using clusterability criteria due to the fact that known approaches to clusterability checking had the disadvantage that they are related to the optimal clustering which is NP hard to identify.
Compared to other approaches to clusterability, the novelty consists in the possibility of an a posteriori (after running $k$-means) check if the data set is well-clusterable or not. As the $k$-means algorithm applied for this purpose has polynomial complexity so does therefore the appropriate check. Additionally, if $k$-means++ fails to identify a clustering that meets clusterability criteria, with high probability the data is not well-clusterable.
△ Less
Submitted 30 June, 2018; v1 submitted 24 April, 2017;
originally announced April 2017.
-
A Comment on "Analysis of Video Image Sequences Using Point and Line Correspondences"
Authors:
Mieczysław A. Kłopotek
Abstract:
In this paper we would like to deny the results of Wang et al. raising two fundamental claims:
* A line does not contribute anything to recognition of motion parameters from two images
* Four traceable points are not sufficient to recover motion parameters from two perspective
To be constructive, however, we show that four traceable points are sufficient to recover motion parameters from two…
▽ More
In this paper we would like to deny the results of Wang et al. raising two fundamental claims:
* A line does not contribute anything to recognition of motion parameters from two images
* Four traceable points are not sufficient to recover motion parameters from two perspective
To be constructive, however, we show that four traceable points are sufficient to recover motion parameters from two frames under orthogonal projection and that five points are sufficient to simplify the solution of the two-frame problem under orthogonal projection to solving a linear equation system.
△ Less
Submitted 18 April, 2017;
originally announced April 2017.
-
Beliefs in Markov Trees - From Local Computations to Local Valuation
Authors:
Mieczysław A. Kłopotek
Abstract:
This paper is devoted to expressiveness of hypergraphs for which uncertainty propagation by local computations via Shenoy/Shafer method applies. It is demonstrated that for this propagation method for a given joint belief distribution no valuation of hyperedges of a hypergraph may provide with simpler hypergraph structure than valuation of hyperedges by conditional distributions. This has vital im…
▽ More
This paper is devoted to expressiveness of hypergraphs for which uncertainty propagation by local computations via Shenoy/Shafer method applies. It is demonstrated that for this propagation method for a given joint belief distribution no valuation of hyperedges of a hypergraph may provide with simpler hypergraph structure than valuation of hyperedges by conditional distributions. This has vital implication that methods recovering belief networks from data have no better alternative for finding the simplest hypergraph structure for belief propagation. A method for recovery tree-structured belief networks has been developed and specialized for Dempster-Shafer belief functions
△ Less
Submitted 12 April, 2017;
originally announced April 2017.
-
Reconstruction of~3-D Rigid Smooth Curves Moving Free when Two Traceable Points Only are Available
Authors:
Mieczysław A. Kłopotek
Abstract:
This paper extends previous research in that sense that for orthogonal projections of rigid smooth (true-3D) curves moving totally free it reduces the number of required traceable points to two only (the best results known so far to the author are 3 points from free motion and 2 for motion restricted to rotation around a fixed direction and and 2 for motion restricted to influence of a homogeneous…
▽ More
This paper extends previous research in that sense that for orthogonal projections of rigid smooth (true-3D) curves moving totally free it reduces the number of required traceable points to two only (the best results known so far to the author are 3 points from free motion and 2 for motion restricted to rotation around a fixed direction and and 2 for motion restricted to influence of a homogeneous force field). The method used is exploitation of information on tangential projections. It discusses also possibility of simplification of reconstruction of flat curves moving free for prospective projections.
△ Less
Submitted 11 April, 2017;
originally announced April 2017.
-
Beliefs and Probability in Bacchus' l.p. Logic: A~3-Valued Logic Solution to Apparent Counter-intuition
Authors:
Mieczysław A. Kłopotek
Abstract:
Fundamental discrepancy between first order logic and statistical inference (global versus local properties of universe) is shown to be the obstacle for integration of logic and probability in L.p. logic of Bacchus. To overcome the counterintuitiveness of L.p. behaviour, a 3-valued logic is proposed.
Fundamental discrepancy between first order logic and statistical inference (global versus local properties of universe) is shown to be the obstacle for integration of logic and probability in L.p. logic of Bacchus. To overcome the counterintuitiveness of L.p. behaviour, a 3-valued logic is proposed.
△ Less
Submitted 11 April, 2017;
originally announced April 2017.
-
Basic Formal Properties of A Relational Model of The Mathematical Theory of Evidence
Authors:
Mieczysław A. Kłopotek,
Sławomir T. Wierzchoń
Abstract:
The paper presents a novel view of the Dempster-Shafer belief function as a measure of diversity in relational data bases. It is demonstrated that under the interpretation The Dempster rule of evidence combination corresponds to the join operator of the relational database theory. This rough-set based interpretation is qualitative in nature and can represent a number of belief function operators.…
▽ More
The paper presents a novel view of the Dempster-Shafer belief function as a measure of diversity in relational data bases. It is demonstrated that under the interpretation The Dempster rule of evidence combination corresponds to the join operator of the relational database theory. This rough-set based interpretation is qualitative in nature and can represent a number of belief function operators.
The interpretation has the property that Given a definition of the belief measure of objects in the interpretation domain we can perform operations in this domain and the measure of the resulting object is derivable from measures of component objects via belief operator. We demonstrated this property for Dempster rule of combination, marginalization, Shafer's conditioning, independent variables, Shenoy's notion of conditional independence of variables.
The interpretation is based on rough sets (in connection with decision tables), but differs from previous interpretations of this type in that it counts the diversity rather than frequencies in a decision table.
△ Less
Submitted 8 April, 2017;
originally announced April 2017.
-
Machine Learning Friendly Set Version of Johnson-Lindenstrauss Lemma
Authors:
Mieczysław A. Kłopotek
Abstract:
In this paper we make a novel use of the Johnson-Lindenstrauss Lemma. The Lemma has an existential form saying that there exists a JL transformation $f$ of the data points into lower dimensional space such that all of them fall into predefined error range $δ$.
We formulate in this paper a theorem stating that we can choose the target dimensionality in a random projection type JL linear transform…
▽ More
In this paper we make a novel use of the Johnson-Lindenstrauss Lemma. The Lemma has an existential form saying that there exists a JL transformation $f$ of the data points into lower dimensional space such that all of them fall into predefined error range $δ$.
We formulate in this paper a theorem stating that we can choose the target dimensionality in a random projection type JL linear transformation in such a way that with probability $1-ε$ all of them fall into predefined error range $δ$ for any user-predefined failure probability $ε$.
This result is important for applications such a data clustering where we want to have a priori dimensionality reducing transformation instead of trying out a (large) number of them, as with traditional Johnson-Lindenstrauss Lemma. In particular, we take a closer look at the $k$-means algorithm and prove that a good solution in the projected space is also a good solution in the original space. Furthermore, under proper assumptions local optima in the original space are also ones in the projected space. We define also conditions for which clusterability property of the original space is transmitted to the projected space, so that special case algorithms for the original space are also applicable in the projected space.
△ Less
Submitted 9 November, 2017; v1 submitted 4 March, 2017;
originally announced March 2017.
-
On the Consistency of $k$-means++ algorithm
Authors:
Mieczysław A. Kłopotek
Abstract:
We prove in this paper that the expected value of the objective function of the $k$-means++ algorithm for samples converges to population expected value. As $k$-means++, for samples, provides with constant factor approximation for $k$-means objectives, such an approximation can be achieved for the population with increase of the sample size.
This result is of potential practical relevance when o…
▽ More
We prove in this paper that the expected value of the objective function of the $k$-means++ algorithm for samples converges to population expected value. As $k$-means++, for samples, provides with constant factor approximation for $k$-means objectives, such an approximation can be achieved for the population with increase of the sample size.
This result is of potential practical relevance when one is considering using subsampling when clustering large data sets (large data bases).
△ Less
Submitted 20 February, 2017;
originally announced February 2017.
-
Traditional PageRank versus Network Capacity Bound
Authors:
Mieczysław A. Kłopotek,
Sławomir T. Wierzchom,
Robert A. Kłopotek,
Elżbieta A. Kłopotek
Abstract:
In a former paper we simplified the proof of a theorem on personalized random walk that is fundamental to graph nodes clustering and generalized it to bipartite graphs for a specific case where the proobability of random jump was proprtional to the number of links of "personally prefereed" nodes. In this paper we turn to the more complex issue of graphs in which the random jump follows uniform dis…
▽ More
In a former paper we simplified the proof of a theorem on personalized random walk that is fundamental to graph nodes clustering and generalized it to bipartite graphs for a specific case where the proobability of random jump was proprtional to the number of links of "personally prefereed" nodes. In this paper we turn to the more complex issue of graphs in which the random jump follows uniform distribution.
△ Less
Submitted 13 February, 2017;
originally announced February 2017.
-
Validity of Clusters Produced By kernel-$k$-means With Kernel-Trick
Authors:
Mieczysław A. Kłopotek
Abstract:
This paper corrects the proof of the Theorem 2 from the Gower's paper \cite[page 5]{Gower:1982} as well as corrects the Theorem 7 from Gower's paper \cite{Gower:1986}. The first correction is needed in order to establish the existence of the kernel function used commonly in the kernel trick e.g. for $k$-means clustering algorithm, on the grounds of distance matrix. The correction encompasses the m…
▽ More
This paper corrects the proof of the Theorem 2 from the Gower's paper \cite[page 5]{Gower:1982} as well as corrects the Theorem 7 from Gower's paper \cite{Gower:1986}. The first correction is needed in order to establish the existence of the kernel function used commonly in the kernel trick e.g. for $k$-means clustering algorithm, on the grounds of distance matrix. The correction encompasses the missing if-part proof and drop** unnecessary conditions. The second correction deals with transformation of the kernel matrix into a one embeddable in Euclidean space.
△ Less
Submitted 21 December, 2018; v1 submitted 19 January, 2017;
originally announced January 2017.
-
Semantic classifier approach to document classification
Authors:
Piotr Borkowski,
Krzysztof Ciesielski,
Mieczysław A. Kłopotek
Abstract:
In this paper we propose a new document classification method, bridging discrepancies (so-called semantic gap) between the training set and the application sets of textual data. We demonstrate its superiority over classical text classification approaches, including traditional classifier ensembles. The method consists in combining a document categorization technique with a single classifier or a c…
▽ More
In this paper we propose a new document classification method, bridging discrepancies (so-called semantic gap) between the training set and the application sets of textual data. We demonstrate its superiority over classical text classification approaches, including traditional classifier ensembles. The method consists in combining a document categorization technique with a single classifier or a classifier ensemble (SEMCOM algorithm - Committee with Semantic Categorizer).
△ Less
Submitted 16 January, 2017;
originally announced January 2017.
-
Grammatical Case Based IS-A Relation Extraction with Boosting for Polish
Authors:
Paweł Łoziński,
Dariusz Czerski,
Mieczysław A. Kłopotek
Abstract:
Pattern-based methods of IS-A relation extraction rely heavily on so called Hearst patterns. These are ways of expressing instance enumerations of a class in natural language. While these lexico-syntactic patterns prove quite useful, they may not capture all taxonomical relations expressed in text. Therefore in this paper we describe a novel method of IS-A relation extraction from patterns, which…
▽ More
Pattern-based methods of IS-A relation extraction rely heavily on so called Hearst patterns. These are ways of expressing instance enumerations of a class in natural language. While these lexico-syntactic patterns prove quite useful, they may not capture all taxonomical relations expressed in text. Therefore in this paper we describe a novel method of IS-A relation extraction from patterns, which uses morpho-syntactical annotations along with grammatical case of noun phrases that constitute entities participating in IS-A relation. We also describe a method for increasing the number of extracted relations that we call pseudo-subclass boosting which has potential application in any pattern-based relation extraction method. Experiments were conducted on a corpus of about 0.5 billion web documents in Polish language.
△ Less
Submitted 10 May, 2016;
originally announced May 2016.