Search | arXiv e-print repository

Modularity Based Community Detection in Hypergraphs

Authors: Bogumił Kamiński, Paweł Misiorek, Paweł Prałat, François Théberge

Abstract: In this paper, we propose a scalable community detection algorithm using hypergraph modularity function, h-Louvain. It is an adaptation of the classical Louvain algorithm in the context of hypergraphs. We observe that a direct application of the Louvain algorithm to optimize the hypergraph modularity function often fails to find meaningful communities. We propose a solution to this issue by adjust… ▽ More In this paper, we propose a scalable community detection algorithm using hypergraph modularity function, h-Louvain. It is an adaptation of the classical Louvain algorithm in the context of hypergraphs. We observe that a direct application of the Louvain algorithm to optimize the hypergraph modularity function often fails to find meaningful communities. We propose a solution to this issue by adjusting the initial stage of the algorithm via carefully and dynamically tuned linear combination of the graph modularity function of the corresponding two-section graph and the desired hypergraph modularity function. The process is guided by Bayesian optimization of the hyper-parameters of the proposed procedure. Various experiments on synthetic as well as real-world networks are performed showing that this process yields improved results in various regimes. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 21 pages, 8 figures, 4 tables

ACM Class: I.6.5; G.4

arXiv:2404.05097 [pdf, other]

Quantitative Weakest Hyper Pre: Unifying Correctness and Incorrectness Hyperproperties via Predicate Transformers

Authors: Linpeng Zhang, Noam Zilberstein, Benjamin Lucien Kaminski, Alexandra Silva

Abstract: We present a novel \emph{weakest pre calculus} for \emph{reasoning about quantitative hyperproperties} over \emph{nondeterministic and probabilistic} programs. Whereas existing calculi allow reasoning about the expected value that a quantity assumes after program termination from a \emph{single initial state}, we do so for \emph{initial sets of states} or \emph{initial probability distributions}.… ▽ More We present a novel \emph{weakest pre calculus} for \emph{reasoning about quantitative hyperproperties} over \emph{nondeterministic and probabilistic} programs. Whereas existing calculi allow reasoning about the expected value that a quantity assumes after program termination from a \emph{single initial state}, we do so for \emph{initial sets of states} or \emph{initial probability distributions}. We thus (i)~obtain a weakest pre calculus for hyper Hoare logic and (ii)~enable reasoning about so-called \emph{hyperquantities} which include expected values but also quantities (e.g. variance) out of scope of previous work. As a byproduct, we obtain a novel strongest post for weighted programs that extends both existing strongest and strongest liberal post calculi. Our framework reveals novel dualities between forward and backward transformers, correctness and incorrectness, as well as nontermination and unreachability. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2312.09662 [pdf, ps, other]

Hoare-Like Triples and Kleene Algebras with Top and Tests: Towards a Holistic Perspective on Hoare Logic, Incorrectness Logic, and Beyond

Authors: Lena Verscht, Benjamin Kaminski

Abstract: We aim at a holistic perspective on program logics, including Hoare and incorrectness logics. To this end, we study different classes of properties arising from the generalization of the aforementioned logics. We compare our results with the properties expressible in the language of Kleene algebra with top and tests. We aim at a holistic perspective on program logics, including Hoare and incorrectness logics. To this end, we study different classes of properties arising from the generalization of the aforementioned logics. We compare our results with the properties expressible in the language of Kleene algebra with top and tests. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Preprint of extended abstract presented at the workshop on Formal Methods for Incorrectness 2024 at POPL 2024

arXiv:2312.00238 [pdf, other]

Self-similarity of Communities of the ABCD Model

Authors: Jordan Barrett, Bogumil Kaminski, Pawel Pralat, Francois Theberge

Abstract: The Artificial Benchmark for Community Detection (ABCD) graph is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs similar to the well-known LFR model but it is faster and can be investigated analytically. In this paper, we show that the ABCD model exhibits some interesting self-similar behaviour, namely, the… ▽ More The Artificial Benchmark for Community Detection (ABCD) graph is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs similar to the well-known LFR model but it is faster and can be investigated analytically. In this paper, we show that the ABCD model exhibits some interesting self-similar behaviour, namely, the degree distribution of ground-truth communities is asymptotically the same as the degree distribution of the whole graph (appropriately normalized based on their sizes). As a result, we can not only estimate the number of edges induced by each community but also the number of self-loops and multi-edges generated during the process. Understanding these quantities is important as (a) rewiring self-loops and multi-edges to keep the graph simple is an expensive part of the algorithm, and (b) every rewiring causes the underlying configuration models to deviate slightly from uniform simple graphs on their corresponding degree sequences. △ Less

Submitted 30 November, 2023; originally announced December 2023.

arXiv:2311.04730 [pdf, other]

Predicting Properties of Nodes via Community-Aware Features

Authors: Bogumił Kamiński, Paweł Prałat, François Théberge, Sebastian Zając

Abstract: This paper shows how information about the network's community structure can be used to define node features with high predictive power for classification tasks. To do so, we define a family of community-aware node features and investigate their properties. Those features are designed to ensure that they can be efficiently computed even for large graphs. We show that community-aware node features… ▽ More This paper shows how information about the network's community structure can be used to define node features with high predictive power for classification tasks. To do so, we define a family of community-aware node features and investigate their properties. Those features are designed to ensure that they can be efficiently computed even for large graphs. We show that community-aware node features contain information that cannot be completely recovered by classical node features or node embeddings (both classical and structural) and bring value in node classification tasks. This is verified for various classification tasks on synthetic and real-life networks. △ Less

Submitted 26 April, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

Comments: 21 pages, 3 figures, 7 tables

ACM Class: I.6.5; G.4

arXiv:2309.07781 [pdf, other]

doi 10.1145/3622870

A Deductive Verification Infrastructure for Probabilistic Programs (Extended Version)

Authors: Philipp Schröer, Kevin Batz, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja

Abstract: This paper presents a quantitative program verification infrastructure for discrete probabilistic programs. Our infrastructure can be viewed as the probabilistic analogue of Boogie: its central components are an intermediate verification language (IVL) together with a real-valued logic. Our IVL provides a programming-language-style for expressing verification conditions whose validity implies the… ▽ More This paper presents a quantitative program verification infrastructure for discrete probabilistic programs. Our infrastructure can be viewed as the probabilistic analogue of Boogie: its central components are an intermediate verification language (IVL) together with a real-valued logic. Our IVL provides a programming-language-style for expressing verification conditions whose validity implies the correctness of a program under investigation. As our focus is on verifying quantitative properties such as bounds on expected outcomes, expected run-times, or termination probabilities, off-the-shelf IVLs based on Boolean first-order logic do not suffice. Instead, a paradigm shift from the standard Boolean to a real-valued domain is required. Our IVL features quantitative generalizations of standard verification constructs such as assume- and assert-statements. Verification conditions are generated by a weakest-precondition-style semantics, based on our real-valued logic. We show that our verification infrastructure supports natural encodings of numerous verification techniques from the literature. With our SMT-based implementation, we automatically verify a variety of benchmarks. To the best of our knowledge, this establishes the first deductive verification infrastructure for expectation-based reasoning about probabilistic programs. △ Less

Submitted 15 November, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: This is the extended version of the the publication at OOPSLA 2023 (https://doi.org/10.1145/3622870)

arXiv:2308.09536 [pdf, other]

The 19th International Workshop on Termination (WST 2023): Preface, Invited Talk Abstract, and Tool Descriptions

Authors: Akihisa Yamada, Benjamin Lucien Kaminski, Dieter Hofbauer, Fred Mesnard, Étienne Payet

Abstract: This report contains the proceedings of the 19th International Workshop on Termination (WST 2023), which was held in Obergurgl during August 24--25 as part of Obergurgl Summer on Rewriting (OSR 2023). This report contains the proceedings of the 19th International Workshop on Termination (WST 2023), which was held in Obergurgl during August 24--25 as part of Obergurgl Summer on Rewriting (OSR 2023). △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2306.10770 [pdf, other]

Unsupervised Framework for Evaluating and Explaining Structural Node Embeddings of Graphs

Authors: Ashkan Dehghan, Kinga Siuta, Agata Skorupka, Andrei Betlen, David Miller, Bogumil Kaminski, Pawel Pralat

Abstract: An embedding is a map** from a set of nodes of a network into a real vector space. Embeddings can have various aims like capturing the underlying graph topology and structure, node-to-node relationship, or other relevant information about the graph, its subgraphs or nodes themselves. A practical challenge with using embeddings is that there are many available variants to choose from. Selecting a… ▽ More An embedding is a map** from a set of nodes of a network into a real vector space. Embeddings can have various aims like capturing the underlying graph topology and structure, node-to-node relationship, or other relevant information about the graph, its subgraphs or nodes themselves. A practical challenge with using embeddings is that there are many available variants to choose from. Selecting a small set of most promising embeddings from the long list of possible options for a given task is challenging and often requires domain expertise. Embeddings can be categorized into two main types: classical embeddings and structural embeddings. Classical embeddings focus on learning both local and global proximity of nodes, while structural embeddings learn information specifically about the local structure of nodes' neighbourhood. For classical node embeddings there exists a framework which helps data scientists to identify (in an unsupervised way) a few embeddings that are worth further investigation. Unfortunately, no such framework exists for structural embeddings. In this paper we propose a framework for unsupervised ranking of structural graph embeddings. The proposed framework, apart from assigning an aggregate quality score for a structural embedding, additionally gives a data scientist insights into properties of this embedding. It produces information which predefined node features the embedding learns, how well it learns them, and which dimensions in the embedded space represent the predefined node features. Using this information the user gets a level of explainability to an otherwise complex black-box embedding algorithm. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Comments: 21 pages, 9 figures

MSC Class: 68T30

arXiv:2302.06082 [pdf, other]

Lower Bounds for Possibly Divergent Probabilistic Programs

Authors: Shenghua Feng, Mingshuai Chen, Han Su, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Naijun Zhan

Abstract: We present a new proof rule for verifying lower bounds on quantities of probabilistic programs. Our proof rule is not confined to almost-surely terminating programs -- as is the case for existing rules -- and can be used to establish non-trivial lower bounds on, e.g., termination probabilities and expected values, for possibly divergent probabilistic loops, e.g., the well-known three-dimensional r… ▽ More We present a new proof rule for verifying lower bounds on quantities of probabilistic programs. Our proof rule is not confined to almost-surely terminating programs -- as is the case for existing rules -- and can be used to establish non-trivial lower bounds on, e.g., termination probabilities and expected values, for possibly divergent probabilistic loops, e.g., the well-known three-dimensional random walk on a lattice. △ Less

Submitted 12 February, 2023; originally announced February 2023.

Comments: Paper conditionally accepted by OOPSLA 2023

arXiv:2301.05749 [pdf, other]

doi 10.1007/s41109-023-00552-9

Artificial Benchmark for Community Detection with Outliers (ABCD+o)

Authors: Bogumił Kamiński, Paweł Prałat, François Théberge

Abstract: The Artificial Benchmark for Community Detection graph (ABCD) is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR one, and its main parameter $ξ$ can be tuned to mimic its counterpart in the LFR model, the mixing parameter $μ$. In this paper, we extend the ABCD mod… ▽ More The Artificial Benchmark for Community Detection graph (ABCD) is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR one, and its main parameter $ξ$ can be tuned to mimic its counterpart in the LFR model, the mixing parameter $μ$. In this paper, we extend the ABCD model to include potential outliers. We perform some exploratory experiments on both the new ABCD+o model as well as a real-world network to show that outliers possess some desired, distinguishable properties. △ Less

Submitted 12 June, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

Comments: 19 pages, 13 figures

ACM Class: I.6.5; G.4

Journal ref: Kamiński, B., Prałat, P. & Théberge, F. Artificial benchmark for community detection with outliers (ABCD+o). Appl Netw Sci 8, 25 (2023)

arXiv:2211.12923 [pdf, ps, other]

A Calculus for Amortized Expected Runtimes

Authors: Kevin Batz, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja, Lena Verscht

Abstract: We develop a weakest-precondition-style calculus à la Dijkstra for reasoning about amortized expected runtimes of randomized algorithms with access to dynamic memory - the $\textsf{aert}$ calculus. Our calculus is truly quantitative, i.e. instead of Boolean valued predicates, it manipulates real-valued functions. En route to the $\textsf{aert}$ calculus, we study the $\textsf{ert}$ calculus for… ▽ More We develop a weakest-precondition-style calculus à la Dijkstra for reasoning about amortized expected runtimes of randomized algorithms with access to dynamic memory - the $\textsf{aert}$ calculus. Our calculus is truly quantitative, i.e. instead of Boolean valued predicates, it manipulates real-valued functions. En route to the $\textsf{aert}$ calculus, we study the $\textsf{ert}$ calculus for reasoning about expected runtimes of Kaminski et al. [2018] extended by capabilities for handling dynamic memory, thus enabling compositional and local reasoning about randomized data structures. This extension employs runtime separation logic, which has been foreshadowed by Matheja [2020] and then implemented in Isabelle/HOL by Haslbeck [2021]. In addition to Haslbeck's results, we further prove soundness of the so-extended $\textsf{ert}$ calculus with respect to an operational Markov decision process model featuring countably-branching nondeterminism, provide intuitive explanations, and provide proof rules enabling separation logic-style verification for upper bounds on expected runtimes. Finally, we build the so-called potential method for amortized analysis into the $\textsf{ert}$ calculus, thus obtaining the $\textsf{aert}$ calculus. Since one needs to be able to handle changes in potential which can be negative, the $\textsf{aert}$ calculus needs to be capable of handling signed random variables. A particularly pleasing feature of our solution is that, unlike e.g. Kozen [1985], we obtain a loop rule for our signed random variables, and furthermore, unlike e.g. Kaminski and Katoen [2017], the $\textsf{aert}$ calculus makes do without the need for involved technical machinery kee** track of the integrability of the random variables. Finally, we present case studies, including a formal analysis of a randomized delete-insert-find-any set data structure [Brodal et al. 1996]. △ Less

Submitted 23 November, 2022; originally announced November 2022.

arXiv:2210.15009 [pdf, other]

Hypergraph Artificial Benchmark for Community Detection (h-ABCD)

Authors: Bogumił Kamiński, Paweł Prałat, François Théberge

Abstract: The Artificial Benchmark for Community Detection (ABCD) graph is a recently introduced random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR one, and its main parameter can be tuned to mimic its counterpart in the LFR model, the mixing parameter. In this paper, we introdu… ▽ More The Artificial Benchmark for Community Detection (ABCD) graph is a recently introduced random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR one, and its main parameter can be tuned to mimic its counterpart in the LFR model, the mixing parameter. In this paper, we introduce hypergraph counterpart of the ABCD model, h-ABCD, which produces random hypergraph with distributions of ground-truth community sizes and degrees following power-law. As in the original ABCD, the new model h-ABCD can produce hypergraphs with various levels of noise. More importantly, the model is flexible and can mimic any desired level of homogeneity of hyperedges that fall into one community. As a result, it can be used as a suitable, synthetic playground for analyzing and tuning hypergraph community detection algorithms. △ Less

Submitted 12 June, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

Comments: 23 pages, 6 figures, 7 tables

ACM Class: I.6.5; G.4

arXiv:2205.06152 [pdf, other]

Probabilistic Program Verification via Inductive Synthesis of Inductive Invariants

Authors: Kevin Batz, Mingshuai Chen, Sebastian Junges, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja

Abstract: Essential tasks for the verification of probabilistic programs include bounding expected outcomes and proving termination in finite expected runtime. We contribute a simple yet effective inductive synthesis approach for proving such quantitative reachability properties by generating inductive invariants on source-code level. Our implementation shows promise: It finds invariants for (in)finite-stat… ▽ More Essential tasks for the verification of probabilistic programs include bounding expected outcomes and proving termination in finite expected runtime. We contribute a simple yet effective inductive synthesis approach for proving such quantitative reachability properties by generating inductive invariants on source-code level. Our implementation shows promise: It finds invariants for (in)finite-state programs, can beat state-of-the-art probabilistic model checkers, and is competitive with modern tools dedicated to invariant synthesis and expected runtime reasoning. △ Less

Submitted 8 February, 2023; v1 submitted 12 May, 2022; originally announced May 2022.

arXiv:2203.14899 [pdf, ps, other]

Properties and Performance of the ABCDe Random Graph Model with Community Structure

Authors: Bogumił Kamiński, Tomasz Olczak, Bartosz Pankratz, Paweł Prałat, François Théberge

Abstract: In this paper, we investigate properties and performance of synthetic random graph models with a built-in community structure. Such models are important for evaluating and tuning community detection algorithms that are unsupervised by nature. We propose ABCDe, a multi-threaded implementation of the ABCD (Artificial Benchmark for Community Detection) graph generator. We discuss the implementation d… ▽ More In this paper, we investigate properties and performance of synthetic random graph models with a built-in community structure. Such models are important for evaluating and tuning community detection algorithms that are unsupervised by nature. We propose ABCDe, a multi-threaded implementation of the ABCD (Artificial Benchmark for Community Detection) graph generator. We discuss the implementation details of the algorithm and compare it with both the previously available sequential version of the ABCD model and with the parallel implementation of the standard and extensively used LFR (Lancichinetti--Fortunato--Radicchi) generator. We show that ABCDe is more than ten times faster and scales better than the parallel implementation of LFR provided in NetworKit. Moreover, the algorithm is not only faster but random graphs generated by ABCD have similar properties to the ones generated by the original LFR algorithm, while the parallelized NetworKit implementation of LFR produces graphs that have noticeably different characteristics. △ Less

Submitted 16 September, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

Comments: 15 pages, 10 figures, 1 table

ACM Class: I.6.5; G.4

arXiv:2203.01480 [pdf, other]

Modularity of the ABCD Random Graph Model with Community Structure

Authors: Bogumil Kaminski, Bartosz Pankratz, Pawel Pralat, Francois Theberge

Abstract: The Artificial Benchmark for Community Detection (ABCD) graph is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR one, and its main parameter $ξ$ can be tuned to mimic its counterpart in the LFR model, the mixing parameter $μ$. In this paper, we investigate vario… ▽ More The Artificial Benchmark for Community Detection (ABCD) graph is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR one, and its main parameter $ξ$ can be tuned to mimic its counterpart in the LFR model, the mixing parameter $μ$. In this paper, we investigate various theoretical asymptotic properties of the ABCD model. In particular, we analyze the modularity function, arguably, the most important graph property of networks in the context of community detection. Indeed, the modularity function is often used to measure the presence of community structure in networks. It is also used as a quality function in many community detection algorithms, including the widely used Louvain algorithm. △ Less

Submitted 2 March, 2022; originally announced March 2022.

Comments: 43 pages

arXiv:2202.07577 [pdf, ps, other]

Weighted Programming

Authors: Kevin Batz, Adrian Gallus, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Tobias Winkler

Abstract: We study weighted programming, a programming paradigm for specifying mathematical models. More specifically, the weighted programs we investigate are like usual imperative programs with two additional features: (1) nondeterministic branching and (2) weighting execution traces. Weights can be numbers but also other objects like words from an alphabet, polynomials, formal power series, or cardinal n… ▽ More We study weighted programming, a programming paradigm for specifying mathematical models. More specifically, the weighted programs we investigate are like usual imperative programs with two additional features: (1) nondeterministic branching and (2) weighting execution traces. Weights can be numbers but also other objects like words from an alphabet, polynomials, formal power series, or cardinal numbers. We argue that weighted programming as a paradigm can be used to specify mathematical models beyond probability distributions (as is done in probabilistic programming). We develop weakest-precondition- and weakest-liberal-precondition-style calculi à la Dijkstra for reasoning about mathematical models specified by weighted programs. We present several case studies. For instance, we use weighted programming to model the ski rental problem - an optimization problem. We model not only the optimization problem itself, but also the best deterministic online algorithm for solving this problem as weighted programs. By means of weakest-precondition-style reasoning, we can determine the competitive ratio of the online algorithm on source code level. △ Less

Submitted 30 March, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

Comments: 71 pages

ACM Class: F.3.2

arXiv:2202.06765 [pdf, other]

Quantitative Strongest Post

Authors: Linpeng Zhang, Benjamin Lucien Kaminski

Abstract: We present a novel strongest-postcondition-style calculus for quantitative reasoning about non-deterministic programs with loops. Whereas existing quantitative weakest pre allows reasoning about the value of a quantity after a program terminates on a given initial state, quantitative strongest post allows reasoning about the value that a quantity had before the program was executed and reached a g… ▽ More We present a novel strongest-postcondition-style calculus for quantitative reasoning about non-deterministic programs with loops. Whereas existing quantitative weakest pre allows reasoning about the value of a quantity after a program terminates on a given initial state, quantitative strongest post allows reasoning about the value that a quantity had before the program was executed and reached a given final state. We show how strongest post enables reasoning about the flow of quantitative information through programs. Similarly to weakest liberal preconditions, we also develop a quantitative strongest liberal post. As a byproduct, we obtain the entirely unexplored notion of strongest liberal postconditions and show how these foreshadow a potential new program logic - partial incorrectness logic - which would be a more liberal version of O'Hearn's recent incorrectness logic. △ Less

Submitted 14 February, 2022; originally announced February 2022.

Comments: Full version of a conference paper presented at OOPSLA 2022

arXiv:2112.00075 [pdf, other]

A Multi-purposed Unsupervised Framework for Comparing Embeddings of Undirected and Directed Graphs

Authors: Bogumił Kamiński, Łukasz Kraiński, Paweł Prałat, François Théberge

Abstract: Graph embedding is a transformation of nodes of a network into a set of vectors. A good embedding should capture the underlying graph topology and structure, node-to-node relationship, and other relevant information about the graph, its subgraphs, and nodes themselves. If these objectives are achieved, an embedding is a meaningful, understandable, and often compressed representation of a network.… ▽ More Graph embedding is a transformation of nodes of a network into a set of vectors. A good embedding should capture the underlying graph topology and structure, node-to-node relationship, and other relevant information about the graph, its subgraphs, and nodes themselves. If these objectives are achieved, an embedding is a meaningful, understandable, and often compressed representation of a network. Unfortunately, selecting the best embedding is a challenging task and very often requires domain experts. In this paper, we extend the framework for evaluating graph embeddings that was recently introduced by the authors. Now, the framework assigns two scores, local and global, to each embedding that measure the quality of an evaluated embedding for tasks that require good representation of local and, respectively, global properties of the network. The best embedding, if needed, can be selected in an unsupervised way, or the framework can identify a few embeddings that are worth further investigation. The framework is flexible, scalable, and can deal with undirected/directed, weighted/unweighted graphs. △ Less

Submitted 30 November, 2021; originally announced December 2021.

Comments: 32 pages, 15 figures

arXiv:2105.14100 [pdf, ps, other]

Latticed $k$-Induction with an Application to Probabilistic Programs

Authors: Kevin Batz, Mingshuai Chen, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja, Philipp Schröer

Abstract: We revisit two well-established verification techniques, $k$-induction and bounded model checking (BMC), in the more general setting of fixed point theory over complete lattices. Our main theoretical contribution is latticed $k$-induction, which (i) generalizes classical $k$-induction for verifying transition systems, (ii) generalizes Park induction for bounding fixed points of monotonic maps on c… ▽ More We revisit two well-established verification techniques, $k$-induction and bounded model checking (BMC), in the more general setting of fixed point theory over complete lattices. Our main theoretical contribution is latticed $k$-induction, which (i) generalizes classical $k$-induction for verifying transition systems, (ii) generalizes Park induction for bounding fixed points of monotonic maps on complete lattices, and (iii) extends from naturals $k$ to transfinite ordinals $κ$, thus yielding $κ$-induction. The lattice-theoretic understanding of $k$-induction and BMC enables us to apply both techniques to the fully automatic verification of infinite-state probabilistic programs. Our prototypical implementation manages to automatically verify non-trivial specifications for probabilistic programs taken from the literature that - using existing techniques - cannot be verified without synthesizing a stronger inductive invariant first. △ Less

Submitted 28 May, 2021; originally announced May 2021.

Comments: to be published in: CAV (2021)

arXiv:2102.08275 [pdf, other]

Evaluating Node Embeddings of Complex Networks

Authors: Arash Dehghan-Kooshkghazi, Bogumił Kamiński, Łukasz Kraiński, Paweł Prałat, François Théberge

Abstract: Graph embedding is a transformation of nodes of a graph into a set of vectors. A~good embedding should capture the graph topology, node-to-node relationship, and other relevant information about the graph, its subgraphs, and nodes. If these objectives are achieved, an embedding is a meaningful, understandable, compressed representations of a network that can be used for other machine learning tool… ▽ More Graph embedding is a transformation of nodes of a graph into a set of vectors. A~good embedding should capture the graph topology, node-to-node relationship, and other relevant information about the graph, its subgraphs, and nodes. If these objectives are achieved, an embedding is a meaningful, understandable, compressed representations of a network that can be used for other machine learning tools such as node classification, community detection, or link prediction. The main challenge is that one needs to make sure that embeddings describe the properties of the graphs well. As a result, selecting the best embedding is a challenging task and very often requires domain experts. In this paper, we do a series of extensive experiments with selected graph embedding algorithms, both on real-world networks as well as artificially generated ones. Based on those experiments we formulate two general conclusions. First, if one needs to pick one embedding algorithm before running the experiments, then node2vec is the best choice as it performed best in our tests. Having said that, there is no single winner in all tests and, additionally, most embedding algorithms have hyperparameters that should be tuned and are randomized. Therefore, our main recommendation for practitioners is, if possible, to generate several embeddings for a problem at hand and then use a general framework that provides a tool for an unsupervised graph embedding comparison. This framework (introduced recently in the literature and easily available on GitHub repository) assigns the divergence score to embeddings to help distinguish good ones from bad ones. △ Less

Submitted 17 June, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: 29 pages, 19 figures

arXiv:2101.12289 [pdf, ps, other]

Probabilistic Data with Continuous Distributions

Authors: Martin Grohe, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Peter Lindner

Abstract: Statistical models of real world data typically involve continuous probability distributions such as normal, Laplace, or exponential distributions. Such distributions are supported by many probabilistic modelling formalisms, including probabilistic database systems. Yet, the traditional theoretical framework of probabilistic databases focusses entirely on finite probabilistic databases. Only rec… ▽ More Statistical models of real world data typically involve continuous probability distributions such as normal, Laplace, or exponential distributions. Such distributions are supported by many probabilistic modelling formalisms, including probabilistic database systems. Yet, the traditional theoretical framework of probabilistic databases focusses entirely on finite probabilistic databases. Only recently, we set out to develop the mathematical theory of infinite probabilistic databases. The present paper is an exposition of two recent papers which are cornerstones of this theory. In (Grohe, Lindner; ICDT 2020) we propose a very general framework for probabilistic databases, possibly involving continuous probability distributions, and show that queries have a well-defined semantics in this framework. In (Grohe, Kaminski, Katoen, Lindner; PODS 2020) we extend the declarative probabilistic programming language Generative Datalog, proposed by (Bárány et al.~2017) for discrete probability distributions, to continuous probability distributions and show that such programs yield generative models of continuous probabilistic databases. △ Less

Submitted 5 March, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

arXiv:2010.14548 [pdf, other]

Relatively Complete Verification of Probabilistic Programs

Authors: Kevin Batz, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja

Abstract: We study a syntax for specifying quantitative "assertions" - functions map** program states to numbers - for probabilistic program verification. We prove that our syntax is expressive in the following sense: Given any probabilistic program $C$, if a function $f$ is expressible in our syntax, then the function map** each initial state $σ$ to the expected value of $f$ evaluated in the final stat… ▽ More We study a syntax for specifying quantitative "assertions" - functions map** program states to numbers - for probabilistic program verification. We prove that our syntax is expressive in the following sense: Given any probabilistic program $C$, if a function $f$ is expressible in our syntax, then the function map** each initial state $σ$ to the expected value of $f$ evaluated in the final states reached after termination of $C$ on $σ$ (also called the weakest preexpectation $\textit{wp} [C](f)$) is also expressible in our syntax. As a consequence, we obtain a relatively complete verification system for reasoning about expected values and probabilities in the sense of Cook: Apart from proving a single inequality between two functions given by syntactic expressions in our language, given $f$, $g$, and $C$, we can check whether $g \preceq \textit{wp} [C] (f)$. △ Less

Submitted 28 January, 2022; v1 submitted 27 October, 2020; originally announced October 2020.

arXiv:2007.06327 [pdf, ps, other]

Generating Functions for Probabilistic Programs

Authors: Lutz Klinkenberg, Kevin Batz, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Joshua Moerman, Tobias Winkler

Abstract: This paper investigates the usage of generating functions (GFs) encoding measures over the program variables for reasoning about discrete probabilistic programs. To that end, we define a denotational GF-transformer semantics for probabilistic while-programs, and show that it instantiates Kozen's seminal distribution transformer semantics. We then study the effective usage of GFs for program analys… ▽ More This paper investigates the usage of generating functions (GFs) encoding measures over the program variables for reasoning about discrete probabilistic programs. To that end, we define a denotational GF-transformer semantics for probabilistic while-programs, and show that it instantiates Kozen's seminal distribution transformer semantics. We then study the effective usage of GFs for program analysis. We show that finitely expressible GFs enable checking super-invariants by means of computer algebra tools, and that they can be used to determine termination probabilities. The paper concludes by characterizing a class of -- possibly infinite-state -- programs whose semantics is a rational GF encoding a discrete phase-type distribution. △ Less

Submitted 13 July, 2020; originally announced July 2020.

arXiv:2004.14835 [pdf, ps, other]

PrIC3: Property Directed Reachability for MDPs

Authors: Kevin Batz, Sebastian Junges, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja, Philipp Schröer

Abstract: IC3 has been a leap forward in symbolic model checking. This paper proposes PrIC3 (pronounced pricy-three), a conservative extension of IC3 to symbolic model checking of MDPs. Our main focus is to develop the theory underlying PrIC3. Alongside, we present a first implementation of PrIC3 including the key ingredients from IC3 such as generalization, repushing, and propagation. IC3 has been a leap forward in symbolic model checking. This paper proposes PrIC3 (pronounced pricy-three), a conservative extension of IC3 to symbolic model checking of MDPs. Our main focus is to develop the theory underlying PrIC3. Alongside, we present a first implementation of PrIC3 including the key ingredients from IC3 such as generalization, repushing, and propagation. △ Less

Submitted 18 May, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

arXiv:2003.08501 [pdf, other]

On Broadcasting Time in the Model of Travelling Agents

Authors: Reaz Huq, Bogumil Kaminski, Atefeh Mashatan, Pawel Pralat, Przemyslaw Szufel

Abstract: Consider the following broadcasting process run on a connected graph $G=(V,E)$. Suppose that $k \ge 2$ agents start on vertices selected from $V$ uniformly and independently at random. One of the agents has a message that she wants to communicate to the other agents. All agents perform independent random walks on $G$, with the message being passed when an agent that knows the message meets an agen… ▽ More Consider the following broadcasting process run on a connected graph $G=(V,E)$. Suppose that $k \ge 2$ agents start on vertices selected from $V$ uniformly and independently at random. One of the agents has a message that she wants to communicate to the other agents. All agents perform independent random walks on $G$, with the message being passed when an agent that knows the message meets an agent that does not know the message. The broadcasting time $ξ(G,k)$ is the time it takes to spread the message to all agents. Our ultimate goal is to gain a better understanding of the broadcasting process run on real-world networks of roads of large cities that might shed some light on the behaviour of future autonomous and connected vehicles. Due to the complexity of road networks, such phenomena have to be studied using simulation in practical applications. In this paper, we study the process on the simplest scenario, i.e., the family of complete graphs, as in this case the problem is analytically tractable. We provide tight bounds for $ξ(K_n,k)$ that hold asymptotically almost surely for the whole range of the parameter $k$. These theoretical results reveal interesting relationships and, at the same time, are also helpful to understand and explain the behaviour we observe in more realistic networks. △ Less

Submitted 18 March, 2020; originally announced March 2020.

arXiv:2002.04654 [pdf, other]

doi 10.24166/im.01.2020

Analyzing, Exploring, and Visualizing Complex Networks via Hypergraphs using SimpleHypergraphs.jl

Authors: Alessia Antelmi, Gennaro Cordasco, Bogumił Kamiński, Paweł Prałat, Vittorio Scarano, Carmine Spagnuolo, Przemyslaw Szufel

Abstract: Real-world complex networks are usually being modeled as graphs. The concept of graphs assumes that the relations within the network are binary (for instance, between pairs of nodes); however, this is not always true for many real-life scenarios, such as peer-to-peer communication schemes, paper co-authorship, or social network interactions. For such scenarios, it is often the case that the underl… ▽ More Real-world complex networks are usually being modeled as graphs. The concept of graphs assumes that the relations within the network are binary (for instance, between pairs of nodes); however, this is not always true for many real-life scenarios, such as peer-to-peer communication schemes, paper co-authorship, or social network interactions. For such scenarios, it is often the case that the underlying network is better and more naturally modeled by hypergraphs. A hypergraph is a generalization of a graph in which a single (hyper)edge can connect any number of vertices. Hypergraphs allow modelers to have a complete representation of multi-relational (many-to-many) networks; hence, they are extremely suitable for analyzing and discovering more subtle dependencies in such data structures. Working with hypergraphs requires new software libraries that make it possible to perform operations on them, from basic algorithms (such as searching or traversing the network) to computing significant hypergraph measures, to including more challenging algorithms (such as community detection). In this paper, we present a new software library, SimpleHypergraphs.jl, written in the Julia language and designed for high-performance computing on hypergraphs and propose two new algorithms for analyzing their properties: s-betweenness and modified label propagation. We also present various approaches for hypergraph visualization integrated into our tool. In order to demonstrate how to exploit the library in practice, we discuss two case studies based on the 2019 Yelp Challenge dataset and the collaboration network built upon the Game of Thrones TV series. The results are promising and they confirm the ability of hypergraphs to provide more insight than standard graph-based approaches. △ Less

Submitted 28 March, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

Comments: 32 pages, 10 figures, 7 tables, submitted to Internet Mathematics Journal

Report number: Vol. 1, Issue 1, 2020 ACM Class: G.2.2; G.4

Journal ref: Internet Mathematics March 31, 2020

arXiv:2002.00843 [pdf, other]

doi 10.1017/nws.2020.45

Artificial Benchmark for Community Detection (ABCD): Fast Random Graph Model with Community Structure

Authors: Bogumił Kamiński, Paweł Prałat, François Théberge

Abstract: Most of the current complex networks that are of interest to practitioners possess a certain community structure that plays an important role in understanding the properties of these networks. Moreover, many machine learning algorithms and tools that are developed for complex networks try to take advantage of the existence of communities to improve their performance or speed. As a result, there ar… ▽ More Most of the current complex networks that are of interest to practitioners possess a certain community structure that plays an important role in understanding the properties of these networks. Moreover, many machine learning algorithms and tools that are developed for complex networks try to take advantage of the existence of communities to improve their performance or speed. As a result, there are many competing algorithms for detecting communities in large networks. Unfortunately, these algorithms are often quite sensitive and so they cannot be fine-tuned for a given, but a constantly changing, real-world network at hand. It is therefore important to test these algorithms for various scenarios that can only be done using synthetic graphs that have built-in community structure, power-law degree distribution, and other typical properties observed in complex networks. The standard and extensively used method for generating artificial networks is the LFR graph generator. Unfortunately, this model has some scalability limitations and it is challenging to analyze it theoretically. Finally, the mixing parameter $μ$, the main parameter of the model guiding the strength of the communities, has a non-obvious interpretation and so can lead to unnaturally-defined networks. In this paper, we provide an alternative random graph model with community structure and power-law distribution for both degrees and community sizes, the Artificial Benchmark for Community Detection (ABCD). We show that the new model solves the three issues identified above and more. The conclusion is that these models produce comparable graphs but ABCD is fast, simple, and can be easily tuned to allow the user to make a smooth transition between the two extremes: pure (independent) communities and random graph with no community structure. △ Less

Submitted 14 January, 2020; originally announced February 2020.

Comments: 22 pages, 4 figures

MSC Class: 68U20 ACM Class: I.6.5; G.4

arXiv:2001.06358 [pdf, ps, other]

Generative Datalog with Continuous Distributions

Authors: Martin Grohe, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Peter Lindner

Abstract: Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a "purely declarative probabilistic programming language." We revisit this language and propose a more principled approach towards defining its semantics based on stochastic kernels and Markov processes - standard notions from probability… ▽ More Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a "purely declarative probabilistic programming language." We revisit this language and propose a more principled approach towards defining its semantics based on stochastic kernels and Markov processes - standard notions from probability theory. This allows us to extend the semantics to continuous probability distributions, thereby settling an open problem posed by Bárány et al. We show that our semantics is fairly robust, allowing both parallel execution and arbitrary chase orders when evaluating a program. We cast our semantics in the framework of infinite probabilistic databases (Grohe and Lindner, ICDT 2020), and show that the semantics remains meaningful even when the input of a probabilistic Datalog program is an arbitrary probabilistic database. △ Less

Submitted 16 February, 2022; v1 submitted 17 January, 2020; originally announced January 2020.

Comments: Extended Version

arXiv:1910.01100 [pdf, ps, other]

Optimistic Value Iteration

Authors: Arnd Hartmanns, Benjamin Lucien Kaminski

Abstract: Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides a lower bound on unbounded probabilities or reward values. Two "sound" variations, which also deliver an upper bound, have recently appeared. In this paper, we present optim… ▽ More Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides a lower bound on unbounded probabilities or reward values. Two "sound" variations, which also deliver an upper bound, have recently appeared. In this paper, we present optimistic value iteration, a new sound approach that leverages value iteration's ability to usually deliver tight lower bounds: we obtain a lower bound via standard value iteration, use the result to "guess" an upper bound, and prove the latter's correctness. Optimistic value iteration is easy to implement, does not require extra precomputations or a priori state space transformations, and works for computing reachability probabilities as well as expected rewards. It is also fast, as we show via an extensive experimental evaluation using our publicly available implementation within the Modest Toolset. △ Less

Submitted 17 October, 2019; v1 submitted 2 October, 2019; originally announced October 2019.

arXiv:1907.11445 [pdf, other]

Protocol for Asynchronous, Reliable, Secure and Efficient Consensus (PARSEC) Version 2.0

Authors: Pierre Chevalier, Bartlomiej Kaminski, Fraser Hutchison, Qi Ma, Spandan Sharma, Andreas Fackler, William J Buchanan

Abstract: In this paper we present an open source, fully asynchronous, leaderless algorithm for reaching consensus in the presence of Byzantine faults in an asynchronous network. We prove the algorithm's correctness provided that less than a third of participating nodes are faulty. We also present a way of applying the algorithm to a network with dynamic membership, i.e. a network in which nodes can join an… ▽ More In this paper we present an open source, fully asynchronous, leaderless algorithm for reaching consensus in the presence of Byzantine faults in an asynchronous network. We prove the algorithm's correctness provided that less than a third of participating nodes are faulty. We also present a way of applying the algorithm to a network with dynamic membership, i.e. a network in which nodes can join and leave at will. The core contribution of this paper is an optimal model in the definition of an asynchronous BFT protocol, and which is resilient to 1/3 byzantine nodes. This model matches an agreement with probability one (unlike some probabilistic methods), and where a common coin is used as a source of randomization so that it respects the FLP impossibility result. △ Less

Submitted 26 July, 2019; originally announced July 2019.

arXiv:1906.04562 [pdf, other]

doi 10.1093/comnet/cnz043

An Unsupervised Framework for Comparing Graph Embeddings

Authors: Bogumil Kaminski, Pawel Pralat, Francois Theberge

Abstract: Graph embedding is a transformation of vertices of a graph into set of vectors. Good embeddings should capture the graph topology, vertex-to-vertex relationship, and other relevant information about graphs, subgraphs, and vertices. If these objectives are achieved, they are meaningful, understandable, and compressed representations of networks. They also provide more options and tools for data sci… ▽ More Graph embedding is a transformation of vertices of a graph into set of vectors. Good embeddings should capture the graph topology, vertex-to-vertex relationship, and other relevant information about graphs, subgraphs, and vertices. If these objectives are achieved, they are meaningful, understandable, and compressed representations of networks. They also provide more options and tools for data scientists as machine learning on graphs is still quite limited. Finally, vector operations are simpler and faster than comparable operations on graphs. The main challenge is that one needs to make sure that embeddings well describe the properties of the graphs. In particular, the decision has to be made on the embedding dimensionality which highly impacts the quality of an embedding. As a result, selecting the best embedding is a challenging task and very often requires domain experts. In this paper, we propose a ``divergence score'' that can be assign to various embeddings to distinguish good ones from bad ones. This general framework provides a tool for an unsupervised graph embedding comparison. In order to achieve it, we needed to generalize the well-known Chung-Lu model to incorporate geometry which is interesting on its own rights. In order to test our framework, we did a number of experiments with synthetic networks as well as real-world networks, and various embedding algorithms. △ Less

Submitted 29 May, 2019; originally announced June 2019.

arXiv:1904.01117 [pdf, ps, other]

doi 10.1145/3371105

Aiming Low Is Harder -- Induction for Lower Bounds in Probabilistic Program Verification

Authors: Marcel Hark, Benjamin Lucien Kaminski, Jürgen Giesl, Joost-Pieter Katoen

Abstract: We present a new inductive rule for verifying lower bounds on expected values of random variables after execution of probabilistic loops as well as on their expected runtimes. Our rule is simple in the sense that loop body semantics need to be applied only finitely often in order to verify that the candidates are indeed lower bounds. In particular, it is not necessary to find the limit of a sequen… ▽ More We present a new inductive rule for verifying lower bounds on expected values of random variables after execution of probabilistic loops as well as on their expected runtimes. Our rule is simple in the sense that loop body semantics need to be applied only finitely often in order to verify that the candidates are indeed lower bounds. In particular, it is not necessary to find the limit of a sequence as in many previous rules. △ Less

Submitted 11 August, 2021; v1 submitted 1 April, 2019; originally announced April 2019.

arXiv:1901.06540 [pdf, other]

A Pre-Expectation Calculus for Probabilistic Sensitivity

Authors: Alejandro Aguirre, Gilles Barthe, Justin Hsu, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja

Abstract: Sensitivity properties describe how changes to the input of a program affect the output, typically by upper bounding the distance between the outputs of two runs by a monotone function of the distance between the corresponding inputs. When programs are probabilistic, the distance between outputs is a distance between distributions. The Kantorovich lifting provides a general way of defining a dista… ▽ More Sensitivity properties describe how changes to the input of a program affect the output, typically by upper bounding the distance between the outputs of two runs by a monotone function of the distance between the corresponding inputs. When programs are probabilistic, the distance between outputs is a distance between distributions. The Kantorovich lifting provides a general way of defining a distance between distributions by lifting the distance of the underlying sample space; by choosing an appropriate distance on the base space, one can recover other usual probabilistic distances, such as the Total Variation distance. We develop a relational pre-expectation calculus to upper bound the Kantorovich distance between two executions of a probabilistic program. We illustrate our methods by proving algorithmic stability of a machine learning algorithm, convergence of a reinforcement learning algorithm, and fast mixing for card shuffling algorithms. We also consider some extensions: proving lower bounds on the Total Variation distance and convergence to the uniform distribution. Finally, we describe an asynchronous extension of our calculus to reason about pairs of program executions with different control flow. △ Less

Submitted 10 August, 2020; v1 submitted 19 January, 2019; originally announced January 2019.

Comments: Major revision

arXiv:1802.10467 [pdf, other]

doi 10.1145/3290347

Quantitative Separation Logic - A Logic for Reasoning about Probabilistic Programs

Authors: Kevin Batz, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja, Thomas Noll

Abstract: We present quantitative separation logic ($\mathsf{QSL}$). In contrast to classical separation logic, $\mathsf{QSL}$ employs quantities which evaluate to real numbers instead of predicates which evaluate to Boolean values. The connectives of classical separation logic, separating conjunction and separating implication, are lifted from predicates to quantities. This extension is conservative: Both… ▽ More We present quantitative separation logic ($\mathsf{QSL}$). In contrast to classical separation logic, $\mathsf{QSL}$ employs quantities which evaluate to real numbers instead of predicates which evaluate to Boolean values. The connectives of classical separation logic, separating conjunction and separating implication, are lifted from predicates to quantities. This extension is conservative: Both connectives are backward compatible to their classical analogs and obey the same laws, e.g. modus ponens, adjointness, etc. Furthermore, we develop a weakest precondition calculus for quantitative reasoning about probabilistic pointer programs in $\mathsf{QSL}$. This calculus is a conservative extension of both Reynolds' separation logic for heap-manipulating programs and Kozen's / McIver and Morgan's weakest preexpectations for probabilistic programs. Soundness is proven with respect to an operational semantics based on Markov decision processes. Our calculus preserves O'Hearn's frame rule, which enables local reasoning. We demonstrate that our calculus enables reasoning about quantities such as the probability of terminating with an empty heap, the probability of reaching a certain array permutation, or the expected length of a list. △ Less

Submitted 26 November, 2018; v1 submitted 28 February, 2018; originally announced February 2018.

Journal ref: Proc. ACM Program. Lang. 3, POPL, Article 34 (January 2019)

arXiv:1802.10433 [pdf, ps, other]

How long, O Bayesian network, will I sample thee? A program analysis perspective on expected sampling times

Authors: Kevin Batz, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja

Abstract: Bayesian networks (BNs) are probabilistic graphical models for describing complex joint probability distributions. The main problem for BNs is inference: Determine the probability of an event given observed evidence. Since exact inference is often infeasible for large BNs, popular approximate inference methods rely on sampling. We study the problem of determining the expected time to obtain a si… ▽ More Bayesian networks (BNs) are probabilistic graphical models for describing complex joint probability distributions. The main problem for BNs is inference: Determine the probability of an event given observed evidence. Since exact inference is often infeasible for large BNs, popular approximate inference methods rely on sampling. We study the problem of determining the expected time to obtain a single valid sample from a BN. To this end, we translate the BN together with observations into a probabilistic program. We provide proof rules that yield the exact expected runtime of this program in a fully automated fashion. We implemented our approach and successfully analyzed various real-world BNs taken from the Bayesian network repository. △ Less

Submitted 28 February, 2018; originally announced February 2018.

arXiv:1802.05127 [pdf, other]

Clustering Properties of Spatial Preferential Attachment Model

Authors: Lenar Iskhakov, Bogumil Kaminski, Maksim Mironov, Liudmila Ostroumova Prokhorenkova, Pawel Pralat

Abstract: In this paper, we study the clustering properties of the Spatial Preferential Attachment (SPA) model introduced by Aiello et al. in 2009. This model naturally combines geometry and preferential attachment using the notion of spheres of influence. It was previously shown in several research papers that graphs generated by the SPA model are similar to real-world networks in many aspects. For example… ▽ More In this paper, we study the clustering properties of the Spatial Preferential Attachment (SPA) model introduced by Aiello et al. in 2009. This model naturally combines geometry and preferential attachment using the notion of spheres of influence. It was previously shown in several research papers that graphs generated by the SPA model are similar to real-world networks in many aspects. For example, the vertex degree distribution was shown to follow a power law. In the current paper, we study the behaviour of C(d), which is the average local clustering coefficient for the vertices of degree d. This characteristic was not previously analyzed in the SPA model. However, it was empirically shown that in real-world networks C(d) usually decreases as d^{-a} for some a>0 and it was often observed that a=1. We prove that in the SPA model C(d) decreases as 1/d. Furthermore, we are also able to prove that not only the average but the individual local clustering coefficient of a vertex v of degree d behaves as 1/d if d is large enough. The obtained results are illustrated by numerous experiments with simulated graphs. △ Less

Submitted 13 February, 2018; originally announced February 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1711.06846

arXiv:1711.03588 [pdf, other]

A New Proof Rule for Almost-Sure Termination

Authors: Annabelle McIver, Carroll Morgan, Benjamin Lucien Kaminski, Joost-Pieter Katoen

Abstract: An important question for a probabilistic program is whether the probability mass of all its diverging runs is zero, that is that it terminates "almost surely". Proving that can be hard, and this paper presents a new method for doing so; it is expressed in a program logic, and so applies directly to source code. The programs may contain both probabilistic- and demonic choice, and the probabilistic… ▽ More An important question for a probabilistic program is whether the probability mass of all its diverging runs is zero, that is that it terminates "almost surely". Proving that can be hard, and this paper presents a new method for doing so; it is expressed in a program logic, and so applies directly to source code. The programs may contain both probabilistic- and demonic choice, and the probabilistic choices may depend on the current state. As do other researchers, we use variant functions (a.k.a. "super-martingales") that are real-valued and probabilistically might decrease on each loop iteration; but our key innovation is that the amount as well as the probability of the decrease are parametric. We prove the soundness of the new rule, indicate where its applicability goes beyond existing rules, and explain its connection to classical results on denumerable (non-demonic) Markov chains. △ Less

Submitted 25 December, 2017; v1 submitted 9 November, 2017; originally announced November 2017.

Comments: V1 to appear in PoPL18. This version collects some existing text into new example subsection 5.5 and adds a new example 5.6 and makes further remarks about uncountable branching. The new example 5.6 relates to work on lexicographic termination methods, also to appear in PoPL18 [Agrawal et al, 2018]

arXiv:1703.07682 [pdf, ps, other]

A Weakest Pre-Expectation Semantics for Mixed-Sign Expectations

Authors: Benjamin Lucien Kaminski, Joost-Pieter Katoen

Abstract: We present a weakest-precondition-style calculus for reasoning about the expected values (pre-expectations) of \emph{mixed-sign unbounded} random variables after execution of a probabilistic program. The semantics of a while-loop is well-defined as the limit of iteratively applying a functional to a zero-element just as in the traditional weakest pre-expectation calculus, even though a standard le… ▽ More We present a weakest-precondition-style calculus for reasoning about the expected values (pre-expectations) of \emph{mixed-sign unbounded} random variables after execution of a probabilistic program. The semantics of a while-loop is well-defined as the limit of iteratively applying a functional to a zero-element just as in the traditional weakest pre-expectation calculus, even though a standard least fixed point argument is not applicable in this context. A striking feature of our semantics is that it is always well-defined, even if the expected values do not exist. We show that the calculus is sound, allows for compositional reasoning, and present an invariant-based approach for reasoning about pre-expectations of loops. △ Less

Submitted 18 April, 2017; v1 submitted 22 March, 2017; originally announced March 2017.

ACM Class: D.2.4; D.3.1; F.1.2; F.3.1; F.3.2

arXiv:1606.08280 [pdf, ps, other]

Inferring Covariances for Probabilistic Programs

Authors: Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja

Abstract: We study weakest precondition reasoning about the (co)variance of outcomes and the variance of run-times of probabilistic programs with conditioning. For outcomes, we show that approximating (co)variances is computationally more difficult than approximating expected values. In particular, we prove that computing both lower and upper bounds for (co)variances is $Σ^{0}_{2}$-complete. As a consequenc… ▽ More We study weakest precondition reasoning about the (co)variance of outcomes and the variance of run-times of probabilistic programs with conditioning. For outcomes, we show that approximating (co)variances is computationally more difficult than approximating expected values. In particular, we prove that computing both lower and upper bounds for (co)variances is $Σ^{0}_{2}$-complete. As a consequence, neither lower nor upper bounds are computably enumerable. We therefore present invariant-based techniques that do enable enumeration of both upper and lower bounds, once appropriate invariants are found. Finally, we extend this approach to reasoning about run-time variances. △ Less

Submitted 27 June, 2016; originally announced June 2016.

arXiv:1605.04477 [pdf, other]

Bounded Model Checking for Probabilistic Programs

Authors: Nils Jansen, Christian Dehnert, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Lukas Westhofen

Abstract: In this paper we investigate the applicability of standard model checking approaches to verifying properties in probabilistic programming. As the operational model for a standard probabilistic program is a potentially infinite parametric Markov decision process, no direct adaption of existing techniques is possible. Therefore, we propose an on-the-fly approach where the operational model is succes… ▽ More In this paper we investigate the applicability of standard model checking approaches to verifying properties in probabilistic programming. As the operational model for a standard probabilistic program is a potentially infinite parametric Markov decision process, no direct adaption of existing techniques is possible. Therefore, we propose an on-the-fly approach where the operational model is successively created and verified via a step-wise execution of the program. This approach enables to take key features of many probabilistic programs into account: nondeterminism and conditioning. We discuss the restrictions and demonstrate the scalability on several benchmarks. △ Less

Submitted 26 July, 2016; v1 submitted 14 May, 2016; originally announced May 2016.

arXiv:1603.02922 [pdf, ps, other]

Reasoning about Recursive Probabilistic Programs

Authors: Federico Olmedo, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja

Abstract: This paper presents a wp-style calculus for obtaining expectations on the outcomes of (mutually) recursive probabilistic programs. We provide several proof rules to derive one-- and two--sided bounds for such expectations, and show the soundness of our wp-calculus with respect to a probabilistic pushdown automaton semantics. We also give a wp-style calculus for obtaining bounds on the expected run… ▽ More This paper presents a wp-style calculus for obtaining expectations on the outcomes of (mutually) recursive probabilistic programs. We provide several proof rules to derive one-- and two--sided bounds for such expectations, and show the soundness of our wp-calculus with respect to a probabilistic pushdown automaton semantics. We also give a wp-style calculus for obtaining bounds on the expected runtime of recursive programs that can be used to determine the (possibly infinite) time until termination of such programs. △ Less

Submitted 9 March, 2016; originally announced March 2016.

arXiv:1601.01001 [pdf, ps, other]

Weakest Precondition Reasoning for Expected Run-Times of Probabilistic Programs

Authors: Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja, Federico Olmedo

Abstract: This paper presents a wp-style calculus for obtaining bounds on the expected run-time of probabilistic programs. Its application includes determining the (possibly infinite) expected termination time of a probabilistic program and proving positive almost-sure termination - does a program terminate with probability one in finite expected time? We provide several proof rules for bounding the run-tim… ▽ More This paper presents a wp-style calculus for obtaining bounds on the expected run-time of probabilistic programs. Its application includes determining the (possibly infinite) expected termination time of a probabilistic program and proving positive almost-sure termination - does a program terminate with probability one in finite expected time? We provide several proof rules for bounding the run-time of loops, and prove the soundness of the approach with respect to a simple operational model. We show that our approach is a conservative extension of Nielson's approach for reasoning about the run-time of deterministic programs. We analyze the expected run-time of some example programs including a one-dimensional random walk and the coupon collector problem. △ Less

Submitted 28 November, 2017; v1 submitted 5 January, 2016; originally announced January 2016.

arXiv:1506.01930 [pdf, ps, other]

On the Hardness of Almost-Sure Termination

Authors: Benjamin Lucien Kaminski, Joost-Pieter Katoen

Abstract: This paper considers the computational hardness of computing expected outcomes and deciding (universal) (positive) almost-sure termination of probabilistic programs. It is shown that computing lower and upper bounds of expected outcomes is $Σ_1^0$- and $Σ_2^0$-complete, respectively. Deciding (universal) almost-sure termination as well as deciding whether the expected outcome of a program equals a… ▽ More This paper considers the computational hardness of computing expected outcomes and deciding (universal) (positive) almost-sure termination of probabilistic programs. It is shown that computing lower and upper bounds of expected outcomes is $Σ_1^0$- and $Σ_2^0$-complete, respectively. Deciding (universal) almost-sure termination as well as deciding whether the expected outcome of a program equals a given rational value is shown to be $Π^0_2$-complete. Finally, it is shown that deciding (universal) positive almost-sure termination is $Σ_2^0$-complete ($Π_3^0$-complete). △ Less

Submitted 5 June, 2015; originally announced June 2015.

Comments: MFCS 2015. arXiv admin note: text overlap with arXiv:1410.7225

ACM Class: F.1.2; F.1.3; F.4.1

arXiv:1504.00198 [pdf, ps, other]

Conditioning in Probabilistic Programming

Authors: Friedrich Gretz, Nils Jansen, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Annabelle McIver, Federico Olmedo

Abstract: We investigate the semantic intricacies of conditioning, a main feature in probabilistic programming. We provide a weakest (liberal) pre-condition (w(l)p) semantics for the elementary probabilistic programming language pGCL extended with conditioning. We prove that quantitative weakest (liberal) pre-conditions coincide with conditional (liberal) expected rewards in Markov chains and show that sema… ▽ More We investigate the semantic intricacies of conditioning, a main feature in probabilistic programming. We provide a weakest (liberal) pre-condition (w(l)p) semantics for the elementary probabilistic programming language pGCL extended with conditioning. We prove that quantitative weakest (liberal) pre-conditions coincide with conditional (liberal) expected rewards in Markov chains and show that semantically conditioning is a truly conservative extension. We present two program transformations which entirely eliminate conditioning from any program and prove their correctness using the w(l)p-semantics. Finally, we show how the w(l)p-semantics can be used to determine conditional probabilities in a parametric anonymity protocol and show that an inductive w(l)p-semantics for conditioning in non-deterministic probabilistic programs cannot exist. △ Less

Submitted 1 April, 2015; originally announced April 2015.

arXiv:1410.7225 [pdf, ps, other]

Analyzing Expected Outcomes and Almost-Sure Termination of Probabilistic Programs is Hard

Authors: Benjamin Lucien Kaminski, Joost-Pieter Katoen

Abstract: This paper considers the computational hardness of computing expected outcomes and deciding almost-sure termination of probabilistic programs. We show that deciding almost-sure termination and deciding whether the expected outcome of a program equals a given rational value is $Π^0_2$-complete. Computing lower and upper bounds on the expected outcome is shown to be recursively enumerable and… ▽ More This paper considers the computational hardness of computing expected outcomes and deciding almost-sure termination of probabilistic programs. We show that deciding almost-sure termination and deciding whether the expected outcome of a program equals a given rational value is $Π^0_2$-complete. Computing lower and upper bounds on the expected outcome is shown to be recursively enumerable and $Σ^0_2$-complete, respectively. △ Less

Submitted 27 October, 2014; originally announced October 2014.

MSC Class: 68Q87 ACM Class: F.1.2; F.1.3; F.4.1

Showing 1–45 of 45 results for author: Kamiński, B