Search | arXiv e-print repository

A Nominal Approach to Probabilistic Separation Logic

Authors: John M. Li, Jon Aytac, Philip Johnson-Freyd, Amal Ahmed, Steven Holtzen

Abstract: Currently, there is a gap between the tools used by probability theorists and those used in formal reasoning about probabilistic programs. On the one hand, a probability theorist decomposes probabilistic state along the simple and natural product of probability spaces. On the other hand, recently developed probabilistic separation logics decompose state via relatively unfamiliar measure-theoretic… ▽ More Currently, there is a gap between the tools used by probability theorists and those used in formal reasoning about probabilistic programs. On the one hand, a probability theorist decomposes probabilistic state along the simple and natural product of probability spaces. On the other hand, recently developed probabilistic separation logics decompose state via relatively unfamiliar measure-theoretic constructions for computing unions of sigma-algebras and probability measures. We bridge the gap between these two perspectives by showing that these two methods of decomposition are equivalent up to a suitable equivalence of categories. Our main result is a probabilistic analog of the classic equivalence between the category of nominal sets and the Schanuel topos. Through this equivalence, we validate design decisions in prior work on probabilistic separation logic and create new connections to nominal-set-like models of probability. △ Less

Submitted 28 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

arXiv:2402.16982 [pdf, other]

Synthesizing Tight Privacy and Accuracy Bounds via Weighted Model Counting

Authors: Lisa Oakley, Steven Holtzen, Alina Oprea

Abstract: Programmatically generating tight differential privacy (DP) bounds is a hard problem. Two core challenges are (1) finding expressive, compact, and efficient encodings of the distributions of DP algorithms, and (2) state space explosion stemming from the multiple quantifiers and relational properties of the DP definition. We address the first challenge by develo** a method for tight privacy and… ▽ More Programmatically generating tight differential privacy (DP) bounds is a hard problem. Two core challenges are (1) finding expressive, compact, and efficient encodings of the distributions of DP algorithms, and (2) state space explosion stemming from the multiple quantifiers and relational properties of the DP definition. We address the first challenge by develo** a method for tight privacy and accuracy bound synthesis using weighted model counting on binary decision diagrams, a state of the art technique from the artificial intelligence and automated reasoning communities for exactly computing probability distributions. We address the second challenge by develo** a framework for leveraging inherent symmetries in DP algorithms. Our solution benefits from ongoing research in probabilistic programming languages, allowing us to succinctly and expressively represent different DP algorithms with approachable language syntax that can be used by non-experts. We provide a detailed case study of our solution on the binary randomized response algorithm. We also evaluate an implementation of our solution using the Dice probabilistic programming language for the randomized response and truncated geometric above threshold algorithms. We compare to prior work on exact DP verification using Markov chain probabilistic model checking. Very few existing works consider mechanized analysis of accuracy guarantees for DP algorithms. We additionally provide a detailed analysis using our technique for finding tight accuracy bounds for DP algorithms. △ Less

Submitted 29 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2312.05706 [pdf, other]

doi 10.1145/3656412

Bit Blasting Probabilistic Programs

Authors: Poorva Garg, Steven Holtzen, Guy Van den Broeck, Todd Millstein

Abstract: Probabilistic programming languages (PPLs) are expressive means for creating and reasoning about probabilistic models. Unfortunately hybrid probabilistic programs, involving both continuous and discrete structures, are not well supported by today's PPLs. In this paper we develop a new approximate inference algorithm for hybrid probabilistic programs that first discretizes the continuous distributi… ▽ More Probabilistic programming languages (PPLs) are expressive means for creating and reasoning about probabilistic models. Unfortunately hybrid probabilistic programs, involving both continuous and discrete structures, are not well supported by today's PPLs. In this paper we develop a new approximate inference algorithm for hybrid probabilistic programs that first discretizes the continuous distributions and then performs discrete inference on the resulting program. The key novelty is a form of discretization that we call bit blasting, which uses a binary representation of numbers such that a domain of $2^b$ discretized points can be succinctly represented as a discrete probabilistic program over poly($b$) Boolean random variables. Surprisingly, we prove that many common continuous distributions can be bit blasted in a manner that incurs no loss of accuracy over an explicit discretization and supports efficient probabilistic inference. We have built a probabilistic programming system for hybrid programs called HyBit, which employs bit blasting followed by discrete probabilistic inference. We empirically demonstrate the benefits of our approach over existing sampling-based and symbolic inference approaches. △ Less

Submitted 29 April, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

ACM Class: G.3

arXiv:2307.13837 [pdf, other]

Scaling Integer Arithmetic in Probabilistic Programs

Authors: William X. Cao, Poorva Garg, Ryan Tjoa, Steven Holtzen, Todd Millstein, Guy Van den Broeck

Abstract: Distributions on integers are ubiquitous in probabilistic modeling but remain challenging for many of today's probabilistic programming languages (PPLs). The core challenge comes from discrete structure: many of today's PPL inference strategies rely on enumeration, sampling, or differentiation in order to scale, which fail for high-dimensional complex discrete distributions involving integers. Our… ▽ More Distributions on integers are ubiquitous in probabilistic modeling but remain challenging for many of today's probabilistic programming languages (PPLs). The core challenge comes from discrete structure: many of today's PPL inference strategies rely on enumeration, sampling, or differentiation in order to scale, which fail for high-dimensional complex discrete distributions involving integers. Our insight is that there is structure in arithmetic that these approaches are not using. We present a binary encoding strategy for discrete distributions that exploits the rich logical structure of integer operations like summation and comparison. We leverage this structured encoding with knowledge compilation to perform exact probabilistic inference, and show that this approach scales to much larger integer distributions with arithmetic. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: Accepted to UAI 2023

arXiv:2305.17145 [pdf, other]

Type Prediction With Program Decomposition and Fill-in-the-Type Training

Authors: Federico Cassano, Ming-Ho Yee, Noah Shinn, Arjun Guha, Steven Holtzen

Abstract: TypeScript and Python are two programming languages that support optional type annotations, which are useful but tedious to introduce and maintain. This has motivated automated type prediction: given an untyped program, produce a well-typed output program. Large language models (LLMs) are promising for type prediction, but there are challenges: fill-in-the-middle performs poorly, programs may not… ▽ More TypeScript and Python are two programming languages that support optional type annotations, which are useful but tedious to introduce and maintain. This has motivated automated type prediction: given an untyped program, produce a well-typed output program. Large language models (LLMs) are promising for type prediction, but there are challenges: fill-in-the-middle performs poorly, programs may not fit into the context window, generated types may not type check, and it is difficult to measure how well-typed the output program is. We address these challenges by building OpenTau, a search-based approach for type prediction that leverages large language models. We propose a new metric for type prediction quality, give a tree-based program decomposition that searches a space of generated types, and present fill-in-the-type fine-tuning for LLMs. We evaluate our work with a new dataset for TypeScript type prediction, and show that 47.4% of files type check (14.5% absolute improvement) with an overall rate of 3.3 type errors per file. All code, data, and models are available at: https://github.com/GammaTauAI/opentau. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2304.01339 [pdf, other]

Lilac: A Modal Separation Logic for Conditional Probability

Authors: John M. Li, Amal Ahmed, Steven Holtzen

Abstract: We present Lilac, a separation logic for reasoning about probabilistic programs where separating conjunction captures probabilistic independence. Inspired by an analogy with mutable state where sampling corresponds to dynamic allocation, we show how probability spaces over a fixed, ambient sample space appear to be the natural analogue of heap fragments, and present a new combining operation on th… ▽ More We present Lilac, a separation logic for reasoning about probabilistic programs where separating conjunction captures probabilistic independence. Inspired by an analogy with mutable state where sampling corresponds to dynamic allocation, we show how probability spaces over a fixed, ambient sample space appear to be the natural analogue of heap fragments, and present a new combining operation on them such that probability spaces behave like heaps and measurability of random variables behaves like ownership. This combining operation forms the basis for our model of separation, and produces a logic with many pleasant properties. In particular, Lilac has a frame rule identical to the ordinary one, and naturally accommodates advanced features like continuous random variables and reasoning about quantitative properties of programs. Then we propose a new modality based on disintegration theory for reasoning about conditional probability. We show how the resulting modal logic validates examples from prior work, and give a formal verification of an intricate weighted sampling algorithm whose correctness depends crucially on conditional independence structure. △ Less

Submitted 25 May, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

Comments: Accepted to PLDI 2023

ACM Class: F.3.1; F.3.2

arXiv:2110.10284 [pdf, other]

flip-hoisting: Exploiting Repeated Parameters in Discrete Probabilistic Programs

Authors: Ellie Y. Cheng, Todd Millstein, Guy Van den Broeck, Steven Holtzen

Abstract: Many of today's probabilistic programming languages (PPLs) have brittle inference performance: the performance of the underlying inference algorithm is very sensitive to the precise way in which the probabilistic program is written. A standard way of addressing this challenge in traditional programming languages is via program optimizations, which seek to unburden the programmer from writing low-l… ▽ More Many of today's probabilistic programming languages (PPLs) have brittle inference performance: the performance of the underlying inference algorithm is very sensitive to the precise way in which the probabilistic program is written. A standard way of addressing this challenge in traditional programming languages is via program optimizations, which seek to unburden the programmer from writing low-level performant code, freeing them to work at a higher-level of abstraction. The arsenal of applicable program optimizations for PPLs to choose from is scarce in comparison to traditional programs; few of today's PPLs offer significant forms of automated program optimization. In this work we develop a new family of program optimizations specific to discrete-valued knowledge compilation based PPLs. We identify a particular form of program structure unique to these PPLs that tangibly affects exact inference performance in these programs: redundant random variables -- variables with repeated parameters and inconsistent path conditions. We develop a new program analysis and associated optimization called flip-hoisting that identifies these redundancies and optimizes them into a single random variable. We show that flip-hoisting yields inference speedups of up to 60% on applications of probabilistic programs such as Bayesian networks and probabilistic verification. △ Less

Submitted 20 February, 2023; v1 submitted 19 October, 2021; originally announced October 2021.

arXiv:2105.12326 [pdf, other]

Model Checking Finite-Horizon Markov Chains with Probabilistic Inference

Authors: Steven Holtzen, Sebastian Junges, Marcell Vazquez-Chanlatte, Todd Millstein, Sanjit A. Seshia, Guy Van Den Broeck

Abstract: We revisit the symbolic verification of Markov chains with respect to finite horizon reachability properties. The prevalent approach iteratively computes step-bounded state reachability probabilities. By contrast, recent advances in probabilistic inference suggest symbolically representing all horizon-length paths through the Markov chain. We ask whether this perspective advances the state-of-the-… ▽ More We revisit the symbolic verification of Markov chains with respect to finite horizon reachability properties. The prevalent approach iteratively computes step-bounded state reachability probabilities. By contrast, recent advances in probabilistic inference suggest symbolically representing all horizon-length paths through the Markov chain. We ask whether this perspective advances the state-of-the-art in probabilistic model checking. First, we formally describe both approaches in order to highlight their key differences. Then, using these insights we develop Rubicon, a tool that transpiles Prism models to the probabilistic inference tool Dice. Finally, we demonstrate better scalability compared to probabilistic model checkers on selected benchmarks. All together, our results suggest that probabilistic inference is a valuable addition to the probabilistic model checking portfolio -- with Rubicon as a first step towards integrating both perspectives. △ Less

Submitted 30 June, 2021; v1 submitted 26 May, 2021; originally announced May 2021.

Comments: Technical Report. Accepted at CAV 2021

arXiv:2103.17226 [pdf, other]

Logical Abstractions for Noisy Variational Quantum Algorithm Simulation

Authors: Yipeng Huang, Steven Holtzen, Todd Millstein, Guy Van den Broeck, Margaret Martonosi

Abstract: Due to the unreliability and limited capacity of existing quantum computer prototypes, quantum circuit simulation continues to be a vital tool for validating next generation quantum computers and for studying variational quantum algorithms, which are among the leading candidates for useful quantum computation. Existing quantum circuit simulators do not address the common traits of variational algo… ▽ More Due to the unreliability and limited capacity of existing quantum computer prototypes, quantum circuit simulation continues to be a vital tool for validating next generation quantum computers and for studying variational quantum algorithms, which are among the leading candidates for useful quantum computation. Existing quantum circuit simulators do not address the common traits of variational algorithms, namely: 1) their ability to work with noisy qubits and operations, 2) their repeated execution of the same circuits but with different parameters, and 3) the fact that they sample from circuit final wavefunctions to drive a classical optimization routine. We present a quantum circuit simulation toolchain based on logical abstractions targeted for simulating variational algorithms. Our proposed toolchain encodes quantum amplitudes and noise probabilities in a probabilistic graphical model, and it compiles the circuits to logical formulas that support efficient repeated simulation of and sampling from quantum circuits for different parameters. Compared to state-of-the-art state vector and density matrix quantum circuit simulators, our simulation approach offers greater performance when sampling from noisy circuits with at least eight to 20 qubits and with around 12 operations on each qubit, making the approach ideal for simulating near-term variational quantum algorithms. And for simulating noise-free shallow quantum circuits with 32 qubits, our simulation approach offers a $66\times$ reduction in sampling cost versus quantum circuit simulation techniques based on tensor network contraction. △ Less

Submitted 31 March, 2021; originally announced March 2021.

Comments: ASPLOS '21, April 19-23, 2021, Virtual, USA

arXiv:2006.15233 [pdf, other]

On the Relationship Between Probabilistic Circuits and Determinantal Point Processes

Authors: Honghua Zhang, Steven Holtzen, Guy Van den Broeck

Abstract: Scaling probabilistic models to large realistic problems and datasets is a key challenge in machine learning. Central to this effort is the development of tractable probabilistic models (TPMs): models whose structure guarantees efficient probabilistic inference algorithms. The current landscape of TPMs is fragmented: there exist various kinds of TPMs with different strengths and weaknesses. Two of… ▽ More Scaling probabilistic models to large realistic problems and datasets is a key challenge in machine learning. Central to this effort is the development of tractable probabilistic models (TPMs): models whose structure guarantees efficient probabilistic inference algorithms. The current landscape of TPMs is fragmented: there exist various kinds of TPMs with different strengths and weaknesses. Two of the most prominent classes of TPMs are determinantal point processes (DPPs) and probabilistic circuits (PCs). This paper provides the first systematic study of their relationship. We propose a unified analysis and shared language for discussing DPPs and PCs. Then we establish theoretical barriers for the unification of these two families, and prove that there are cases where DPPs have no compact representation as a class of PCs. We close with a perspective on the central problem of unifying these tractable models. △ Less

Submitted 26 June, 2020; originally announced June 2020.

arXiv:2005.09089 [pdf, other]

doi 10.1145/3428208

Scaling Exact Inference for Discrete Probabilistic Programs

Authors: Steven Holtzen, Guy Van den Broeck, Todd Millstein

Abstract: Probabilistic programming languages (PPLs) are an expressive means of representing and reasoning about probabilistic models. The computational challenge of probabilistic inference remains the primary roadblock for applying PPLs in practice. Inference is fundamentally hard, so there is no one-size-fits all solution. In this work, we target scalable inference for an important class of probabilistic… ▽ More Probabilistic programming languages (PPLs) are an expressive means of representing and reasoning about probabilistic models. The computational challenge of probabilistic inference remains the primary roadblock for applying PPLs in practice. Inference is fundamentally hard, so there is no one-size-fits all solution. In this work, we target scalable inference for an important class of probabilistic programs: those whose probability distributions are discrete. Discrete distributions are common in many fields, including text analysis, network verification, artificial intelligence, and graph analysis, but they prove to be challenging for existing PPLs. We develop a domain-specific probabilistic programming language called Dice that features a new approach to exact discrete probabilistic program inference. Dice exploits program structure in order to factorize inference, enabling us to perform exact inference on probabilistic programs with hundreds of thousands of random variables. Our key technical contribution is a new reduction from discrete probabilistic programs to weighted model counting (WMC). This reduction separates the structure of the distribution from its parameters, enabling logical reasoning tools to exploit that structure for probabilistic inference. We (1) show how to compositionally reduce Dice inference to WMC, (2) prove this compilation correct with respect to a denotational semantics, (3) empirically demonstrate the performance benefits over prior approaches, and (4) analyze the types of structure that allow Dice to scale to large probabilistic programs. △ Less

Submitted 16 October, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

Journal ref: Proc. ACM Program. Lang. 4, OOPSLA (2020)

arXiv:1904.02079 [pdf, ps, other]

Symbolic Exact Inference for Discrete Probabilistic Programs

Authors: Steven Holtzen, Todd Millstein, Guy Van den Broeck

Abstract: The computational burden of probabilistic inference remains a hurdle for applying probabilistic programming languages to practical problems of interest. In this work, we provide a semantic and algorithmic foundation for efficient exact inference on discrete-valued finite-domain imperative probabilistic programs. We leverage and generalize efficient inference procedures for Bayesian networks, which… ▽ More The computational burden of probabilistic inference remains a hurdle for applying probabilistic programming languages to practical problems of interest. In this work, we provide a semantic and algorithmic foundation for efficient exact inference on discrete-valued finite-domain imperative probabilistic programs. We leverage and generalize efficient inference procedures for Bayesian networks, which exploit the structure of the network to decompose the inference task, thereby avoiding full path enumeration. To do this, we first compile probabilistic programs to a symbolic representation. Then we adapt techniques from the probabilistic logic programming and artificial intelligence communities in order to perform inference on the symbolic representation. We formalize our approach, prove it sound, and experimentally validate it against existing exact and approximate inference techniques. We show that our inference approach is competitive with inference procedures specialized for Bayesian networks, thereby expanding the class of probabilistic programs that can be practically analyzed. △ Less

Submitted 30 June, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

arXiv:1903.04672 [pdf, ps, other]

Generating and Sampling Orbits for Lifted Probabilistic Inference

Authors: Steven Holtzen, Todd Millstein, Guy Van den Broeck

Abstract: A key goal in the design of probabilistic inference algorithms is identifying and exploiting properties of the distribution that make inference tractable. Lifted inference algorithms identify symmetry as a property that enables efficient inference and seek to scale with the degree of symmetry of a probability model. A limitation of existing exact lifted inference techniques is that they do not app… ▽ More A key goal in the design of probabilistic inference algorithms is identifying and exploiting properties of the distribution that make inference tractable. Lifted inference algorithms identify symmetry as a property that enables efficient inference and seek to scale with the degree of symmetry of a probability model. A limitation of existing exact lifted inference techniques is that they do not apply to non-relational representations like factor graphs. In this work we provide the first example of an exact lifted inference algorithm for arbitrary discrete factor graphs. In addition we describe a lifted Markov-Chain Monte-Carlo algorithm that provably mixes rapidly in the degree of symmetry of the distribution. △ Less

Submitted 30 June, 2019; v1 submitted 11 March, 2019; originally announced March 2019.

arXiv:1705.09970 [pdf, ps, other]

Probabilistic Program Abstractions

Authors: Steven Holtzen, Todd Millstein, Guy Van den Broeck

Abstract: Abstraction is a fundamental tool for reasoning about complex systems. Program abstraction has been utilized to great effect for analyzing deterministic programs. At the heart of program abstraction is the relationship between a concrete program, which is difficult to analyze, and an abstract program, which is more tractable. Program abstractions, however, are typically not probabilistic. We gener… ▽ More Abstraction is a fundamental tool for reasoning about complex systems. Program abstraction has been utilized to great effect for analyzing deterministic programs. At the heart of program abstraction is the relationship between a concrete program, which is difficult to analyze, and an abstract program, which is more tractable. Program abstractions, however, are typically not probabilistic. We generalize non-deterministic program abstractions to probabilistic program abstractions by explicitly quantifying the non-deterministic choices. Our framework upgrades key definitions and properties of abstractions to the probabilistic context. We also discuss preliminary ideas for performing inference on probabilistic abstractions and general probabilistic programs. △ Less

Submitted 14 July, 2017; v1 submitted 28 May, 2017; originally announced May 2017.

Showing 1–14 of 14 results for author: Holtzen, S