Compositional imprecise probability
Abstract
Imprecise probability is concerned with uncertainty about which probability distributions to use. It has applications in robust statistics and elsewhere. Imprecise probability can be modelled in various ways, including by convex sets of probability distributions.
We look at programming language models for imprecise probability. Our desiderata are that we would like our model to support all kinds of composition, categorical and monoidal, in other words, guided by dataflow diagrams. Another equivalent perspective is that we would like a model of synthetic probability in the sense of Markov categories.
There is already a fairly popular monad-based approach to imprecise probability, but it is not fully compositional because the monad involved is not commutative, which means that we do not have a proper monoidal structure. In this work, we provide a new fully compositional account. The key idea is to name the non-deterministic choices. To manage the renamings and disjointness of names, we use graded monads. We show that the resulting compositional model is maximal. We relate with the earlier monad approach, showing that we obtain tighter bounds on the uncertainty.
1 Overview
This paper is about using programming language notations to give compositional descriptions of imprecise probability. For illustration, consider a situation with three outcomes: red (), green () and blue (). A precise probability distribution can be understood as a point in the triangle: the corner () represents 100% certainty of red; the points on the edge between and represent the probability distributions where is impossible (Figure 1a).
An imprecise probability on three outcomes is a convex region of the triangle (Figure 1b–1d). One interpretation is that if a probability distribution describes a bet, as in the foundations of Bayesianism, then a convex region is a collection of bets that would be reasonable given the current imprecise knowledge. Imprecise probability has a long history in terms of statistical robustness (e.g. [86, 33]), recently considered as part of infrabayesianism (e.g. [49, 3, 4]) and the foundations of safe AI [20].
There is already a body of work on semantics models of programming languages with imprecise probability [64, 65, 63, 70, 30, 34, 2, 28, 83, 29, 44, 43, 66, 84, 62]. Our contribution is to investigate new models that support our compositional desiderata (§1.1) by naming the non-deterministic choices (§1.2). We show that this compares favourably with earlier work (Thm. 1, §1.3) and that it is a maximal approach (Thm. 2).
1.1 Desiderata: a language for imprecise probability with compositional reasoning
A first language for describing imprecise probability is a first-order functional language without recursion. Rather, we have if/then/else statements, sequencing with immutable variable assignment (like [67, 56]), and the following two commands, which both return a boolean value:
-
•
bernoulli: a fair Bernoulli choice [6] which draws a ball from some urn containing two balls labelled ‘true’ and ‘false’, and replaces it;
-
•
knight: a Knightian choice [46] which draws a ball from a fresh urn containing balls labelled ‘true’ and ‘false’, where the number and ratio of balls are unknown and we have no priors on their distribution, except to know that the urn is not empty. (These ‘Knightian urns’ are fresh each time, i.e. they can each be used only once. For example, we are not interested in here in using multiple draws and frequencies to predict their contents.)
For example, consider the following two programs.
Example 1.1.
Consider the following program, which we argue describes the convex region in Figure 1b:
We draw two boolean values, x and z, respectively with Knightian uncertainty and from a fair Bernoulli trial. We then combine these two boolean values using the logic on the second and third lines of the program.
Example 1.2.
The following program describes the convex region in Figure 1c:
This time, we draw three boolean values, x, y and z, where y is with Knightian uncertainty too. We then combine these three boolean values using the logic on the second and third lines of the program, which is almost the same except for the use of y when z is false. Decoupling the Knightian uncertainties increases the region of imprecise probability because it allows new outcomes (such as an equal chance between and when x is true and y is false) that were impossible in Example 1.1.
Our desiderata for a compositional account of a first-order language are the following. We are inspired by recent compositional accounts of probability theory (e.g. [48, 23, 35]), statistics (e.g. [53, 38, 10]), and probabilistic programming (e.g. [17, 39, 76]), and the connections between them (e.g. [79]). These desiderata are formalized in Section 2.
Desideratum 1.
The language should be commutative:
x t ; y u ; v | (if is not free in and not free in ) | ||||
and affine: | |||||
(x t ; u) | (if is not free in ). |
This means that we can regard composition graphically, as a data flow graph. For instance, the notation
is not ambiguous.
Although this requirement does not hold generally in the presence of memory side effects and mutable variables, we do not have mutable variables here, and it is desirable in a declarative language. For example, we would like to notate the program from Example 1.1 as
Desideratum 2.
The standard equational reasoning about if/then/else should apply, and in particular the following hoisting equation should be allowed:
where is not free in .
One earlier approach to a semantic study of a language like this is provided by a convex powerset of distributions monad (e.g. [44, 64, 7, 8, 65, 30, 34]). This does not satisfy the desiderata for compositional reasoning. In fact, no semantic model satisfying the desiderata can allow Examples 1.1 and 1.2 to be distinguished, as we show in Figure 2. The key issue is with the third program in Figure 2:
This program draws a boolean value with Knightian uncertainty on each of the branches of the if statement. The paradox arises in whether each choice comes from different urns or the same urn. Perhaps there is one Knightian urn that is used in both branches. Or perhaps we draw a boolean value from a new Knightian urn on the second branch. Our proposed solution is to make this distinction explicit.
1.2 Resolution: named Knightian choices
To satisfy both desiderata, our proposal is to name each Knightian choice (Section 3). To do this, we rewrite Example 1.1 by annotating the only Knightian choice with the name :
We think of this program as giving rise to the convex set in Figure 1(b). This is then the same as the program where y describes the outcome of the same Knightian choice, i.e. one with the same name:
but it is different from the program where y describes a different Knightian choice, i.e. one with a different name:
which is intuitively what Example 1.2 describes, and gives rise to the convex set in Figure 1(c). Now when we try to follow the same equational derivation as in Figure 2, the third program becomes:
which conveys the same Knightian value is used on each of the branches of the if statement. This can no longer derive the program:
which explicitly uses a different Knightian choice on the else branch.
The idea of naming non-deterministic choices appears in work outside probability (e.g. proved transitions, [9]) and probabilistic choices are often named in practical probabilistic programming [82, §6.2] which has already been explored using graded monads [57]. More generally, intensionality in non-determinism is known to be a profitable perspective (e.g. [52, 12]).
1.2.1 Named Knightian choices via a reader monad
The set-up with named Knightian choices is consistent with Desiderata 1 and 2, which we can show by building a monad (e.g. [67]), namely the reader transformer (e.g. [58]) of the finite distributions monad (e.g. [37, Ch. 2]):
(1) |
where is the set of outcomes, is the set of names required, and is the finite distributions monad. Then the Knightian choices are interpreted by reading, and the Bernoulli choices use the distributions monad. This combined monad is well known to be commutative and affine. Thus both desiderata are satisfied.
We can recover a convex set of probability distributions from any by pushing forward all the possible probability distributions on . Formally, we can express this using the monadic bind (, Kleisli composition) of :
1.2.2 Grading to account for renamings
A remaining concern with named Knightian choices is that we ought to take seriously name-space issues in composition. When composing programs with named Knightian choices, we may wish to avoid name clashes. This is dependent on how we interpret the set in (1).
We resolve this issue by regarding the monad (1) as a graded monad [42, 68, 41]. This is closely related to the ‘para’ construction (e.g. [22, 31]). Some of the crucial steps are as follows:
-
•
Any injection induces a renaming of programs using names to programs using names , and indeed a natural map ;
-
•
We can regard monadic bind (Kleisli composition) in as operating on disjoint sets of names:
Thus a computation using names is sequenced with a computation using names to build a computation that involves names .
-
•
This monad is graded-monoidal too, via a map
which juxtaposes computations using names and to give a computation using .
-
•
The induced convex set of distributions is invariant under renaming: .
The crucial element is that the injective renaming induces a surjection between the spaces of Knightian choices. We abstract and generalize by allowing arbitrary surjections , further by allowing sets other than , and further still by allowing surjective stochastic maps rather than surjections.
As an aside, we note that probability monads too can often be regarded as sort-of reader monads (e.g. [80, 17, 81, 5, 74, 75]), since probability distributions can be described by random variables , for some base probability sample space . Thus we could regard our monad as a quotient of
where is a sample space for Bernoulli probability and is a sample space for Knightian uncertainty. In this work, we will quotient by the ‘law’ of random variables in , so that the usual equational reasoning about Bernoulli probability is valid.
1.3 Results about quotienting our theory
The names for the Knightian choices in our language appear to be additional intensional information, and the reader monad does not quotient this away. For this reason we show two results about the equational theory. First, we connect our approach to the convex powerset of distributions monad, showing that our bounds are tighter. Second, we show it is maximal — no further quotient is possible.
Theorem 1 (§4): Improved bounds on uncertainty
In our resulting language, every closed term describes a convex set of distributions. We thus establish a connection to the non-compositional approach that uses the Kleisli category of the convex powerset of distributions monad (e.g. [44, 64, 7, 8, 65, 30, 34]). We have an ‘op-lax’ functor
from our locally graded category (§3.2) to the Kleisli category of the convex powerset of distributions monad . Being an op-lax functor means that
i.e. composition in our category gives a tighter bound on the Knightian uncertainty than the composition using the Kleisli category of the convex powerset of distributions monad (see also the Example in §4.4).
Note that this could not be a proper functor because we would then have a quotient theory in violation of the maximality theorem (Theorem 2). But an op-lax functor is beneficial as an interpretation of giving a tighter bound.
Theorem 2 (§5): Maximality.
Our language also gives rise to a compositional theory of equality. We prove our equational theory is maximal in that we can add no further equations on open terms without equating different convex sets of distributions or compromising the compositional structure. (See Theorem 2 for a precise statement.)
Further detail: two quotients
In slightly more detail, we consider two candidates from the literature for quotients of a graded monad. They are general methods, but appear in our language as follows. Notice that an open term in our language contains both names and variables: names for Knightian choices, and free variables standing for ordinary values that might be substituted later. There are two ways to quotient the names away:
- Quotient A
- Quotient B
-
Following [31], we could equate two open terms if there is a renaming such that for every valuation of the free variables they are made equal.
For closed terms with no free variables, the two approaches are the same and give rise to a convex set of distributions (Prop. 4.3).
Quotient A does not directly give a compositional theory in our setting: the criteria of [24] are not satisfied. Nonetheless, the construction of [24] can be adjusted, giving the op-lax functor of Theorem 1 rather than a monad morphism.
Quotient B does not satisfy Desideratum 2 (commuting if-then-else). Informally, it would allow us to rename on the ‘then’ branch but not the ‘else’ branch, which is inconsistent with Desideratum 2. Nonetheless, it could be a useful approach in a metalanguage for combining models that do not need a general if-then-else construction. For this reason, Quotient B is not a counterexample to Theorem 2.
1.4 Notes about context
Other situations combining probability and non-determinism.
Our focus here is on imprecise probability. This is one form of non-determinism, but there is a broader body of general interest in non-determinism and its combination with probability. Of course, non-determinism can arise in many different semantic situations, beyond the motivation from imprecise probability. One motivation is in program abstraction and refinement: there, one describes a problem by writing a non-deterministic program that solves it; one then solves the problem by refining that non-deterministic program. The mathematical analysis is similar, and for instance illustrations essentially the same as Figure 1 appear in work on refinement of probabilistic programs [61, Fig. 6.4.2, Fig. 6.5.1]. But the motivation is different, and the desiderata (§1.1) may be less relevant in program refinement.
An arguably different kind of non-determinism appears where there are many appropriate results that we want to collect. In this sense, for instance, database queries are non-deterministic if they return multiple results, and Prolog is non-deterministic. When combined with probability, this leads more naturally to random sets or random bags, which are in contrast to the sets of distributions shown in Figure 1. Random bags do arise in probabilistic databases and in point process theory. We looked at these applications from a monad perspective here [19, 18], and the monads have long been discussed (e.g. [15, 44, 16, 50, 84, 36]).
Work on distributive laws.
There is a large literature on finding elegant explanations for combining existing monads for probability and non-determinism, exploring distributive laws (e.g. [70, 30, 13, 84, 8, 15, 50]). Although the reader monad transformer is a distributive law, our emphasis here is on the commutativity desiderata (§1.1) rather than distributivity issues. Even when there is a distributive law between commutative monads, the resulting composite monad need not be commutative. Indeed both the random bags monad (e.g. [16]) and the powerdomain of indexed valuations monad [84] arise from distributive laws between commutative monads, but neither composite monads are commutative.
Algebraic perspective.
Some of this work on distributive laws takes the perspective of algebraic theories. Our desiderata can be viewed from the point of view of algebraic theories, via algebraic effects (e.g. [72]), which we now briefly explore. We define two binary operations:
Regarding Desideratum 1, Commutativity means that each operation is a homomorphism for the other:
and Affinity says that and . Desideratum 2 is always assumed in algebraic effects. From these five axioms, the derivation of Figure 2 can be made algebraically:
If we regard as a Minkowski sum (see §4.1) then the left hand side appears to be Fig. 1(b) and the right hand side is Fig. 1(c). especially since the latter can be further rearranged just using the commutativity and affinity laws to
which enumerates the four extreme points of Fig. 1(c). From this algebraic perpective, our proposal is to label the Knightian branching, with a family of binary operators , , …, and then we still have that
yet it is consistent to assume
with different names on the right hand side, corresponding to the difference between Figures 1(b) and 1(c).
Our intuition is that branches left or right depending on the Knightian draw . Here is describing the unique draw from that urn. We note that a labelled binary choice already appears in the probabilistic setting in [77], where it has a different meaning: there, stands for an urn but not a specific draw, and denotes sampling from urn according to Polya’s scheme (replace with two copies of what was drawn).
We note that it is already known that Desiderata 1 is incompatible with symmetry laws ( and ), since we can use those to deduce
(2) |
In our graded semantics (), we do have exactly. We do not have , although there is a reindexing in the grading that corresponds to this symmetry. One view is that we have side-stepped this Eckmann-Hilton-like obstacle (2) by putting the symmetry into the grading structure.
2 Rudiments of graded Markov categories and graded monads
We now recall notions of Markov category (§2.1) and relative monads (§ 2.2), recasting them in the locally graded setting. We show how to pass between the concepts (Prop. 2.5, 2.6) and we relate them to notions from enriched category theory (§2.3). For brevity, in this section we focus on definitions and in the next section (§3) we focus on examples, rather than interleaving them.
Definition 2.1.
A monoidal category is a category equipped with a functor and an object together with associativity and unitor isomorphisms (e.g. ) that satisfy coherence conditions (e.g. [59]). It is strict if the isomorphisms are equalities. A symmetric monoidal category is moreover equipped with isomorphisms such that and satisfying coherence conditions.
A semicartesian category is a symmetric monoidal category in which the monoidal unit is a terminal object. That is, there is exactly one morphism for all .
A semicartesian category has projections , but it is weaker than a full categorical product because there need not be a natural diagonal .
2.1 Graded Markov categories
Definition 2.2.
[23] A Markov category is a semicartesian category such that every object is equipped with a commutative comonoid structure, that is, a map that is symmetric and associative and has the terminal map as a unit.
A morphism in a Markov category is deterministic if it commutes with the copy map ().
A distributive Markov category [1] is a Markov category that has coproducts such that the canonical maps are isomorphisms and the coproduct injections are deterministic.
A typical example of a distributive Markov category is the category of stochastic matrices (Def. 3.2).
An ordinary distributive category [14, 11] is a distributive Markov category in which every morphism is deterministic. A typical example is the category of finite sets.
Programming syntax.
We can use programming language syntax for composition in a distributive Markov category (see also e.g. [79]). The objects of the category are regarded as types, with bool regarded as the object . If then a morphism is regarded as a term . We notate
(t,u) | for | |||
x t ; u | for | |||
if t then u else v | for |
In this way, given interpretations of bernoulli and knight, we can interpret the programs from Examples 1.1 and 1.2.
Definition 2.3.
Let be a semicartesian category. A graded distributive Markov category is given by
-
•
a distributive Markov category , but moreover,
-
•
for each pair of objects and each grade a set of morphisms, agreeing with when ;
-
•
for each morphism , a function ;
-
•
a family of maps ;
-
•
a family
all such that composition is natural and associative up to the associativity of (see e.g. [87, §1.2], [55], [27, App. B]), monoidal product of morphisms is also natural and has associators and symmetric braidings up-to the structure of , and such that the induced function is a bijection (e.g. [87, p. 36]).
See Proposition 3.3 for our example of a graded Markov category. We note that since is semicartesian, there are canonical projections , and so we can regard any morphism at grade as a morphism at grade .
2.2 Monads and graded relative affine monads
It is well-established that notions of computation can be modelled by monads [67], including probability and non-determinism [40, 65, 64]. In this section, we introduce the flavours of monads relevant to this work and establish their correspondence to Markov categories. We present them in the Kleisli triple setting because this is more conducive to their use in programming languages.
Definition 2.4.
A strong monad over a cartesian closed category is for each an object and a morphism
and a family of morphisms
such that for generalised elements and of and , the following equations hold:
(3) | ||||
The left-strength is induced by the canonical action [47].
The monad is commutative if the following diagram commutes, where is the induced right strength from the symmetry of the cartesian product.
It is affine if the unique map is an isomorphism. A typical example is , and is the finite probability distribution monad [37].
Let be a category with finite products and consider a finite product preserving functor . A relative strong monad on is a functor , along with a -relative unit
natural in , and a family of -relative Kleisli extensions
natural in , and such that (3) holds for and generalised elements of and . A typical example is , and , with the evident embedding, and .
Let be a monoidal category. A graded strong monad is a functor , with unit
natural in , and a family of Kleisli extensions
natural in , and such that for and generalised elements of and , the following equations hold:
(4) | ||||
where , , and are the left unitor, right unitor, and associator of , respectively.
A graded relative strong monad on is a functor , along with a -relative unit
natural in , and a family of -relative Kleisli extensions
natural in , and such that (4) holds for and generalised elements of and . The graded left-strength is induced by the action . The monad is commutative if is symmetric monoidal and the following diagram commutes, where is the symmetric coherence isomorphism of .
It is affine if the unique map is an isomorphism.
Proposition 2.5.
Proposition 2.6.
Any graded Markov category induces a graded commutative affine relative monad, by
-
•
, the distributive category of -graded deterministic maps
-
•
The underlying category is , the finite product-preserving contravariant presheaves on ;
-
•
is the Yoneda embedding;
-
•
is given by
Proof.
2.3 Connection with enriched categories
To show that the concepts in this section are canonical, we connect with the theory of enriched categories. Let be a symmetric monoidal closed category with limits and colimits, that is moreover semicartesian. Recall (e.g. [45]) that a -enriched category is given by a collection of objects, and for each pair of objects of , a ‘hom-object’ in . Composition is a morphism in . We can also define -enriched monoidal categories, by requiring the functor to be a monoidal category. And -enriched coproducts require a natural isomorphism
between objects of . Any enriched category has an underlying ordinary category , which has the same objects but with a hom-set given by . This ordinary category inherits monoidal, limit and colimit structure from .
Definition 2.7 (e.g. [69]).
A -enriched Markov category is a -enriched symmetric monoidal category such that the monoidal unit is terminal: , and such that the underlying symmetric monoidal category is equipped with the structure of a Markov category (i.e. a comonoid structure in the underlying ordinary category).
A -enriched Markov category is moreover distributive if it has -coproducts that distribute over the monoidal structure, and such that the coproduct injections are deterministic, in the sense of the underlying ordinary category.
For any semicartesian category , recall the category of functors . This extends to a good ‘cosmos’ for enrichment since
-
•
embeds fully and faithfully (i.e. essentially as a full subcategory), via the Yoneda embedding .
-
•
has all limits and colimits, computed pointwise.
-
•
has a semicartesian structure such that the Yoneda embedding is a symmetric monoidal functor. This is given by Day convolution [21], and has the following universal property: for functors , to give a natural transformation is to give a natural family of functions .
-
•
is moreover monoidal closed.
Proposition 2.8.
To give an -enriched distributive Markov category is to give a -graded distributive Markov category.
Notes.
The correspondence between graded monads and enriched monads is also well understood (e.g. [60]).
2.4 Discussion
Recent work by Perrone [69] has considered enriched Markov categories to obtain an abstract view of the distance between probabilities, which allows for an abstract development of entropy. We note that the enriching category in [69] is indeed semicartesian. The full theory of enriched Markov categories perhaps deserves a more thorough analysis.
3 A graded Markov category for probability and non-determinism
We recall ordinary Markov categories for finite probability (§3.1). We then consider a generic construction for graded Markov categories, and instantiate it in our setting, obtaining the graded Markov category (for ‘Imprecise Probability’, §3.2). We conclude this section with a worked example (§3.3). In the subsequent sections (§4–5) we relate this graded Markov category with convex sets of distributions.
3.1 Ordinary Markov categories for probability
Definition 3.1.
A probability vector is a sequence of non-negative numbers that sum to . We write for the set of probability vectors of length .
The set is always a convex set: for any and , the convex combination is again a probability vector in . We write as shorthand for such a convex combination. Every probability vector in arises via convex combinations of the Dirac vectors , for , where , and so on.
A matrix of real numbers is called stochastic if each column is a probability vector. This is equivalent to requiring that as a linear map, it preserves the property of being a probability vector, i.e. if then . In fact, every function that preserves convex structure arises from a stochastic matrix in this way. We call such a function a convex map.
Definition 3.2 (e.g. [23], Ex. 2.5).
The category of finite stochastic matrices has as objects natural numbers, and morphisms stochastic matrices in . Composition is matrix multiplication, and the identity morphism is the unit diagonal matrix.
This can be made into a symmetric monoidal category, with monoidal structure on objects given by multiplication of numbers, and on morphisms by the Kronecker product of matrices. It is semicartesian where the terminal object is , and there is a unique stochastic matrix with one row. This is moreover a Markov category, with given by the three-dimensional diagonal (in ).
The Markov category moreover has a distributive structure. The coproduct of objects is given by addition, and the coproduct of morphisms by block matrices (concatenating the columns).
The monad view on is as follows. First, we consider the embedding . We then regard (Def. 3.1) as a -relative monad , which is affine and commutative. In fact there is an ordinary monad on , comprising finitely supported probability distributions (e.g. [37, Ch. 2]), and . The distributive Markov category can then be regarded as the Kleisli category for this relative monad.
3.2 The graded Markov category
We first introduce a general construction for graded Markov categories. This is a variation on the Para construction [22], also called monoidal indeterminates [31]. Via the connections between Markov categories and commutative affine relative monads (§2.2), it is equivalently a graded version of the reader monad transformer [58].
Proposition 3.3.
Let be a semicartesian subcategory of a distributive Markov category . There is a graded distributive Markov category with the same objects as and with the hom-sets
The reindexing is given by composition: if and
For the composition of and ,
The monoidal product of and is given by
Definition 3.4.
A stochastic matrix is surjective if for every there exists such that is the Dirac distribution at . In other words, the induced function is surjective. Let be the category of natural numbers and surjective stochastic matrices. This is a semicartesian monoidal subcategory of .
Definition 3.5.
The graded distributive Markov category is the -graded version of , according to Proposition 3.3.
This graded distributive Markov category supports both finite probability and finite non-determinism.
-
•
For binary probabilistic choice with bias , we consider the morphism in given by the column vector .
-
•
For a binary non-deterministic (i.e. Knightian) choice, we consider the morphism in given by the unit diagonal matrix.
We can extend the above notions of probabilistic and non-deterministic choice between elements of a finite set by considering probability vectors (in ) and unit diagonal matrices (in ) respectively.
Remark:
We could have considered a subcategory of as the grading. One example is finite sets and (deterministic) surjective functions. Another example is the subcategory where the objects are of the form and where we only consider the surjections induced by injections (connecting with nominal sets [71] and the notation for grading considered in Section 1.2; in this case the semicartesian monoidal structure amounts to disjoint union, .). We leave for future work the question of to what extent the following results depend on this particular choice of grading.
3.3 Example calculation with
Example 3.6.
Consider the scenarios from Examples 1.1 and 1.2 where we draw boolean values with Knightian uncertainty and from fair Bernoulli trials and combine them using different program logic. We denote outcomes as probability vectors of length three, representing the chance of , , and , respectively. Example 1.1 is the morphism
where denotes the conditional on the fair Bernoulli trial, and and are the conditionals on the Knightian choices in each branch, respectively.
On the other hand, Example 1.2 is the morphism
where , , and denote the same conditional statements, but now we lift the grading of and to via the projections to account for the decoupling of their Knightian uncertainties.
4 Convex sets of distributions and an op-lax functor
In this section we recall the properties of convex powersets of distributions (see also [64, 65, 44, 7, 34] and elsewhere). We then connect our category (§3) with convex powersets, via the Kan extension method of [24] (§4.2) and then show that this yields an op-lax functor (§4.3), which means that our category is more conservative with uncertainty bounds. We begin by recalling some basic properties of convex sets of distributions.
Definition 4.1.
A subset of is convex if it is closed under convex combinations: if then for any we have .
A convex subset of is finitely generated if there is a finite sequence such that every element of can be achieved by convex combinations of the ’s. In other words, , with the ’s regarded as column vectors and regarded as a row vector.
Lemma 4.2.
For any convex map between the sets of probability vectors, the image of is a convex subset of .
Moreover, such convex subsets of are finitely generated, and every finitely generated convex set arises in this way.
Proof.
Suppose , and let . So we must have such that and . Then
the last step because is a convex map, and so we see that .
The set is generated by for . Conversely if a set is generated by , regarded as column vectors, then the matrix determines a map such that . ∎
4.1 Convex powersets of distributions
We write for the set of convex subsets of , and for the finitely generated convex subsets of . Both support convex combinations: if and then
There is moreover an ordering given by subset, and the join is a convex closure of the union:
Proposition 4.3.
There is a family of functions , that takes to its image , and the family is natural in and .
Proof.
First, the fact that the image of is convex is Lemma 4.2. For naturality in , suppose . Then naturality in amounts to the fact that
which is true since is surjective. For naturality in , suppose . Then naturality amounts to the fact that
which is true because taking an image of after postcomposition with is the same as a pointwise application of to the image of . ∎
4.2 Connection to Kan extensions
Fritz and Perrone [24, 25] propose a method to extract a monad from a graded monad, by taking the left Kan extension. They provide criteria for when this process works and induces a monad morphism. This process cannot work entirely for our situation, for the following reason. First we note that we can interpret bernoulli and knight as the following elements of :
-
•
bernoulli is ;
-
•
knight is .
This construction extends to a monad on [34]. Therefore, we can follow through the derivation of Figure 2 to see that the graded monad cannot be commutative (apparently contradicting [34, Lemma 5.2]) since the convex sets in Fig. 1(b)–(c) are different. (For another argument, note that contains two binary idempotent symmetric operations, and , and see e.g. [85].)
By contrast, (§3.2) does satisfy our desiderata (§1.1). So there cannot be a monad morphism between them. Nonetheless, the Kan extension of our graded Markov category , regarded as a graded monad via Proposition 2.6, does give the finitely-generated convex powerset monad as a functor, just not as a monad.
Proposition 4.4.
The family exhibits as the Kan extension of
along the unique functor .
Proof.
Kan extensions in can be computed pointwise, and for any the Kan extension of along is simply the colimit of the functor. Thus it suffices to show that the canonical function
(induced by ) is a bijection. This function is given by . To see that it is surjective we recall that every finitely generated convex set is the image of some convex function (Lemma 4.2). To see that it is injective we suppose that , for and . We need to show that in the colimit. It suffices to find with and surjections and :
The finitely generated convex set must have a unique convex hull, and we let be the number of extremal points of the convex hull, which are uniquely determined. We construct by noting that must be a convex combination from the extremal points, and so we let be the probability vector corresponding to that combination. We construct from similarly. To see that is surjective we note that since is surjective onto its image we must have points in that map onto the extremal points, and hence onto all the points of via . Similarly, is surjective. ∎
4.3 An op-lax functor
Definition 4.5.
The construction extends to a relative monad. The unit morphism picks out the singleton set containing the Dirac vector, . The Kleisli extension takes a function to a function given by
where takes the extreme points of the finitely generated convex subset.
From this structure, we build a Kleisli category as usual.
-
•
The objects of are natural numbers.
-
•
The morphisms are functions .
-
•
The identity morphism is the unit . Composition of and is given by .
In fact, this category is order-enriched. That is to say, the hom-sets have a natural partial order structure given by if for all , . Composition is thus monotone.
We now extend the quotient of Proposition 4.3 to an identity-on-objects op-lax functor .
Theorem 1.
Consider the assignment of a morphism to given by . This defines an op-lax functor
Proof notes..
It is straightforward that . It remains to show that . Since we will show that preserves finite coproducts, it is sufficient to first suppose that the domain of is . So consider and . So . We must show that for all , the probability vector is in
To show this, we note that the grade of is , but we can also consider an alternative kind of composite with a bigger grade . This is given by
where the middle arrow is the evident function between sets regarded as a stochastic matrix. Contrast with
The function that copies is an injection and exhibits
Moreover, we have that
The intuitive point is that in , for each possible intermediate we are allowed to use different choices of , but in , each possible intermediate will use the same choices of .
To see that preserves coproducts we note that on objects it is immediate, and expanding the definitions shows that the coproduct injections and copairings are exactly preserved by . ∎
(Here, we are regarding with discrete order enrichment but non-trivial local grading, and with non-trivial order enrichment but trivial local grading. It is possible that there interesting ways to unify the two different enrichments.)
4.4 Discussion and example
Example 4.6.
We once again revisit the scenarios from Examples 1.1 and 1.2 where boolean values are drawn with Knightian uncertainty or from fair Bernoulli trials and combined using different program logic. Consider the morphism denoting a fair Bernoulli trial ( from Example 3.6),
and a morphism that employs Knightian uncertainty on each of its inputs ( from Example 3.6),
Then maps the singleton set to
maps the two-element set to
and, following Example 3.6, maps the singleton set to
This is the convex subset in Figure 1(b) if we consider the probability vectors as giving the corresponding chances of outcomes , , and .
On the other hand, by composing with after map** them into , we lose the ability to distinguish which outcomes were related via the same Knightian choices. So the morphism maps the singleton element to
which is the convex subset given in Figure 1(c). Thus, .
Therefore, by accounting for corresponding choices of Knightian uncertainty within morphism compositions, our category obtains tighter bounds on the imprecise probabilities.
5 Maximality as a commutative theory
In Proposition 4.3 we gave a family of maps that convert our compositional imprecise probability into convex sets of probability distributions. These maps are not injective, and in this sense the model of is intensional. This raises a question of whether we could have made a less intensional model than while still maintaining Desiderata 1 and 2 and maintaining the connection to convex sets of distributions. In Theorem 2 we answer this question negatively, in the following sense: We cannot quotient the hom-sets of without either losing the connection with convex sets (and hence statistics) or losing the monoidal or distributive structure (and hence the compositionality desiderata of §1.1). In this sense, is maximal.
Definition 5.1.
Let be a semicartesian category. Let and be -graded distributive Markov categories. A graded distributive Markov functor is given by a map** from the objects of to the objects of and a family of map**s from , strictly preserving the composition, monoidal and coproduct structure, and the copy maps.
Note
In view of §2.1, we note that a graded distributive Markov functor is the same thing as the existing notion of strict distributive monoidal functor between distributive monoidal enriched categories (e.g. [45]), together with the requirement that the copy maps are preserved, which is in common with the ordinary Markov category literature [23]. We could also formulate this in terms of monad morphisms, following Section 2.2.
Theorem 2.
Let be -graded distributive Markov category with a graded distributive Markov functor and a natural family of functions
such that
(Proposition 4.3) factors through . Then is faithful: if in then also in .
Proof.
Since preserves finite coproducts, it is sufficient to suppose the domain of and is . That is, let and suppose factors as
Let be the evident tuple of Diracs. Define and as the lifting of the injections via postcomposition with the unit of . Since is a graded distributive Markov functor and ,
where for , we define . Applying gives
(5) |
Now, for all , are independent because they each use a different dimension. They are all extremal vertices on the convex hull . Moreover, they must be the same vertices as for respective because the convex hulls are the same (5). Therefore,
We can recover and as for any ,
So . ∎
6 Summary and outlook
We have shown that by taking a graded perspective and naming Knightian choices we can obtain a compositional account of Bernoulli and Knightian uncertainty together. The account gives a refined bound on the uncertainty (Theorem 1) and is maximal among the compositional accounts (Theorem 2).
There are a number of future directions. A first question is how to accommodate iteration. The convex sets considered in this article are all finitely generated, but if we allow iterative programs that have an unbounded number of Knightian choices, this leads to a more general class of convex sets.
The concerns about iteration hold even if we restrict to finite outcome spaces, and thus far we have focused on this for simplicity. Much work on programming semantics for imprecise probability has focused beyond finite outcome spaces, and it will be interesting to revisit this from our perspective: this includes domain theoretic structures (e.g. [44, 84, 43, 29]) and metric structures (e.g. [65, 64]).
It would be interesting to compare to another recent compositional framework combining unknowns with probability by Stein and Samuelson, currently focusing on Gaussians [78].
Our approach is based on random elements, and so is the quasi-Borel-space probability monad (e.g. [32, 81]), so this might be a good approach to accommodating function spaces.
On the more practical side, an open question is how to perform statistical inference in a probabilistic programming language with imprecise probability.
Going beyond statistics, it is possible that there are other scenarios where this approach is useful: making a theory compositional by using a graded theory (for a first purely speculative example, the issues with amb outlined in [54]).
Acknowledgements
It has been helpful to talk to many people about this work. The 2023/2024 Aria workshops emphasised aspects of imprecise probability as discussion topics (and so it was helpful to discuss with davidad, Dylan Braithwaite, Elena Di Lavore, Alex Lew, Owen Lynch, Sean Moss, Zenna Tavares, and others). It has been helpful to discuss with colleagues in Oxford (particularly Paolo Perrone, who is running an adjoint school project on uncertainty in Markov categories, and some time ago Kwok Ho Cheung regarding his thesis work [13]), and at the 2024 Bellairs workshop. We also received helpful feedback from Hugo Paquet, Dario Stein, and MFPS 2024 reviewers, where this material was submitted as an ‘early announcement’.
Research supported by Clarendon Scholarship, ERC Consolidator Grant BLAST, and AFOSR Project FA9550-21-1-0038.
References
- [1] N. Ackerman, C. Freer, Y. Kaddar, J. Karwowski, S. Moss, D. Roy, S. Staton, and H. Yang. Probabilistic programming interfaces for random graphs: Markov categories, graphons, and nominal sets. In Proc. POPL 2024, 2024.
- [2] A. Adjé, O. Bouissou, J. Goubault-Larrecq, E. Goubault, and S. Putot. Static analysis of programs with imprecise probabilistic inputs. In Proc. VSTTE 2013, pages 22–47, 2013.
- [3] A. Appel and V. Kosoy. Basic inframeasure theory. LessWrong, August 2020. https://www.lesswrong.com/posts/YAa4qcMyoucRS2Ykr/basic-inframeasure-theory.
- [4] A. Appel and V. Kosoy. Inframeasures and domain theory. LessWrong, March 2021. https://www.lesswrong.com/posts/vrbidMiczaoHBhZGp/inframeasures-and-domain-theory.
- [5] T. Barker. A monad for randomized algorithms. In Proc. MFPS 2016, 2016.
- [6] J. Bernoulli. Ars conjectandi, opus posthumum. Thurneysen, 1713.
- [7] F. Bonchi, A. Sokolova, and V. Vignudelli. The theory of traces for systems with nondeterminism and probability. In Proc. LICS 2019, 2019.
- [8] F. Bonchi, A. Sokolova, and V. Vignudelli. Presenting convex sets of probability distributions by convex semilattices and unique bases. In Proc. CALCO 2021, CALCO 2021.
- [9] G. Boudol and I. Castellani. Flow models of distributed computations: three equivalent semantics for CCS. Inform. Comput., 114:247–314, 1994.
- [10] D. Braithwaite, J. Hedges, and T. St Clere Smithe. The compositional structure of Bayesian inference. In Proc. MFCS 2023, 2023.
- [11] A. Carboni, S. Lack, and R. F. C. Walters. Introduction to extensive and distributive categories. J. Pure Appl. Algebra, 84(2):145–158, 1993.
- [12] G. L. Cattani, M. P. Fiore, and G. Winskel. A theory of recursive domains with applications to concurrency. In Proc. LICS 1998, 1998.
- [13] K.-H. Cheung. Distributive Interaction of Algebraic Effects. PhD thesis, University of Oxford Department of Computer Science, 2017.
- [14] J. R. B. Cockett. Introduction to distributive categories. Math. Struct. Comput. Sci., 3(3):277–307, 1993.
- [15] F. Dahlqvist, L. Parlant, and A. Silva. Layer by layer: composing monads. In Proc. ICTAC 2018, 2018.
- [16] S. Dash. A Monadic Theory of Point Processes. PhD thesis, University of Oxford, 2024.
- [17] S. Dash, Y. Kaddar, H. Paquet, and S. Staton. Affine monads and lazy structures for Bayesian programming. In Proc. POPL 2023, 2023.
- [18] S. Dash and S. Staton. A monad for probabilistic point processes. In Proc. ACT 2020, 2020.
- [19] S. Dash and S. Staton. Monads for measurable queries in probabilistic databases. In Proc. MFPS 2021, 2021.
- [20] D. ‘davidad’ Dalrymple. Safeguarded AI: constructing safety by design. ARIA Programme Thesis, January 2024. https://www.aria.org.uk/wp-content/uploads/2024/01/ARIA-Safeguarded-AI-Programme-Thesis-V1.pdf.
- [21] B. Day. On closed categories of functors. In Proc. Midwest Category Seminar IV, volume 137 of Lect. Notes Math. Springer, 1969.
- [22] B. Fong, D. Spivak, and R. Tuyéras. Backprop as functor: A compositional perspective on supervised learning. In Proceedings of the 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’19. IEEE Press, 2021.
- [23] T. Fritz. A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics. Adv. Math., 370, 2020.
- [24] T. Fritz and P. Perrone. A criterion for Kan extensions of lax monoidal functors. arxiv:1809.10481, 2018.
- [25] T. Fritz and P. Perrone. A probability monad as the colimit of spaces of finite samples. Theory and Applications of Categories, 34, 2019.
- [26] M. Gaboardi, S. Katsumata, D. Orchard, and T. Sato. Graded hoare logic and its categorical semantics. In Proc. ESOP 2021, 2021.
- [27] B. Gavranovic. Fundamental Components of Deep Learning: A category-theoretic approach. PhD thesis, Strathclyde University, 2024. arXiv:2403.13001.
- [28] J. Goubault-Larrecq. Continuous previsions. In Proc. CSL 2007, 2007.
- [29] J. Goubault-Larrecq. Prevision domains and convex powercones. In Proc. FOSSACS 2008, 2008.
- [30] A. Goy and D. Petrisan. Combining probabilistic and non-deterministic choice via weak distributive laws. In LICS 2020, 2020.
- [31] C. Hermida and R. Tennent. Monoidal indeterminates and categories of possible worlds. Theoretical Computer Science, 430, 2012.
- [32] C. Heunen, O. Kammar, S. Staton, and H. Yang. A convenient category for higher-order probability theory. In Proc. LICS 2017, 2017.
- [33] P. J. Huber. Robust statistics. Wiley, 1981.
- [34] B. Jacobs. Coalgebraic trace semantics for combined possibilitistic and probabilistic systems. In Proc. CMCS 2008, 2008.
- [35] B. Jacobs. From probability monads to commutative effectuses. J. Log. Algebr. Methods Program., 94:200–237, 2018.
- [36] B. Jacobs. From multisets over distributions to distributions over multisets. In Proc. LICS 2021, 2021.
- [37] B. Jacobs. Structured probabilistic reasoning. Available from the author’s homepage, July 2023. Draft book.
- [38] B. Jacobs, A. Kissinger, and F. Zanasi. Causal inference by string diagram surgery. In Proc. FOSSACS 2019, 2019.
- [39] X. Jia, B. Lindenhovius, M. W. Mislove, and V. Zamdzhiev. Commutative monads for probabilistic programming languages. In Proc. LICS 2021, pages 1–14, 2021.
- [40] C. Jones and G. D. Plotkin. A probabilistic powerdomain of evaluations. In Fourth Annual Symposium on Logic in Computer Science, pages 186–195, June 1989.
- [41] O. Kammar and G. D. Plotkin. Algebraic foundations for effect-dependent optimisations. In Proc. POPL 2012, pages 349–360, 2012.
- [42] S. Katsumata. Parametric effect monads and semantics of effect systems. In Proc. POPL 2014, 2014.
- [43] K. Keimel. Topological cones: Foundations for a domain theoretical semantics combining probability and nondeterminism. In MFPS 2005, 2005.
- [44] K. Keimel and G. D. Plotkin. Mixed powerdomains for probability and nondeterminism. Log. Methods Comput. Sci., 13, 2017.
- [45] G. M. Kelly. Basic Concepts of Enriched Category Theory. CUP, 1982.
- [46] F. H. Knight. Risk, uncertainty and profit. Houghton Mifflin, 1921.
- [47] A. Kock. Strong functors and monoidal monads. Arch. Math, 23(1):113–120, Dec. 1972.
- [48] A. Kock. Commutative monads as a theory of distributions. Theory Appl. Categ., 26(4):97–131, 2012.
- [49] V. Kosoy and A. Appel. Infra-Bayesian physicalism: A formal theory of naturalized induction. AI Alignment Forum, November 2021. Available at https://www.alignmentforum.org/posts/gHgs2e2J5azvGFatb/infra-bayesian-physicalism-a-formal-theory-of-naturalized.
- [50] D. Kozen and A. Silva. Multisets and distributions. arxiv:2301.10812, 2023.
- [51] S. Kura. Graded algebraic theories. In Proc. FOSSACS 2020, 2020.
- [52] J. Laird, G. Manzonetto, G. McCusker, and M. Pagani. Weighted relational models of typed lambda-calculi. In Proc. LICS 2013, 2013.
- [53] E. D. Lavore and M. Román. Evidential decision theory via partial Markov categories. In Proc. LICS 2023, 2023.
- [54] P. B. Levy. Amb breaks well-pointedness, ground amb doesn’t. In Proc. MFPS 2007, 2007.
- [55] P. B. Levy. Locally graded categories. Slides available from the author’s webpage, 2019.
- [56] P. B. Levy, J. Power, and H. Thielecke. Modelling environments in call-by-value programming languages. Inform. Comput., 185, 2003.
- [57] A. K. Lew, M. F. Cusumano-Towner, B. Sherman, M. Carbin, and V. K. Maninghka. Trace types and denotational semantics for sound programmable inference in probabilistic languages. In Proc. POPL 2020, 2020.
- [58] S. Liang, P. Hudak, and M. Jones. Monad transformers and modular interpreters. In Proc. POPL 1995, 1995.
- [59] S. Mac Lane. Categories for the Working Mathematician. Springer, 1998.
- [60] D. McDermott and T. Uustalu. Flexibly graded monads and graded algebras. In Proc. MPC 2022, 2022.
- [61] A. McIver and C. Morgan. Abstraction, Refinement and Proof for Probabilistic Systems. Springer, 2005.
- [62] A. K. McIver and C. Morgan. Partial correctness for probabilistic demonic programs. Theoret. Comput. Sci., 266:513–541, 2001.
- [63] M. Mio. Upper-expectation bisimilarity and Łukasiewicz mu-calculus. In Proc. FoSSaCS 2014, 2014.
- [64] M. Mio, R. Sarkis, and V. Vignudelli. Combining nondeterminism, probability, and termination: Equational and metric reasoning. In Proc. LICS 2021, 2021.
- [65] M. Mio and V. Vignudelli. Monads and quantitative equational theories for nondeterminism and probability. In Proc. CONCUR 2020, 2020.
- [66] M. W. Mislove, J. Ouaknine, and J. Worrell. Axioms for probability and nondeterminism. In Proc. EXPRESS 2003, 2003.
- [67] E. Moggi. Notions of computation and monads. Information and Computation, 1991.
- [68] D. Orchard, P. Wadler, and H. E. III. Unifying graded and parameterised monads. In Proc. MSFP 2020, 2020.
- [69] P. Perrone. Markov categories and entropy. IEEE Transactions on Information Theory, 2023.
- [70] D. Petrisan and R. Sarkis. Semialgebras and weak distributive laws. In Proc. MFPS 2021, 2021.
- [71] A. M. Pitts. Nominal Sets: names and symmetry in computer science. CUP, 2013.
- [72] G. D. Plotkin and J. Power. Notions of computation determine monads. In Proc. FOSSACS 2002, 2002.
- [73] J. Power. Generic models for computational effects. Theor. Comput. Sci., 364(2):254–269, 2006.
- [74] D. Shiebler. Categorical Stochastic Processes and Likelihood. Compositionality, 3, Apr. 2021.
- [75] A. Simpson. Probability sheaves and the Giry monad. In Proc. CALCO 2017, 2017.
- [76] S. Staton. Commutative semantics for probabilistic programming. In Proc. ESOP 2017, pages 855–879, 2017.
- [77] S. Staton, D. Stein, H. Yang, N. L. Ackerman, C. E. Freer, and D. M. Roy. The Beta-Bernoulli process and algebraic effects. In Proc. ICALP 2018, 2018.
- [78] D. Stein and R. Samuelson. A categorical treatment of open linear systems, March 2024. arxiv:2403.03934.
- [79] D. Stein and S. Staton. Compositional semantics for probabilistic programs with exact conditioning. In Proc. LICS 2021, 2021.
- [80] Z. Tavares, J. Koppel, X. Zhang, R. Das, and A. Solar-Lezama. A language for counterfactual generative models. In Proc. ICML 2021, pages 10173–10182, 2021.
- [81] M. Vákár, O. Kammar, and S. Staton. A domain theory for statistical probabilistic programming. Proc. ACM Program. Lang., 3(POPL):36:1–36:29, 2019.
- [82] J.-W. van de Meent, B. Paige, H. Yang, and F. Wood. An introduction to probabilistic programming. arXiv e-print 1809.10756, 2018.
- [83] G. van Heerdt, J. Hsu, J. Ouaknine, and A. Silva. Convex language semantics for nondeterministic probabilistic automata. In Proc. ICTAC 2018, 2018.
- [84] D. Varacca and G. Winskel. Distributing probability over non-determinism. Math. Structures Comput. Sci., pages 87–113, 2006.
- [85] Various authors. Eckmann-Hilton argument. nLab, April 2024. https://ncatlab.org/nlab/show/Eckmann-Hilton+argument#variation.
- [86] P. Walley. Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, 1991.
- [87] R. Wood. Indical methods for relative categories. PhD thesis, Dalhousie University, 1976.