Compositional imprecise probability

Jack Liell-Cock and Sam Staton
University of Oxford
(Draft, May 2024)
Abstract

Imprecise probability is concerned with uncertainty about which probability distributions to use. It has applications in robust statistics and elsewhere. Imprecise probability can be modelled in various ways, including by convex sets of probability distributions.

We look at programming language models for imprecise probability. Our desiderata are that we would like our model to support all kinds of composition, categorical and monoidal, in other words, guided by dataflow diagrams. Another equivalent perspective is that we would like a model of synthetic probability in the sense of Markov categories.

There is already a fairly popular monad-based approach to imprecise probability, but it is not fully compositional because the monad involved is not commutative, which means that we do not have a proper monoidal structure. In this work, we provide a new fully compositional account. The key idea is to name the non-deterministic choices. To manage the renamings and disjointness of names, we use graded monads. We show that the resulting compositional model is maximal. We relate with the earlier monad approach, showing that we obtain tighter bounds on the uncertainty.

1 Overview

This paper is about using programming language notations to give compositional descriptions of imprecise probability. For illustration, consider a situation with three outcomes: red (r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r), green (g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g) and blue (b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b). A precise probability distribution can be understood as a point in the triangle: the corner (r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r) represents 100% certainty of red; the points on the edge between g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g and b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b represent the probability distributions where r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r is impossible (Figure 1a).

(a)p𝑝pitalic_pq𝑞qitalic_qg𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_gb𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_br𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r
(b)p𝑝pitalic_pg𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_gb𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_br𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r
(c)p𝑝pitalic_pg𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_gb𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_br𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r
(d)g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_gb𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_br𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r  
Figure 1: (a) Five probabilities over the three-point set {r,g,b}𝑟𝑔𝑏\{{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r},{\color% [rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g},{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,1}b}\}{ italic_r , italic_g , italic_b } illustrated as points in the triangle: the three extreme points are the corners; p𝑝pitalic_p is the equal odds chance between b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b and g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g; q𝑞qitalic_q is the equal odds chance between all three points. (b) A line indicating a convex region between r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r and p𝑝pitalic_p, which includes q𝑞qitalic_q. (c) A convex region which is the convex hull of four points, including r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r, p𝑝pitalic_p and also the equal odds chance between r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r and b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b and between r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r and g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g. (d) A different convex region, considered in [86, Ex. 7.3].

An imprecise probability on three outcomes is a convex region of the triangle (Figure 1b–1d). One interpretation is that if a probability distribution describes a bet, as in the foundations of Bayesianism, then a convex region is a collection of bets that would be reasonable given the current imprecise knowledge. Imprecise probability has a long history in terms of statistical robustness (e.g. [86, 33]), recently considered as part of infrabayesianism (e.g. [49, 3, 4]) and the foundations of safe AI [20].

There is already a body of work on semantics models of programming languages with imprecise probability [64, 65, 63, 70, 30, 34, 2, 28, 83, 29, 44, 43, 66, 84, 62]. Our contribution is to investigate new models that support our compositional desiderata (§1.1) by naming the non-deterministic choices (§1.2). We show that this compares favourably with earlier work (Thm. 1, §1.3) and that it is a maximal approach (Thm. 2).

1.1 Desiderata: a language for imprecise probability with compositional reasoning

A first language for describing imprecise probability is a first-order functional language without recursion. Rather, we have if/then/else statements, sequencing with immutable variable assignment (like [67, 56]), and the following two commands, which both return a boolean value:

  • bernoulli: a fair Bernoulli choice [6] which draws a ball from some urn containing two balls labelled ‘true’ and ‘false’, and replaces it;

  • knight: a Knightian choice [46] which draws a ball from a fresh urn containing balls labelled ‘true’ and ‘false’, where the number and ratio of balls are unknown and we have no priors on their distribution, except to know that the urn is not empty. (These ‘Knightian urns’ are fresh each time, i.e. they can each be used only once. For example, we are not interested in here in using multiple draws and frequencies to predict their contents.)

For example, consider the following two programs.

Example 1.1.

Consider the following program, which we argue describes the convex region in Figure 1b:

x \leftarrow\; knight ; z \leftarrow\; bernoulli ;
if z then (if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g)
else (if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b)

We draw two boolean values, x and z, respectively with Knightian uncertainty and from a fair Bernoulli trial. We then combine these two boolean values using the logic on the second and third lines of the program.

Example 1.2.

The following program describes the convex region in Figure 1c:

x \leftarrow\; knight ; y \leftarrow\; knight ; z \leftarrow\; bernoulli ;
if z then (if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g)
else (if y then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b)

This time, we draw three boolean values, x, y and z, where y is with Knightian uncertainty too. We then combine these three boolean values using the logic on the second and third lines of the program, which is almost the same except for the use of y when z is false. Decoupling the Knightian uncertainties increases the region of imprecise probability because it allows new outcomes (such as an equal chance between r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r and b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b when x is true and y is false) that were impossible in Example 1.1.

Our desiderata for a compositional account of a first-order language are the following. We are inspired by recent compositional accounts of probability theory (e.g. [48, 23, 35]), statistics (e.g. [53, 38, 10]), and probabilistic programming (e.g. [17, 39, 76]), and the connections between them (e.g. [79]). These desiderata are formalized in Section 2.

Desideratum 1.

The language should be commutative:

x \leftarrow\; t ; y \leftarrow\; u ; v =y  u ; x  t ; vy  u ; x  t ; v{\displaystyle=\quad\mbox{\leavevmode\lstinline{{\lst@@@set@language% \lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language{% \@listingGroup{ltx_lst_identifier}{y}}{\@listingGroup{ltx_lst_space}{ }}% \@listingLiterate{$\leftarrow\;$}{\@listingGroup{ltx_lst_space}{ }}{% \@listingGroup{ltx_lst_identifier}{u}}{\@listingGroup{ltx_lst_space}{ }};{% \@listingGroup{ltx_lst_space}{ }}{\@listingGroup{ltx_lst_identifier}{x}}{% \@listingGroup{ltx_lst_space}{ }}\@listingLiterate{$\leftarrow\;$}{% \@listingGroup{ltx_lst_space}{ }}{\@listingGroup{ltx_lst_identifier}{t}}{% \@listingGroup{ltx_lst_space}{ }};{\@listingGroup{ltx_lst_space}{ }}{% \@listingGroup{ltx_lst_identifier}{v}}}}}}= italic_y italic_← italic_u sansserif_; italic_x italic_← italic_t italic_; italic_v (if x𝑥xitalic_x is not free in u𝑢uitalic_u and y𝑦yitalic_y not free in t𝑡titalic_t)
and affine:
(x \leftarrow\; t ; u) =uu{\displaystyle=\quad\mbox{\leavevmode\lstinline{{\lst@@@set@language% \lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language{% \@listingGroup{ltx_lst_identifier}{u}}}}}}= u (if x𝑥xitalic_x is not free in u𝑢uitalic_u).

This means that we can regard composition graphically, as a data flow graph. For instance, the notation

u𝑢uitalic_u\vdotst𝑡titalic_t\vdotsv𝑣vitalic_vx𝑥xitalic_xy𝑦yitalic_y\vdots===      u𝑢uitalic_u\vdotst𝑡titalic_t\vdotsv𝑣vitalic_vx𝑥xitalic_xy𝑦yitalic_y\vdots===      u𝑢uitalic_u\vdotst𝑡titalic_t\vdotsv𝑣vitalic_vx𝑥xitalic_xy𝑦yitalic_y\vdots

is not ambiguous.

Although this requirement does not hold generally in the presence of memory side effects and mutable variables, we do not have mutable variables here, and it is desirable in a declarative language. For example, we would like to notate the program from Example 1.1 as

bernoulliknightif z then x𝑥xitalic_xz𝑧zitalic_z
Desideratum 2.

The standard equational reasoning about if/then/else should apply, and in particular the following hoisting equation should be allowed:

if b then (x  t ; u) else (x  t ; v)=x  t ; if b then u else vif b then (x  t ; u) else (x  t ; v)x  t ; if b then u else v{{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers% \lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language{\@listingGroup{% ltx_lst_keyword}{if}}{\@listingGroup{ltx_lst_space}{ }}{\@listingGroup{% ltx_lst_identifier}{b}}{\@listingGroup{ltx_lst_space}{ }}{\@listingGroup{% ltx_lst_keyword}{then}}{\@listingGroup{ltx_lst_space}{ }}({\@listingGroup{% ltx_lst_identifier}{x}}{\@listingGroup{ltx_lst_space}{ }}\@listingLiterate{$% \leftarrow\;$}{\@listingGroup{ltx_lst_space}{ }}{\@listingGroup{% ltx_lst_identifier}{t}}{\@listingGroup{ltx_lst_space}{ }};{\@listingGroup{% ltx_lst_space}{ }}{\@listingGroup{ltx_lst_identifier}{u}}){\@listingGroup{% ltx_lst_space}{ }}{\@listingGroup{ltx_lst_keyword}{else}}{\@listingGroup{% ltx_lst_space}{ }}({\@listingGroup{ltx_lst_identifier}{x}}{\@listingGroup{% ltx_lst_space}{ }}\@listingLiterate{$\leftarrow\;$}{\@listingGroup{% ltx_lst_space}{ }}{\@listingGroup{ltx_lst_identifier}{t}}{\@listingGroup{% ltx_lst_space}{ }};{\@listingGroup{ltx_lst_space}{ }}{\@listingGroup{% ltx_lst_identifier}{v}})}}}}\quad=\quad\mbox{\leavevmode\lstinline{{% \lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor% \lst@@@set@language{\@listingGroup{ltx_lst_identifier}{x}}{\@listingGroup{% ltx_lst_space}{ }}\@listingLiterate{$\leftarrow\;$}{\@listingGroup{% ltx_lst_space}{ }}{\@listingGroup{ltx_lst_identifier}{t}}{\@listingGroup{% ltx_lst_space}{ }};{\@listingGroup{ltx_lst_space}{ }}{\@listingGroup{% ltx_lst_keyword}{if}}{\@listingGroup{ltx_lst_space}{ }}{\@listingGroup{% ltx_lst_identifier}{b}}{\@listingGroup{ltx_lst_space}{ }}{\@listingGroup{% ltx_lst_keyword}{then}}{\@listingGroup{ltx_lst_space}{ }}{\@listingGroup{% ltx_lst_identifier}{u}}{\@listingGroup{ltx_lst_space}{ }}{\@listingGroup{% ltx_lst_keyword}{else}}{\@listingGroup{ltx_lst_space}{ }}{\@listingGroup{% ltx_lst_identifier}{v}}}}}}italic_if italic_b italic_then italic_(x sansserif_← italic_t italic_; italic_u) italic_else italic_(x italic_← italic_t italic_; italic_v) = italic_x italic_← italic_t sansserif_; sansserif_if italic_b italic_then italic_u italic_else italic_v

where x𝑥xitalic_x is not free in b𝑏bitalic_b.

One earlier approach to a semantic study of a language like this is provided by a convex powerset of distributions monad (e.g. [44, 64, 7, 8, 65, 30, 34]). This does not satisfy the desiderata for compositional reasoning. In fact, no semantic model satisfying the desiderata can allow Examples 1.1 and 1.2 to be distinguished, as we show in Figure 2. The key issue is with the third program in Figure 2:

z \leftarrow\; bernoulli ; if z then (x \leftarrow\; knight ; if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g)
else (x \leftarrow\; knight ; if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b)

This program draws a boolean value with Knightian uncertainty on each of the branches of the if statement. The paradox arises in whether each choice comes from different urns or the same urn. Perhaps there is one Knightian urn that is used in both branches. Or perhaps we draw a boolean value from a new Knightian urn on the second branch. Our proposed solution is to make this distinction explicit.

x \leftarrow\; knight ; z \leftarrow\; bernoulli ;
if z then (if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g) else (if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b)
= Desideratum 1 (commutativity)
z \leftarrow\; bernoulli ; x \leftarrow\; knight ;
if z then (if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g) else (if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b)
= Desideratum 2
z \leftarrow\; bernoulli ; if z then (x \leftarrow\; knight ; if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g)
else (x \leftarrow\; knight ; if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b)
= Alpha renaming
z \leftarrow\; bernoulli ; if z then (x \leftarrow\; knight ; if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g)
else (y \leftarrow\; knight ; if y then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b)
= Desideratum 1 (affine)
z \leftarrow\; bernoulli ; if z then (x \leftarrow\; knight ; y \leftarrow\; knight ; if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g)
else (x \leftarrow\; knight ; y \leftarrow\; knight ; if y then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b)
= Desideratum 2
z \leftarrow\; bernoulli ; x \leftarrow\; knight ; y \leftarrow\; knight ;
if z then (if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g) else (if y then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b)
= Desideratum 1 (commutativity)
x \leftarrow\; knight ; y \leftarrow\; knight ; z \leftarrow\; bernoulli ;
if z then (if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g) else (if y then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b)
Figure 2: An equational derivation that Examples 1.1 and 1.2 must be equal if Desiderata 1 and 2 are satisfied.

1.2 Resolution: named Knightian choices

To satisfy both desiderata, our proposal is to name each Knightian choice (Section 3). To do this, we rewrite Example 1.1 by annotating the only Knightian choice with the name a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT:

x \leftarrow\; knight(a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) ; z \leftarrow\; bernoulli ;
if z then (if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g) else (if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b)

We think of this program as giving rise to the convex set in Figure 1(b). This is then the same as the program where y describes the outcome of the same Knightian choice, i.e. one with the same name:

x \leftarrow\; knight(a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) ; y \leftarrow\; knight(a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) ; z \leftarrow\; bernoulli ;
if z then (if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g) else (if y then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b)

but it is different from the program where y describes a different Knightian choice, i.e. one with a different name:

x \leftarrow\; knight(a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) ; y \leftarrow\; knight(a2subscript𝑎2a_{2}italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) ; z \leftarrow\; bernoulli ;
if z then (if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g) else (if y then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b)

which is intuitively what Example 1.2 describes, and gives rise to the convex set in Figure 1(c). Now when we try to follow the same equational derivation as in Figure 2, the third program becomes:

z \leftarrow\; bernoulli ; if z then (x \leftarrow\; knight(a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) ; if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g)
else (x \leftarrow\; knight(a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) ; if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b)

which conveys the same Knightian value is used on each of the branches of the if statement. This can no longer derive the program:

z \leftarrow\; bernoulli ; if z then (x \leftarrow\; knight(a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) ; if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g)
else (x \leftarrow\; knight(a2subscript𝑎2a_{2}italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) ; if x then return r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r else return b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b)

which explicitly uses a different Knightian choice on the else branch.

The idea of naming non-deterministic choices appears in work outside probability (e.g. proved transitions, [9]) and probabilistic choices are often named in practical probabilistic programming [82, §6.2] which has already been explored using graded monads [57]. More generally, intensionality in non-determinism is known to be a profitable perspective (e.g. [52, 12]).

1.2.1 Named Knightian choices via a reader monad

The set-up with named Knightian choices is consistent with Desiderata 1 and 2, which we can show by building a monad (e.g. [67]), namely the reader transformer (e.g. [58]) of the finite distributions monad (e.g. [37, Ch. 2]):

T2A(X)=[2AD(X)]subscript𝑇superscript2𝐴𝑋delimited-[]superscript2𝐴𝐷𝑋T_{2^{A}}(X)=[2^{A}\Rightarrow D(X)]italic_T start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_X ) = [ 2 start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ⇒ italic_D ( italic_X ) ] (1)

where X𝑋Xitalic_X is the set of outcomes, A𝐴Aitalic_A is the set of names required, and D𝐷Ditalic_D is the finite distributions monad. Then the Knightian choices are interpreted by reading, and the Bernoulli choices use the distributions monad. This combined monad is well known to be commutative and affine. Thus both desiderata are satisfied.

We can recover a convex set of probability distributions from any tT2A(X)𝑡subscript𝑇superscript2𝐴𝑋t\in T_{2^{A}}(X)italic_t ∈ italic_T start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_X ) by pushing forward all the possible probability distributions on 2Asuperscript2𝐴2^{A}2 start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT. Formally, we can express this using the monadic bind (>>=Dsubscriptmuch-greater-thanabsent𝐷\mathrel{\scalebox{0.5}[1.0]{$>\!>=$}}_{D}start_RELOP > > = end_RELOP start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT, Kleisli composition) of D𝐷Ditalic_D:

t2A={p>>=Dt|pD(2A)}D(X).\llbracket t\rrbracket_{2^{A}}=\{p\mathrel{\scalebox{0.5}[1.0]{$>\!>=$}}_{D}t~% {}|~{}p\in D(2^{A})\}\subseteq D(X)\text{.}⟦ italic_t ⟧ start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = { italic_p start_RELOP > > = end_RELOP start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT italic_t | italic_p ∈ italic_D ( 2 start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ) } ⊆ italic_D ( italic_X ) .

1.2.2 Grading to account for renamings

A remaining concern with named Knightian choices is that we ought to take seriously name-space issues in composition. When composing programs with named Knightian choices, we may wish to avoid name clashes. This is dependent on how we interpret the set A𝐴Aitalic_A in (1).

We resolve this issue by regarding the monad (1) as a graded monad [42, 68, 41]. This is closely related to the ‘para’ construction (e.g. [22, 31]). Some of the crucial steps are as follows:

  • Any injection ι:AB:𝜄𝐴𝐵\iota:A\to Bitalic_ι : italic_A → italic_B induces a renaming of programs using names A𝐴Aitalic_A to programs using names B𝐵Bitalic_B, and indeed a natural map T2ι:T2A(X)T2B(X):subscript𝑇superscript2𝜄subscript𝑇superscript2𝐴𝑋subscript𝑇superscript2𝐵𝑋T_{2^{\iota}}:T_{2^{A}}(X)\to T_{2^{B}}(X)italic_T start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_ι end_POSTSUPERSCRIPT end_POSTSUBSCRIPT : italic_T start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_X ) → italic_T start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_X );

  • We can regard monadic bind (Kleisli composition) in T𝑇Titalic_T as operating on disjoint sets of names:

    >>=T:T2A(X)×(XT2B(Y))T2AB(Y)\mathrel{\scalebox{0.5}[1.0]{$>\!>=$}}_{T}\ :\ T_{2^{A}}(X)\times(X\Rightarrow T% _{2^{B}}(Y))\to T_{2^{A\uplus B}}(Y)start_RELOP > > = end_RELOP start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT : italic_T start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_X ) × ( italic_X ⇒ italic_T start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Y ) ) → italic_T start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_A ⊎ italic_B end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Y )

    Thus a computation using names A𝐴Aitalic_A is sequenced with a computation using names B𝐵Bitalic_B to build a computation that involves names (AB)𝐴𝐵(A\uplus B)( italic_A ⊎ italic_B ).

  • This monad is graded-monoidal too, via a map

    T2A(X)×T2B(Y)T2(AB)(X×Y)subscript𝑇superscript2𝐴𝑋subscript𝑇superscript2𝐵𝑌subscript𝑇superscript2𝐴𝐵𝑋𝑌T_{2^{A}}(X)\times T_{2^{B}}(Y)\to T_{2^{(A\uplus B)}}(X\times Y)italic_T start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_X ) × italic_T start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_Y ) → italic_T start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT ( italic_A ⊎ italic_B ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_X × italic_Y )

    which juxtaposes computations using names A𝐴Aitalic_A and B𝐵Bitalic_B to give a computation using (AB)𝐴𝐵(A\uplus B)( italic_A ⊎ italic_B ).

  • The induced convex set of distributions is invariant under renaming: t2A=T2ι(t)2B\llbracket t\rrbracket_{2^{A}}=\llbracket T_{2^{\iota}}(t)\rrbracket_{2^{B}}⟦ italic_t ⟧ start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = ⟦ italic_T start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_ι end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) ⟧ start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT end_POSTSUBSCRIPT.

The crucial element is that the injective renaming ι𝜄\iotaitalic_ι induces a surjection 2ι:2B2A:superscript2𝜄superscript2𝐵superscript2𝐴2^{\iota}:2^{B}\to 2^{A}2 start_POSTSUPERSCRIPT italic_ι end_POSTSUPERSCRIPT : 2 start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT → 2 start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT between the spaces of Knightian choices. We abstract and generalize by allowing arbitrary surjections 2B2Asuperscript2𝐵superscript2𝐴2^{B}\to 2^{A}2 start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT → 2 start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT, further by allowing sets other than 2Asuperscript2𝐴2^{A}2 start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT, and further still by allowing surjective stochastic maps rather than surjections.

As an aside, we note that probability monads too can often be regarded as sort-of reader monads (e.g. [80, 17, 81, 5, 74, 75]), since probability distributions D(X)𝐷𝑋D(X)italic_D ( italic_X ) can be described by random variables ΩXΩ𝑋\Omega\to Xroman_Ω → italic_X, for some base probability sample space ΩΩ\Omegaroman_Ω. Thus we could regard our monad T(X)𝑇𝑋T(X)italic_T ( italic_X ) as a quotient of

[(Ω×Ξ)X]delimited-[]ΩΞ𝑋[(\Omega\times\Xi)\Rightarrow X][ ( roman_Ω × roman_Ξ ) ⇒ italic_X ]

where ΩΩ\Omegaroman_Ω is a sample space for Bernoulli probability and Ξ=2AΞsuperscript2𝐴\Xi=2^{A}roman_Ξ = 2 start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT is a sample space for Knightian uncertainty. In this work, we will quotient by the ‘law’ of random variables in ΩΩ\Omegaroman_Ω, so that the usual equational reasoning about Bernoulli probability is valid.

1.3 Results about quotienting our theory

The names for the Knightian choices in our language appear to be additional intensional information, and the reader monad does not quotient this away. For this reason we show two results about the equational theory. First, we connect our approach to the convex powerset of distributions monad, showing that our bounds are tighter. Second, we show it is maximal — no further quotient is possible.

Theorem 14): Improved bounds on uncertainty

In our resulting language, every closed term describes a convex set of distributions. We thus establish a connection to the non-compositional approach that uses the Kleisli category of the convex powerset of distributions monad (e.g. [44, 64, 7, 8, 65, 30, 34]). We have an ‘op-lax’ functor

R:𝐈𝐦𝐏Kl(CP).:𝑅𝐈𝐦𝐏KlCPR:\mathbf{ImP}\to\mathrm{Kl}(\mathrm{CP}).italic_R : bold_ImP → roman_Kl ( roman_CP ) .

from our locally graded category 𝐈𝐦𝐏𝐈𝐦𝐏\mathbf{ImP}bold_ImP3.2) to the Kleisli category of the convex powerset of distributions monad CPCP\mathrm{CP}roman_CP. Being an op-lax functor means that

R(gf)R(g)R(f),𝑅𝑔𝑓𝑅𝑔𝑅𝑓R(g\circ f)\subseteq R(g)\circ R(f),italic_R ( italic_g ∘ italic_f ) ⊆ italic_R ( italic_g ) ∘ italic_R ( italic_f ) ,

i.e. composition in our category gives a tighter bound on the Knightian uncertainty than the composition using the Kleisli category of the convex powerset of distributions monad (see also the Example in §4.4).

Note that this could not be a proper functor because we would then have a quotient theory in violation of the maximality theorem (Theorem 2). But an op-lax functor is beneficial as an interpretation of giving a tighter bound.

Theorem 25): Maximality.

Our language also gives rise to a compositional theory of equality. We prove our equational theory is maximal in that we can add no further equations on open terms without equating different convex sets of distributions or compromising the compositional structure. (See Theorem 2 for a precise statement.)

Further detail: two quotients

In slightly more detail, we consider two candidates from the literature for quotients of a graded monad. They are general methods, but appear in our language as follows. Notice that an open term in our language contains both names and variables: names for Knightian choices, and free variables standing for ordinary values that might be substituted later. There are two ways to quotient the names away:

Quotient A

Following [25, 24] we could equate two open terms if, for every valuation of the free variables, there is a renaming that equates them.

Quotient B

Following [31], we could equate two open terms if there is a renaming such that for every valuation of the free variables they are made equal.

For closed terms with no free variables, the two approaches are the same and give rise to a convex set of distributions (Prop. 4.3).

Quotient A does not directly give a compositional theory in our setting: the criteria of [24] are not satisfied. Nonetheless, the construction of [24] can be adjusted, giving the op-lax functor of Theorem 1 rather than a monad morphism.

Quotient B does not satisfy Desideratum 2 (commuting if-then-else). Informally, it would allow us to rename on the ‘then’ branch but not the ‘else’ branch, which is inconsistent with Desideratum 2. Nonetheless, it could be a useful approach in a metalanguage for combining models that do not need a general if-then-else construction. For this reason, Quotient B is not a counterexample to Theorem 2.

1.4 Notes about context

Other situations combining probability and non-determinism.

Our focus here is on imprecise probability. This is one form of non-determinism, but there is a broader body of general interest in non-determinism and its combination with probability. Of course, non-determinism can arise in many different semantic situations, beyond the motivation from imprecise probability. One motivation is in program abstraction and refinement: there, one describes a problem by writing a non-deterministic program that solves it; one then solves the problem by refining that non-deterministic program. The mathematical analysis is similar, and for instance illustrations essentially the same as Figure 1 appear in work on refinement of probabilistic programs [61, Fig. 6.4.2, Fig. 6.5.1]. But the motivation is different, and the desiderata (§1.1) may be less relevant in program refinement.

An arguably different kind of non-determinism appears where there are many appropriate results that we want to collect. In this sense, for instance, database queries are non-deterministic if they return multiple results, and Prolog is non-deterministic. When combined with probability, this leads more naturally to random sets or random bags, which are in contrast to the sets of distributions shown in Figure 1. Random bags do arise in probabilistic databases and in point process theory. We looked at these applications from a monad perspective here [19, 18], and the monads have long been discussed (e.g. [15, 44, 16, 50, 84, 36]).

Work on distributive laws.

There is a large literature on finding elegant explanations for combining existing monads for probability and non-determinism, exploring distributive laws (e.g. [70, 30, 13, 84, 8, 15, 50]). Although the reader monad transformer is a distributive law, our emphasis here is on the commutativity desiderata (§1.1) rather than distributivity issues. Even when there is a distributive law between commutative monads, the resulting composite monad need not be commutative. Indeed both the random bags monad (e.g. [16]) and the powerdomain of indexed valuations monad [84] arise from distributive laws between commutative monads, but neither composite monads are commutative.

Algebraic perspective.

Some of this work on distributive laws takes the perspective of algebraic theories. Our desiderata can be viewed from the point of view of algebraic theories, via algebraic effects (e.g. [72]), which we now briefly explore. We define two binary operations:

(t+0.5u)=defif bernoulli then t else u(tu)=defif knight then t else uformulae-sequencesuperscriptdefsubscript0.5𝑡𝑢if bernoulli then t else usuperscriptdef𝑡𝑢if knight then t else u{{(t+_{0.5}u)\stackrel{{\scriptstyle\text{def}}}{{=}}\mbox{\leavevmode\lstinline% {{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor% \lst@@@set@language{\@listingGroup{ltx_lst_keyword}{if}}{\@listingGroup{% ltx_lst_space}{ }}{\@listingGroup{ltx_lst_keywords2}{bernoulli}}{% \@listingGroup{ltx_lst_space}{ }}{\@listingGroup{ltx_lst_keyword}{then}}{% \@listingGroup{ltx_lst_space}{ }}{\@listingGroup{ltx_lst_identifier}{t}}{% \@listingGroup{ltx_lst_space}{ }}{\@listingGroup{ltx_lst_keyword}{else}}{% \@listingGroup{ltx_lst_space}{ }}{\@listingGroup{ltx_lst_identifier}{u}}}}}}% \qquad(t\vee u)\stackrel{{\scriptstyle\text{def}}}{{=}}\mbox{\leavevmode% \lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame% \lst@@@set@rulecolor\lst@@@set@language{\@listingGroup{ltx_lst_keyword}{if}}{% \@listingGroup{ltx_lst_space}{ }}{\@listingGroup{ltx_lst_keywords2}{knight}}{% \@listingGroup{ltx_lst_space}{ }}{\@listingGroup{ltx_lst_keyword}{then}}{% \@listingGroup{ltx_lst_space}{ }}{\@listingGroup{ltx_lst_identifier}{t}}{% \@listingGroup{ltx_lst_space}{ }}{\@listingGroup{ltx_lst_keyword}{else}}{% \@listingGroup{ltx_lst_space}{ }}{\@listingGroup{ltx_lst_identifier}{u}}}}}}( italic_t + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_u ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG def end_ARG end_RELOP if bernoulli then t sansserif_else italic_u ( italic_t ∨ italic_u ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG def end_ARG end_RELOP if knight then t sansserif_else italic_u

Regarding Desideratum 1, Commutativity means that each operation is a homomorphism for the other:

(st)+0.5(uv)=(s+0.5u)(t+0.5v)(s+0.5t)+0.5(u+0.5v)=(s+0.5u)+0.5(t+0.5v)(st)(uv)=(su)(tv)formulae-sequencesubscript0.5𝑠𝑡𝑢𝑣subscript0.5𝑠𝑢subscript0.5𝑡𝑣formulae-sequencesubscript0.5subscript0.5𝑠𝑡subscript0.5𝑢𝑣subscript0.5subscript0.5𝑠𝑢subscript0.5𝑡𝑣𝑠𝑡𝑢𝑣𝑠𝑢𝑡𝑣(s\vee t)+_{0.5}(u\vee v)=(s+_{0.5}u)\vee(t+_{0.5}v)\quad(s+_{0.5}t)+_{0.5}(u+% _{0.5}v)=(s+_{0.5}u)+_{0.5}(t+_{0.5}v)\quad(s\vee t)\vee(u\vee v)=(s\vee u)% \vee(t\vee v)( italic_s ∨ italic_t ) + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT ( italic_u ∨ italic_v ) = ( italic_s + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_u ) ∨ ( italic_t + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_v ) ( italic_s + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_t ) + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT ( italic_u + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_v ) = ( italic_s + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_u ) + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT ( italic_t + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_v ) ( italic_s ∨ italic_t ) ∨ ( italic_u ∨ italic_v ) = ( italic_s ∨ italic_u ) ∨ ( italic_t ∨ italic_v )

and Affinity says that tt=t𝑡𝑡𝑡t\vee t=titalic_t ∨ italic_t = italic_t and t+0.5t=tsubscript0.5𝑡𝑡𝑡t+_{0.5}t=titalic_t + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_t = italic_t. Desideratum 2 is always assumed in algebraic effects. From these five axioms, the derivation of Figure 2 can be made algebraically:

r(g+0.5b)=(r+0.5r)(g+0.5b)=(rg)+0.5(rb).𝑟subscript0.5𝑔𝑏subscript0.5𝑟𝑟subscript0.5𝑔𝑏subscript0.5𝑟𝑔𝑟𝑏.{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}\vee({% \color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}+_{0.5}{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{% rgb}{0,0,1}b})=({\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{% 1,0,0}r}+_{0.5}{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{% 1,0,0}r})\vee({\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0.88,0}\pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill% {0.91}{0}{0.88}{0.12}g}+_{0.5}{\color[rgb]{0,0,1}\definecolor[named]{% pgfstrokecolor}{rgb}{0,0,1}b})=({\color[rgb]{1,0,0}\definecolor[named]{% pgfstrokecolor}{rgb}{1,0,0}r}\vee{\color[rgb]{0,0.88,0}\definecolor[named]{% pgfstrokecolor}{rgb}{0,0.88,0}\pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}% \pgfsys@color@cmyk@fill{0.91}{0}{0.88}{0.12}g})+_{0.5}({\color[rgb]{1,0,0}% \definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}\vee{\color[rgb]{0,0,1}% \definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b})\text{.}italic_r ∨ ( italic_g + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_b ) = ( italic_r + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_r ) ∨ ( italic_g + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_b ) = ( italic_r ∨ italic_g ) + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT ( italic_r ∨ italic_b ) .

If we regard \vee as a Minkowski sum (see §4.1) then the left hand side appears to be Fig. 1(b) and the right hand side is Fig. 1(c). especially since the latter can be further rearranged just using the commutativity and affinity laws to

(r(g+0.5r))((r+0.5b)(g+0.5b))𝑟subscript0.5𝑔𝑟subscript0.5𝑟𝑏subscript0.5𝑔𝑏({\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}\vee({% \color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}+_{0.5}{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{% rgb}{1,0,0}r}))\vee(({\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{% rgb}{1,0,0}r}+_{0.5}{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb% }{0,0,1}b})\vee({\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}% {0,0.88,0}\pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}% \pgfsys@color@cmyk@fill{0.91}{0}{0.88}{0.12}g}+_{0.5}{\color[rgb]{0,0,1}% \definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}))( italic_r ∨ ( italic_g + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_r ) ) ∨ ( ( italic_r + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_b ) ∨ ( italic_g + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_b ) )

which enumerates the four extreme points of Fig. 1(c). From this algebraic perpective, our proposal is to label the Knightian branching, with a family of binary operators a1subscriptsubscript𝑎1\vee_{a_{1}}∨ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, a2subscriptsubscript𝑎2\vee_{a_{2}}∨ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, …, and then we still have that

ra1(g+0.5b)=(r+0.5r)a1(g+0.5b)=(ra1g)+0.5(ra1b)subscriptsubscript𝑎1𝑟subscript0.5𝑔𝑏subscriptsubscript𝑎1subscript0.5𝑟𝑟subscript0.5𝑔𝑏subscript0.5subscriptsubscript𝑎1𝑟𝑔subscriptsubscript𝑎1𝑟𝑏{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}\vee_{a_{1% }}({\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}+_{0.5}{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{% rgb}{0,0,1}b})=({\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{% 1,0,0}r}+_{0.5}{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{% 1,0,0}r})\vee_{a_{1}}({\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor% }{rgb}{0,0.88,0}\pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}% \pgfsys@color@cmyk@fill{0.91}{0}{0.88}{0.12}g}+_{0.5}{\color[rgb]{0,0,1}% \definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b})=({\color[rgb]{1,0,0}% \definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}\vee_{a_{1}}{\color[rgb]{% 0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g})+_{0.5}({\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{% rgb}{1,0,0}r}\vee_{a_{1}}{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor% }{rgb}{0,0,1}b})italic_r ∨ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_g + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_b ) = ( italic_r + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_r ) ∨ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_g + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_b ) = ( italic_r ∨ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_g ) + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT ( italic_r ∨ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_b )

yet it is consistent to assume

(ra1g)+0.5(ra1b)(ra1g)+0.5(ra2b).subscript0.5subscriptsubscript𝑎1𝑟𝑔subscriptsubscript𝑎1𝑟𝑏subscript0.5subscriptsubscript𝑎1𝑟𝑔subscriptsubscript𝑎2𝑟𝑏.({\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}\vee_{a_{% 1}}{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g})+_{0.5}({\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{% rgb}{1,0,0}r}\vee_{a_{1}}{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor% }{rgb}{0,0,1}b})\neq({\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{% rgb}{1,0,0}r}\vee_{a_{1}}{\color[rgb]{0,0.88,0}\definecolor[named]{% pgfstrokecolor}{rgb}{0,0.88,0}\pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}% \pgfsys@color@cmyk@fill{0.91}{0}{0.88}{0.12}g})+_{0.5}({\color[rgb]{1,0,0}% \definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}\vee_{a_{2}}{\color[rgb]{% 0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b})\text{.}( italic_r ∨ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_g ) + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT ( italic_r ∨ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_b ) ≠ ( italic_r ∨ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_g ) + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT ( italic_r ∨ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_b ) .

with different names on the right hand side, corresponding to the difference between Figures 1(b) and 1(c).

Our intuition is that tausubscript𝑎𝑡𝑢t\vee_{a}uitalic_t ∨ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_u branches left or right depending on the Knightian draw a𝑎aitalic_a. Here a𝑎aitalic_a is describing the unique draw from that urn. We note that a labelled binary choice already appears in the probabilistic setting in [77], where it has a different meaning: there, a𝑎aitalic_a stands for an urn but not a specific draw, and ?asubscript?𝑎?_{a}? start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT denotes sampling from urn a𝑎aitalic_a according to Polya’s scheme (replace with two copies of what was drawn).

We note that it is already known that Desiderata 1 is incompatible with symmetry laws (tu=ut𝑡𝑢𝑢𝑡{t\vee u=u\vee t}italic_t ∨ italic_u = italic_u ∨ italic_t and t+0.5u=u+0.5tsubscript0.5𝑡𝑢subscript0.5𝑢𝑡{t+_{0.5}u=u+_{0.5}t}italic_t + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_u = italic_u + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_t), since we can use those to deduce

tu=(tu)+0.5(tu)=(tu)+0.5(ut)=(t+0.5u)(u+0.5t)=(t+0.5u)(t+0.5u)=t+0.5u𝑡𝑢subscript0.5𝑡𝑢𝑡𝑢subscript0.5𝑡𝑢𝑢𝑡subscript0.5𝑡𝑢subscript0.5𝑢𝑡subscript0.5𝑡𝑢subscript0.5𝑡𝑢subscript0.5𝑡𝑢t\vee u=(t\vee u)+_{0.5}(t\vee u)=(t\vee u)+_{0.5}(u\vee t)=(t+_{0.5}u)\vee(u+% _{0.5}t)=(t+_{0.5}u)\vee(t+_{0.5}u)=t+_{0.5}uitalic_t ∨ italic_u = ( italic_t ∨ italic_u ) + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT ( italic_t ∨ italic_u ) = ( italic_t ∨ italic_u ) + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT ( italic_u ∨ italic_t ) = ( italic_t + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_u ) ∨ ( italic_u + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_t ) = ( italic_t + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_u ) ∨ ( italic_t + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_u ) = italic_t + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_u (2)

In our graded semantics (𝐈𝐦𝐏𝐈𝐦𝐏\mathbf{ImP}bold_ImP), we do have t+0.5u=u+0.5tsubscript0.5𝑡𝑢subscript0.5𝑢𝑡t+_{0.5}u=u+_{0.5}titalic_t + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_u = italic_u + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_t exactly. We do not have tu=ut𝑡𝑢𝑢𝑡t\vee u=u\vee titalic_t ∨ italic_u = italic_u ∨ italic_t, although there is a reindexing in the grading that corresponds to this symmetry. One view is that we have side-stepped this Eckmann-Hilton-like obstacle (2) by putting the symmetry tuut𝑡𝑢𝑢𝑡t\vee u\approx u\vee titalic_t ∨ italic_u ≈ italic_u ∨ italic_t into the grading structure.

2 Rudiments of graded Markov categories and graded monads

We now recall notions of Markov category (§2.1) and relative monads (§ 2.2), recasting them in the locally graded setting. We show how to pass between the concepts (Prop. 2.5, 2.6) and we relate them to notions from enriched category theory (§2.3). For brevity, in this section we focus on definitions and in the next section (§3) we focus on examples, rather than interleaving them.

Definition 2.1.

A monoidal category is a category 𝒞𝒞\mathcal{C}caligraphic_C equipped with a functor :𝒞×𝒞𝒞\otimes:\mathcal{C}\times\mathcal{C}\to\mathcal{C}⊗ : caligraphic_C × caligraphic_C → caligraphic_C and an object I𝐼Iitalic_I together with associativity and unitor isomorphisms (e.g. (XY)ZX(YZ)tensor-producttensor-product𝑋𝑌𝑍tensor-product𝑋tensor-product𝑌𝑍(X\otimes Y)\otimes Z\cong X\otimes(Y\otimes Z)( italic_X ⊗ italic_Y ) ⊗ italic_Z ≅ italic_X ⊗ ( italic_Y ⊗ italic_Z )) that satisfy coherence conditions (e.g. [59]). It is strict if the isomorphisms are equalities. A symmetric monoidal category is moreover equipped with isomorphisms σX,Y:XYYX:subscript𝜎𝑋𝑌tensor-product𝑋𝑌tensor-product𝑌𝑋\sigma_{X,Y}:X\otimes Y\cong Y\otimes Xitalic_σ start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT : italic_X ⊗ italic_Y ≅ italic_Y ⊗ italic_X such that σY,X=σX,Y1subscript𝜎𝑌𝑋superscriptsubscript𝜎𝑋𝑌1\sigma_{Y,X}=\sigma_{X,Y}^{-1}italic_σ start_POSTSUBSCRIPT italic_Y , italic_X end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT and satisfying coherence conditions.

A semicartesian category is a symmetric monoidal category in which the monoidal unit is a terminal object. That is, there is exactly one morphism XI𝑋𝐼X\to Iitalic_X → italic_I for all X𝑋Xitalic_X.

A semicartesian category has projections XYXIXtensor-product𝑋𝑌tensor-product𝑋𝐼𝑋X\otimes Y\to X\otimes I\cong Xitalic_X ⊗ italic_Y → italic_X ⊗ italic_I ≅ italic_X, but it is weaker than a full categorical product because there need not be a natural diagonal XXX𝑋tensor-product𝑋𝑋X\to X\otimes Xitalic_X → italic_X ⊗ italic_X.

2.1 Graded Markov categories

Definition 2.2.

[23] A Markov category is a semicartesian category such that every object is equipped with a commutative comonoid structure, that is, a map 𝖼𝗈𝗉𝗒X:XXX:subscript𝖼𝗈𝗉𝗒𝑋𝑋tensor-product𝑋𝑋\mathsf{copy}_{X}:X\to X\otimes Xsansserif_copy start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT : italic_X → italic_X ⊗ italic_X that is symmetric and associative and has the terminal map XI𝑋𝐼X\to Iitalic_X → italic_I as a unit.

A morphism f:XY:𝑓𝑋𝑌f:X\to Yitalic_f : italic_X → italic_Y in a Markov category is deterministic if it commutes with the copy map (𝖼𝗈𝗉𝗒Yf=(ff)𝖼𝗈𝗉𝗒Xsubscript𝖼𝗈𝗉𝗒𝑌𝑓tensor-product𝑓𝑓subscript𝖼𝗈𝗉𝗒𝑋\mathsf{copy}_{Y}\circ f=(f\otimes f)\circ\mathsf{copy}_{X}sansserif_copy start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ∘ italic_f = ( italic_f ⊗ italic_f ) ∘ sansserif_copy start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT).

A distributive Markov category [1] is a Markov category that has coproducts such that the canonical maps XZ+YZ(X+Y)Ztensor-product𝑋𝑍tensor-product𝑌𝑍tensor-product𝑋𝑌𝑍X\otimes Z+Y\otimes Z\to(X+Y)\otimes Zitalic_X ⊗ italic_Z + italic_Y ⊗ italic_Z → ( italic_X + italic_Y ) ⊗ italic_Z are isomorphisms and the coproduct injections XX+YY𝑋𝑋𝑌𝑌X\to X+Y\leftarrow Yitalic_X → italic_X + italic_Y ← italic_Y are deterministic.

A typical example of a distributive Markov category is the category 𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡\mathbf{FinStoch}bold_FinStoch of stochastic matrices (Def. 3.2).

An ordinary distributive category [14, 11] is a distributive Markov category in which every morphism is deterministic. A typical example is the category 𝐅𝐢𝐧𝐒𝐞𝐭𝐅𝐢𝐧𝐒𝐞𝐭\mathbf{FinSet}bold_FinSet of finite sets.

Programming syntax.

We can use programming language syntax for composition in a distributive Markov category (see also e.g. [79]). The objects of the category are regarded as types, with bool regarded as the object 1+1111+11 + 1. If Γ=(x1:A1)(xn:An)\Gamma=(x_{1}:A_{1})\otimes\dots\otimes(x_{n}:A_{n})roman_Γ = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⊗ ⋯ ⊗ ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) then a morphism t:ΓB:𝑡Γ𝐵t:\Gamma\to Bitalic_t : roman_Γ → italic_B is regarded as a term Γt:BprovesΓ𝑡:𝐵\Gamma\vdash t:Broman_Γ ⊢ italic_t : italic_B. We notate

(t,u) for Γ𝖼𝗈𝗉𝗒ΓΓtuAB𝖼𝗈𝗉𝗒Γtensor-productΓΓtensor-product𝑡𝑢tensor-product𝐴𝐵\displaystyle\Gamma\xrightarrow{\mathsf{copy}}\Gamma\otimes\Gamma\xrightarrow{% t\otimes u}A\otimes Broman_Γ start_ARROW oversansserif_copy → end_ARROW roman_Γ ⊗ roman_Γ start_ARROW start_OVERACCENT italic_t ⊗ italic_u end_OVERACCENT → end_ARROW italic_A ⊗ italic_B
x \leftarrow\; t ; u for Γ𝖼𝗈𝗉𝗒ΓΓΓtΓA𝑢B𝖼𝗈𝗉𝗒Γtensor-productΓΓtensor-productΓ𝑡tensor-productΓ𝐴𝑢𝐵\displaystyle\Gamma\xrightarrow{\mathsf{copy}}\Gamma\otimes\Gamma\xrightarrow{% \Gamma\otimes t}\Gamma\otimes A\xrightarrow{u}Broman_Γ start_ARROW oversansserif_copy → end_ARROW roman_Γ ⊗ roman_Γ start_ARROW start_OVERACCENT roman_Γ ⊗ italic_t end_OVERACCENT → end_ARROW roman_Γ ⊗ italic_A start_ARROW overitalic_u → end_ARROW italic_B
if t then u else v for Γ𝖼𝗈𝗉𝗒ΓΓtΓ(1+1)ΓΓ+Γ[t,u]B𝖼𝗈𝗉𝗒Γtensor-productΓΓtensor-product𝑡Γtensor-product11ΓΓΓ𝑡𝑢𝐵\displaystyle\Gamma\xrightarrow{\mathsf{copy}}\Gamma\otimes\Gamma\xrightarrow{% t\otimes\Gamma}(1+1)\otimes\Gamma\cong\Gamma+\Gamma\xrightarrow{[t,u]}Broman_Γ start_ARROW oversansserif_copy → end_ARROW roman_Γ ⊗ roman_Γ start_ARROW start_OVERACCENT italic_t ⊗ roman_Γ end_OVERACCENT → end_ARROW ( 1 + 1 ) ⊗ roman_Γ ≅ roman_Γ + roman_Γ start_ARROW start_OVERACCENT [ italic_t , italic_u ] end_OVERACCENT → end_ARROW italic_B

In this way, given interpretations of bernoulli and knight, we can interpret the programs from Examples 1.1 and 1.2.

Definition 2.3.

Let 𝔾𝔾\mathbb{G}blackboard_G be a semicartesian category. A graded distributive Markov category 𝐂𝐂\mathbf{C}bold_C is given by

  • a distributive Markov category 𝐂Isubscript𝐂𝐼\mathbf{C}_{I}bold_C start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT, but moreover,

  • for each pair of objects and each grade a𝔾𝑎𝔾a\in\mathbb{G}italic_a ∈ blackboard_G a set 𝐂a(X,Y)subscript𝐂𝑎𝑋𝑌\mathbf{C}_{a}(X,Y)bold_C start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X , italic_Y ) of morphisms, agreeing with 𝐂Isubscript𝐂𝐼\mathbf{C}_{I}bold_C start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT when a=I𝑎𝐼a=Iitalic_a = italic_I;

  • for each morphism f:ba:𝑓𝑏𝑎f:b\to aitalic_f : italic_b → italic_a, a function 𝐂a(X,Y)𝐂b(X,Y)subscript𝐂𝑎𝑋𝑌subscript𝐂𝑏𝑋𝑌\mathbf{C}_{a}(X,Y)\to\mathbf{C}_{b}(X,Y)bold_C start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X , italic_Y ) → bold_C start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_X , italic_Y );

  • a family of maps :𝐂a(X,Y)×𝐂b(Y,Z)𝐂ab(X,Z)\circ:\mathbf{C}_{a}(X,Y)\times\mathbf{C}_{b}(Y,Z)\to\mathbf{C}_{a\otimes b}(X% ,Z)∘ : bold_C start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X , italic_Y ) × bold_C start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_Y , italic_Z ) → bold_C start_POSTSUBSCRIPT italic_a ⊗ italic_b end_POSTSUBSCRIPT ( italic_X , italic_Z );

  • a family :𝐂a(X,X)×𝐂b(Y,Y)𝐂ab(XY,XY)\otimes:\mathbf{C}_{a}(X,X^{\prime})\times\mathbf{C}_{b}(Y,Y^{\prime})\to% \mathbf{C}_{a\otimes b}(X\otimes Y,X^{\prime}\otimes Y^{\prime})⊗ : bold_C start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X , italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) × bold_C start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_Y , italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) → bold_C start_POSTSUBSCRIPT italic_a ⊗ italic_b end_POSTSUBSCRIPT ( italic_X ⊗ italic_Y , italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊗ italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )

all such that composition is natural and associative up to the associativity of 𝔾𝔾\mathbb{G}blackboard_G (see e.g. [87, §1.2][55], [27, App. B]), monoidal product of morphisms is also natural and has associators and symmetric braidings up-to the structure of 𝔾𝔾\mathbb{G}blackboard_G, and such that the induced function 𝐂a(X+Y,Z)𝐂a(X,Z)×𝐂a(Y,Z)subscript𝐂𝑎𝑋𝑌𝑍subscript𝐂𝑎𝑋𝑍subscript𝐂𝑎𝑌𝑍\mathbf{C}_{a}(X+Y,Z)\to\mathbf{C}_{a}(X,Z)\times\mathbf{C}_{a}(Y,Z)bold_C start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X + italic_Y , italic_Z ) → bold_C start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X , italic_Z ) × bold_C start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_Y , italic_Z ) is a bijection (e.g. [87, p. 36]).

See Proposition 3.3 for our example of a graded Markov category. We note that since 𝔾𝔾\mathbb{G}blackboard_G is semicartesian, there are canonical projections abatensor-product𝑎𝑏𝑎a\otimes b\to aitalic_a ⊗ italic_b → italic_a, and so we can regard any morphism at grade a𝑎aitalic_a as a morphism at grade (ab)tensor-product𝑎𝑏(a\otimes b)( italic_a ⊗ italic_b ).

2.2 Monads and graded relative affine monads

It is well-established that notions of computation can be modelled by monads [67], including probability and non-determinism [40, 65, 64]. In this section, we introduce the flavours of monads relevant to this work and establish their correspondence to Markov categories. We present them in the Kleisli triple setting because this is more conducive to their use in programming languages.

Definition 2.4.

A strong monad over a cartesian closed category 𝐂𝐂\mathbf{C}bold_C is for each X𝐂𝑋𝐂X\in\mathbf{C}italic_X ∈ bold_C an object T(X)𝐂𝑇𝑋𝐂T(X)\in\mathbf{C}italic_T ( italic_X ) ∈ bold_C and a morphism

ηX:I𝐂[X,T(X)],:subscript𝜂𝑋subscript𝐼𝐂𝑋𝑇𝑋\eta_{X}:I_{\mathbf{C}}\to[X,T(X)],italic_η start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT : italic_I start_POSTSUBSCRIPT bold_C end_POSTSUBSCRIPT → [ italic_X , italic_T ( italic_X ) ] ,

and a family of morphisms

():[X,T(Y)][T(X),T(Y)]:superscript𝑋𝑇𝑌𝑇𝑋𝑇𝑌(-)^{*}:{[X,T(Y)]}\to{[T(X),T(Y)]}( - ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT : [ italic_X , italic_T ( italic_Y ) ] → [ italic_T ( italic_X ) , italic_T ( italic_Y ) ]

such that for generalised elements f𝑓fitalic_f and g𝑔gitalic_g of [X,T(Y)]𝑋𝑇𝑌[X,T(Y)][ italic_X , italic_T ( italic_Y ) ] and [Y,T(Z)]𝑌𝑇𝑍[Y,T(Z)][ italic_Y , italic_T ( italic_Z ) ], the following equations hold:

f𝑓\displaystyle fitalic_f =fηXabsentsuperscript𝑓subscript𝜂𝑋\displaystyle=f^{*}\circ\eta_{X}= italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∘ italic_η start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT (3)
idT(X)𝑖subscript𝑑𝑇𝑋\displaystyle id_{T(X)}italic_i italic_d start_POSTSUBSCRIPT italic_T ( italic_X ) end_POSTSUBSCRIPT =(ηX)absentsuperscriptsubscript𝜂𝑋\displaystyle=(\eta_{X})^{*}= ( italic_η start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
gfsuperscript𝑔superscript𝑓\displaystyle g^{*}\circ f^{*}italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∘ italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =(gf).absentsuperscriptsuperscript𝑔𝑓\displaystyle=(g^{*}\circ f)^{*}.= ( italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∘ italic_f ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT .

The left-strength t:X×T(Y)T(X×Y):𝑡𝑋𝑇𝑌𝑇𝑋𝑌t:X\times T(Y)\to T(X\times Y)italic_t : italic_X × italic_T ( italic_Y ) → italic_T ( italic_X × italic_Y ) is induced by the canonical action (ηY):[X,Y][T(X),T(Y)](\eta_{Y}\circ{-})^{*}:[X,Y]\to[T(X),T(Y)]( italic_η start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ∘ - ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT : [ italic_X , italic_Y ] → [ italic_T ( italic_X ) , italic_T ( italic_Y ) ] [47].

The monad is commutative if the following diagram commutes, where t^^𝑡\hat{t}over^ start_ARG italic_t end_ARG is the induced right strength from the symmetry of the cartesian product.

T(X)×T(Y)𝑇𝑋𝑇𝑌T(X)\times T(Y)italic_T ( italic_X ) × italic_T ( italic_Y )T(X×T(Y))𝑇𝑋𝑇𝑌T(X\times T(Y))italic_T ( italic_X × italic_T ( italic_Y ) )T(X×Y)𝑇𝑋𝑌T(X\times Y)italic_T ( italic_X × italic_Y )T(T(X)×Y)𝑇𝑇𝑋𝑌T(T(X)\times Y)italic_T ( italic_T ( italic_X ) × italic_Y )t^^𝑡\scriptstyle\hat{t}over^ start_ARG italic_t end_ARGtsuperscript𝑡\scriptstyle t^{*}italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPTt^superscript^𝑡\scriptstyle\hat{t}^{*}over^ start_ARG italic_t end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPTt𝑡\scriptstyle titalic_t

It is affine if the unique map T(1)1𝑇11T(1)\to 1italic_T ( 1 ) → 1 is an isomorphism. A typical example is 𝐂=𝐒𝐞𝐭𝐂𝐒𝐞𝐭\mathbf{C}=\mathbf{Set}bold_C = bold_Set, and T=D𝑇𝐷T=Ditalic_T = italic_D is the finite probability distribution monad [37].

Let 𝒥𝒥\mathcal{J}caligraphic_J be a category with finite products and consider a finite product preserving functor J:𝒥𝐂:𝐽𝒥𝐂J:\mathcal{J}\to\mathbf{C}italic_J : caligraphic_J → bold_C. A relative strong monad T𝑇Titalic_T on J𝐽Jitalic_J is a functor T:𝒥𝐂:𝑇𝒥𝐂T:\mathcal{J}\to\mathbf{C}italic_T : caligraphic_J → bold_C, along with a J𝐽Jitalic_J-relative unit

ηX:I[J(X),T(X)]:subscript𝜂𝑋𝐼𝐽𝑋𝑇𝑋\eta_{X}:I\to[J(X),T(X)]italic_η start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT : italic_I → [ italic_J ( italic_X ) , italic_T ( italic_X ) ]

natural in X𝒥𝑋𝒥X\in\mathcal{J}italic_X ∈ caligraphic_J, and a family of J𝐽Jitalic_J-relative Kleisli extensions

():[J(X),T(Y)][T(X),T(Y)]:superscript𝐽𝑋𝑇𝑌𝑇𝑋𝑇𝑌({-})^{*}:[J(X),T(Y)]\to[T(X),T(Y)]( - ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT : [ italic_J ( italic_X ) , italic_T ( italic_Y ) ] → [ italic_T ( italic_X ) , italic_T ( italic_Y ) ]

natural in X,Y𝒥𝑋𝑌𝒥X,Y\in\mathcal{J}italic_X , italic_Y ∈ caligraphic_J, and such that (3) holds for f𝑓fitalic_f and g𝑔gitalic_g generalised elements of [J(X),T(Y)]𝐽𝑋𝑇𝑌[J(X),T(Y)][ italic_J ( italic_X ) , italic_T ( italic_Y ) ] and [J(Y),T(Z)]𝐽𝑌𝑇𝑍[J(Y),T(Z)][ italic_J ( italic_Y ) , italic_T ( italic_Z ) ]. A typical example is 𝐂=𝐒𝐞𝐭𝐂𝐒𝐞𝐭\mathbf{C}=\mathbf{Set}bold_C = bold_Set, and 𝒥=𝐅𝐢𝐧𝐒𝐞𝐭𝒥𝐅𝐢𝐧𝐒𝐞𝐭\mathcal{J}=\mathbf{FinSet}caligraphic_J = bold_FinSet, with J𝐽Jitalic_J the evident embedding, and T=DJ𝑇𝐷𝐽T=DJitalic_T = italic_D italic_J.

Let (𝔾,,I)𝔾tensor-product𝐼(\mathbb{G},\otimes,I)( blackboard_G , ⊗ , italic_I ) be a monoidal category. A graded strong monad is a functor T:𝔾[𝐂,𝐂]:𝑇𝔾𝐂𝐂T:\mathbb{G}\to[\mathbf{C},\mathbf{C}]italic_T : blackboard_G → [ bold_C , bold_C ], with unit

ηX:I[X,TI(X)]:subscript𝜂𝑋𝐼𝑋subscript𝑇𝐼𝑋\eta_{X}:I\to[X,T_{I}(X)]italic_η start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT : italic_I → [ italic_X , italic_T start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ( italic_X ) ]

natural in X𝒥𝑋𝒥X\in\mathcal{J}italic_X ∈ caligraphic_J, and a family of Kleisli extensions

()a,b:[X,Tb(Y)][Ta(X),Tab(Y)]:subscriptsuperscript𝑎𝑏𝑋subscript𝑇𝑏𝑌subscript𝑇𝑎𝑋subscript𝑇tensor-product𝑎𝑏𝑌({-})^{*}_{a,b}:[X,T_{b}(Y)]\to[T_{a}(X),T_{a\otimes b}(Y)]( - ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a , italic_b end_POSTSUBSCRIPT : [ italic_X , italic_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_Y ) ] → [ italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X ) , italic_T start_POSTSUBSCRIPT italic_a ⊗ italic_b end_POSTSUBSCRIPT ( italic_Y ) ]

natural in X,Y𝒥𝑋𝑌𝒥X,Y\in\mathcal{J}italic_X , italic_Y ∈ caligraphic_J, and such that for f𝑓fitalic_f and g𝑔gitalic_g generalised elements of [X,Ta(Y)]𝑋subscript𝑇𝑎𝑌[X,T_{a}(Y)][ italic_X , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_Y ) ] and [Y,Tb(Z)]𝑌subscript𝑇𝑏𝑍[Y,T_{b}(Z)][ italic_Y , italic_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_Z ) ], the following equations hold:

f𝑓\displaystyle fitalic_f =Tλ(f)I,aηX,absentsubscript𝑇𝜆subscriptsuperscript𝑓𝐼𝑎subscript𝜂𝑋\displaystyle=T_{\lambda}\circ(f)^{*}_{I,a}\circ\eta_{X},= italic_T start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ∘ ( italic_f ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_a end_POSTSUBSCRIPT ∘ italic_η start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , (4)
idTb(X)𝑖subscript𝑑subscript𝑇𝑏𝑋\displaystyle id_{T_{b}(X)}italic_i italic_d start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_X ) end_POSTSUBSCRIPT =Tρ(ηX)b,I,absentsubscript𝑇𝜌subscriptsuperscriptsubscript𝜂𝑋𝑏𝐼\displaystyle=T_{\rho}\circ(\eta_{X})^{*}_{b,I},= italic_T start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ∘ ( italic_η start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b , italic_I end_POSTSUBSCRIPT ,
(g)ca,b(f)c,asubscriptsuperscript𝑔tensor-product𝑐𝑎𝑏subscriptsuperscript𝑓𝑐𝑎\displaystyle(g)^{*}_{c\otimes a,b}\circ(f)^{*}_{c,a}( italic_g ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c ⊗ italic_a , italic_b end_POSTSUBSCRIPT ∘ ( italic_f ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_a end_POSTSUBSCRIPT =Tα((g)a,bf)c,ab,absentsubscript𝑇𝛼subscriptsuperscriptsubscriptsuperscript𝑔𝑎𝑏𝑓𝑐tensor-product𝑎𝑏\displaystyle=T_{\alpha}\circ((g)^{*}_{a,b}\circ f)^{*}_{c,a\otimes b},= italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ∘ ( ( italic_g ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a , italic_b end_POSTSUBSCRIPT ∘ italic_f ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_a ⊗ italic_b end_POSTSUBSCRIPT ,

where λ:Iaa:𝜆tensor-product𝐼𝑎𝑎\lambda:I\otimes a\to aitalic_λ : italic_I ⊗ italic_a → italic_a, ρ:bIb:𝜌tensor-product𝑏𝐼𝑏\rho:b\otimes I\to bitalic_ρ : italic_b ⊗ italic_I → italic_b, and α:c(ab)(ca)b:𝛼tensor-product𝑐tensor-product𝑎𝑏tensor-producttensor-product𝑐𝑎𝑏\alpha:c\otimes(a\otimes b)\to(c\otimes a)\otimes bitalic_α : italic_c ⊗ ( italic_a ⊗ italic_b ) → ( italic_c ⊗ italic_a ) ⊗ italic_b are the left unitor, right unitor, and associator of 𝔾𝔾\mathbb{G}blackboard_G, respectively.

A graded relative strong monad T𝑇Titalic_T on J𝐽Jitalic_J is a functor T:𝔾[𝒥,𝐂]:𝑇𝔾𝒥𝐂T:\mathbb{G}\to[\mathcal{J},\mathbf{C}]italic_T : blackboard_G → [ caligraphic_J , bold_C ], along with a J𝐽Jitalic_J-relative unit

ηX:I𝐂[J(X),TI(X)]:subscript𝜂𝑋subscript𝐼𝐂𝐽𝑋subscript𝑇𝐼𝑋\eta_{X}:I_{\mathbf{C}}\to[J(X),T_{I}(X)]italic_η start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT : italic_I start_POSTSUBSCRIPT bold_C end_POSTSUBSCRIPT → [ italic_J ( italic_X ) , italic_T start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ( italic_X ) ]

natural in X𝒥𝑋𝒥X\in\mathcal{J}italic_X ∈ caligraphic_J, and a family of J𝐽Jitalic_J-relative Kleisli extensions

()a,b:[J(X),Tb(Y)][Ta(X),Tab(Y)]:subscriptsuperscript𝑎𝑏𝐽𝑋subscript𝑇𝑏𝑌subscript𝑇𝑎𝑋subscript𝑇tensor-product𝑎𝑏𝑌({-})^{*}_{a,b}:[J(X),T_{b}(Y)]\to[T_{a}(X),T_{a\otimes b}(Y)]( - ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a , italic_b end_POSTSUBSCRIPT : [ italic_J ( italic_X ) , italic_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_Y ) ] → [ italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X ) , italic_T start_POSTSUBSCRIPT italic_a ⊗ italic_b end_POSTSUBSCRIPT ( italic_Y ) ]

natural in X,Y𝒥𝑋𝑌𝒥X,Y\in\mathcal{J}italic_X , italic_Y ∈ caligraphic_J, and such that (4) holds for f𝑓fitalic_f and g𝑔gitalic_g generalised elements of [J(X),Ta(Y)]𝐽𝑋subscript𝑇𝑎𝑌[J(X),T_{a}(Y)][ italic_J ( italic_X ) , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_Y ) ] and [J(Y),Tb(Z)]𝐽𝑌subscript𝑇𝑏𝑍[J(Y),T_{b}(Z)][ italic_J ( italic_Y ) , italic_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_Z ) ]. The graded left-strength tasubscript𝑡𝑎t_{a}italic_t start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is induced by the action Tρ(ηY)a,I:[X,Y][Ta(X),Ta(Y)]T_{\rho}\circ(\eta_{Y}\circ{-})^{*}_{a,I}:[X,Y]\to[T_{a}(X),T_{a}(Y)]italic_T start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ∘ ( italic_η start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ∘ - ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a , italic_I end_POSTSUBSCRIPT : [ italic_X , italic_Y ] → [ italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X ) , italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_Y ) ]. The monad is commutative if 𝔾𝔾\mathbb{G}blackboard_G is symmetric monoidal and the following diagram commutes, where σ:abba:𝜎tensor-product𝑎𝑏tensor-product𝑏𝑎\sigma:a\otimes b\to b\otimes aitalic_σ : italic_a ⊗ italic_b → italic_b ⊗ italic_a is the symmetric coherence isomorphism of 𝔾𝔾\mathbb{G}blackboard_G.

Ta(X)×Tb(Y)subscript𝑇𝑎𝑋subscript𝑇𝑏𝑌T_{a}(X)\times T_{b}(Y)italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X ) × italic_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_Y )Ta(X×Tb(Y))subscript𝑇𝑎𝑋subscript𝑇𝑏𝑌T_{a}(X\times T_{b}(Y))italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X × italic_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_Y ) )Tab(X×Y)subscript𝑇tensor-product𝑎𝑏𝑋𝑌T_{a\otimes b}(X\times Y)italic_T start_POSTSUBSCRIPT italic_a ⊗ italic_b end_POSTSUBSCRIPT ( italic_X × italic_Y )Tba(X×Y)subscript𝑇tensor-product𝑏𝑎𝑋𝑌T_{b\otimes a}(X\times Y)italic_T start_POSTSUBSCRIPT italic_b ⊗ italic_a end_POSTSUBSCRIPT ( italic_X × italic_Y )Tb(Ta(X)×Y)subscript𝑇𝑏subscript𝑇𝑎𝑋𝑌T_{b}(T_{a}(X)\times Y)italic_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X ) × italic_Y )t^asubscript^𝑡𝑎\scriptstyle\hat{t}_{a}over^ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT(tb)a,bsubscriptsuperscriptsubscript𝑡𝑏𝑎𝑏\scriptstyle(t_{b})^{*}_{a,b}( italic_t start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a , italic_b end_POSTSUBSCRIPTTσsubscript𝑇𝜎\scriptstyle T_{\sigma}italic_T start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT(t^a)b,asubscriptsuperscriptsubscript^𝑡𝑎𝑏𝑎\scriptstyle(\hat{t}_{a})^{*}_{b,a}( over^ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b , italic_a end_POSTSUBSCRIPTtbsubscript𝑡𝑏\scriptstyle t_{b}italic_t start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT

It is affine if the unique map TI(1)1subscript𝑇𝐼11T_{I}(1)\to 1italic_T start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ( 1 ) → 1 is an isomorphism.

Proposition 2.5.

If T𝑇Titalic_T is a graded commutative affine relative monad on a distributive category, then its Kleisli category (e.g. [51, 26]) is a graded Markov category:

  • The objects are the same as 𝒥𝒥\mathcal{J}caligraphic_J;

  • The morphisms in Kl(T)a(X,Y)Klsubscript𝑇𝑎𝑋𝑌\mathrm{Kl}(T)_{a}(X,Y)roman_Kl ( italic_T ) start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X , italic_Y ) are the morphisms J(X)Ta(X)𝐽𝑋subscript𝑇𝑎𝑋J(X)\to T_{a}(X)italic_J ( italic_X ) → italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X ) in 𝐂𝐂\mathbf{C}bold_C;

  • Composition is via the Kleisli extension.

Proposition 2.6.

Any graded Markov category induces a graded commutative affine relative monad, by

  • 𝒥=𝐂I,det𝒥subscript𝐂𝐼det\mathcal{J}=\mathbf{C}_{I,\mathrm{det}}caligraphic_J = bold_C start_POSTSUBSCRIPT italic_I , roman_det end_POSTSUBSCRIPT, the distributive category of I𝐼Iitalic_I-graded deterministic maps

  • The underlying category is FP(𝒥op,𝐒𝐞𝐭)FPsuperscript𝒥op𝐒𝐞𝐭\mathrm{FP}(\mathcal{J}^{\mathrm{op}},\mathbf{Set})roman_FP ( caligraphic_J start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT , bold_Set ), the finite product-preserving contravariant presheaves on 𝒥𝒥\mathcal{J}caligraphic_J;

  • J:𝒥FP(𝒥op,𝐒𝐞𝐭):𝐽𝒥FPsuperscript𝒥op𝐒𝐞𝐭J:\mathcal{J}\to\mathrm{FP}(\mathcal{J}^{\mathrm{op}},\mathbf{Set})italic_J : caligraphic_J → roman_FP ( caligraphic_J start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT , bold_Set ) is the Yoneda embedding;

  • T:𝔾[𝒥,FP(𝒥op,𝐒𝐞𝐭)]:𝑇𝔾𝒥FPsuperscript𝒥op𝐒𝐞𝐭T:\mathbb{G}\to[\mathcal{J},\mathrm{FP}(\mathcal{J}^{\mathrm{op}},\mathbf{Set})]italic_T : blackboard_G → [ caligraphic_J , roman_FP ( caligraphic_J start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT , bold_Set ) ] is given by

    Ta(Y)(X)=𝐂a(X,Y)subscript𝑇𝑎𝑌𝑋subscript𝐂𝑎𝑋𝑌T_{a}(Y)(X)=\mathbf{C}_{a}(X,Y)italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_Y ) ( italic_X ) = bold_C start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X , italic_Y )
Proof.

Note. In both cases, the proof amounts to expanding the definitions. The constructions are similar to [73, §7]. See also [1, Prop. 13] for the non-graded case.∎

(We conjecture that Propositions 2.52.6 are part of a biequivalence between graded distributive Markov categories and commutative affine graded relative monads. We do not pursue this here because we will not need the generality of the biequivalence in what follows.)

2.3 Connection with enriched categories

To show that the concepts in this section are canonical, we connect with the theory of enriched categories. Let 𝒱𝒱\mathcal{V}caligraphic_V be a symmetric monoidal closed category with limits and colimits, that is moreover semicartesian. Recall (e.g. [45]) that a 𝒱𝒱\mathcal{V}caligraphic_V-enriched category 𝐂𝐂\mathbf{C}bold_C is given by a collection of objects, and for each pair of objects X,Y𝑋𝑌X,Yitalic_X , italic_Y of 𝐂𝐂\mathbf{C}bold_C, a ‘hom-object’ 𝐂(X,Y)𝐂𝑋𝑌\mathbf{C}(X,Y)bold_C ( italic_X , italic_Y ) in 𝒱𝒱\mathcal{V}caligraphic_V. Composition is a morphism 𝐂(X,Y)𝐂(Y,Z)𝐂(X,Z)tensor-product𝐂𝑋𝑌𝐂𝑌𝑍𝐂𝑋𝑍\mathbf{C}(X,Y)\otimes\mathbf{C}(Y,Z)\to\mathbf{C}(X,Z)bold_C ( italic_X , italic_Y ) ⊗ bold_C ( italic_Y , italic_Z ) → bold_C ( italic_X , italic_Z ) in 𝒱𝒱\mathcal{V}caligraphic_V. We can also define 𝒱𝒱\mathcal{V}caligraphic_V-enriched monoidal categories, by requiring the functor :𝐂×𝐂𝐂\otimes:\mathbf{C}\times\mathbf{C}\to\mathbf{C}⊗ : bold_C × bold_C → bold_C to be a monoidal category. And 𝒱𝒱\mathcal{V}caligraphic_V-enriched coproducts require a natural isomorphism

𝐂(X1++Xn,Y)𝐂(X1,Y)××𝐂(Xn,Y)𝐂subscript𝑋1subscript𝑋𝑛𝑌𝐂subscript𝑋1𝑌𝐂subscript𝑋𝑛𝑌\mathbf{C}(X_{1}+\dots+X_{n},Y)\cong\mathbf{C}(X_{1},Y)\times\dots\times% \mathbf{C}(X_{n},Y)bold_C ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_Y ) ≅ bold_C ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Y ) × ⋯ × bold_C ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_Y )

between objects of 𝒱𝒱\mathcal{V}caligraphic_V. Any enriched category has an underlying ordinary category 𝐂0subscript𝐂0\mathbf{C}_{0}bold_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, which has the same objects but with a hom-set given by 𝐂0(X,Y)=𝒱(I,𝐂(X,Y))subscript𝐂0𝑋𝑌𝒱𝐼𝐂𝑋𝑌\mathbf{C}_{0}(X,Y)=\mathcal{V}(I,\mathbf{C}(X,Y))bold_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_X , italic_Y ) = caligraphic_V ( italic_I , bold_C ( italic_X , italic_Y ) ). This ordinary category inherits monoidal, limit and colimit structure from 𝐂𝐂\mathbf{C}bold_C.

Definition 2.7 (e.g. [69]).

A 𝒱𝒱\mathcal{V}caligraphic_V-enriched Markov category is a 𝒱𝒱\mathcal{V}caligraphic_V-enriched symmetric monoidal category such that the monoidal unit is terminal: 𝐂(A,I)1𝐂𝐴𝐼1\mathbf{C}(A,I)\cong 1bold_C ( italic_A , italic_I ) ≅ 1, and such that the underlying symmetric monoidal category is equipped with the structure of a Markov category (i.e. a comonoid structure in the underlying ordinary category).

A 𝒱𝒱\mathcal{V}caligraphic_V-enriched Markov category is moreover distributive if it has 𝒱𝒱\mathcal{V}caligraphic_V-coproducts that distribute over the monoidal structure, and such that the coproduct injections are deterministic, in the sense of the underlying ordinary category.

For any semicartesian category 𝔾𝔾\mathbb{G}blackboard_G, recall the category of functors [𝔾op,𝐒𝐞𝐭]superscript𝔾op𝐒𝐞𝐭[\mathbb{G}^{\mathrm{op}},\mathbf{Set}][ blackboard_G start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT , bold_Set ]. This extends 𝔾𝔾\mathbb{G}blackboard_G to a good ‘cosmos’ for enrichment since

  • [𝔾op,𝐒𝐞𝐭]superscript𝔾op𝐒𝐞𝐭[\mathbb{G}^{\mathrm{op}},\mathbf{Set}][ blackboard_G start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT , bold_Set ] embeds 𝔾𝔾\mathbb{G}blackboard_G fully and faithfully (i.e. essentially as a full subcategory), via the Yoneda embedding y(a)=𝔾(,a)𝑦𝑎𝔾𝑎y(a)=\mathbb{G}(-,a)italic_y ( italic_a ) = blackboard_G ( - , italic_a ).

  • [𝔾op,𝐒𝐞𝐭]superscript𝔾op𝐒𝐞𝐭[\mathbb{G}^{\mathrm{op}},\mathbf{Set}][ blackboard_G start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT , bold_Set ] has all limits and colimits, computed pointwise.

  • [𝔾op,𝐒𝐞𝐭]superscript𝔾op𝐒𝐞𝐭[\mathbb{G}^{\mathrm{op}},\mathbf{Set}][ blackboard_G start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT , bold_Set ] has a semicartesian structure such that the Yoneda embedding is a symmetric monoidal functor. This is given by Day convolution [21], and has the following universal property: for functors F,G,H[𝔾op,𝐒𝐞𝐭]𝐹𝐺𝐻superscript𝔾op𝐒𝐞𝐭F,G,H\in[\mathbb{G}^{\mathrm{op}},\mathbf{Set}]italic_F , italic_G , italic_H ∈ [ blackboard_G start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT , bold_Set ], to give a natural transformation FGHtensor-product𝐹𝐺𝐻F\otimes G\to Hitalic_F ⊗ italic_G → italic_H is to give a natural family of functions F(a)×G(b)H(ab)𝐹𝑎𝐺𝑏𝐻tensor-product𝑎𝑏{F(a)\times G(b)\to H(a\otimes b)}italic_F ( italic_a ) × italic_G ( italic_b ) → italic_H ( italic_a ⊗ italic_b ).

  • [𝔾op,𝐒𝐞𝐭]superscript𝔾op𝐒𝐞𝐭[\mathbb{G}^{\mathrm{op}},\mathbf{Set}][ blackboard_G start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT , bold_Set ] is moreover monoidal closed.

Proposition 2.8.

To give an [𝔾op,𝐒𝐞𝐭]superscript𝔾op𝐒𝐞𝐭[\mathbb{G}^{\mathrm{op}},\mathbf{Set}][ blackboard_G start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT , bold_Set ]-enriched distributive Markov category is to give a 𝔾𝔾\mathbb{G}blackboard_G-graded distributive Markov category.

Notes.

This follows from the characterization of locally 𝔾𝔾\mathbb{G}blackboard_G-graded categories as [𝔾op,𝐒𝐞𝐭]superscript𝔾op𝐒𝐞𝐭[\mathbb{G}^{\mathrm{op}},\mathbf{Set}][ blackboard_G start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT , bold_Set ]-enriched categories (e.g. [87, 55, 27]), and then translating Definition 2.7 across this correspondence to arrive at Definition 2.3. ∎

The correspondence between graded monads and enriched monads is also well understood (e.g. [60]).

2.4 Discussion

Recent work by Perrone [69] has considered enriched Markov categories to obtain an abstract view of the distance between probabilities, which allows for an abstract development of entropy. We note that the enriching category 𝒱=𝖣𝗂𝗏𝒱𝖣𝗂𝗏\mathcal{V}=\mathsf{Div}caligraphic_V = sansserif_Div in [69] is indeed semicartesian. The full theory of enriched Markov categories perhaps deserves a more thorough analysis.

3 A graded Markov category for probability and non-determinism

We recall ordinary Markov categories for finite probability (§3.1). We then consider a generic construction for graded Markov categories, and instantiate it in our setting, obtaining the graded Markov category 𝐈𝐦𝐏𝐈𝐦𝐏\mathbf{ImP}bold_ImP (for ‘Imprecise Probability’, §3.2). We conclude this section with a worked example (§3.3). In the subsequent sections (§45) we relate this graded Markov category with convex sets of distributions.

3.1 Ordinary Markov categories for probability

Definition 3.1.

A probability vector pn𝑝superscript𝑛p\in\mathbb{R}^{n}italic_p ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is a sequence of non-negative numbers that sum to 1111. We write D(n)𝐷𝑛D(n)italic_D ( italic_n ) for the set of probability vectors of length n𝑛nitalic_n.

The set D(n)𝐷𝑛D(n)italic_D ( italic_n ) is always a convex set: for any r[0,1]𝑟01r\in[0,1]italic_r ∈ [ 0 , 1 ] and p,qD(n)𝑝𝑞𝐷𝑛p,q\in D(n)italic_p , italic_q ∈ italic_D ( italic_n ), the convex combination rp+(1r)q𝑟𝑝1𝑟𝑞r\cdot p+(1-r)\cdot qitalic_r ⋅ italic_p + ( 1 - italic_r ) ⋅ italic_q is again a probability vector in D(n)𝐷𝑛D(n)italic_D ( italic_n ). We write p+rqsubscript𝑟𝑝𝑞p+_{r}qitalic_p + start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_q as shorthand for such a convex combination. Every probability vector in D(n)𝐷𝑛D(n)italic_D ( italic_n ) arises via convex combinations of the Dirac vectors δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, for i=1n𝑖1𝑛i=1\dots nitalic_i = 1 … italic_n, where δ1=(1,0,0,0)subscript𝛿11000\delta_{1}=(1,0,0,0\dots)italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( 1 , 0 , 0 , 0 … ), δ2=(0,1,0,0)subscript𝛿20100\delta_{2}=(0,1,0,0\dots)italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ( 0 , 1 , 0 , 0 … ) and so on.

A matrix of real numbers fn×m𝑓superscript𝑛𝑚f\in\mathbb{R}^{n\times m}italic_f ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_m end_POSTSUPERSCRIPT is called stochastic if each column is a probability vector. This is equivalent to requiring that as a linear map, it preserves the property of being a probability vector, i.e. if pD(n)𝑝𝐷𝑛p\in D(n)italic_p ∈ italic_D ( italic_n ) then (fp)D(m)𝑓𝑝𝐷𝑚(f\,p)\in D(m)( italic_f italic_p ) ∈ italic_D ( italic_m ). In fact, every function D(n)D(m)𝐷𝑛𝐷𝑚D(n)\to D(m)italic_D ( italic_n ) → italic_D ( italic_m ) that preserves convex structure arises from a stochastic matrix in this way. We call such a function a convex map.

Definition 3.2 (e.g. [23], Ex. 2.5).

The category 𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡\mathbf{FinStoch}bold_FinStoch of finite stochastic matrices has as objects natural numbers, and morphisms mn𝑚𝑛m\to nitalic_m → italic_n stochastic matrices in n×msuperscript𝑛𝑚\mathbb{R}^{n\times m}blackboard_R start_POSTSUPERSCRIPT italic_n × italic_m end_POSTSUPERSCRIPT. Composition is matrix multiplication, and the identity morphism is the unit diagonal matrix.

This can be made into a symmetric monoidal category, with monoidal structure on objects given by multiplication of numbers, and on morphisms by the Kronecker product of matrices. It is semicartesian where the terminal object is 1111, and there is a unique stochastic matrix with one row. This is moreover a Markov category, with 𝖼𝗈𝗉𝗒n:nnn:subscript𝖼𝗈𝗉𝗒𝑛𝑛tensor-product𝑛𝑛\mathsf{copy}_{n}:n\to n\otimes nsansserif_copy start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : italic_n → italic_n ⊗ italic_n given by the three-dimensional diagonal (in (n×n)×nsuperscript𝑛𝑛𝑛\mathbb{R}^{(n\times n)\times n}blackboard_R start_POSTSUPERSCRIPT ( italic_n × italic_n ) × italic_n end_POSTSUPERSCRIPT).

The Markov category 𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡\mathbf{FinStoch}bold_FinStoch moreover has a distributive structure. The coproduct of objects is given by addition, and the coproduct of morphisms by block matrices (concatenating the columns).

The monad view on 𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡\mathbf{FinStoch}bold_FinStoch is as follows. First, we consider the embedding J:𝐅𝐢𝐧𝐒𝐞𝐭𝐒𝐞𝐭:𝐽𝐅𝐢𝐧𝐒𝐞𝐭𝐒𝐞𝐭{J:\mathbf{FinSet}\to\mathbf{Set}}italic_J : bold_FinSet → bold_Set. We then regard D𝐷Ditalic_D (Def. 3.1) as a J𝐽Jitalic_J-relative monad D:𝐅𝐢𝐧𝐒𝐞𝐭𝐒𝐞𝐭:𝐷𝐅𝐢𝐧𝐒𝐞𝐭𝐒𝐞𝐭D\colon\mathbf{FinSet}\to\mathbf{Set}italic_D : bold_FinSet → bold_Set, which is affine and commutative. In fact there is an ordinary monad Dsuperscript𝐷D^{\prime}italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT on 𝐒𝐞𝐭𝐒𝐞𝐭\mathbf{Set}bold_Set, comprising finitely supported probability distributions (e.g. [37, Ch. 2]), and D=DJ𝐷superscript𝐷𝐽D=D^{\prime}Jitalic_D = italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_J. The distributive Markov category 𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡\mathbf{FinStoch}bold_FinStoch can then be regarded as the Kleisli category for this relative monad.

3.2 The graded Markov category 𝐈𝐦𝐏𝐈𝐦𝐏\mathbf{ImP}bold_ImP

We first introduce a general construction for graded Markov categories. This is a variation on the Para construction [22], also called monoidal indeterminates [31]. Via the connections between Markov categories and commutative affine relative monads (§2.2), it is equivalently a graded version of the reader monad transformer [58].

Proposition 3.3.

Let 𝔾𝔾\mathbb{G}blackboard_G be a semicartesian subcategory of a distributive Markov category 𝐂𝐂\mathbf{C}bold_C. There is a graded distributive Markov category with the same objects as 𝐂𝐂\mathbf{C}bold_C and with the hom-sets

𝐂a(X,Y)=𝐂(aX,Y)(a𝔾).subscript𝐂𝑎𝑋𝑌𝐂tensor-product𝑎𝑋𝑌𝑎𝔾.\mathbf{C}_{a}(X,Y)=\mathbf{C}(a\otimes X,Y)\qquad(a\in\mathbb{G})\text{.}bold_C start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X , italic_Y ) = bold_C ( italic_a ⊗ italic_X , italic_Y ) ( italic_a ∈ blackboard_G ) .

The reindexing is given by composition: if f𝔾(b,a)𝑓𝔾𝑏𝑎f\in\mathbb{G}(b,a)italic_f ∈ blackboard_G ( italic_b , italic_a ) and g𝐂a(X,Y)𝑔subscript𝐂𝑎𝑋𝑌g\in\mathbf{C}_{a}(X,Y)italic_g ∈ bold_C start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X , italic_Y )

f(g)=g(fX)𝐂b(X,Y).superscript𝑓𝑔𝑔tensor-product𝑓𝑋subscript𝐂𝑏𝑋𝑌.f^{*}(g)=g\circ(f\otimes X)\in\mathbf{C}_{b}(X,Y)\text{.}italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_g ) = italic_g ∘ ( italic_f ⊗ italic_X ) ∈ bold_C start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_X , italic_Y ) .

For the composition of f𝐂a(X,Y)𝑓subscript𝐂𝑎𝑋𝑌f\in\mathbf{C}_{a}(X,Y)italic_f ∈ bold_C start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X , italic_Y ) and g𝐂b(Y,Z)𝑔subscript𝐂𝑏𝑌𝑍g\in\mathbf{C}_{b}(Y,Z)italic_g ∈ bold_C start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_Y , italic_Z ),

(gf)=(abXbaXbfbY𝑔Z)𝐂ab(X,Z).𝑔𝑓tensor-product𝑎𝑏𝑋tensor-product𝑏𝑎𝑋tensor-product𝑏𝑓tensor-product𝑏𝑌𝑔𝑍subscript𝐂tensor-product𝑎𝑏𝑋𝑍.(g\circ f)=\Big{(}a\otimes b\otimes X\cong b\otimes a\otimes X\xrightarrow{b% \otimes f}b\otimes Y\xrightarrow{g}Z\Big{)}\in\mathbf{C}_{a\otimes b}(X,Z)% \text{.}( italic_g ∘ italic_f ) = ( italic_a ⊗ italic_b ⊗ italic_X ≅ italic_b ⊗ italic_a ⊗ italic_X start_ARROW start_OVERACCENT italic_b ⊗ italic_f end_OVERACCENT → end_ARROW italic_b ⊗ italic_Y start_ARROW overitalic_g → end_ARROW italic_Z ) ∈ bold_C start_POSTSUBSCRIPT italic_a ⊗ italic_b end_POSTSUBSCRIPT ( italic_X , italic_Z ) .

The monoidal product of f𝐂a(X,X)𝑓subscript𝐂𝑎𝑋superscript𝑋f\in\mathbf{C}_{a}(X,X^{\prime})italic_f ∈ bold_C start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X , italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) and g𝐂b(Y,Y)𝑔subscript𝐂𝑏𝑌superscript𝑌g\in\mathbf{C}_{b}(Y,Y^{\prime})italic_g ∈ bold_C start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_Y , italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) is given by

(fg)=(abXYaXbYfgXY)𝐂ab(XY,XY).tensor-product𝑓𝑔tensor-product𝑎𝑏𝑋𝑌tensor-product𝑎𝑋𝑏𝑌tensor-product𝑓𝑔tensor-productsuperscript𝑋superscript𝑌subscript𝐂tensor-product𝑎𝑏tensor-product𝑋𝑌tensor-productsuperscript𝑋superscript𝑌.(f\otimes g)=\Big{(}a\otimes b\otimes X\otimes Y\cong a\otimes X\otimes b% \otimes Y\xrightarrow{f\otimes g}X^{\prime}\otimes Y^{\prime}\Big{)}\in\mathbf% {C}_{a\otimes b}(X\otimes Y,X^{\prime}\otimes Y^{\prime})\text{.}( italic_f ⊗ italic_g ) = ( italic_a ⊗ italic_b ⊗ italic_X ⊗ italic_Y ≅ italic_a ⊗ italic_X ⊗ italic_b ⊗ italic_Y start_ARROW start_OVERACCENT italic_f ⊗ italic_g end_OVERACCENT → end_ARROW italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊗ italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ bold_C start_POSTSUBSCRIPT italic_a ⊗ italic_b end_POSTSUBSCRIPT ( italic_X ⊗ italic_Y , italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊗ italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) .
Definition 3.4.

A stochastic matrix f𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡(m,n)𝑓𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡𝑚𝑛f\in\mathbf{FinStoch}(m,n)italic_f ∈ bold_FinStoch ( italic_m , italic_n ) is surjective if for every j[1,n]𝑗1𝑛j\in[1,n]italic_j ∈ [ 1 , italic_n ] there exists i[1,m]𝑖1𝑚i\in[1,m]italic_i ∈ [ 1 , italic_m ] such that fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the Dirac distribution at j𝑗jitalic_j. In other words, the induced function D(m)D(n)𝐷𝑚𝐷𝑛D(m)\to D(n)italic_D ( italic_m ) → italic_D ( italic_n ) is surjective. Let 𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjsubscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surj\mathbf{FinStoch}_{\mathrm{Surj}}bold_FinStoch start_POSTSUBSCRIPT roman_Surj end_POSTSUBSCRIPT be the category of natural numbers and surjective stochastic matrices. This is a semicartesian monoidal subcategory of 𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡\mathbf{FinStoch}bold_FinStoch.

Definition 3.5.

The graded distributive Markov category 𝐈𝐦𝐏𝐈𝐦𝐏\mathbf{ImP}bold_ImP is the 𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjsubscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surj\mathbf{FinStoch}_{\mathrm{Surj}}bold_FinStoch start_POSTSUBSCRIPT roman_Surj end_POSTSUBSCRIPT-graded version of 𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡\mathbf{FinStoch}bold_FinStoch, according to Proposition 3.3.

This graded distributive Markov category supports both finite probability and finite non-determinism.

  • For binary probabilistic choice with bias r𝑟ritalic_r, we consider the morphism in 𝐈𝐦𝐏1(1,2)subscript𝐈𝐦𝐏112\mathbf{ImP}_{1}(1,2)bold_ImP start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 , 2 ) given by the column vector (r1r)matrix𝑟1𝑟\begin{pmatrix}r\\ 1-r\end{pmatrix}( start_ARG start_ROW start_CELL italic_r end_CELL end_ROW start_ROW start_CELL 1 - italic_r end_CELL end_ROW end_ARG ).

  • For a binary non-deterministic (i.e. Knightian) choice, we consider the morphism in 𝐈𝐦𝐏2(1,2)subscript𝐈𝐦𝐏212\mathbf{ImP}_{2}(1,2)bold_ImP start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 , 2 ) given by the unit diagonal matrix.

We can extend the above notions of probabilistic and non-deterministic choice between elements of a finite set n𝑛nitalic_n by considering probability vectors (in 𝐈𝐦𝐏1(1,n)subscript𝐈𝐦𝐏11𝑛\mathbf{ImP}_{1}(1,n)bold_ImP start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 , italic_n )) and unit diagonal matrices (in 𝐈𝐦𝐏n(1,n)subscript𝐈𝐦𝐏𝑛1𝑛\mathbf{ImP}_{n}(1,n)bold_ImP start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( 1 , italic_n )) respectively.

Remark:

We could have considered a subcategory of 𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjsubscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surj\mathbf{FinStoch}_{\mathrm{Surj}}bold_FinStoch start_POSTSUBSCRIPT roman_Surj end_POSTSUBSCRIPT as the grading. One example is finite sets and (deterministic) surjective functions. Another example is the subcategory where the objects are of the form 2Asuperscript2𝐴2^{A}2 start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT and where we only consider the surjections 2B2Asuperscript2𝐵superscript2𝐴2^{B}\to 2^{A}2 start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT → 2 start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT induced by injections AB𝐴𝐵A\to Bitalic_A → italic_B (connecting with nominal sets [71] and the notation for grading considered in Section 1.2; in this case the semicartesian monoidal structure amounts to disjoint union, AB𝐴𝐵A\uplus Bitalic_A ⊎ italic_B.). We leave for future work the question of to what extent the following results depend on this particular choice of grading.

3.3 Example calculation with 𝐈𝐦𝐏𝐈𝐦𝐏\mathbf{ImP}bold_ImP

Example 3.6.

Consider the scenarios from Examples 1.1 and 1.2 where we draw boolean values with Knightian uncertainty and from fair Bernoulli trials and combine them using different program logic. We denote outcomes as probability vectors of length three, representing the chance of r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r, g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g, and b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b, respectively. Example 1.1 is the morphism

(g+h)f=(1000.500.5)𝐈𝐦𝐏2(1,3),𝑔𝑓matrix1000.500.5subscript𝐈𝐦𝐏213(g+h)\circ f=\begin{pmatrix}1&0\\ 0&0.5\\ 0&0.5\end{pmatrix}\in\mathbf{ImP}_{2}(1,3),( italic_g + italic_h ) ∘ italic_f = ( start_ARG start_ROW start_CELL 1 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0.5 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0.5 end_CELL end_ROW end_ARG ) ∈ bold_ImP start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 , 3 ) ,

where f𝑓fitalic_f denotes the conditional on the fair Bernoulli trial, and g𝑔gitalic_g and hhitalic_h are the conditionals on the Knightian choices in each branch, respectively.

f=(0.50.5)𝐈𝐦𝐏1(1,2)g=(100100)𝐈𝐦𝐏2(1,3)h=(100001)𝐈𝐦𝐏2(1,3)formulae-sequence𝑓matrix0.50.5subscript𝐈𝐦𝐏112𝑔matrix100100subscript𝐈𝐦𝐏213matrix100001subscript𝐈𝐦𝐏213f=\begin{pmatrix}0.5\\ 0.5\end{pmatrix}\in\mathbf{ImP}_{1}(1,2)\qquad g=\begin{pmatrix}1&0\\ 0&1\\ 0&0\end{pmatrix}\in\mathbf{ImP}_{2}(1,3)\qquad h=\begin{pmatrix}1&0\\ 0&0\\ 0&1\end{pmatrix}\in\mathbf{ImP}_{2}(1,3)italic_f = ( start_ARG start_ROW start_CELL 0.5 end_CELL end_ROW start_ROW start_CELL 0.5 end_CELL end_ROW end_ARG ) ∈ bold_ImP start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 , 2 ) italic_g = ( start_ARG start_ROW start_CELL 1 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ) ∈ bold_ImP start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 , 3 ) italic_h = ( start_ARG start_ROW start_CELL 1 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARG ) ∈ bold_ImP start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 , 3 )

On the other hand, Example 1.2 is the morphism

(𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surj,π1(g)+𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surj,π2(h))f=(10.50.50000.50.500.500.5)𝐈𝐦𝐏4(1,3),subscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjsubscript𝜋1𝑔subscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjsubscript𝜋2𝑓matrix10.50.50000.50.500.500.5subscript𝐈𝐦𝐏413(\mathbf{FinStoch}_{\mathrm{Surj},\pi_{1}}(g)+\mathbf{FinStoch}_{\mathrm{Surj}% ,\pi_{2}}(h))\circ f=\begin{pmatrix}1&0.5&0.5&0\\ 0&0&0.5&0.5\\ 0&0.5&0&0.5\end{pmatrix}\in\mathbf{ImP}_{4}(1,3),( bold_FinStoch start_POSTSUBSCRIPT roman_Surj , italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_g ) + bold_FinStoch start_POSTSUBSCRIPT roman_Surj , italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_h ) ) ∘ italic_f = ( start_ARG start_ROW start_CELL 1 end_CELL start_CELL 0.5 end_CELL start_CELL 0.5 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0.5 end_CELL start_CELL 0.5 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0.5 end_CELL start_CELL 0 end_CELL start_CELL 0.5 end_CELL end_ROW end_ARG ) ∈ bold_ImP start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( 1 , 3 ) ,

where f𝑓fitalic_f, g𝑔gitalic_g, and hhitalic_h denote the same conditional statements, but now we lift the grading of g𝑔gitalic_g and hhitalic_h to 4444 via the projections π1,π2𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surj(2×2,2)subscript𝜋1subscript𝜋2subscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surj222\pi_{1},\pi_{2}\in\mathbf{FinStoch}_{\mathrm{Surj}}(2\times 2,2)italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ bold_FinStoch start_POSTSUBSCRIPT roman_Surj end_POSTSUBSCRIPT ( 2 × 2 , 2 ) to account for the decoupling of their Knightian uncertainties.

4 Convex sets of distributions and an op-lax functor

In this section we recall the properties of convex powersets of distributions (see also [64, 65, 44, 7, 34] and elsewhere). We then connect our category 𝐈𝐦𝐏𝐈𝐦𝐏\mathbf{ImP}bold_ImP3) with convex powersets, via the Kan extension method of [24]4.2) and then show that this yields an op-lax functor (§4.3), which means that our category is more conservative with uncertainty bounds. We begin by recalling some basic properties of convex sets of distributions.

Definition 4.1.

A subset S𝑆Sitalic_S of D(n)𝐷𝑛D(n)italic_D ( italic_n ) is convex if it is closed under convex combinations: if p,qS𝑝𝑞𝑆p,q\in Sitalic_p , italic_q ∈ italic_S then for any r[0,1]𝑟01r\in[0,1]italic_r ∈ [ 0 , 1 ] we have p+rqSsubscript𝑟𝑝𝑞𝑆p+_{r}q\in Sitalic_p + start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_q ∈ italic_S.

A convex subset S𝑆Sitalic_S of D(n)𝐷𝑛D(n)italic_D ( italic_n ) is finitely generated if there is a finite sequence p1pmSsubscript𝑝1subscript𝑝𝑚𝑆p_{1}\dots p_{m}\in Sitalic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ italic_S such that every element of S𝑆Sitalic_S can be achieved by convex combinations of the pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s. In other words, S={q(p1pm)|qD(m)}𝑆conditional-set𝑞subscript𝑝1subscript𝑝𝑚𝑞𝐷𝑚S=\{q\cdot(p_{1}\dots p_{m})~{}|~{}q\in D(m)\}italic_S = { italic_q ⋅ ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) | italic_q ∈ italic_D ( italic_m ) }, with the pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s regarded as column vectors and q𝑞qitalic_q regarded as a row vector.

Lemma 4.2.

For any convex map f:D(m)D(n):𝑓𝐷𝑚𝐷𝑛f:D(m)\to D(n)italic_f : italic_D ( italic_m ) → italic_D ( italic_n ) between the sets of probability vectors, the image of f𝑓fitalic_f is a convex subset of D(n)𝐷𝑛D(n)italic_D ( italic_n ).

Moreover, such convex subsets of D(n)𝐷𝑛D(n)italic_D ( italic_n ) are finitely generated, and every finitely generated convex set arises in this way.

Proof.

Suppose q,q𝗂𝗆𝖺𝗀𝖾(f)𝑞superscript𝑞𝗂𝗆𝖺𝗀𝖾𝑓q,q^{\prime}\in\mathsf{image}(f)italic_q , italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_image ( italic_f ), and let r[0,1]𝑟01r\in[0,1]italic_r ∈ [ 0 , 1 ]. So we must have p,pD(m)𝑝𝑝𝐷𝑚p,p\in D(m)italic_p , italic_p ∈ italic_D ( italic_m ) such that f(p)=q𝑓𝑝𝑞f(p)=qitalic_f ( italic_p ) = italic_q and f(p)=q𝑓superscript𝑝superscript𝑞f(p^{\prime})=q^{\prime}italic_f ( italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Then

q+rq=f(p)+rf(p)=f(p+rp),subscript𝑟𝑞superscript𝑞subscript𝑟𝑓𝑝𝑓superscript𝑝𝑓subscript𝑟𝑝superscript𝑝,q+_{r}q^{\prime}=f(p)+_{r}f(p^{\prime})=f(p+_{r}p^{\prime})\text{,}italic_q + start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_f ( italic_p ) + start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_f ( italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_f ( italic_p + start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ,

the last step because f𝑓fitalic_f is a convex map, and so we see that q+rq𝗂𝗆𝖺𝗀𝖾(f)subscript𝑟𝑞superscript𝑞𝗂𝗆𝖺𝗀𝖾𝑓q+_{r}q^{\prime}\in\mathsf{image}(f)italic_q + start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_image ( italic_f ).

The set 𝗂𝗆𝖺𝗀𝖾(f)𝗂𝗆𝖺𝗀𝖾𝑓\mathsf{image}(f)sansserif_image ( italic_f ) is generated by f(δi)𝑓subscript𝛿𝑖f(\delta_{i})italic_f ( italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) for i=1m𝑖1𝑚i=1\dots mitalic_i = 1 … italic_m. Conversely if a set S𝑆Sitalic_S is generated by p1pmsubscript𝑝1subscript𝑝𝑚p_{1}\dots p_{m}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, regarded as column vectors, then the matrix (p1pm)𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡(m,n)subscript𝑝1subscript𝑝𝑚𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡𝑚𝑛(p_{1}\dots p_{m})\in\mathbf{FinStoch}(m,n)( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ∈ bold_FinStoch ( italic_m , italic_n ) determines a map f:D(m)D(n):𝑓𝐷𝑚𝐷𝑛f:D(m)\to D(n)italic_f : italic_D ( italic_m ) → italic_D ( italic_n ) such that 𝗂𝗆𝖺𝗀𝖾(f)=S𝗂𝗆𝖺𝗀𝖾𝑓𝑆\mathsf{image}(f)=Ssansserif_image ( italic_f ) = italic_S. ∎

4.1 Convex powersets of distributions

We write CP(n)CP𝑛\mathrm{CP}(n)roman_CP ( italic_n ) for the set of convex subsets of D(n)𝐷𝑛D(n)italic_D ( italic_n ), and CPfin(n)subscriptCPfin𝑛\mathrm{CP}_{\mathrm{fin}}(n)roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( italic_n ) for the finitely generated convex subsets of D(n)𝐷𝑛D(n)italic_D ( italic_n ). Both support convex combinations: if r[0,1]𝑟01r\in[0,1]italic_r ∈ [ 0 , 1 ] and S,TCP(n)𝑆𝑇CP𝑛S,T\in\mathrm{CP}(n)italic_S , italic_T ∈ roman_CP ( italic_n ) then

S+rT=def{p+rq|pS,qT}CP(n).superscriptdefsubscript𝑟𝑆𝑇conditional-setsubscript𝑟𝑝𝑞formulae-sequence𝑝𝑆𝑞𝑇CP𝑛.S+_{r}T\stackrel{{\scriptstyle\text{def}}}{{=}}\{p+_{r}q~{}|~{}p\in S,q\in T\}% \in\mathrm{CP}(n)\text{.}italic_S + start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_T start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG def end_ARG end_RELOP { italic_p + start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_q | italic_p ∈ italic_S , italic_q ∈ italic_T } ∈ roman_CP ( italic_n ) .

There is moreover an ordering given by subset, and the join is a convex closure of the union:

ST=def{p+rq|r[0,1],pS,qT}.superscriptdef𝑆𝑇conditional-setsubscript𝑟𝑝𝑞formulae-sequence𝑟01formulae-sequence𝑝𝑆𝑞𝑇.S\vee T\stackrel{{\scriptstyle\text{def}}}{{=}}\{p+_{r}q~{}|~{}r\in[0,1],p\in S% ,q\in T\}\text{.}italic_S ∨ italic_T start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG def end_ARG end_RELOP { italic_p + start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_q | italic_r ∈ [ 0 , 1 ] , italic_p ∈ italic_S , italic_q ∈ italic_T } .
Proposition 4.3.

There is a family of functions ϕm,n:𝐈𝐦𝐏m(1,n)CPfin(n):subscriptitalic-ϕ𝑚𝑛subscript𝐈𝐦𝐏𝑚1𝑛subscriptCPfin𝑛\phi_{m,n}:\mathbf{ImP}_{m}(1,n)\to\mathrm{CP}_{\mathrm{fin}}(n)italic_ϕ start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT : bold_ImP start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( 1 , italic_n ) → roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( italic_n ), that takes f𝐈𝐦𝐏m(1,n)𝑓subscript𝐈𝐦𝐏𝑚1𝑛f\in\mathbf{ImP}_{m}(1,n)italic_f ∈ bold_ImP start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( 1 , italic_n ) to its image 𝗂𝗆𝖺𝗀𝖾(f)CPfin(n)𝗂𝗆𝖺𝗀𝖾𝑓subscriptCPfin𝑛\mathsf{image}(f)\in\mathrm{CP}_{\mathrm{fin}}(n)sansserif_image ( italic_f ) ∈ roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( italic_n ), and the family is natural in m𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surj𝑚subscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjm\in\mathbf{FinStoch}_{\mathrm{Surj}}italic_m ∈ bold_FinStoch start_POSTSUBSCRIPT roman_Surj end_POSTSUBSCRIPT and n𝐒𝐞𝐭𝑛𝐒𝐞𝐭n\in\mathbf{Set}italic_n ∈ bold_Set.

Proof.

First, the fact that the image of f𝑓fitalic_f is convex is Lemma 4.2. For naturality in m𝑚mitalic_m, suppose g𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surj(m,m)𝑔subscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjsuperscript𝑚𝑚g\in\mathbf{FinStoch}_{\mathrm{Surj}}(m^{\prime},m)italic_g ∈ bold_FinStoch start_POSTSUBSCRIPT roman_Surj end_POSTSUBSCRIPT ( italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_m ). Then naturality in m𝑚mitalic_m amounts to the fact that

𝗂𝗆𝖺𝗀𝖾(fg)=𝗂𝗆𝖺𝗀𝖾(f)𝗂𝗆𝖺𝗀𝖾𝑓𝑔𝗂𝗆𝖺𝗀𝖾𝑓\mathsf{image}(f\circ g)=\mathsf{image}(f)sansserif_image ( italic_f ∘ italic_g ) = sansserif_image ( italic_f )

which is true since g𝑔gitalic_g is surjective. For naturality in n𝑛nitalic_n, suppose h𝐒𝐞𝐭(n,n)𝐒𝐞𝐭𝑛superscript𝑛h\in\mathbf{Set}(n,n^{\prime})italic_h ∈ bold_Set ( italic_n , italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ). Then naturality amounts to the fact that

𝗂𝗆𝖺𝗀𝖾(D(h)f)=CP(h)(𝗂𝗆𝖺𝗀𝖾(f))𝗂𝗆𝖺𝗀𝖾𝐷𝑓CP𝗂𝗆𝖺𝗀𝖾𝑓\mathsf{image}(D(h)\circ f)=\mathrm{CP}(h)(\mathsf{image}(f))sansserif_image ( italic_D ( italic_h ) ∘ italic_f ) = roman_CP ( italic_h ) ( sansserif_image ( italic_f ) )

which is true because taking an image of f𝑓fitalic_f after postcomposition with D(h)𝐷D(h)italic_D ( italic_h ) is the same as a pointwise application of D(h)𝐷D(h)italic_D ( italic_h ) to the image of f𝑓fitalic_f. ∎

4.2 Connection to Kan extensions

Fritz and Perrone [24, 25] propose a method to extract a monad from a graded monad, by taking the left Kan extension. They provide criteria for when this process works and induces a monad morphism. This process cannot work entirely for our situation, for the following reason. First we note that we can interpret bernoulli and knight as the following elements of CP(2)CP2\mathrm{CP}(2)roman_CP ( 2 ) :

  • bernoulli is {(10)}+0.5{(01)}={(0.50.5)}subscript0.510010.50.5\{(\begin{smallmatrix}1\\ 0\end{smallmatrix})\}+_{0.5}\{(\begin{smallmatrix}0\\ 1\end{smallmatrix})\}\ =\ \{(\begin{smallmatrix}0.5\\ 0.5\end{smallmatrix})\}{ ( start_ROW start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW ) } + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT { ( start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 1 end_CELL end_ROW ) } = { ( start_ROW start_CELL 0.5 end_CELL end_ROW start_ROW start_CELL 0.5 end_CELL end_ROW ) };

  • knight is {(10)}{(01)}={(10),(01)}10011001\{(\begin{smallmatrix}1\\ 0\end{smallmatrix})\}\vee\{(\begin{smallmatrix}0\\ 1\end{smallmatrix})\}\ =\ \{(\begin{smallmatrix}1\\ 0\end{smallmatrix}),(\begin{smallmatrix}0\\ 1\end{smallmatrix})\}{ ( start_ROW start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW ) } ∨ { ( start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 1 end_CELL end_ROW ) } = { ( start_ROW start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW ) , ( start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 1 end_CELL end_ROW ) }.

This construction CPCP\mathrm{CP}roman_CP extends to a monad on 𝐒𝐞𝐭𝐒𝐞𝐭\mathbf{Set}bold_Set [34]. Therefore, we can follow through the derivation of Figure 2 to see that the graded monad cannot be commutative (apparently contradicting [34, Lemma 5.2]) since the convex sets in Fig. 1(b)–(c) are different. (For another argument, note that CPCP\mathrm{CP}roman_CP contains two binary idempotent symmetric operations, \vee and +0.5subscript0.5+_{0.5}+ start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT, and see e.g. [85].)

By contrast, 𝐈𝐦𝐏𝐈𝐦𝐏\mathbf{ImP}bold_ImP3.2) does satisfy our desiderata (§1.1). So there cannot be a monad morphism between them. Nonetheless, the Kan extension of our graded Markov category 𝐈𝐦𝐏𝐈𝐦𝐏\mathbf{ImP}bold_ImP, regarded as a graded monad via Proposition 2.6, does give the finitely-generated convex powerset monad CPCP\mathrm{CP}roman_CP as a functor, just not as a monad.

Proposition 4.4.

The family ϕm,n:𝐈𝐦𝐏m(1,n)CPfin(n):subscriptitalic-ϕ𝑚𝑛subscript𝐈𝐦𝐏𝑚1𝑛subscriptCPfin𝑛\phi_{m,n}:\mathbf{ImP}_{m}(1,n)\to\mathrm{CP}_{\mathrm{fin}}(n)italic_ϕ start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT : bold_ImP start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( 1 , italic_n ) → roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( italic_n ) exhibits CPfin:𝐅𝐢𝐧𝐒𝐞𝐭𝐒𝐞𝐭:subscriptCPfin𝐅𝐢𝐧𝐒𝐞𝐭𝐒𝐞𝐭\mathrm{CP}_{\mathrm{fin}}:\mathbf{FinSet}\to\mathbf{Set}roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT : bold_FinSet → bold_Set as the Kan extension of

𝐈𝐦𝐏()(1,=):𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjop[𝐅𝐢𝐧𝐒𝐞𝐭,𝐒𝐞𝐭]:subscript𝐈𝐦𝐏1superscriptsubscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjop𝐅𝐢𝐧𝐒𝐞𝐭𝐒𝐞𝐭\mathbf{ImP}_{(-)}(1,=):\mathbf{FinStoch}_{\mathrm{Surj}}^{\mathrm{op}}\to[% \mathbf{FinSet},\mathbf{Set}]bold_ImP start_POSTSUBSCRIPT ( - ) end_POSTSUBSCRIPT ( 1 , = ) : bold_FinStoch start_POSTSUBSCRIPT roman_Surj end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT → [ bold_FinSet , bold_Set ]

along the unique functor 𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjop1superscriptsubscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjop1\mathbf{FinStoch}_{\mathrm{Surj}}^{\mathrm{op}}\to 1bold_FinStoch start_POSTSUBSCRIPT roman_Surj end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT → 1.

𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjopsuperscriptsubscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjop{\mathbf{FinStoch}_{\mathrm{Surj}}^{\mathrm{op}}}bold_FinStoch start_POSTSUBSCRIPT roman_Surj end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT11{1}1[𝐅𝐢𝐧𝐒𝐞𝐭,𝐒𝐞𝐭]𝐅𝐢𝐧𝐒𝐞𝐭𝐒𝐞𝐭{{[\mathbf{FinSet},\mathbf{Set}]}}[ bold_FinSet , bold_Set ]CPfinsubscriptCPfin\scriptstyle{\mathrm{CP}_{\mathrm{fin}}}roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT!\scriptstyle{!}!𝐈𝐦𝐏()(1,=)subscript𝐈𝐦𝐏1\scriptstyle{\mathbf{ImP}_{(-)}(1,=)}bold_ImP start_POSTSUBSCRIPT ( - ) end_POSTSUBSCRIPT ( 1 , = )
Proof.

Kan extensions in [𝐅𝐢𝐧𝐒𝐞𝐭,𝐒𝐞𝐭]𝐅𝐢𝐧𝐒𝐞𝐭𝐒𝐞𝐭[\mathbf{FinSet},\mathbf{Set}][ bold_FinSet , bold_Set ] can be computed pointwise, and for any n𝐅𝐢𝐧𝐒𝐞𝐭𝑛𝐅𝐢𝐧𝐒𝐞𝐭n\in\mathbf{FinSet}italic_n ∈ bold_FinSet the Kan extension of 𝐈𝐦𝐏()(1,n):𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjop𝐒𝐞𝐭:subscript𝐈𝐦𝐏1𝑛superscriptsubscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjop𝐒𝐞𝐭\mathbf{ImP}_{(-)}(1,n):\mathbf{FinStoch}_{\mathrm{Surj}}^{\mathrm{op}}\to% \mathbf{Set}bold_ImP start_POSTSUBSCRIPT ( - ) end_POSTSUBSCRIPT ( 1 , italic_n ) : bold_FinStoch start_POSTSUBSCRIPT roman_Surj end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT → bold_Set along 𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjop1superscriptsubscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjop1\mathbf{FinStoch}_{\mathrm{Surj}}^{\mathrm{op}}\to 1bold_FinStoch start_POSTSUBSCRIPT roman_Surj end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT → 1 is simply the colimit of the functor. Thus it suffices to show that the canonical function

Φ:colimm𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjop𝐈𝐦𝐏m(1,n)CPfin(n):Φsubscriptcolim𝑚superscriptsubscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjopsubscript𝐈𝐦𝐏𝑚1𝑛subscriptCPfin𝑛\Phi:\operatorname*{colim}_{m\in\mathbf{FinStoch}_{\mathrm{Surj}}^{\mathrm{op}% }}\mathbf{ImP}_{m}(1,n)\to\mathrm{CP}_{\mathrm{fin}}(n)roman_Φ : roman_colim start_POSTSUBSCRIPT italic_m ∈ bold_FinStoch start_POSTSUBSCRIPT roman_Surj end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_op end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_ImP start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( 1 , italic_n ) → roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( italic_n )

(induced by ϕitalic-ϕ\phiitalic_ϕ) is a bijection. This function ΦΦ\Phiroman_Φ is given by Φ[m,f𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡(m,n)]=𝗂𝗆𝖺𝗀𝖾(f)Φdelimited-[]𝑚𝑓𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡𝑚𝑛𝗂𝗆𝖺𝗀𝖾𝑓\Phi[m,f\in\mathbf{FinStoch}(m,n)]=\mathsf{image}(f)roman_Φ [ italic_m , italic_f ∈ bold_FinStoch ( italic_m , italic_n ) ] = sansserif_image ( italic_f ). To see that it is surjective we recall that every finitely generated convex set is the image of some convex function D(m)D(n)𝐷𝑚𝐷𝑛D(m)\to D(n)italic_D ( italic_m ) → italic_D ( italic_n ) (Lemma 4.2). To see that it is injective we suppose that 𝗂𝗆𝖺𝗀𝖾(f)=𝗂𝗆𝖺𝗀𝖾(f)𝗂𝗆𝖺𝗀𝖾𝑓𝗂𝗆𝖺𝗀𝖾superscript𝑓\mathsf{image}(f)=\mathsf{image}(f^{\prime})sansserif_image ( italic_f ) = sansserif_image ( italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), for f𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡(m,n)𝑓𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡𝑚𝑛f\in\mathbf{FinStoch}(m,n)italic_f ∈ bold_FinStoch ( italic_m , italic_n ) and f𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡(m,n)superscript𝑓𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡superscript𝑚𝑛f^{\prime}\in\mathbf{FinStoch}(m^{\prime},n)italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ bold_FinStoch ( italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_n ). We need to show that [m,f]=[m,f]𝑚𝑓superscript𝑚superscript𝑓[m,f]=[m^{\prime},f^{\prime}][ italic_m , italic_f ] = [ italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] in the colimit. It suffices to find m′′superscript𝑚′′m^{\prime\prime}italic_m start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT with h𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡(m,n)𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡𝑚𝑛h\in\mathbf{FinStoch}(m,n)italic_h ∈ bold_FinStoch ( italic_m , italic_n ) and surjections g𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surj(m,m′′)𝑔subscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surj𝑚superscript𝑚′′g\in\mathbf{FinStoch}_{\mathrm{Surj}}(m,m^{\prime\prime})italic_g ∈ bold_FinStoch start_POSTSUBSCRIPT roman_Surj end_POSTSUBSCRIPT ( italic_m , italic_m start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) and g𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surj(m,m′′)superscript𝑔subscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjsuperscript𝑚superscript𝑚′′g^{\prime}\in\mathbf{FinStoch}_{\mathrm{Surj}}(m^{\prime},m^{\prime\prime})italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ bold_FinStoch start_POSTSUBSCRIPT roman_Surj end_POSTSUBSCRIPT ( italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_m start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ):

m𝑚{m}italic_mmsuperscript𝑚{{m^{\prime}}}italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPTm′′superscript𝑚′′{{m^{\prime\prime}}}italic_m start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPTn𝑛{n}italic_ng𝑔\scriptstyle{g}italic_ggsuperscript𝑔\scriptstyle{g^{\prime}}italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPTf𝑓\scriptstyle{f}italic_ffsuperscript𝑓\scriptstyle{f^{\prime}}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPTh\scriptstyle{h}italic_h

The finitely generated convex set 𝗂𝗆𝖺𝗀𝖾(f)=𝗂𝗆𝖺𝗀𝖾(f)𝗂𝗆𝖺𝗀𝖾𝑓𝗂𝗆𝖺𝗀𝖾superscript𝑓\mathsf{image}(f)=\mathsf{image}(f^{\prime})sansserif_image ( italic_f ) = sansserif_image ( italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) must have a unique convex hull, and we let m′′superscript𝑚′′m^{\prime\prime}italic_m start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT be the number of extremal points of the convex hull, which are uniquely determined. We construct g𝑔gitalic_g by noting that f(i)𝑓𝑖f(i)italic_f ( italic_i ) must be a convex combination from the m′′superscript𝑚′′m^{\prime\prime}italic_m start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT extremal points, and so we let g(i)𝑔𝑖g(i)italic_g ( italic_i ) be the probability vector corresponding to that combination. We construct gsuperscript𝑔g^{\prime}italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT from fsuperscript𝑓f^{\prime}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT similarly. To see that g𝑔gitalic_g is surjective we note that since f𝑓fitalic_f is surjective onto its image we must have points in m𝑚mitalic_m that map onto the extremal points, and hence onto all the points of m′′superscript𝑚′′m^{\prime\prime}italic_m start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT via g𝑔gitalic_g. Similarly, gsuperscript𝑔g^{\prime}italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is surjective. ∎

4.3 An op-lax functor

Definition 4.5.

The construction CPfinsubscriptCPfin\mathrm{CP}_{\mathrm{fin}}roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT extends to a relative monad. The unit morphism ηn:nCPfin(n):subscript𝜂𝑛𝑛subscriptCPfin𝑛\eta_{n}:n\to\mathrm{CP}_{\mathrm{fin}}(n)italic_η start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : italic_n → roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( italic_n ) picks out the singleton set containing the Dirac vector, ηn(i)={δi}subscript𝜂𝑛𝑖subscript𝛿𝑖\eta_{n}(i)=\{\delta_{i}\}italic_η start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_i ) = { italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }. The Kleisli extension takes a function f:mCPfin(n):𝑓𝑚subscriptCPfin𝑛f:m\to\mathrm{CP}_{\mathrm{fin}}(n)italic_f : italic_m → roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( italic_n ) to a function f:CPfin(m)CPfin(n):superscript𝑓subscriptCPfin𝑚subscriptCPfin𝑛f^{*}:\mathrm{CP}_{\mathrm{fin}}(m)\to\mathrm{CP}_{\mathrm{fin}}(n)italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT : roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( italic_m ) → roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( italic_n ) given by

f(X)=xext(X)imxif(i);superscript𝑓𝑋subscript𝑥𝑒𝑥𝑡𝑋subscript𝑖𝑚subscript𝑥𝑖𝑓𝑖f^{*}(X)=\bigvee_{x\in ext(X)}\sum_{i\in m}x_{i}\cdot f(i);italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_X ) = ⋁ start_POSTSUBSCRIPT italic_x ∈ italic_e italic_x italic_t ( italic_X ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ italic_m end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_f ( italic_i ) ;

where ext𝑒𝑥𝑡extitalic_e italic_x italic_t takes the extreme points of the finitely generated convex subset.

From this structure, we build a Kleisli category as usual.

  • The objects of Kl(CPfin)KlsubscriptCPfin\mathrm{Kl}(\mathrm{CP}_{\mathrm{fin}})roman_Kl ( roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ) are natural numbers.

  • The morphisms mn𝑚𝑛m\to nitalic_m → italic_n are functions mCPfin(n)𝑚subscriptCPfin𝑛m\to\mathrm{CP}_{\mathrm{fin}}(n)italic_m → roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( italic_n ).

  • The identity morphism is the unit η𝜂\etaitalic_η. Composition of g𝑔gitalic_g and f𝑓fitalic_f is given by gfsuperscript𝑔𝑓g^{*}\circ fitalic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∘ italic_f.

In fact, this category is order-enriched. That is to say, the hom-sets Kl(CPfin)KlsubscriptCPfin\mathrm{Kl}(\mathrm{CP}_{\mathrm{fin}})roman_Kl ( roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ) have a natural partial order structure given by fg𝑓𝑔f\leq gitalic_f ≤ italic_g if for all i𝑖iitalic_i, f(i)g(i)𝑓𝑖𝑔𝑖f(i)\subseteq g(i)italic_f ( italic_i ) ⊆ italic_g ( italic_i ). Composition is thus monotone.

We now extend the quotient of Proposition 4.3 to an identity-on-objects op-lax functor 𝐈𝐦𝐏Kl(CPfin)𝐈𝐦𝐏KlsubscriptCPfin\mathbf{ImP}\to\mathrm{Kl}(\mathrm{CP}_{\mathrm{fin}})bold_ImP → roman_Kl ( roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ).

Theorem 1.

Consider the assignment of a morphism f𝐈𝐦𝐏a(m,n)𝑓subscript𝐈𝐦𝐏𝑎𝑚𝑛f\in\mathbf{ImP}_{a}(m,n)italic_f ∈ bold_ImP start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_m , italic_n ) to R(f):mCPfin(n):𝑅𝑓𝑚subscriptCPfin𝑛R(f):m\to\mathrm{CP}_{\mathrm{fin}}(n)italic_R ( italic_f ) : italic_m → roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( italic_n ) given by R(f)(i)=𝗂𝗆𝖺𝗀𝖾(f(,i))𝑅𝑓𝑖𝗂𝗆𝖺𝗀𝖾𝑓𝑖R(f)(i)=\mathsf{image}(f(-,i))italic_R ( italic_f ) ( italic_i ) = sansserif_image ( italic_f ( - , italic_i ) ). This defines an op-lax functor

𝐈𝐦𝐏Kl(CPfin)𝐈𝐦𝐏KlsubscriptCPfin\mathbf{ImP}\to\mathrm{Kl}(\mathrm{CP}_{\mathrm{fin}})bold_ImP → roman_Kl ( roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT )
Proof notes..

It is straightforward that R(𝗂𝖽)=𝗂𝖽𝑅𝗂𝖽𝗂𝖽R(\mathsf{id})=\mathsf{id}italic_R ( sansserif_id ) = sansserif_id. It remains to show that R(gf)R(g)R(f)𝑅𝑔𝑓𝑅𝑔𝑅𝑓R(g\circ f)\subseteq R(g)\circ R(f)italic_R ( italic_g ∘ italic_f ) ⊆ italic_R ( italic_g ) ∘ italic_R ( italic_f ). Since we will show that R𝑅Ritalic_R preserves finite coproducts, it is sufficient to first suppose that the domain of f𝑓fitalic_f is 1111. So consider f𝐈𝐦𝐏a(1,m)𝑓subscript𝐈𝐦𝐏𝑎1𝑚f\in\mathbf{ImP}_{a}(1,m)italic_f ∈ bold_ImP start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( 1 , italic_m ) and g𝐈𝐦𝐏b(m,n)𝑔subscript𝐈𝐦𝐏𝑏𝑚𝑛g\in\mathbf{ImP}_{b}(m,n)italic_g ∈ bold_ImP start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_m , italic_n ). So (gf)𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡(a×b,n)𝑔𝑓𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡𝑎𝑏𝑛(g\circ f)\in\mathbf{FinStoch}(a\times b,n)( italic_g ∘ italic_f ) ∈ bold_FinStoch ( italic_a × italic_b , italic_n ). We must show that for all (i,j)(a×b)𝑖𝑗𝑎𝑏(i,j)\in(a\times b)( italic_i , italic_j ) ∈ ( italic_a × italic_b ), the probability vector (gf)(i,j)𝑔𝑓𝑖𝑗(g\circ f)(i,j)( italic_g ∘ italic_f ) ( italic_i , italic_j ) is in

(R(g)R(f))()=R(g)(𝗂𝗆𝖺𝗀𝖾(f))CP(n).𝑅𝑔𝑅𝑓𝑅superscript𝑔𝗂𝗆𝖺𝗀𝖾𝑓CP𝑛.(R(g)\circ R(f))()=R(g)^{*}(\mathsf{image}(f))\in\mathrm{CP}(n)\text{.}( italic_R ( italic_g ) ∘ italic_R ( italic_f ) ) ( ) = italic_R ( italic_g ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( sansserif_image ( italic_f ) ) ∈ roman_CP ( italic_n ) .

To show this, we note that the grade of (gf)𝑔𝑓(g\circ f)( italic_g ∘ italic_f ) is (a×b)𝑎𝑏(a\times b)( italic_a × italic_b ), but we can also consider an alternative kind of composite (gf)𝑔𝑓(g\ast f)( italic_g ∗ italic_f ) with a bigger grade (a×bm)𝑎superscript𝑏𝑚(a\times b^{m})( italic_a × italic_b start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ). This is given by

(gf)=(a×bm𝑓m×bm(𝑒𝑣𝑎𝑙,𝑝𝑟𝑜𝑗1)b×m𝑔n);𝑔𝑓𝑓𝑎superscript𝑏𝑚𝑚superscript𝑏𝑚𝑒𝑣𝑎𝑙subscript𝑝𝑟𝑜𝑗1𝑏𝑚𝑔𝑛;(g\ast f)=\Big{(}a\times b^{m}\xrightarrow{f}m\times b^{m}\xrightarrow{(% \mathit{eval},\mathit{proj}_{1})}b\times m\xrightarrow{g}n\Big{)}\text{;}( italic_g ∗ italic_f ) = ( italic_a × italic_b start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_ARROW overitalic_f → end_ARROW italic_m × italic_b start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_ARROW start_OVERACCENT ( italic_eval , italic_proj start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_OVERACCENT → end_ARROW italic_b × italic_m start_ARROW overitalic_g → end_ARROW italic_n ) ;

where the middle arrow is the evident function between sets regarded as a stochastic matrix. Contrast with

(gf)=(a×b𝑓m×b𝑠𝑤𝑝b×m𝑔n).𝑔𝑓𝑓𝑎𝑏𝑚𝑏𝑠𝑤𝑝𝑏𝑚𝑔𝑛.(g\circ f)=\Big{(}a\times b\xrightarrow{f}m\times b\xrightarrow{\mathit{swp}}b% \times m\xrightarrow{g}n\Big{)}\text{.}( italic_g ∘ italic_f ) = ( italic_a × italic_b start_ARROW overitalic_f → end_ARROW italic_m × italic_b start_ARROW overitalic_swp → end_ARROW italic_b × italic_m start_ARROW overitalic_g → end_ARROW italic_n ) .

The function (a×b)(a×bm)𝑎𝑏𝑎superscript𝑏𝑚(a\times b)\to(a\times b^{m})( italic_a × italic_b ) → ( italic_a × italic_b start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ) that copies b𝑏bitalic_b is an injection and exhibits

𝗂𝗆𝖺𝗀𝖾(gf)𝗂𝗆𝖺𝗀𝖾(gf)𝗂𝗆𝖺𝗀𝖾𝑔𝑓𝗂𝗆𝖺𝗀𝖾𝑔𝑓\mathsf{image}(g\circ f)\subseteq\mathsf{image}(g\ast f)sansserif_image ( italic_g ∘ italic_f ) ⊆ sansserif_image ( italic_g ∗ italic_f )

Moreover, we have that

𝗂𝗆𝖺𝗀𝖾(gf)=R(g)R(f).𝗂𝗆𝖺𝗀𝖾𝑔𝑓𝑅𝑔𝑅𝑓.\mathsf{image}(g\ast f)=R(g)\circ R(f)\text{.}sansserif_image ( italic_g ∗ italic_f ) = italic_R ( italic_g ) ∘ italic_R ( italic_f ) .

The intuitive point is that in (gf)𝑔𝑓(g\ast f)( italic_g ∗ italic_f ), for each possible intermediate m𝑚mitalic_m we are allowed to use different choices of b𝑏bitalic_b, but in (gf)𝑔𝑓(g\circ f)( italic_g ∘ italic_f ), each possible intermediate m𝑚mitalic_m will use the same choices of b𝑏bitalic_b.

To see that R𝑅Ritalic_R preserves coproducts we note that on objects it is immediate, and expanding the definitions shows that the coproduct injections and copairings are exactly preserved by R𝑅Ritalic_R. ∎

(Here, we are regarding 𝐈𝐦𝐏𝐈𝐦𝐏\mathbf{ImP}bold_ImP with discrete order enrichment but non-trivial local grading, and Kl(CPfin)KlsubscriptCPfin\mathrm{Kl}(\mathrm{CP}_{\mathrm{fin}})roman_Kl ( roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ) with non-trivial order enrichment but trivial local grading. It is possible that there interesting ways to unify the two different enrichments.)

4.4 Discussion and example

Example 4.6.

We once again revisit the scenarios from Examples 1.1 and 1.2 where boolean values are drawn with Knightian uncertainty or from fair Bernoulli trials and combined using different program logic. Consider the morphism denoting a fair Bernoulli trial (f𝑓fitalic_f from Example 3.6),

f=(0.50.5)𝐈𝐦𝐏1(1,2),𝑓matrix0.50.5subscript𝐈𝐦𝐏112f=\begin{pmatrix}0.5\\ 0.5\end{pmatrix}\in\mathbf{ImP}_{1}(1,2),italic_f = ( start_ARG start_ROW start_CELL 0.5 end_CELL end_ROW start_ROW start_CELL 0.5 end_CELL end_ROW end_ARG ) ∈ bold_ImP start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 , 2 ) ,

and a morphism that employs Knightian uncertainty on each of its inputs (g+h𝑔g+hitalic_g + italic_h from Example 3.6),

g=(100100|100001)𝐈𝐦𝐏2(2,3).g=\left(\begin{matrix}1&0\\ 0&1\\ 0&0\end{matrix}\ \middle|\ \begin{matrix}1&0\\ 0&0\\ 0&1\end{matrix}\right)\in\mathbf{ImP}_{2}(2,3).italic_g = ( start_ARG start_ROW start_CELL 1 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG | start_ARG start_ROW start_CELL 1 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARG ) ∈ bold_ImP start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 2 , 3 ) .

Then R(f):1CPfin(2):𝑅𝑓1subscriptCPfin2R(f):1\to\mathrm{CP}_{\mathrm{fin}}(2)italic_R ( italic_f ) : 1 → roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( 2 ) maps the singleton set to

{(0.50.5)},matrix0.50.5\left\{\begin{pmatrix}0.5\\ 0.5\end{pmatrix}\right\},{ ( start_ARG start_ROW start_CELL 0.5 end_CELL end_ROW start_ROW start_CELL 0.5 end_CELL end_ROW end_ARG ) } ,

R(g):2CPfin(3):𝑅𝑔2subscriptCPfin3R(g):2\to\mathrm{CP}_{\mathrm{fin}}(3)italic_R ( italic_g ) : 2 → roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( 3 ) maps the two-element set to

{(100),(010)} and {(100),(001)},matrix100matrix010 and matrix100matrix001\left\{\begin{pmatrix}1\\ 0\\ 0\end{pmatrix},\begin{pmatrix}0\\ 1\\ 0\end{pmatrix}\right\}\text{ and }\left\{\begin{pmatrix}1\\ 0\\ 0\end{pmatrix},\begin{pmatrix}0\\ 0\\ 1\end{pmatrix}\right\},{ ( start_ARG start_ROW start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ) , ( start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ) } and { ( start_ARG start_ROW start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ) , ( start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 1 end_CELL end_ROW end_ARG ) } ,

and, following Example 3.6, R(gf):1CPfin(3):𝑅𝑔𝑓1subscriptCPfin3R(g\circ f):1\to\mathrm{CP}_{\mathrm{fin}}(3)italic_R ( italic_g ∘ italic_f ) : 1 → roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( 3 ) maps the singleton set to

{(100),(00.50.5)}.matrix100matrix00.50.5\left\{\begin{pmatrix}1\\ 0\\ 0\end{pmatrix},\begin{pmatrix}0\\ 0.5\\ 0.5\end{pmatrix}\right\}.{ ( start_ARG start_ROW start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ) , ( start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0.5 end_CELL end_ROW start_ROW start_CELL 0.5 end_CELL end_ROW end_ARG ) } .

This is the convex subset in Figure 1(b) if we consider the probability vectors as giving the corresponding chances of outcomes r𝑟{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}r}italic_r, g𝑔{\color[rgb]{0,0.88,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0.88,0}% \pgfsys@color@cmyk@stroke{0.91}{0}{0.88}{0.12}\pgfsys@color@cmyk@fill{0.91}{0}% {0.88}{0.12}g}italic_g, and b𝑏{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}b}italic_b.

On the other hand, by composing g𝑔gitalic_g with f𝑓fitalic_f after map** them into Kl(CPfin)KlsubscriptCPfin\mathrm{Kl}(\mathrm{CP}_{\mathrm{fin}})roman_Kl ( roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ), we lose the ability to distinguish which outcomes were related via the same Knightian choices. So the morphism R(g)R(f):1CPfin(3):𝑅𝑔𝑅𝑓1subscriptCPfin3R(g)\circ R(f):1\to\mathrm{CP}_{\mathrm{fin}}(3)italic_R ( italic_g ) ∘ italic_R ( italic_f ) : 1 → roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( 3 ) maps the singleton element to

{(100),(0.500.5),(0.50.50),(00.50.5)},matrix100matrix0.500.5matrix0.50.50matrix00.50.5\left\{\begin{pmatrix}1\\ 0\\ 0\end{pmatrix},\begin{pmatrix}0.5\\ 0\\ 0.5\end{pmatrix},\begin{pmatrix}0.5\\ 0.5\\ 0\end{pmatrix},\begin{pmatrix}0\\ 0.5\\ 0.5\end{pmatrix}\right\},{ ( start_ARG start_ROW start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ) , ( start_ARG start_ROW start_CELL 0.5 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0.5 end_CELL end_ROW end_ARG ) , ( start_ARG start_ROW start_CELL 0.5 end_CELL end_ROW start_ROW start_CELL 0.5 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ) , ( start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0.5 end_CELL end_ROW start_ROW start_CELL 0.5 end_CELL end_ROW end_ARG ) } ,

which is the convex subset given in Figure 1(c). Thus, R(gf)R(g)R(f)𝑅𝑔𝑓𝑅𝑔𝑅𝑓R(g\circ f)\subsetneq R(g)\circ R(f)italic_R ( italic_g ∘ italic_f ) ⊊ italic_R ( italic_g ) ∘ italic_R ( italic_f ).

Therefore, by accounting for corresponding choices of Knightian uncertainty within morphism compositions, our category 𝐈𝐦𝐏𝐈𝐦𝐏\mathbf{ImP}bold_ImP obtains tighter bounds on the imprecise probabilities.

5 Maximality as a commutative theory

In Proposition 4.3 we gave a family of maps ϕitalic-ϕ\phiitalic_ϕ that convert our compositional imprecise probability into convex sets of probability distributions. These maps are not injective, and in this sense the model of 𝐈𝐦𝐏𝐈𝐦𝐏\mathbf{ImP}bold_ImP is intensional. This raises a question of whether we could have made a less intensional model than 𝐈𝐦𝐏𝐈𝐦𝐏\mathbf{ImP}bold_ImP while still maintaining Desiderata 1 and 2 and maintaining the connection to convex sets of distributions. In Theorem 2 we answer this question negatively, in the following sense: We cannot quotient the hom-sets of 𝐈𝐦𝐏𝐈𝐦𝐏\mathbf{ImP}bold_ImP without either losing the connection with convex sets (and hence statistics) or losing the monoidal or distributive structure (and hence the compositionality desiderata of §1.1). In this sense, 𝐈𝐦𝐏𝐈𝐦𝐏\mathbf{ImP}bold_ImP is maximal.

Definition 5.1.

Let 𝔾𝔾\mathbb{G}blackboard_G be a semicartesian category. Let 𝐂𝐂\mathbf{C}bold_C and 𝐃𝐃\mathbf{D}bold_D be 𝔾𝔾\mathbb{G}blackboard_G-graded distributive Markov categories. A graded distributive Markov functor F:𝐂𝐃:𝐹𝐂𝐃F:\mathbf{C}\to\mathbf{D}italic_F : bold_C → bold_D is given by a map** from the objects of 𝐂𝐂\mathbf{C}bold_C to the objects of 𝐃𝐃\mathbf{D}bold_D and a family of map**s from 𝐂a(X,Y)𝐃a(F(X),F(Y))subscript𝐂𝑎𝑋𝑌subscript𝐃𝑎𝐹𝑋𝐹𝑌\mathbf{C}_{a}(X,Y)\to\mathbf{D}_{a}(F(X),F(Y))bold_C start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_X , italic_Y ) → bold_D start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_F ( italic_X ) , italic_F ( italic_Y ) ), strictly preserving the composition, monoidal and coproduct structure, and the copy maps.

Note

In view of §2.1, we note that a graded distributive Markov functor is the same thing as the existing notion of strict distributive monoidal functor between distributive monoidal enriched categories (e.g. [45]), together with the requirement that the copy maps are preserved, which is in common with the ordinary Markov category literature [23]. We could also formulate this in terms of monad morphisms, following Section 2.2.

Theorem 2.

Let 𝐂𝐂\mathbf{C}bold_C be 𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surjsubscript𝐅𝐢𝐧𝐒𝐭𝐨𝐜𝐡Surj\mathbf{FinStoch}_{\mathrm{Surj}}bold_FinStoch start_POSTSUBSCRIPT roman_Surj end_POSTSUBSCRIPT-graded distributive Markov category with a graded distributive Markov functor F:𝐈𝐦𝐏𝐂:𝐹𝐈𝐦𝐏𝐂F:\mathbf{ImP}\to\mathbf{C}italic_F : bold_ImP → bold_C and a natural family of functions

ψm,n:𝐂m(1,n)CPfin(n):subscript𝜓𝑚𝑛subscript𝐂𝑚1𝑛subscriptCPfin𝑛\psi_{m,n}:\mathbf{C}_{m}(1,n)\to\mathrm{CP}_{\mathrm{fin}}(n)italic_ψ start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT : bold_C start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( 1 , italic_n ) → roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( italic_n )

such that

ϕm,n:𝐈𝐦𝐏m(1,n)CPfin(n):subscriptitalic-ϕ𝑚𝑛subscript𝐈𝐦𝐏𝑚1𝑛subscriptCPfin𝑛\phi_{m,n}:\mathbf{ImP}_{m}(1,n)\to\mathrm{CP}_{\mathrm{fin}}(n)italic_ϕ start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT : bold_ImP start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( 1 , italic_n ) → roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( italic_n )

(Proposition 4.3) factors through ψ𝜓\psiitalic_ψ. Then F𝐹Fitalic_F is faithful: if F(f)=F(g)𝐹𝑓𝐹𝑔F(f)=F(g)italic_F ( italic_f ) = italic_F ( italic_g ) in 𝐂𝐂\mathbf{C}bold_C then also f=g𝑓𝑔f=gitalic_f = italic_g in 𝐈𝐦𝐏𝐈𝐦𝐏\mathbf{ImP}bold_ImP.

Proof.

Since F𝐹Fitalic_F preserves finite coproducts, it is sufficient to suppose the domain of f𝑓fitalic_f and g𝑔gitalic_g is 1111. That is, let f,g𝐈𝐦𝐏m(1,n)𝑓𝑔subscript𝐈𝐦𝐏𝑚1𝑛f,g\in\mathbf{ImP}_{m}(1,n)italic_f , italic_g ∈ bold_ImP start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( 1 , italic_n ) and suppose ϕm,nsubscriptitalic-ϕ𝑚𝑛\phi_{m,n}italic_ϕ start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT factors as

𝐈𝐦𝐏m(1,n)Fm,n𝐂m(1,n)ψm,nCPfin(n).subscript𝐹𝑚𝑛subscript𝐈𝐦𝐏𝑚1𝑛subscript𝐂𝑚1𝑛subscript𝜓𝑚𝑛subscriptCPfin𝑛\mathbf{ImP}_{m}(1,n)\xrightarrow{F_{m,n}}\mathbf{C}_{m}(1,n)\xrightarrow{\psi% _{m,n}}\mathrm{CP}_{\mathrm{fin}}(n).bold_ImP start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( 1 , italic_n ) start_ARROW start_OVERACCENT italic_F start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT end_OVERACCENT → end_ARROW bold_C start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( 1 , italic_n ) start_ARROW start_OVERACCENT italic_ψ start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT end_OVERACCENT → end_ARROW roman_CP start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( italic_n ) .

Let d𝐈𝐦𝐏m(1,m)𝑑subscript𝐈𝐦𝐏𝑚1𝑚d\in\mathbf{ImP}_{m}(1,m)italic_d ∈ bold_ImP start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( 1 , italic_m ) be the evident tuple of Diracs. Define ι𝐈𝐦𝐏1(m,n+m)𝜄subscript𝐈𝐦𝐏1𝑚𝑛𝑚\iota\in\mathbf{ImP}_{1}(m,n+m)italic_ι ∈ bold_ImP start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_m , italic_n + italic_m ) and ȷ𝐈𝐦𝐏1(n,m+n)italic-ȷsubscript𝐈𝐦𝐏1𝑛𝑚𝑛\jmath\in\mathbf{ImP}_{1}(n,m+n)italic_ȷ ∈ bold_ImP start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_n , italic_m + italic_n ) as the lifting of the injections mm+nn𝑚𝑚𝑛𝑛m\to m+n\leftarrow nitalic_m → italic_m + italic_n ← italic_n via postcomposition with the unit of D𝐷Ditalic_D. Since F𝐹Fitalic_F is a graded distributive Markov functor and Fm,n(f)=Fm,n(g)subscript𝐹𝑚𝑛𝑓subscript𝐹𝑚𝑛𝑔F_{m,n}(f)=F_{m,n}(g)italic_F start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT ( italic_f ) = italic_F start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT ( italic_g ),

Fm,m+n(ȷf+0.5ιd)=Fm,m+n(ȷg+0.5ιd)subscript𝐹𝑚𝑚𝑛subscript0.5italic-ȷ𝑓𝜄𝑑subscript𝐹𝑚𝑚𝑛subscript0.5italic-ȷ𝑔𝜄𝑑F_{m,m+n}(\jmath\circ f+_{0.5}\iota\circ d)=F_{m,m+n}(\jmath\circ g+_{0.5}% \iota\circ d)italic_F start_POSTSUBSCRIPT italic_m , italic_m + italic_n end_POSTSUBSCRIPT ( italic_ȷ ∘ italic_f + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_ι ∘ italic_d ) = italic_F start_POSTSUBSCRIPT italic_m , italic_m + italic_n end_POSTSUBSCRIPT ( italic_ȷ ∘ italic_g + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_ι ∘ italic_d )

where for h,k:XD(Y):𝑘𝑋𝐷𝑌h,k:X\to D(Y)italic_h , italic_k : italic_X → italic_D ( italic_Y ), we define (h+rk)(x)=h(x)+rk(x)subscript𝑟𝑘𝑥subscript𝑟𝑥𝑘𝑥(h+_{r}k)(x)=h(x)+_{r}k(x)( italic_h + start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_k ) ( italic_x ) = italic_h ( italic_x ) + start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_k ( italic_x ). Applying ψ𝜓\psiitalic_ψ gives

ϕm,m+n(ȷf+0.5ιd)=ϕm,m+n(ȷg+0.5ιd)subscriptitalic-ϕ𝑚𝑚𝑛subscript0.5italic-ȷ𝑓𝜄𝑑subscriptitalic-ϕ𝑚𝑚𝑛subscript0.5italic-ȷ𝑔𝜄𝑑\phi_{m,m+n}(\jmath\circ f+_{0.5}\iota\circ d)=\phi_{m,m+n}(\jmath\circ g+_{0.% 5}\iota\circ d)italic_ϕ start_POSTSUBSCRIPT italic_m , italic_m + italic_n end_POSTSUBSCRIPT ( italic_ȷ ∘ italic_f + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_ι ∘ italic_d ) = italic_ϕ start_POSTSUBSCRIPT italic_m , italic_m + italic_n end_POSTSUBSCRIPT ( italic_ȷ ∘ italic_g + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_ι ∘ italic_d ) (5)

Now, for all im𝑖𝑚i\in mitalic_i ∈ italic_m, (ȷf+0.5ιd)(i)subscript0.5italic-ȷ𝑓𝜄𝑑𝑖(\jmath\circ f+_{0.5}\iota\circ d)(i)( italic_ȷ ∘ italic_f + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_ι ∘ italic_d ) ( italic_i ) are independent because they each use a different dimension. They are all extremal vertices on the convex hull ϕn,m+n(ȷf+0.5ιd)subscriptitalic-ϕ𝑛𝑚𝑛subscript0.5italic-ȷ𝑓𝜄𝑑\phi_{n,m+n}(\jmath\circ f+_{0.5}\iota\circ d)italic_ϕ start_POSTSUBSCRIPT italic_n , italic_m + italic_n end_POSTSUBSCRIPT ( italic_ȷ ∘ italic_f + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_ι ∘ italic_d ). Moreover, they must be the same vertices as (ȷg+0.5ιd)(i)subscript0.5italic-ȷ𝑔𝜄𝑑𝑖(\jmath\circ g+_{0.5}\iota\circ d)(i)( italic_ȷ ∘ italic_g + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_ι ∘ italic_d ) ( italic_i ) for respective im𝑖𝑚i\in mitalic_i ∈ italic_m because the convex hulls are the same (5). Therefore,

ȷf+0.5ιd=ȷg+0.5ιd.subscript0.5italic-ȷ𝑓𝜄𝑑subscript0.5italic-ȷ𝑔𝜄𝑑\jmath\circ f+_{0.5}\iota\circ d=\jmath\circ g+_{0.5}\iota\circ d.italic_ȷ ∘ italic_f + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_ι ∘ italic_d = italic_ȷ ∘ italic_g + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_ι ∘ italic_d .

We can recover f𝑓fitalic_f and g𝑔gitalic_g as for any im,jnformulae-sequence𝑖𝑚𝑗𝑛i\in m,j\in nitalic_i ∈ italic_m , italic_j ∈ italic_n,

f(i)(j)=2×(ȷf+0.5ιd)(i)(j),𝑓𝑖𝑗2subscript0.5italic-ȷ𝑓𝜄𝑑𝑖𝑗\displaystyle f(i)(j)=2\times(\jmath\circ f+_{0.5}\iota\circ d)(i)(j),italic_f ( italic_i ) ( italic_j ) = 2 × ( italic_ȷ ∘ italic_f + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_ι ∘ italic_d ) ( italic_i ) ( italic_j ) ,
g(i)(j)=2×(ȷg+0.5ιd)(i)(j).𝑔𝑖𝑗2subscript0.5italic-ȷ𝑔𝜄𝑑𝑖𝑗\displaystyle g(i)(j)=2\times(\jmath\circ g+_{0.5}\iota\circ d)(i)(j).italic_g ( italic_i ) ( italic_j ) = 2 × ( italic_ȷ ∘ italic_g + start_POSTSUBSCRIPT 0.5 end_POSTSUBSCRIPT italic_ι ∘ italic_d ) ( italic_i ) ( italic_j ) .

So f=g𝑓𝑔f=gitalic_f = italic_g. ∎

6 Summary and outlook

We have shown that by taking a graded perspective and naming Knightian choices we can obtain a compositional account of Bernoulli and Knightian uncertainty together. The account gives a refined bound on the uncertainty (Theorem 1) and is maximal among the compositional accounts (Theorem 2).

There are a number of future directions. A first question is how to accommodate iteration. The convex sets considered in this article are all finitely generated, but if we allow iterative programs that have an unbounded number of Knightian choices, this leads to a more general class of convex sets.

The concerns about iteration hold even if we restrict to finite outcome spaces, and thus far we have focused on this for simplicity. Much work on programming semantics for imprecise probability has focused beyond finite outcome spaces, and it will be interesting to revisit this from our perspective: this includes domain theoretic structures (e.g. [44, 84, 43, 29]) and metric structures (e.g. [65, 64]).

It would be interesting to compare to another recent compositional framework combining unknowns with probability by Stein and Samuelson, currently focusing on Gaussians [78].

Our approach is based on random elements, and so is the quasi-Borel-space probability monad (e.g. [32, 81]), so this might be a good approach to accommodating function spaces.

On the more practical side, an open question is how to perform statistical inference in a probabilistic programming language with imprecise probability.

Going beyond statistics, it is possible that there are other scenarios where this approach is useful: making a theory compositional by using a graded theory (for a first purely speculative example, the issues with amb outlined in [54]).

Acknowledgements

It has been helpful to talk to many people about this work. The 2023/2024 Aria workshops emphasised aspects of imprecise probability as discussion topics (and so it was helpful to discuss with davidad, Dylan Braithwaite, Elena Di Lavore, Alex Lew, Owen Lynch, Sean Moss, Zenna Tavares, and others). It has been helpful to discuss with colleagues in Oxford (particularly Paolo Perrone, who is running an adjoint school project on uncertainty in Markov categories, and some time ago Kwok Ho Cheung regarding his thesis work [13]), and at the 2024 Bellairs workshop. We also received helpful feedback from Hugo Paquet, Dario Stein, and MFPS 2024 reviewers, where this material was submitted as an ‘early announcement’.

Research supported by Clarendon Scholarship, ERC Consolidator Grant BLAST, and AFOSR Project FA9550-21-1-0038.

References

  • [1] N. Ackerman, C. Freer, Y. Kaddar, J. Karwowski, S. Moss, D. Roy, S. Staton, and H. Yang. Probabilistic programming interfaces for random graphs: Markov categories, graphons, and nominal sets. In Proc. POPL 2024, 2024.
  • [2] A. Adjé, O. Bouissou, J. Goubault-Larrecq, E. Goubault, and S. Putot. Static analysis of programs with imprecise probabilistic inputs. In Proc. VSTTE 2013, pages 22–47, 2013.
  • [3] A. Appel and V. Kosoy. Basic inframeasure theory. LessWrong, August 2020. https://www.lesswrong.com/posts/YAa4qcMyoucRS2Ykr/basic-inframeasure-theory.
  • [4] A. Appel and V. Kosoy. Inframeasures and domain theory. LessWrong, March 2021. https://www.lesswrong.com/posts/vrbidMiczaoHBhZGp/inframeasures-and-domain-theory.
  • [5] T. Barker. A monad for randomized algorithms. In Proc. MFPS 2016, 2016.
  • [6] J. Bernoulli. Ars conjectandi, opus posthumum. Thurneysen, 1713.
  • [7] F. Bonchi, A. Sokolova, and V. Vignudelli. The theory of traces for systems with nondeterminism and probability. In Proc. LICS 2019, 2019.
  • [8] F. Bonchi, A. Sokolova, and V. Vignudelli. Presenting convex sets of probability distributions by convex semilattices and unique bases. In Proc. CALCO 2021, CALCO 2021.
  • [9] G. Boudol and I. Castellani. Flow models of distributed computations: three equivalent semantics for CCS. Inform. Comput., 114:247–314, 1994.
  • [10] D. Braithwaite, J. Hedges, and T. St Clere Smithe. The compositional structure of Bayesian inference. In Proc. MFCS 2023, 2023.
  • [11] A. Carboni, S. Lack, and R. F. C. Walters. Introduction to extensive and distributive categories. J. Pure Appl. Algebra, 84(2):145–158, 1993.
  • [12] G. L. Cattani, M. P. Fiore, and G. Winskel. A theory of recursive domains with applications to concurrency. In Proc. LICS 1998, 1998.
  • [13] K.-H. Cheung. Distributive Interaction of Algebraic Effects. PhD thesis, University of Oxford Department of Computer Science, 2017.
  • [14] J. R. B. Cockett. Introduction to distributive categories. Math. Struct. Comput. Sci., 3(3):277–307, 1993.
  • [15] F. Dahlqvist, L. Parlant, and A. Silva. Layer by layer: composing monads. In Proc. ICTAC 2018, 2018.
  • [16] S. Dash. A Monadic Theory of Point Processes. PhD thesis, University of Oxford, 2024.
  • [17] S. Dash, Y. Kaddar, H. Paquet, and S. Staton. Affine monads and lazy structures for Bayesian programming. In Proc. POPL 2023, 2023.
  • [18] S. Dash and S. Staton. A monad for probabilistic point processes. In Proc. ACT 2020, 2020.
  • [19] S. Dash and S. Staton. Monads for measurable queries in probabilistic databases. In Proc. MFPS 2021, 2021.
  • [20] D. ‘davidad’ Dalrymple. Safeguarded AI: constructing safety by design. ARIA Programme Thesis, January 2024. https://www.aria.org.uk/wp-content/uploads/2024/01/ARIA-Safeguarded-AI-Programme-Thesis-V1.pdf.
  • [21] B. Day. On closed categories of functors. In Proc. Midwest Category Seminar IV, volume 137 of Lect. Notes Math. Springer, 1969.
  • [22] B. Fong, D. Spivak, and R. Tuyéras. Backprop as functor: A compositional perspective on supervised learning. In Proceedings of the 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’19. IEEE Press, 2021.
  • [23] T. Fritz. A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics. Adv. Math., 370, 2020.
  • [24] T. Fritz and P. Perrone. A criterion for Kan extensions of lax monoidal functors. arxiv:1809.10481, 2018.
  • [25] T. Fritz and P. Perrone. A probability monad as the colimit of spaces of finite samples. Theory and Applications of Categories, 34, 2019.
  • [26] M. Gaboardi, S. Katsumata, D. Orchard, and T. Sato. Graded hoare logic and its categorical semantics. In Proc. ESOP 2021, 2021.
  • [27] B. Gavranovic. Fundamental Components of Deep Learning: A category-theoretic approach. PhD thesis, Strathclyde University, 2024. arXiv:2403.13001.
  • [28] J. Goubault-Larrecq. Continuous previsions. In Proc. CSL 2007, 2007.
  • [29] J. Goubault-Larrecq. Prevision domains and convex powercones. In Proc. FOSSACS 2008, 2008.
  • [30] A. Goy and D. Petrisan. Combining probabilistic and non-deterministic choice via weak distributive laws. In LICS 2020, 2020.
  • [31] C. Hermida and R. Tennent. Monoidal indeterminates and categories of possible worlds. Theoretical Computer Science, 430, 2012.
  • [32] C. Heunen, O. Kammar, S. Staton, and H. Yang. A convenient category for higher-order probability theory. In Proc. LICS 2017, 2017.
  • [33] P. J. Huber. Robust statistics. Wiley, 1981.
  • [34] B. Jacobs. Coalgebraic trace semantics for combined possibilitistic and probabilistic systems. In Proc. CMCS 2008, 2008.
  • [35] B. Jacobs. From probability monads to commutative effectuses. J. Log. Algebr. Methods Program., 94:200–237, 2018.
  • [36] B. Jacobs. From multisets over distributions to distributions over multisets. In Proc. LICS 2021, 2021.
  • [37] B. Jacobs. Structured probabilistic reasoning. Available from the author’s homepage, July 2023. Draft book.
  • [38] B. Jacobs, A. Kissinger, and F. Zanasi. Causal inference by string diagram surgery. In Proc. FOSSACS 2019, 2019.
  • [39] X. Jia, B. Lindenhovius, M. W. Mislove, and V. Zamdzhiev. Commutative monads for probabilistic programming languages. In Proc. LICS 2021, pages 1–14, 2021.
  • [40] C. Jones and G. D. Plotkin. A probabilistic powerdomain of evaluations. In Fourth Annual Symposium on Logic in Computer Science, pages 186–195, June 1989.
  • [41] O. Kammar and G. D. Plotkin. Algebraic foundations for effect-dependent optimisations. In Proc. POPL 2012, pages 349–360, 2012.
  • [42] S. Katsumata. Parametric effect monads and semantics of effect systems. In Proc. POPL 2014, 2014.
  • [43] K. Keimel. Topological cones: Foundations for a domain theoretical semantics combining probability and nondeterminism. In MFPS 2005, 2005.
  • [44] K. Keimel and G. D. Plotkin. Mixed powerdomains for probability and nondeterminism. Log. Methods Comput. Sci., 13, 2017.
  • [45] G. M. Kelly. Basic Concepts of Enriched Category Theory. CUP, 1982.
  • [46] F. H. Knight. Risk, uncertainty and profit. Houghton Mifflin, 1921.
  • [47] A. Kock. Strong functors and monoidal monads. Arch. Math, 23(1):113–120, Dec. 1972.
  • [48] A. Kock. Commutative monads as a theory of distributions. Theory Appl. Categ., 26(4):97–131, 2012.
  • [49] V. Kosoy and A. Appel. Infra-Bayesian physicalism: A formal theory of naturalized induction. AI Alignment Forum, November 2021. Available at https://www.alignmentforum.org/posts/gHgs2e2J5azvGFatb/infra-bayesian-physicalism-a-formal-theory-of-naturalized.
  • [50] D. Kozen and A. Silva. Multisets and distributions. arxiv:2301.10812, 2023.
  • [51] S. Kura. Graded algebraic theories. In Proc. FOSSACS 2020, 2020.
  • [52] J. Laird, G. Manzonetto, G. McCusker, and M. Pagani. Weighted relational models of typed lambda-calculi. In Proc. LICS 2013, 2013.
  • [53] E. D. Lavore and M. Román. Evidential decision theory via partial Markov categories. In Proc. LICS 2023, 2023.
  • [54] P. B. Levy. Amb breaks well-pointedness, ground amb doesn’t. In Proc. MFPS 2007, 2007.
  • [55] P. B. Levy. Locally graded categories. Slides available from the author’s webpage, 2019.
  • [56] P. B. Levy, J. Power, and H. Thielecke. Modelling environments in call-by-value programming languages. Inform. Comput., 185, 2003.
  • [57] A. K. Lew, M. F. Cusumano-Towner, B. Sherman, M. Carbin, and V. K. Maninghka. Trace types and denotational semantics for sound programmable inference in probabilistic languages. In Proc. POPL 2020, 2020.
  • [58] S. Liang, P. Hudak, and M. Jones. Monad transformers and modular interpreters. In Proc. POPL 1995, 1995.
  • [59] S. Mac Lane. Categories for the Working Mathematician. Springer, 1998.
  • [60] D. McDermott and T. Uustalu. Flexibly graded monads and graded algebras. In Proc. MPC 2022, 2022.
  • [61] A. McIver and C. Morgan. Abstraction, Refinement and Proof for Probabilistic Systems. Springer, 2005.
  • [62] A. K. McIver and C. Morgan. Partial correctness for probabilistic demonic programs. Theoret. Comput. Sci., 266:513–541, 2001.
  • [63] M. Mio. Upper-expectation bisimilarity and Łukasiewicz mu-calculus. In Proc. FoSSaCS 2014, 2014.
  • [64] M. Mio, R. Sarkis, and V. Vignudelli. Combining nondeterminism, probability, and termination: Equational and metric reasoning. In Proc. LICS 2021, 2021.
  • [65] M. Mio and V. Vignudelli. Monads and quantitative equational theories for nondeterminism and probability. In Proc. CONCUR 2020, 2020.
  • [66] M. W. Mislove, J. Ouaknine, and J. Worrell. Axioms for probability and nondeterminism. In Proc. EXPRESS 2003, 2003.
  • [67] E. Moggi. Notions of computation and monads. Information and Computation, 1991.
  • [68] D. Orchard, P. Wadler, and H. E. III. Unifying graded and parameterised monads. In Proc. MSFP 2020, 2020.
  • [69] P. Perrone. Markov categories and entropy. IEEE Transactions on Information Theory, 2023.
  • [70] D. Petrisan and R. Sarkis. Semialgebras and weak distributive laws. In Proc. MFPS 2021, 2021.
  • [71] A. M. Pitts. Nominal Sets: names and symmetry in computer science. CUP, 2013.
  • [72] G. D. Plotkin and J. Power. Notions of computation determine monads. In Proc. FOSSACS 2002, 2002.
  • [73] J. Power. Generic models for computational effects. Theor. Comput. Sci., 364(2):254–269, 2006.
  • [74] D. Shiebler. Categorical Stochastic Processes and Likelihood. Compositionality, 3, Apr. 2021.
  • [75] A. Simpson. Probability sheaves and the Giry monad. In Proc. CALCO 2017, 2017.
  • [76] S. Staton. Commutative semantics for probabilistic programming. In Proc. ESOP 2017, pages 855–879, 2017.
  • [77] S. Staton, D. Stein, H. Yang, N. L. Ackerman, C. E. Freer, and D. M. Roy. The Beta-Bernoulli process and algebraic effects. In Proc. ICALP 2018, 2018.
  • [78] D. Stein and R. Samuelson. A categorical treatment of open linear systems, March 2024. arxiv:2403.03934.
  • [79] D. Stein and S. Staton. Compositional semantics for probabilistic programs with exact conditioning. In Proc. LICS 2021, 2021.
  • [80] Z. Tavares, J. Koppel, X. Zhang, R. Das, and A. Solar-Lezama. A language for counterfactual generative models. In Proc. ICML 2021, pages 10173–10182, 2021.
  • [81] M. Vákár, O. Kammar, and S. Staton. A domain theory for statistical probabilistic programming. Proc. ACM Program. Lang., 3(POPL):36:1–36:29, 2019.
  • [82] J.-W. van de Meent, B. Paige, H. Yang, and F. Wood. An introduction to probabilistic programming. arXiv e-print 1809.10756, 2018.
  • [83] G. van Heerdt, J. Hsu, J. Ouaknine, and A. Silva. Convex language semantics for nondeterministic probabilistic automata. In Proc. ICTAC 2018, 2018.
  • [84] D. Varacca and G. Winskel. Distributing probability over non-determinism. Math. Structures Comput. Sci., pages 87–113, 2006.
  • [85] Various authors. Eckmann-Hilton argument. nLab, April 2024. https://ncatlab.org/nlab/show/Eckmann-Hilton+argument#variation.
  • [86] P. Walley. Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, 1991.
  • [87] R. Wood. Indical methods for relative categories. PhD thesis, Dalhousie University, 1976.