ULLER: A Unified Language
for Learning and Reasoning

[Uncaptioned image] Emile van Krieken
University of Edinburgh
&[Uncaptioned image] Samy Badreddine
Sony AI
Fondazione Bruno Kessler
UniTrento
&[Uncaptioned image] Robin Manhaeve
KU Leuven
&[Uncaptioned image] Eleonora Giunchiglia
TU Wien
Abstract

The field of neuro-symbolic artificial intelligence (NeSy), which combines learning and reasoning, has recently experienced significant growth. There now are a wide variety of NeSy frameworks, each with its own specific language for expressing background knowledge and how to relate it to neural networks. This heterogeneity hinders accessibility for newcomers and makes comparing different NeSy frameworks challenging. We propose a language for NeSy, which we call ULLER, a Unified Language for LEarning and Reasoning. ULLER encompasses a wide variety of settings, while ensuring that knowledge described in it can be used in existing NeSy systems. ULLER has a first-order logic syntax specialised for NeSy for which we provide example semantics including classical FOL, fuzzy logic, and probabilistic logic. We believe ULLER is a first step towards making NeSy research more accessible and comparable, paving the way for libraries that streamline training and evaluation across a multitude of semantics, knowledge bases, and NeSy systems.

**footnotetext: Equal contribution.
Correspondence to [email protected] and [email protected].
Accepted at the 18th International Conference on Neural-Symbolic Learning and Reasoning.

1 Introduction

Deep learning has driven innovation in many fields for the past decade. Among the many reasons behind its central role is the ease with which it can be applied to a multitude of problems. Recently, neuro-symbolic (NeSy) methods (see, e.g., [30, 4, 50, 21, 35, 57, 24]), which belong to the NeSy subfield informed machine learning[20, 53] have overcome some well-known problems affecting deep learning models by exploiting background knowledge available for the problem at hand. For example, knowledge can help to train models with fewer data points and/or incomplete supervisions, to create models that comply by design to a set of requirements, and to be more robust in out-of-distribution prediction tasks.

However, background knowledge makes it more challenging to obtain “frictionless reproducibility” [17] which characterises machine learning (ML). Indeed, shared datasets and clear evaluation metrics allow ML practitioners to quickly get started with evaluating new methods and comparing it to existing work. To achieve this goal for NeSy research, we also need frictionless sharing of knowledge. Current NeSy frameworks all have different approaches to encode the background knowledge: some use logical languages, like first-order [4, 35] and propositional logic [2, 56, 21], logic programming [30] or answer set programming [57] - with a wide array of different syntaxes - while other methods use plain Python programs [14]. See Section 5 for an overview. To compare the performance of different NeSy systems, a researcher needs to specify the same knowledge in many languages. This is a significant barrier both for researchers new to the field or even for experts, as it is a time-consuming and error-prone task.

ULLER, a Unified Language for LEarning and Reasoning

We take a first step towards frictionless sharing of knowledge in the NeSy field by proposing a Unified Language for Learning and Reasoning (ULLER, pronounced “OOH-ler” like the god of the Norse mythology). ULLER allows us to express the knowledge used in informed machine learning. Our long-term goal is to create a Python library implementing ULLER to be shared among the significant NeSy systems. First, the user expresses the knowledge in ULLER. Then, they load the data, after which they call different NeSy systems with a single line of code to train neural networks, or to use the knowledge at prediction time. This allows the NeSy community (i)𝑖(\mathit{i})( italic_i ) to define benchmarks that include both data and knowledge, (𝑖𝑖)𝑖𝑖(\mathit{ii})( italic_ii ) to easily compare the available NeSy systems on such benchmarks, and (𝑖𝑖𝑖)𝑖𝑖𝑖(\mathit{iii})( italic_iii ) to lower the entry barrier to NeSy research for the broader machine learning community.

To achieve the above requires us to decouple the syntax of the knowledge representation from the semantics given by the NeSy system of interest. The syntax of ULLER, defined in Section 2, is based on first-order logic (FOL). However, we introduce statements for ULLER. Statements simplify the process of writing down function application and composition - and hence dealing with data sampling and processing pipelines. We opt for a FOL syntax because it generalises propositional logic, while being a common language for declaring general constraints. Secondly, FOL with statements is more familiar to ML researchers, who are mostly used to writing procedural statements like in Python, while having a well-defined semantics for logicians. Finally, FOL is highly expressive: We believe that it can express all knowledge currently used in NeSy methods.

The semantics of ULLER (Section 3), depends on (i)𝑖(\mathit{i})( italic_i ) an interpretation, often referred to as a “symbol grounding” [23], which maps symbols to meanings, and (𝑖𝑖)𝑖𝑖(\mathit{ii})( italic_ii ) a “NeSy system”, which takes knowledge and its interpretation, and computes loss functions and outputs accordingly. We formalise the differences between NeSy systems by what they compute given a program in ULLER and an interpretation. For classical boolean semantics, ULLER is equivalent to standard FOL. However, we also provide example semantics for several common NeSy frameworks of fuzzy logic (such as Logic Tensor Networks [4]) and probabilistic logic (such as Semantic Loss [56] and DeepProbLog [30]). This highlights the flexibility of our language, as it can be used to express knowledge in many formalisms.

2 Syntax of ULLER

Let 𝒱𝒱\mathcal{V}caligraphic_V be a set of variable symbols, 𝒞𝒞\mathcal{C}caligraphic_C be a set of constant symbols, 𝒟𝒟\mathcal{D}caligraphic_D be a set of domain symbols, 𝒫𝒫\mathcal{P}caligraphic_P be a set of predicate symbols, 𝒯𝒯\mathcal{T}caligraphic_T be a set of property symbols, and \mathcal{F}caligraphic_F be a set of function symbols. We then define the syntax of ULLER ULLERsubscriptULLER\mathcal{L}_{\text{{ULLER}}}caligraphic_L start_POSTSUBSCRIPT ULLER end_POSTSUBSCRIPT as a context-free grammar:

F𝐹\displaystyle Fitalic_F ::=xD(F)|xD(F)\displaystyle::=\forall x\in D\ (F)\ |\ \exists x\in D\ (F): := ∀ italic_x ∈ italic_D ( italic_F ) | ∃ italic_x ∈ italic_D ( italic_F ) (1)
F𝐹\displaystyle Fitalic_F ::=FF|FF|FF|¬F|P(T,,T)|(F)\displaystyle::=F\land F\ |\ F\lor F\ |\ F\Rightarrow F\ |\ \neg F\ |\ \mathrm% {P}(T,...,T)\ |\ (F): := italic_F ∧ italic_F | italic_F ∨ italic_F | italic_F ⇒ italic_F | ¬ italic_F | roman_P ( italic_T , … , italic_T ) | ( italic_F )
F𝐹\displaystyle Fitalic_F ::=x:=f(T,,T)(F)\displaystyle::=x:=f(T,...,T)\ (F): := italic_x := italic_f ( italic_T , … , italic_T ) ( italic_F )
T𝑇\displaystyle Titalic_T ::=x|c|T.prop|T+T|TT|\displaystyle::=x\ |\ c\ |\ T.\text{prop}\ |\ T+T\ |\ T-T\ |\ \dots: := italic_x | italic_c | italic_T . prop | italic_T + italic_T | italic_T - italic_T | …

where D𝒟𝐷𝒟D\in\mathcal{D}italic_D ∈ caligraphic_D, x𝒱𝑥𝒱x\in\mathcal{V}italic_x ∈ caligraphic_V, c𝒞𝑐𝒞c\in\mathcal{C}italic_c ∈ caligraphic_C, f,+,f,+,-\in\mathcal{F}italic_f , + , - ∈ caligraphic_F, P𝒫P𝒫\mathrm{P}\in\mathcal{P}roman_P ∈ caligraphic_P and prop𝒯prop𝒯\text{prop}\in\mathcal{T}prop ∈ caligraphic_T. The nonterminal symbol F𝐹Fitalic_F is a formula and T𝑇Titalic_T is a term. We call x:=f(T,,T)(F)assign𝑥𝑓𝑇𝑇𝐹x:=f(T,\dots,T)(F)italic_x := italic_f ( italic_T , … , italic_T ) ( italic_F ) a statement, or just statement, which we discuss in Section 2.1. Notice that, except for basic arithmetic operations (+++, --, …), functions only appear in statements.

The syntax of ULLER does not include a special syntactic construct for neural networks. Instead, we treat them as functions, where the intended meaning is given by the semantics specified by the NeSy system. We therefore hide how the NeSy system uses the neural networks to the user, so the focus is on specifying constraints rather than implementation details.

Syntactic Sugar. We use x1D1,x2D2(F)formulae-sequencefor-allsubscript𝑥1subscript𝐷1subscript𝑥2subscript𝐷2𝐹\forall x_{1}\in D_{1},x_{2}\in D_{2}(F)∀ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_F ) as syntactic sugar for x1D2(x2D2(F))for-allsubscript𝑥1subscript𝐷2for-allsubscript𝑥2subscript𝐷2𝐹\forall x_{1}\in D_{2}\ (\forall x_{2}\in D_{2}\ (F))∀ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∀ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_F ) ) for the quantifiers. We use x1:=f1(T,,T),x2:=f2(T,,T)(F)formulae-sequenceassignsubscript𝑥1subscript𝑓1𝑇𝑇assignsubscript𝑥2subscript𝑓2𝑇𝑇𝐹x_{1}:=f_{1}(T,\dots,T),x_{2}:=f_{2}(T,\dots,T)\ (F)italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_T , … , italic_T ) , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT := italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T , … , italic_T ) ( italic_F ) as syntactic sugar for x1:=f1(T,,T)(x2:=f2(T,,T)(F))assignsubscript𝑥1subscript𝑓1𝑇𝑇assignsubscript𝑥2subscript𝑓2𝑇𝑇𝐹x_{1}:=f_{1}(T,\dots,T)\ (x_{2}:=f_{2}(T,\dots,T)\ (F))italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_T , … , italic_T ) ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT := italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_T , … , italic_T ) ( italic_F ) ). Finally, we also allow for binary predicates in infix notation, such as TT𝑇𝑇T\leq Titalic_T ≤ italic_T.

Ty**. ULLER is a dynamically typed language. We do not guarantee syntactically nor via type checker that functions and predicates only take arguments from the domain defined in their interpretations. This mimics the design of the type system of Python.

2.1 Statements

A key design choice of ULLER is the use of special statements x:=f(T,,T)(F)assign𝑥𝑓𝑇𝑇𝐹x:=f(T,...,T)\ (F)italic_x := italic_f ( italic_T , … , italic_T ) ( italic_F ) to declare (possibly random) variables obtained by applying (possibly non-deterministic) functions. The function symbols f𝑓fitalic_f appear only in statements, and not in the definition of terms T𝑇Titalic_T, like in standard FOL. Statements simplify the composition of functions. They give a syntax that is both familiar to ML researchers who are used to writing Python, and gives a clear separation between the machine learning pipeline that processes data and the constraints on the data given by the logic. We will motivate statements with the two following examples.

Example 2.1 (Procedural composition of functions).

Consider the MNISTAdd example from Appendix A. To emphasise the ease of composing functions in ULLER, consider a scenario where the classifier f𝑓fitalic_f expects greyscale images while the data points in the dataset T𝑇Titalic_T are RGB images. We can easily apply transformations and formulate the new condition using ULLER statements:

xT(x1:=greyscale(x.im1),x1′′:=normalise(x1),x2:=greyscale(x.im2),x2′′:=normalise(x2),n1:=f(x1′′),n2:=f(x2′′)(n1+n2=x.sum))\displaystyle\begin{split}&\forall x\in T(\\ &\quad x_{1}^{\prime}:=\mathrm{greyscale}(x.\mathrm{im1}),x_{1}^{\prime\prime}% :=\mathrm{normalise}(x_{1}^{\prime}),\\ &\quad x_{2}^{\prime}:=\mathrm{greyscale}(x.\mathrm{im2}),x_{2}^{\prime\prime}% :=\mathrm{normalise}(x_{2}^{\prime}),\\ &\quad n_{1}:=f(x_{1}^{\prime\prime}),n_{2}:=f(x_{2}^{\prime\prime})\\ &\qquad(n_{1}+n_{2}=x.\mathrm{sum})\\ &)\end{split}start_ROW start_CELL end_CELL start_CELL ∀ italic_x ∈ italic_T ( end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT := roman_greyscale ( italic_x . im1 ) , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT := roman_normalise ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT := roman_greyscale ( italic_x . im2 ) , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT := roman_normalise ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := italic_f ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT := italic_f ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_x . roman_sum ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ) end_CELL end_ROW (2)

\triangleleft

Example 2.2 (Sco** independence).

Another key feature of ULLER statements is that they explicitly delimit the scopes of variables, giving control over the memoisation and independence assumptions. Consider a non-deterministic function dice()dice\mathrm{dice}()roman_dice ( ) which associates a probability to each outcome of a six-sided dice throw. Consider the following program written in ULLER:

x:=dice()(x=6even(x)).assign𝑥dice𝑥6even𝑥x:=\mathrm{dice}()\ (x=6\land\mathrm{even}(x)).italic_x := roman_dice ( ) ( italic_x = 6 ∧ roman_even ( italic_x ) ) . (3)

The formula asks whether a die-throw outcome is both a six and even. For a fair dice, the probability of the formula is 1616\frac{1}{6}divide start_ARG 1 end_ARG start_ARG 6 end_ARG under probabilistic semantics.

Now consider the alternative ULLER program:

(x:=dice()(x=6))(x:=dice()(even(x))).assign𝑥dice𝑥6assign𝑥diceeven𝑥(x:=\mathrm{dice}()\ (x=6))\land(x:=\mathrm{dice}()\ (\mathrm{even}(x))).( italic_x := roman_dice ( ) ( italic_x = 6 ) ) ∧ ( italic_x := roman_dice ( ) ( roman_even ( italic_x ) ) ) . (4)

In this program, we throw two independent dice, and check if the first lands on six and the second is even. For fair dice, the probability of this formula is 1612=1121612112\frac{1}{6}\cdot\frac{1}{2}=\frac{1}{12}divide start_ARG 1 end_ARG start_ARG 6 end_ARG ⋅ divide start_ARG 1 end_ARG start_ARG 2 end_ARG = divide start_ARG 1 end_ARG start_ARG 12 end_ARG.

Consider a similar program in regular FOL (which is not allowed in ULLER):

(dice()=6)even(dice())dice6evendice(\mathrm{dice}()=6)\land\mathrm{even}(\mathrm{dice}())( roman_dice ( ) = 6 ) ∧ roman_even ( roman_dice ( ) ) (5)

Here, it is ambiguous whether the outcomes of the dice are shared like in the ULLER program of (3) or not, like in (4). Many probabilistic NeSy frameworks choose the first option and memoise the outcome of the function. We argue that this behaviour should not be a default assumption from the NeSy system. Instead, dependence and memoisation scopes should be explicitly defined by the program. ULLER statements give researchers control over these scopes. \triangleleft

3 Semantics of ULLER

In this Section, we define the semantics of ULLER. In Section 3.1 we discuss how ULLER interprets the symbols in the language, such as the function and domain symbols. Then, in Section 3.2, we discuss how some example NeSy systems interpret the formulas in ULLER.

3.1 Interpretation of the Symbols

To assign meaning to ULLER programs, we need to interpret the non-logical symbols in ULLERsubscriptULLER\mathcal{L}_{\text{{ULLER}}}caligraphic_L start_POSTSUBSCRIPT ULLER end_POSTSUBSCRIPT, that is, 𝒟𝒟\mathcal{D}caligraphic_D, 𝒫𝒫\mathcal{P}caligraphic_P, \mathcal{F}caligraphic_F, and 𝒞𝒞\mathcal{C}caligraphic_C, using an interpretation function I𝐼Iitalic_I.

Definition 3.1.

An interpretation I𝐼Iitalic_I is a function assigning a meaning to the symbols in ULLERsubscriptULLER\mathcal{L}_{\text{{ULLER}}}caligraphic_L start_POSTSUBSCRIPT ULLER end_POSTSUBSCRIPT under the following rules, where Ω1,,Ωn,Ωn+1subscriptΩ1subscriptΩ𝑛subscriptΩ𝑛1\Omega_{1},...,\Omega_{n},\Omega_{n+1}roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , roman_Ω start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT are sets.

  1. 1.

    The interpretation of a domain D𝒟𝐷𝒟D\in\mathcal{D}italic_D ∈ caligraphic_D is a set ΩΩ\Omegaroman_Ω.

  2. 2.

    The interpretation of a predicate PP\mathrm{P}roman_P of arity n𝑛nitalic_n is a function of n𝑛nitalic_n domains to {0,1}01\{0,1\}{ 0 , 1 }. That is, I(P):Ω1××Ωn{0,1}:𝐼PsubscriptΩ1subscriptΩ𝑛01I(\mathrm{P}):\Omega_{1}\times...\times\Omega_{n}\rightarrow\{0,1\}italic_I ( roman_P ) : roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × … × roman_Ω start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → { 0 , 1 }.

  3. 3.

    The interpretation of the predicate truePtrueP\mathrm{true}\in\mathrm{P}roman_true ∈ roman_P is the identity function on {0,1}01\{0,1\}{ 0 , 1 }, that is, I(true):{0,1}{0,1}:𝐼true0101I(\mathrm{true}):\{0,1\}\rightarrow\{0,1\}italic_I ( roman_true ) : { 0 , 1 } → { 0 , 1 } such that I(true)(x)=x𝐼true𝑥𝑥I(\mathrm{true})(x)=xitalic_I ( roman_true ) ( italic_x ) = italic_x.

  4. 4.

    The interpretation of a constant c𝑐citalic_c is an element of a domain I(c)Ωi𝐼𝑐subscriptΩ𝑖I(c)\in\Omega_{i}italic_I ( italic_c ) ∈ roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

  5. 5.

    The interpretation of a function f𝑓fitalic_f of arity n𝑛nitalic_n is a conditional probability distribution111To be precise, our definition is equivalent to a probability kernel or Markov kernel, which is a formalisation of the concept of a conditional probability distribution in measure theory. I(f):Ω1××Ωn(Ωn+1[0,1]):𝐼𝑓subscriptΩ1subscriptΩ𝑛subscriptΩ𝑛101I(f):\Omega_{1}\times...\times\Omega_{n}\rightarrow(\Omega_{n+1}\rightarrow[0,% 1])italic_I ( italic_f ) : roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × … × roman_Ω start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → ( roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT → [ 0 , 1 ] ). That is, for any set of inputs x1Ω1,,xnΩnformulae-sequencesubscript𝑥1subscriptΩ1subscript𝑥𝑛subscriptΩ𝑛x_{1}\in\Omega_{1},...,x_{n}\in\Omega_{n}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, I(f)(x1,,xn)𝐼𝑓subscript𝑥1subscript𝑥𝑛I(f)(x_{1},...,x_{n})italic_I ( italic_f ) ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) is a probability distribution on the domain Ωn+1subscriptΩ𝑛1\Omega_{n+1}roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT. If for all x1Ω1,,xnΩnformulae-sequencesubscript𝑥1subscriptΩ1subscript𝑥𝑛subscriptΩ𝑛x_{1}\in\Omega_{1},...,x_{n}\in\Omega_{n}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT the probability distribution I(f)(x1,,xn)𝐼𝑓subscript𝑥1subscript𝑥𝑛I(f)(x_{1},...,x_{n})italic_I ( italic_f ) ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) is a deterministic distribution, we say that I(f)𝐼𝑓I(f)italic_I ( italic_f ) is a deterministic function.

Next, we give a probabilistic interpretation to both domains and functions. In particular, we treat functions, such as neural networks, as a conditional distribution given assignments x1Ω1,,xnΩnformulae-sequencesubscript𝑥1subscriptΩ1subscript𝑥𝑛subscriptΩ𝑛x_{1}\in\Omega_{1},...,x_{n}\in\Omega_{n}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT to input variables. This allows us to represent the uncertainty of the neural networks, which NeSy systems using, for example, probabilistic and fuzzy semantics can use to compute probabilities and fuzzy truth values. We will also frequently want to use regular (deterministic) functions f:Ω1××ΩnΩn+1:𝑓subscriptΩ1subscriptΩ𝑛subscriptΩ𝑛1f:\Omega_{1}\times...\times\Omega_{n}\rightarrow\Omega_{n+1}italic_f : roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × … × roman_Ω start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT. A regular function is a special case of a conditional distribution that we refer to as a deterministic function. We define deterministic functions with a conditional distribution using the Dirac delta distribution at f(x1,,xn)𝑓subscript𝑥1subscript𝑥𝑛f(x_{1},...,x_{n})italic_f ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) for continuous distributions, and a distribution that assigns 1 to the output value f(x1,,xn)𝑓subscript𝑥1subscript𝑥𝑛f(x_{1},...,x_{n})italic_f ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) for finite domains, and 0 to the other values.

3.2 Semantics of neuro-symbolic systems

Refer to caption
Figure 1: The meaning of an example ULLER formula under classical, probabilistic and fuzzy semantics. We interpret the function symbols as conditional distributions. Refer to caption:X({Refer to caption,Refer to caption,Refer to caption}[0,1]):Refer to caption𝑋Refer to captionRefer to captionRefer to caption01\text{{\includegraphics[width=7.5347pt]{figs/vertical-traffic-light.png}}}:X% \rightarrow(\{\text{{\includegraphics[width=7.5347pt]{figs/red-circle2.png}}},% \text{{\includegraphics[width=7.5347pt]{figs/orange-circle2.png}}},\text{{% \includegraphics[width=7.5347pt]{figs/green-circle2.png}}}\}\rightarrow[0,1]): italic_X → ( { , , } → [ 0 , 1 ] ) detects the concepts for red, orange, and green in the crossing representations.  Refer to caption Refer to caption :(X×{Refer to caption,Refer to caption,Refer to caption})({0,1}[0,1]): Refer to caption Refer to caption 𝑋Refer to captionRefer to captionRefer to caption0101\text{ {\includegraphics[width=7.5347pt]{figs/automobile.png}} {\includegraphics[width=7.5347pt]{figs/dashing-away.png}} }:(X\times\{\text{{\includegraphics[width=7.5347pt]{figs/red-circle2.png}}},% \text{{\includegraphics[width=7.5347pt]{figs/orange-circle2.png}}},\text{{% \includegraphics[width=7.5347pt]{figs/green-circle2.png}}}\})\rightarrow(\{0,1% \}\rightarrow[0,1]): ( italic_X × { , , } ) → ( { 0 , 1 } → [ 0 , 1 ] ) takes the decision of continuing given an extracted color light concept and the rest of the crossing scene. With abuse of notation, we ignore I()𝐼I()italic_I ( ) and \llbracket\rrbracket⟦ ⟧.

We next define the meaning of a formula in ULLERsubscriptULLER\mathcal{L}_{\text{{ULLER}}}caligraphic_L start_POSTSUBSCRIPT ULLER end_POSTSUBSCRIPT, which requires both an interpretation I𝐼Iitalic_I and a NeSy system {\llbracket\rrbracket}⟦ ⟧. Here, {\llbracket\rrbracket}⟦ ⟧ is a function that interprets the semantics of the program statements in ULLERsubscriptULLER\mathcal{L}_{\text{{ULLER}}}caligraphic_L start_POSTSUBSCRIPT ULLER end_POSTSUBSCRIPT. We also need a variable assignment η:𝒱𝒪:𝜂𝒱𝒪\eta:\mathcal{V}\rightarrow\mathcal{O}italic_η : caligraphic_V → caligraphic_O that maps variables v𝒱𝑣𝒱v\in\mathcal{V}italic_v ∈ caligraphic_V to an element of a domain 𝒪=iΩi𝒪subscript𝑖subscriptΩ𝑖\mathcal{O}=\cup_{i}\Omega_{i}caligraphic_O = ∪ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where Ωi=I(Di)subscriptΩ𝑖𝐼subscript𝐷𝑖\Omega_{i}=I(D_{i})roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_I ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is a set associated to a domain Di𝒟subscript𝐷𝑖𝒟D_{i}\in\mathcal{D}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_D.

Definition 3.2.

A NeSy structure is a tuple (I,η,,I,η)(I,\eta,\mathcal{B},{\llbracket\rrbracket}_{I,\eta})( italic_I , italic_η , caligraphic_B , ⟦ ⟧ start_POSTSUBSCRIPT italic_I , italic_η end_POSTSUBSCRIPT ) where I𝐼Iitalic_I is an interpretation, η:𝒱𝒪:𝜂𝒱𝒪\eta:\mathcal{V}\rightarrow\mathcal{O}italic_η : caligraphic_V → caligraphic_O is a variable assignment, \mathcal{B}caligraphic_B is a set of outputs and I,η:ULLER𝒪{\llbracket\rrbracket}_{I,\eta}:\mathcal{L}_{\text{{ULLER}}}\rightarrow% \mathcal{B}\cup\mathcal{O}⟦ ⟧ start_POSTSUBSCRIPT italic_I , italic_η end_POSTSUBSCRIPT : caligraphic_L start_POSTSUBSCRIPT ULLER end_POSTSUBSCRIPT → caligraphic_B ∪ caligraphic_O is a neuro-symbolic system which is a function that assigns an output in \mathcal{B}caligraphic_B to each formula in ULLERsubscriptULLER\mathcal{L}_{\text{{ULLER}}}caligraphic_L start_POSTSUBSCRIPT ULLER end_POSTSUBSCRIPT and a domain element in 𝒪𝒪\mathcal{O}caligraphic_O for terms T𝑇Titalic_T. If the interpretation and variable assignment are clear from the context, we write {\llbracket\rrbracket}⟦ ⟧ for I,η{\llbracket\rrbracket}_{I,\eta}⟦ ⟧ start_POSTSUBSCRIPT italic_I , italic_η end_POSTSUBSCRIPT.

We discuss several NeSy systems and their semantics for the NeSy language in the following sections, and provide a visual overview in Figure 1. Each NeSy system is defined over some set of outputs \mathcal{B}caligraphic_B. For example, classical logic is defined over the output {0,1}01\{0,1\}{ 0 , 1 }, while fuzzy logics are defined over the interval [0,1]01[0,1][ 0 , 1 ]. A neuro-symbolic system I,η{\llbracket\rrbracket}_{I,\eta}⟦ ⟧ start_POSTSUBSCRIPT italic_I , italic_η end_POSTSUBSCRIPT defines the semantics of a language expression. When a language expression is a term T𝑇Titalic_T, I,η{\llbracket\rrbracket}_{I,\eta}⟦ ⟧ start_POSTSUBSCRIPT italic_I , italic_η end_POSTSUBSCRIPT returns an element of the universe 𝒪𝒪\mathcal{O}caligraphic_O. When the language expression is a formula F𝐹Fitalic_F, it returns an element in \mathcal{B}caligraphic_B.

Notation. We use η[xa]𝜂delimited-[]maps-to𝑥𝑎\eta[x\mapsto a]italic_η [ italic_x ↦ italic_a ] to update a variable assignment η𝜂\etaitalic_η with the assignment of a𝑎aitalic_a to x𝑥xitalic_x:

η[xa](x)𝜂delimited-[]maps-to𝑥𝑎𝑥\displaystyle\eta[x\mapsto a](x)italic_η [ italic_x ↦ italic_a ] ( italic_x ) =aabsent𝑎\displaystyle=a= italic_a (6)
η[xa](x)𝜂delimited-[]maps-to𝑥𝑎superscript𝑥\displaystyle\eta[x\mapsto a](x^{\prime})italic_η [ italic_x ↦ italic_a ] ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) =η(x)forxxformulae-sequenceabsent𝜂superscript𝑥forsuperscript𝑥𝑥\displaystyle=\eta(x^{\prime})\quad\text{for}\ x^{\prime}\neq x= italic_η ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) for italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ italic_x

We also define pf(a|T1,,Tn)=I(f)(T1,,Tn)(a){p}_{f}(a|T_{1},...,T_{n})=I(f)(\llbracket T_{1}\rrbracket,...,\llbracket T_{n% }\rrbracket)(a)italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_a | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = italic_I ( italic_f ) ( ⟦ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ , … , ⟦ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟧ ) ( italic_a ), which computes the probability of the element aΩn+1𝑎subscriptΩ𝑛1a\in\Omega_{n+1}italic_a ∈ roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT under the distribution I(f)𝐼𝑓I(f)italic_I ( italic_f ), conditioned on the interpretation of the terms T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to Tnsubscript𝑇𝑛T_{n}italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. That is, under T1,,Tn\llbracket T_{1}\rrbracket,...,\llbracket T_{n}\rrbracket⟦ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ , … , ⟦ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟧. In the coming sections, we will frequently use this shorthand to talk about the semantics of the different NeSy systems.

3.3 Classical semantics

We first define the semantics of the NeSy language if we “choose” an option deterministically from a conditional distribution. Then, under the classical (boolean) semantics of the logical symbols, ULLER is a regular first-order logic: It becomes exactly as expressive as FOL. In this paper, we will make the deterministic choice from a distribution by taking the mode, that is, the most likely output of the conditional distribution. However, other choices are also possible.

Definition 3.3.

The classical structure (I,η,{0,1},I,ηC)(I,\eta,\{0,1\},\llbracket\rrbracket^{\text{C}}_{I,\eta})( italic_I , italic_η , { 0 , 1 } , ⟦ ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η end_POSTSUBSCRIPT ) is defined on boolean outputs {0,1}01\{0,1\}{ 0 , 1 } as:

xD(F)I,ηC\displaystyle\llbracket\forall x\in D\ (F)\rrbracket^{\text{C}}_{I,\eta}⟦ ∀ italic_x ∈ italic_D ( italic_F ) ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η end_POSTSUBSCRIPT =minaI(D)FI,η[xa]C\displaystyle=\min_{a\in I(D)}\llbracket F\rrbracket^{\text{C}}_{I,\eta[x% \mapsto a]}= roman_min start_POSTSUBSCRIPT italic_a ∈ italic_I ( italic_D ) end_POSTSUBSCRIPT ⟦ italic_F ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η [ italic_x ↦ italic_a ] end_POSTSUBSCRIPT (7)
xD(F)C\displaystyle\llbracket\exists x\in D\ (F)\rrbracket^{\text{C}}⟦ ∃ italic_x ∈ italic_D ( italic_F ) ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT =¬xD(¬F)C\displaystyle=\llbracket\neg\forall x\in D\ (\neg F)\rrbracket^{\text{C}}= ⟦ ¬ ∀ italic_x ∈ italic_D ( ¬ italic_F ) ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT (8)
F1F2C=min(F1C,F2C)\displaystyle\llbracket F_{1}\wedge F_{2}\rrbracket^{\text{C}}=\min(\llbracket F% _{1}\rrbracket^{\text{C}},\llbracket F_{2}\rrbracket^{\text{C}})⟦ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT = roman_min ( ⟦ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT , ⟦ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT ) ,F1F2C=¬(¬F1¬F2)C\displaystyle,\ \llbracket F_{1}\lor F_{2}\rrbracket^{\text{C}}=\llbracket\neg% (\neg F_{1}\land\neg F_{2})\rrbracket^{\text{C}}, ⟦ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∨ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT = ⟦ ¬ ( ¬ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ ¬ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT (9)
¬FC=1FC\displaystyle\llbracket\neg F\rrbracket^{\text{C}}=1-\llbracket F\rrbracket^{% \text{C}}⟦ ¬ italic_F ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT = 1 - ⟦ italic_F ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT ,F1F2C=¬F1F2C\displaystyle,\ \llbracket F_{1}\Rightarrow F_{2}\rrbracket^{\text{C}}=% \llbracket\neg F_{1}\lor F_{2}\rrbracket^{\text{C}}, ⟦ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⇒ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT = ⟦ ¬ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∨ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT (10)
P(T1,,Tn)I,ηC\displaystyle\llbracket P(T_{1},\dots,T_{n})\rrbracket^{\text{C}}_{I,\eta}⟦ italic_P ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η end_POSTSUBSCRIPT =I(P)(T1C,,TnC)\displaystyle=I(P)(\llbracket T_{1}\rrbracket^{\text{C}},\dots,\llbracket T_{n% }\rrbracket^{\text{C}})= italic_I ( italic_P ) ( ⟦ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT , … , ⟦ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT ) (11)
xI,ηC=η(x)\displaystyle\llbracket x\rrbracket^{\text{C}}_{I,\eta}=\eta(x)⟦ italic_x ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η end_POSTSUBSCRIPT = italic_η ( italic_x ) ,cC=I(c)\displaystyle,\ \llbracket c\rrbracket^{\text{C}}=I(c), ⟦ italic_c ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT = italic_I ( italic_c ) (12)
T1+T2C\displaystyle\llbracket T_{1}+T_{2}\rrbracket^{\text{C}}⟦ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT =T1C+T2C\displaystyle=\llbracket T_{1}\rrbracket^{\text{C}}+\llbracket T_{2}\rrbracket% ^{\text{C}}= ⟦ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT + ⟦ italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT (13)
T.propC\displaystyle\llbracket T.\mathrm{prop}\rrbracket^{\text{C}}⟦ italic_T . roman_prop ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT =get(TC,prop)\displaystyle=\mathrm{get}(\llbracket T\rrbracket^{\text{C}},\mathrm{prop})= roman_get ( ⟦ italic_T ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT , roman_prop ) (14)
x:=f(T1,,Tn)(F)I,ηC\displaystyle\llbracket x:=f(T_{1},...,T_{n})(F)\rrbracket^{\text{C}}_{I,\eta}⟦ italic_x := italic_f ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( italic_F ) ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η end_POSTSUBSCRIPT =FI,η[xargmaxaΩn+1pf(a|T1,,Tn)]C\displaystyle=\llbracket F\rrbracket^{\text{C}}_{I,\eta[x\mapsto\operatorname*% {argmax}_{a\in\Omega_{n+1}}{p}_{f}(a|T_{1},...,T_{n})]}= ⟦ italic_F ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η [ italic_x ↦ roman_argmax start_POSTSUBSCRIPT italic_a ∈ roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_a | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] end_POSTSUBSCRIPT (15)

In Equation 14, get(TC,prop)\mathrm{get}(\llbracket T\rrbracket^{\text{C}},\mathrm{prop})roman_get ( ⟦ italic_T ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT , roman_prop ) is a deterministic function that retrieves the value of an object property.

Equation 15 demands some explanation. The argmax\arg\maxroman_arg roman_max takes the probability distribution given by the interpretation of the function f𝑓fitalic_f and chooses a value from the codomain Ωn+1subscriptΩ𝑛1\Omega_{n+1}roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT. In the classical structure, this choice is made deterministically by picking the mode of the distribution: the most likely element a𝑎aitalic_a. Then we assign this element a𝑎aitalic_a to the variable x𝑥xitalic_x through the variable assignment η[xa]𝜂delimited-[]maps-to𝑥𝑎\eta[x\mapsto a]italic_η [ italic_x ↦ italic_a ], and evaluate the rest of the formula F𝐹Fitalic_F under this new assignment.

Importantly, the classic semantics allows us to prove whether a neuro-symbolic system “is faithful” to classical logic when all functions are deterministic. We formally introduce this notion by noting we can transform any program into a deterministic program by choosing the mode of the distribution like in Equation 15.

Definition 3.4.

For some interpretation I𝐼Iitalic_I, the mode interpretation I^^𝐼\hat{I}over^ start_ARG italic_I end_ARG is another interpretation such that for all function symbols f𝑓f\in\mathcal{F}italic_f ∈ caligraphic_F, I^(f)^𝐼𝑓\hat{I}(f)over^ start_ARG italic_I end_ARG ( italic_f ) returns the mode of pfsubscript𝑝𝑓{p}_{f}italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT. That is, p^f(a|T1,,Tn)=δ(aargmaxapf(a|T1,,Tn))subscript^𝑝𝑓conditional𝑎subscript𝑇1subscript𝑇𝑛𝛿𝑎subscriptsuperscript𝑎subscript𝑝𝑓conditionalsuperscript𝑎subscript𝑇1subscript𝑇𝑛\hat{p}_{f}(a|T_{1},...,T_{n})=\delta(a-\arg\max_{a^{\prime}}{p}_{f}(a^{\prime% }|T_{1},...,T_{n}))over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_a | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = italic_δ ( italic_a - roman_arg roman_max start_POSTSUBSCRIPT italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ), where δ𝛿\deltaitalic_δ is the Dirac delta distribution. Then a neuro-symbolic system {\llbracket\rrbracket}⟦ ⟧ is classical in the limit if for all language statements LULLER𝐿subscriptULLERL\in\mathcal{L}_{\text{{ULLER}}}italic_L ∈ caligraphic_L start_POSTSUBSCRIPT ULLER end_POSTSUBSCRIPT, LI^,η=LI,ηC\llbracket L\rrbracket_{\hat{I},\eta}=\llbracket L\rrbracket^{\text{C}}_{I,\eta}⟦ italic_L ⟧ start_POSTSUBSCRIPT over^ start_ARG italic_I end_ARG , italic_η end_POSTSUBSCRIPT = ⟦ italic_L ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η end_POSTSUBSCRIPT.

3.4 Probabilistic Semantics

Probabilistic semantics, also known as weighted model counting or possible world semantics in the literature, computes the probability that a formula is true. This is done by iterating over all possible assignments to the variables. We give a straightforward implementation of a probabilistic semantics for ULLER in Definition 3.5, but note that other probabilistic semantics exist which would require different NeSy systems.

In the upcoming definitions, we will not redefine semantics whenever it is equal to the classical semantics, up to domain differences. For instance, we will not repeat constants and variable semantics.

Definition 3.5.

The probabilistic structure (I,η,[0,1],P)(I,\eta,[0,1],\llbracket\rrbracket^{\text{P}})( italic_I , italic_η , [ 0 , 1 ] , ⟦ ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT ) is defined on probabilities [0,1]01[0,1][ 0 , 1 ] as:

xD(F)P\displaystyle\llbracket\forall x\in D\ (F)\rrbracket^{\text{P}}⟦ ∀ italic_x ∈ italic_D ( italic_F ) ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT =aI(D)FI,η[xa]P\displaystyle=\prod_{a\in I(D)}\llbracket F\rrbracket^{\text{P}}_{I,\eta[x% \rightarrow a]}= ∏ start_POSTSUBSCRIPT italic_a ∈ italic_I ( italic_D ) end_POSTSUBSCRIPT ⟦ italic_F ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η [ italic_x → italic_a ] end_POSTSUBSCRIPT (16)
F1F2P\displaystyle\llbracket F_{1}\wedge F_{2}\rrbracket^{\text{P}}⟦ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT =F1PF2P\displaystyle=\llbracket F_{1}\rrbracket^{\text{P}}\cdot\llbracket F_{2}% \rrbracket^{\text{P}}= ⟦ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT ⋅ ⟦ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT (17)
x:=f(T1,,Tn)(F)P\displaystyle\llbracket x:=f(T_{1},...,T_{n})\ (F)\rrbracket^{\text{P}}⟦ italic_x := italic_f ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( italic_F ) ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT =𝔼apf(|T1,,Tn)[FI,η[xa]P]\displaystyle=\mathbb{E}_{a\sim{p}_{f}(\cdot|T_{1},...,T_{n})}\left[\llbracket F% \rrbracket^{\text{P}}_{I,\eta[x\mapsto a]}\right]= blackboard_E start_POSTSUBSCRIPT italic_a ∼ italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ⋅ | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ⟦ italic_F ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η [ italic_x ↦ italic_a ] end_POSTSUBSCRIPT ] (18)

In probabilistic semantics, a function f(x)𝑓𝑥f(x)italic_f ( italic_x ) is interpreted as a conditional distribution conditioned on x𝑥xitalic_x. In this case, we require computing the expectation of the formulas being true under the interpreted functions. This happens in Equation 18. We also define universal aggregation as a product of independent probabilities in Equation 16, reflecting the common i.i.d. assumption in Machine Learning. This assumption may not hold [43], in which case users can define more sophisticated NeSy systems to model a different universal aggregation behaviour.

The computation of the expectation depends on whether the output domain Ωn+1subscriptΩ𝑛1\Omega_{n+1}roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT is discrete or continuous. For discrete domains, Equation 18 equals

x:=f(T1,,Tn)(F)P=aΩn+1pf(a|T1,,Tn)FI,η[xa]P,\llbracket x:=f(T_{1},...,T_{n})\ (F)\rrbracket^{\text{P}}=\displaystyle\sum_{% a\in\Omega_{n+1}}{p}_{f}(a|T_{1},...,T_{n})\cdot\llbracket F\rrbracket^{\text{% P}}_{I,\eta[x\mapsto a]},⟦ italic_x := italic_f ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( italic_F ) ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_a ∈ roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_a | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⋅ ⟦ italic_F ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η [ italic_x ↦ italic_a ] end_POSTSUBSCRIPT , (19)

while for continuous domains it equals

x:=f(T1,,Tn)(F)P=aΩn+1pf(a|T1,,Tn)FI,η[xa]Pda.\llbracket x:=f(T_{1},...,T_{n})\ (F)\rrbracket^{\text{P}}=\displaystyle\int_{% a\in\Omega_{n+1}}{p}_{f}(a|T_{1},...,T_{n})\cdot\llbracket F\rrbracket^{\text{% P}}_{I,\eta[x\mapsto a]}\text{d}a.⟦ italic_x := italic_f ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( italic_F ) ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT = ∫ start_POSTSUBSCRIPT italic_a ∈ roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_a | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⋅ ⟦ italic_F ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η [ italic_x ↦ italic_a ] end_POSTSUBSCRIPT d italic_a . (20)

We should note that probabilistic semantics in most practical cases will be intractable because of the exponential recursion introduced in Equation 19, not to mention the usually intractable integral in Equation 20 [5]. We can speed this up with techniques that compile formulas into representations where computing the probability of the formula is tractable [8, 11]. The probabilistic semantics is classical in the limit (Appendix C.1), and is connected to the standard weighted model counting semantics used in, for example, Semantic Loss [56], SPL [2] and DeepProbLog [30]. See Appendix D for details.

We can generalise the probabilistic semantics to algebraic model counting [25, 15] by considering semirings \mathcal{B}caligraphic_B together with a product and a sum operation. This, for example, allows us to compute the most likely assignment to the variables in a formula, or to compute the gradient of the probabilistic semantics using dual numbers.

3.5 Fuzzy Semantics

Our definition for a fuzzy semantics is very similar to that of the probabilistic semantics. The two differences are using t-norms and t-conorms to connect fuzzy truth values, and the interpretation of sampling from boolean distributions.

Definition 3.6.

The fuzzy structure (IF,η,[0,1],F)(I_{F},\eta,[0,1],\llbracket\rrbracket^{\text{F}})( italic_I start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT , italic_η , [ 0 , 1 ] , ⟦ ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT ), where IFsubscript𝐼𝐹I_{F}italic_I start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT is an interpretation I𝐼Iitalic_I except that the predicate symbol truetrue\mathrm{true}roman_true is interpreted as the identity function on [0,1]01[0,1][ 0 , 1 ], is defined on fuzzy truth values [0,1]01[0,1][ 0 , 1 ] as:

xD(F)IF,ηF\displaystyle\llbracket\forall x\in D\ (F)\rrbracket^{\text{F}}_{I_{F},\eta}⟦ ∀ italic_x ∈ italic_D ( italic_F ) ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT , italic_η end_POSTSUBSCRIPT =aI(D)FIF,η[xa]F\displaystyle=\bigotimes_{a\in I(D)}\llbracket F\rrbracket^{\text{F}}_{I_{F},% \eta[x\rightarrow a]}= ⨂ start_POSTSUBSCRIPT italic_a ∈ italic_I ( italic_D ) end_POSTSUBSCRIPT ⟦ italic_F ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT , italic_η [ italic_x → italic_a ] end_POSTSUBSCRIPT (21)
xD(F)IF,ηF\displaystyle\llbracket\exists x\in D\ (F)\rrbracket^{\text{F}}_{I_{F},\eta}⟦ ∃ italic_x ∈ italic_D ( italic_F ) ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT , italic_η end_POSTSUBSCRIPT =aI(D)FIF,η[xa]F\displaystyle=\bigoplus_{a\in I(D)}\llbracket F\rrbracket^{\text{F}}_{I_{F},% \eta[x\rightarrow a]}= ⨁ start_POSTSUBSCRIPT italic_a ∈ italic_I ( italic_D ) end_POSTSUBSCRIPT ⟦ italic_F ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT , italic_η [ italic_x → italic_a ] end_POSTSUBSCRIPT (22)
F1F2F\displaystyle\llbracket F_{1}\wedge F_{2}\rrbracket^{\text{F}}⟦ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT =F1FF2F,F1F2F=F1FF2F\displaystyle=\llbracket F_{1}\rrbracket^{\text{F}}\operatorname{\otimes}\ % \llbracket F_{2}\rrbracket^{\text{F}},\quad\llbracket F_{1}\lor F_{2}% \rrbracket^{\text{F}}=\llbracket F_{1}\rrbracket^{\text{F}}\operatorname{% \oplus}\ \llbracket F_{2}\rrbracket^{\text{F}}= ⟦ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT ⊗ ⟦ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT , ⟦ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∨ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT = ⟦ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT ⊕ ⟦ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT (23)
true(x)I,ηF\displaystyle\llbracket\mathrm{true}(x)\rrbracket^{\text{F}}_{I,\eta}⟦ roman_true ( italic_x ) ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η end_POSTSUBSCRIPT =η(x),if η(x)[0,1]formulae-sequenceabsent𝜂𝑥if 𝜂𝑥01\displaystyle=\eta(x),\quad\text{if }\eta(x)\in[0,1]= italic_η ( italic_x ) , if italic_η ( italic_x ) ∈ [ 0 , 1 ] (24)
xf(T1,,Tn)(F)F\displaystyle\llbracket x\coloneqq f(T_{1},...,T_{n})(F)\rrbracket^{\text{F}}⟦ italic_x ≔ italic_f ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( italic_F ) ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT ={FIF,η[xpf(1|T1,,Tn)]Fif Ωn+1={0,1}aΩn+1pf(a|T1,,Tn)FIF,η[xa]Fif Ωn+1 is finite\displaystyle=\begin{cases}\llbracket F\rrbracket^{\text{F}}_{I_{F},\eta[x% \mapsto{p}_{f}(1|T_{1},...,T_{n})]}&\text{if }\Omega_{n+1}=\{0,1\}\\ \displaystyle\bigoplus_{a\in\Omega_{n+1}}{p}_{f}(a|T_{1},...,T_{n})\otimes% \llbracket F\rrbracket^{\text{F}}_{I_{F},\eta[x\mapsto a]}&\text{if $\Omega_{n% +1}$ is finite}\end{cases}= { start_ROW start_CELL ⟦ italic_F ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT , italic_η [ italic_x ↦ italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( 1 | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] end_POSTSUBSCRIPT end_CELL start_CELL if roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = { 0 , 1 } end_CELL end_ROW start_ROW start_CELL ⨁ start_POSTSUBSCRIPT italic_a ∈ roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_a | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⊗ ⟦ italic_F ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT , italic_η [ italic_x ↦ italic_a ] end_POSTSUBSCRIPT end_CELL start_CELL if roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT is finite end_CELL end_ROW (25)

where :[0,1]×[0,1][0,1]:tensor-productmaps-to010101\operatorname{\otimes}:[0,1]\times[0,1]\mapsto[0,1]⊗ : [ 0 , 1 ] × [ 0 , 1 ] ↦ [ 0 , 1 ] is a fuzzy t-norm and :[0,1]×[0,1][0,1]:direct-summaps-to010101\operatorname{\oplus}:[0,1]\times[0,1]\mapsto[0,1]⊕ : [ 0 , 1 ] × [ 0 , 1 ] ↦ [ 0 , 1 ] is a fuzzy t-conorm [4, 49].

In the first case of Equation 25, fuzzy semantics manipulates distributions over boolean codomains Ωn+1={0,1}subscriptΩ𝑛101\Omega_{n+1}=\{0,1\}roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = { 0 , 1 } as a single truth value pf(1|T1,,Tn)subscript𝑝𝑓conditional1subscript𝑇1subscript𝑇𝑛{p}_{f}(1|T_{1},...,T_{n})italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( 1 | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). The second case is defined for discrete, non boolean codomains. Fuzzy semantics reasons disjointly over all possible outcomes aΩn+1𝑎subscriptΩ𝑛1a\in\Omega_{n+1}italic_a ∈ roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT by interpreting the probability pf(a|T1,,Tn)[0,1]subscript𝑝𝑓conditional𝑎subscript𝑇1subscript𝑇𝑛01{p}_{f}(a|T_{1},\dots,T_{n})\in[0,1]italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_a | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∈ [ 0 , 1 ] as truth degrees. This truth degree is then conjoined with the interpretation of the rest of the formula F𝐹Fitalic_F. Intuitively, they ask if there “exists a𝑎aitalic_a such that f(T1,,Tn)𝑓subscript𝑇1subscript𝑇𝑛f(T_{1},\dots,T_{n})italic_f ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) maps to a𝑎aitalic_a and that a𝑎aitalic_a verifies the rest of the formula F𝐹Fitalic_F”. We do not give a semantics for continuous or infinite domains in the fuzzy semantics, as we do not know of a standard definition in the neuro-symbolic literature. The fuzzy semantics is classical in the limit (see Appendix C.2), and is closely related to differentiable fuzzy logics such as Logic Tensor Networks [49, 4] (see Appendix E).

In addition to Fuzzy Logics with t-norms and t-conorms for conjunction and disjunction, other NeSy frameworks such as DL2 [18] and STL [52] can also be implemented with this semantics. While fuzzy logic acts on truth values in [0,1]01[0,1][ 0 , 1 ], DL2 acts on truth values in [,0]0[-\infty,0][ - ∞ , 0 ] and STL in [,][-\infty,\infty][ - ∞ , ∞ ]. They choose appropriate differentiable operators to implement the conjunction and disjunction. We refer the reader to [42] for details.

3.6 Sampling semantics

The sampling semantics S{\llbracket\rrbracket}^{S}⟦ ⟧ start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT is a simple modification to the classical semantics. It samples a value from each conditional distribution and uses that value to evaluate the formula. Therefore, the only difference in S{\llbracket\rrbracket}^{\text{S}}⟦ ⟧ start_POSTSUPERSCRIPT S end_POSTSUPERSCRIPT with classical semantics in Definition 3.3 is in Equation 15:

x:=f(T1,,Tn)(F)S=FI,η[x𝗌𝖺𝗆𝗉𝗅𝖾(pf(|T1,,Tn))]S\llbracket x:=f(T_{1},...,T_{n})\ (F)\rrbracket^{\text{S}}=\llbracket F% \rrbracket^{\text{S}}_{I,\eta[x\mapsto\mathsf{sample}({p}_{f}(\cdot|T_{1},...,% T_{n}))]}⟦ italic_x := italic_f ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( italic_F ) ⟧ start_POSTSUPERSCRIPT S end_POSTSUPERSCRIPT = ⟦ italic_F ⟧ start_POSTSUPERSCRIPT S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η [ italic_x ↦ sansserif_sample ( italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ⋅ | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) ] end_POSTSUBSCRIPT (26)

Here, 𝗌𝖺𝗆𝗉𝗅𝖾𝗌𝖺𝗆𝗉𝗅𝖾\mathsf{sample}sansserif_sample is a (random) function that takes a probability distribution and samples a value from the codomain Ωn+1subscriptΩ𝑛1\Omega_{n+1}roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT under the distribution pf(|T1,,Tn){p}_{f}(\cdot|T_{1},...,T_{n})italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ⋅ | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). We can repeat the computation of the sampling semantics S{\llbracket\rrbracket}^{\text{S}}⟦ ⟧ start_POSTSUPERSCRIPT S end_POSTSUPERSCRIPT to reduce variance, like in standard Monte Carlo methods. This semantics can be combined with gradient estimation methods to learn the parameters of neural networks [39, 51]. A recent implementation of gradient estimation in the context of NeSy is the CatLog derivative trick [14], but any type of estimator based on the score function (commonly known as REINFORCE) can be used [26]. See Appendix B for a short discussion.

4 Learning and Reasoning

This section describes how to use ULLER for neuro-symbolic learning and reasoning. For a learning setting, we extend the definition of an interpretation (Definition 3.1) to a parameterised interpretation. A parameterised implementation allows us to implement neural networks with learnable parameters. For instance, a function model()model\text{model}()model ( ) can be interpreted as a neural network I𝜽(model)=𝑁𝑁𝜽subscript𝐼𝜽modelsubscript𝑁𝑁𝜽I_{\boldsymbol{\theta}}(\text{model})=\mathit{NN}_{\boldsymbol{\theta}}italic_I start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( model ) = italic_NN start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT.

Definition 4.1.

A parameterised interpretation is an interpretation I𝜽subscript𝐼𝜽I_{\boldsymbol{\theta}}italic_I start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT that is uniquely defined by a set of parameters 𝜽d𝜽superscript𝑑{\boldsymbol{\theta}}\in\mathbb{R}^{d}bold_italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

Let FULLER𝐹subscriptULLERF\in\mathcal{L}_{\text{{ULLER}}}italic_F ∈ caligraphic_L start_POSTSUBSCRIPT ULLER end_POSTSUBSCRIPT denote a ULLER formula that has a quantifier ranging over a dataset symbol T𝑇Titalic_T (for instance Example 2.1). Learning a parameterised interpretation typically involves searching for an optimal set of parameters 𝜽dsuperscript𝜽superscript𝑑{\boldsymbol{\theta}}^{*}\in\mathbb{R}^{d}bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT maximising the neuro-symbolic system on F𝐹Fitalic_F over a dataset ΩTsubscriptΩ𝑇\Omega_{T}roman_Ω start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. In most machine learning settings, we are interested in minimising a loss function over a random minibatch x1,,xnΩTsimilar-tosubscript𝑥1subscript𝑥𝑛subscriptΩ𝑇x_{1},...,x_{n}\sim\Omega_{T}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∼ roman_Ω start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. We can define such a loss function L(𝜽)𝐿𝜽L({\boldsymbol{\theta}})italic_L ( bold_italic_θ ) and corresponding minimisation problem for finding parameters 𝜽superscript𝜽{\boldsymbol{\theta}}^{*}bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT with

L(𝜽)=FI𝜽{T{x1,,xn}},{},𝜽=argmax𝜽dL(𝜽).L({\boldsymbol{\theta}})=-\llbracket F\rrbracket_{I_{{\boldsymbol{\theta}}}% \cup\{T\mapsto\{x_{1},...,x_{n}\}\},\{\}},\quad{\boldsymbol{\theta}}^{*}=\arg% \max_{{\boldsymbol{\theta}}\in\mathbb{R}^{d}}L({\boldsymbol{\theta}}).italic_L ( bold_italic_θ ) = - ⟦ italic_F ⟧ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ∪ { italic_T ↦ { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } } , { } end_POSTSUBSCRIPT , bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT bold_italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_L ( bold_italic_θ ) . (27)

To allow for minibatching, we interpret the domain symbol T𝑇Titalic_T as the minibatch {x1,,xn}subscript𝑥1subscript𝑥𝑛\{x_{1},...,x_{n}\}{ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }. We can easily implement variations of this loss. For instance, we can combine multiple formulas and give each different weights. Notice that, for probabilistic and fuzzy semantics, L(𝜽)𝐿𝜽L({\boldsymbol{\theta}})italic_L ( bold_italic_θ ) is differentiable, allowing us to use common optimisers. However, not all NeSy structures can be optimised over: This loss only makes sense when a semantics returns a value in an ordered set \mathcal{B}caligraphic_B, but we also allow NeSy structures to return other kinds of values.

A different pattern, more related to reasoning, is to find the input x𝑥xitalic_x that maximises (or minimises) the neuro-symbolic system:

x=argmaxxXFI𝜽{T{x}},{}x^{*}=\arg\max_{x^{\prime}\in X}\llbracket F\rrbracket_{I_{\boldsymbol{\theta}% }\cup\{T\mapsto\{x\}\},\{\}}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_X end_POSTSUBSCRIPT ⟦ italic_F ⟧ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ∪ { italic_T ↦ { italic_x } } , { } end_POSTSUBSCRIPT (28)

This strategy can be combined with adversarial learning to first find the input that most violates the background knowledge, and then corrects that input [34].

5 Related Work

The last decade has seen the rise of neuro-symbolic frameworks that allow for specifying knowledge about the behaviour of neural networks symbolically [31]. However, unlike ULLER they are restricted to a single semantics, usually variations of probabilistic (Section 3.4) or fuzzy semantics (Section 3.5). The majority of current frameworks use the syntactic neural predicate construct as discussed in Section 3.1. DeepProbLog [30] is a probabilistic logic programming language [12] with neural predicates. Variations of its syntax are used in multiple follow-up works [13, 54, 28]. Scallop [24] chooses to restrict its language to Datalog to improve scalability, among others [29]. For ULLER, we choose to use an expressive first-order language, leaving scalable inference to the implementation of the NeSy system. Other NeSy frameworks are based on Answer Set Programming [57, 41, 3], relational languages [36, 33, 9], temporal logics [46] and description logics [55, 44, 45], while Logic Tensor Networks [4] is also based on first-order logic, among others [32, 16]. Finally, many commonly used NeSy frameworks are restricted to propositional logic [56, 2, 27, 10, 21, 18].

Logic of Differentiable Logics (LDL) [42] defines a first-order language to compare formal properties of several NeSy frameworks. Compared to ULLER, LDL is strongly typed, while ULLER is weakly typed, and LDL does not model probabilistic semantics. In LDL, uncertainty comes from predicates, rather than functions, and does not have a syntactic construct like ULLERs statement blocks. Pylon [1] is a Python library similar in goal to ULLER. It also allows for expressing propositional logic (CNF) formulas, which can then get compiled into a Semantic Loss or fuzzy loss functions. However, by being restricted to a propositional language, Pylon is limited in expressiveness, and requires the user to manually ground out formulas.

ULLER is also heavily inspired by probabilistic programming languages [22] such as Stan [7] that specify probabilistic models in a high-level language. In particular, ULLER can be considered a first-order probabilistic programming language (FOPPL) [47] defined on boolean outputs. These boolean outputs represent the conditioning (observations) of the probabilistic model. By being first-order, the language is restricted to having a finite number of random variables. Other FOPPL languages centred on neural networks include Pyro [6] and ProbTorch [40]. These languages enforce a probabilistic semantics corresponding to that of ULLER defined in Section 3.4. However, ULLER does not enforce this semantics and also supports, for instance, fuzzy semantics. We leave an in-depth analysis of the relations between ULLER and aforementioned probabilistic programming languages for future work.

Other related work attempts to define building blocks for neuro-symbolic AI [48] or to categorise existing approaches [38]. We instead focus on a particular set of informed machine learning approaches, and develop a unifying language to allow communicating with them.

6 Conclusion

We introduced ULLER, a Unified Language for LEarning and Reasoning. ULLER is a first-order logic language designed for neuro-symbolic learning and reasoning, with a special statement syntax for constraining neural networks. We showed how to implement the common fuzzy and probabilistic semantics in ULLER, allowing for easy comparison between different NeSy systems. For future work, we want to implement ULLER as an easy-to-use Python library to increase the “frictionless reproducibility” of NeSy research. In this library, a researcher can easily write and share knowledge, and develop new NeSy benchmarks. We also believe such a library is a good avenue for reducing the barrier of entry into NeSy research.

Acknowledgements

We would like to thank Frank van Harmelen, Tarek Richard Besold, Luciano Serafini, Antonio Vergari, Pasquale Minervini, Thiviyan Thanapalasingam, Guy van den Broeck, Connor Pryor, Patrick Koopmann, and Mihaela Stoian for fruitful discussions during the writing of this paper. We also thank the anonymous reviewers of NeSy 2024 for their valuable feedback. This work was supported by the EU H2020 ICT48 project “TAILOR” under contract #952215. Emile van Krieken was funded by ELIAI (The Edinburgh Laboratory for Integrated Artificial Intelligence), EPSRC (grant no. EP/W002876/1).

References

  • [1] Ahmed, K., Li, T., Ton, T., Guo, Q., Chang, K., Kordjamshidi, P., Srikumar, V., den Broeck, G.V., Singh, S.: PYLON: A pytorch framework for learning with constraints. In: Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022. pp. 13152–13154. AAAI Press (2022). https://doi.org/10.1609/AAAI.V36I11.21711, https://doi.org/10.1609/aaai.v36i11.21711
  • [2] Ahmed, K., Teso, S., Chang, K., den Broeck, G.V., Vergari, A.: Semantic probabilistic layers for neuro-symbolic learning. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022 (2022), http://papers.nips.cc/paper_files/paper/2022/hash/c182ec594f38926b7fcb827635b9a8f4-Abstract-Conference.html
  • [3] Aspis, Y., Broda, K., Lobo, J., Russo, A.: Embed2Sym - Scalable Neuro-Symbolic Reasoning via Clustered Embeddings. In: Proceedings of the Nineteenth International Conference on Principles of Knowledge Representation and Reasoning. pp. 421–431. International Joint Conferences on Artificial Intelligence Organization, Haifa, Israel (Jul 2022). https://doi.org/10.24963/kr.2022/44
  • [4] Badreddine, S., d’Avila Garcez, A., Serafini, L., Spranger, M.: Logic Tensor Networks. Artificial Intelligence 303, 103649 (Feb 2022). https://doi.org/10.1016/j.artint.2021.103649
  • [5] Belle, V., Passerini, A., Van den Broeck, G.: Probabilistic inference in hybrid domains by weighted model integration. In: Proceedings of 24th International Joint Conference on Artificial Intelligence (IJCAI). vol. 2015, pp. 2770–2776. IJCAI-INT JOINT CONF ARTIF INTELL (2015)
  • [6] Bingham, E., Chen, J.P., Jankowiak, M., Obermeyer, F., Pradhan, N., Karaletsos, T., Singh, R., Szerlip, P.A., Horsfall, P., Goodman, N.D.: Pyro: Deep universal probabilistic programming. Journal of Machine Learning Research 20, 28:1–28:6 (2019)
  • [7] Carpenter, B., Gelman, A., Hoffman, M.D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M.A., Guo, J., Li, P., Riddell, A.: Stan: A probabilistic programming language. Journal of statistical software 76 (2017)
  • [8] Chavira, M., Darwiche, A.: On probabilistic inference by weighted model counting. Artificial Intelligence 172(6), 772–799 (Apr 2008). https://doi.org/10.1016/j.artint.2007.11.002
  • [9] Cohen, W.W.: TensorLog: A Differentiable Deductive Database. arXiv:1605.06523 [cs] (Jul 2016)
  • [10] Daniele, A., van Krieken, E., Serafini, L., van Harmelen, F.: Refining neural network predictions using background knowledge. Mach. Learn. 112(9), 3293–3331 (2023). https://doi.org/10.1007/S10994-023-06310-3, https://doi.org/10.1007/s10994-023-06310-3
  • [11] Darwiche, A.: SDD: A new canonical representation of propositional knowledge bases. IJCAI International Joint Conference on Artificial Intelligence pp. 819–826 (2011). https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-143
  • [12] De Raedt, L., Kimmig, A.: Probabilistic (logic) programming concepts. Machine Learning 100(1), 5–47 (Jul 2015). https://doi.org/10.1007/s10994-015-5494-z
  • [13] De Smet, L., Martires, P.Z.D., Manhaeve, R., Marra, G., Kimmig, A., De Raedt, L.: Neural Probabilistic Logic Programming in Discrete-Continuous Domains (Mar 2023). https://doi.org/10.48550/arXiv.2303.04660
  • [14] De Smet, L., Sansone, E., Zuidberg Dos Martires, P.: Differentiable sampling of categorical distributions using the catlog-derivative trick. Advances in Neural Information Processing Systems 36 (2024)
  • [15] Derkinderen, V., Manhaeve, R., Dos Martires, P.Z., De Raedt, L.: Semirings for probabilistic and neuro-symbolic logic programming. International Journal of Approximate Reasoning p. 109130 (2024)
  • [16] Diligenti, M., Gori, M., Sacca, C.: Semantic-based regularization for learning and inference. Artificial Intelligence 244, 143–165 (2017)
  • [17] Donoho, D.: Data science at the singularity. arXiv preprint arXiv:2310.00865 (2023)
  • [18] Fischer, M., Balunovic, M., Drachsler-Cohen, D., Gehr, T., Zhang, C., Vechev, M.: Dl2: training and querying neural networks with logic. In: International Conference on Machine Learning. pp. 1931–1941. PMLR (2019)
  • [19] Foerster, J., Farquhar, G., Al-Shedivat, M., Rocktäschel, T., Xing, E., Whiteson, S.: DiCE: The infinitely differentiable monte carlo estimator. In: International Conference on Machine Learning. pp. 1529–1538 (2018)
  • [20] Giunchiglia, E., Stoian, M.C., Lukasiewicz, T.: Deep Learning with Logical Constraints. In: Raedt, L.D. (ed.) Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022. pp. 5478–5485. ijcai.org (2022). https://doi.org/10.24963/ijcai.2022/767
  • [21] Giunchiglia, E., Tatomir, A., Stoian, M.C.ă., Lukasiewicz, T.: CCN+: A neuro-symbolic framework for deep learning with requirements. International Journal of Approximate Reasoning p. 109124 (2024). https://doi.org/10.1016/j.ijar.2024.109124
  • [22] Gordon, A.D., Henzinger, T.A., Nori, A.V., Rajamani, S.K.: Probabilistic programming. In: Future of software engineering proceedings, pp. 167–181 (2014)
  • [23] Harnad, S.: The symbol grounding problem. Physica D: Nonlinear Phenomena 42(1-3), 335–346 (1990)
  • [24] Huang, J., Li, Z., Chen, B., Samel, K., Naik, M., Song, L., Si, X.: Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning. In: Advances in Neural Information Processing Systems (May 2021)
  • [25] Kimmig, A., Van den Broeck, G., De Raedt, L.: Algebraic model counting. Journal of Applied Logic 22, 46–62 (2017)
  • [26] Kool, W., van Hoof, H., Welling, M.: Buy 4 REINFORCE samples, get a baseline for free! p. 14 (2019)
  • [27] van Krieken, E., Thanapalasingam, T., Tomczak, J.M., van Harmelen, F., ten Teije, A.: A-nesi: A scalable approximate method for probabilistic neurosymbolic inference. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 (2023), http://papers.nips.cc/paper_files/paper/2023/hash/4d9944ab3330fe6af8efb9260aa9f307-Abstract-Conference.html
  • [28] Maene, J., Raedt, L.D.: Soft-Unification in Deep Probabilistic Logic. In: Thirty-Seventh Conference on Neural Information Processing Systems (Nov 2023)
  • [29] Magnini, M., Ciatto, G., Omicini, A.: On the Design of PSyKI: A Platform for Symbolic Knowledge Injection into Sub-symbolic Predictors. In: Calvaresi, D., Najjar, A., Winikoff, M., Frä mling, K. (eds.) Explainable and Transparent AI and Multi-Agent Systems, vol. 13283, pp. 90–108. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-15565-9-6
  • [30] Manhaeve, R., Dumancic, S., Kimmig, A., Demeester, T., De Raedt, L.: DeepProbLog: Neural probabilistic logic programming. In: Proceedings of NeurIPS (2018)
  • [31] Marra, G., Dumančić, S., Manhaeve, R., De Raedt, L.: From Statistical Relational to Neural Symbolic Artificial Intelligence: A Survey. arXiv:2108.11451 [cs] (Aug 2021)
  • [32] Marra, G., Giannini, F., Diligenti, M., Gori, M.: Lyrics: A general interface layer to integrate logic inference and deep learning. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. pp. 283–298. Springer (2019)
  • [33] Marra, G., Kuželka, O.: Neural markov logic networks. In: Uncertainty in Artificial Intelligence. pp. 908–917. PMLR (2021)
  • [34] Minervini, P., Riedel, S.: Adversarially regularising neural NLI models to integrate logical background knowledge. In: Korhonen, A., Titov, I. (eds.) Proceedings of the 22nd Conference on Computational Natural Language Learning. pp. 65–74. Association for Computational Linguistics, Brussels, Belgium (Oct 2018). https://doi.org/10.18653/v1/K18-1007, https://aclanthology.org/K18-1007
  • [35] Pryor, C., Dickens, C., Augustine, E., Albalak, A., Wang, W.Y., Getoor, L.: NeuPSL: Neural Probabilistic Soft Logic. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. pp. 4145–4153. International Joint Conferences on Artificial Intelligence Organization, Macau, SAR China (Aug 2023). https://doi.org/10.24963/ijcai.2023/461
  • [36] Pryor, C., Dickens, C., Augustine, E., Albalak, A., Wang, W.Y., Getoor, L.: Neupsl: Neural probabilistic soft logic. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China. pp. 4145–4153. ijcai.org (2023). https://doi.org/10.24963/IJCAI.2023/461, https://doi.org/10.24963/ijcai.2023/461
  • [37] Richardson, M., Domingos, P.: Markov logic networks. Machine Learning 62(1–2), 107–136 (Jan 2006). https://doi.org/10.1007/s10994-006-5833-1, http://dx.doi.org/10.1007/s10994-006-5833-1
  • [38] Sarker, M.K., Zhou, L., Eberhart, A., Hitzler, P.: Neuro-symbolic artificial intelligence. AI Communications 34(3), 197–209 (2021). https://doi.org/10.3233/AIC-210084
  • [39] Schulman, J., Heess, N., Weber, T., Abbeel, P.: Gradient estimation using stochastic computation graphs. In: Advances in Neural Information Processing Systems (2015)
  • [40] Siddharth, N., Paige, B., van de Meent, J.W., Desmaison, A., Goodman, N.D., Kohli, P., Wood, F., Torr, P.: Learning disentangled representations with semi-supervised deep generative models. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30. pp. 5927–5937. Curran Associates, Inc. (2017)
  • [41] Skryagin, A., Stammer, W., Ochs, D., Dhami, D.S., Kersting, K.: SLASH: Embracing Probabilistic Circuits into Neural Answer Set Programming. arXiv:2110.03395 [cs] (Oct 2021)
  • [42] Slusarz, N., Komendantskaya, E., Daggitt, M.L., Stewart, R., Stark, K.: Logic of differentiable logics: Towards a uniform semantics of dl. In: Proceedings of 24th International Conference on Logic. vol. 94, pp. 473–493 (2023)
  • [43] Stol, M.C., Mileo, A.: Iid relaxation by logical expressivity: A research agenda for fitting logics to neurosymbolic requirements (2024)
  • [44] Tang, Z., Hinnerichs, T., Peng, X., Zhang, X., Hoehndorf, R.: Falcon: faithful neural semantic entailment over alc ontologies. arXiv preprint arXiv:2208.07628 (2022)
  • [45] Tang, Z., Pei, S., Peng, X., Zhuang, F., Zhang, X., Hoehndorf, R.: TAR: Neural Logical Reasoning across TBox and ABox (Aug 2022)
  • [46] Umili, E., Capobianco, R., De Giacomo, G.: Grounding ltlf specifications in image sequences. In: Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning. vol. 19, pp. 668–678 (2023)
  • [47] van de Meent, J.W., Paige, B., Yang, H., Wood, F.: An Introduction to Probabilistic Programming (Oct 2021). https://doi.org/10.48550/arXiv.1809.10756
  • [48] van Harmelen, F., ten Teije, A.: A Boxology of Design Patterns for Hybrid Learning and Reasoning Systems. Journal of Web Engineering 18(1), 97–124 (2019). https://doi.org/10.13052/jwe1540-9589.18133
  • [49] van Krieken, E., Acar, E., van Harmelen, F.: Analyzing differentiable fuzzy logic operators. Artificial Intelligence 302, 103602 (2022). https://doi.org/10.1016/j.artint.2021.103602
  • [50] van Krieken, E., Thanapalasingam, T., Tomczak, J., van Harmelen, F., Ten Teije, A.: A-NeSI: A scalable approximate method for probabilistic neurosymbolic inference. In: Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems. vol. 36, pp. 24586–24609. Curran Associates, Inc. (2023)
  • [51] van Krieken, E., Tomczak, J., Ten Teije, A.: Storchastic: A framework for general stochastic automatic differentiation. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems. vol. 34, pp. 7574–7587. Curran Associates, Inc. (2021)
  • [52] Varnai, P., Dimarogonas, D.V.: On robustness metrics for learning stl tasks. In: 2020 American Control Conference (ACC). pp. 5394–5399. IEEE (2020)
  • [53] von Rueden, L., Mayer, S., Beckh, K., Georgiev, B., Giesselbach, S., Heese, R., Kirsch, B., Pfrommer, J., Pick, A., Ramamurthy, R., Walczak, M., Garcke, J., Bauckhage, C., Schuecker, J.: Informed Machine Learning – A Taxonomy and Survey of Integrating Prior Knowledge into Learning Systems. IEEE Transactions on Knowledge and Data Engineering 35(1), 614–633 (Jan 2023). https://doi.org/10.1109/TKDE.2021.3079836
  • [54] Winters, T., Marra, G., Manhaeve, R., De Raedt, L.: Deepstochlog: Neural stochastic logic programming. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36, pp. 10090–10100 (2022)
  • [55] Wu, X., Zhu, X., Zhao, Y., Dai, X.: Differentiable Fuzzy $\mathcal{}ALC {}$: A Neural-Symbolic Representation Language for Symbol Grounding (Dec 2022)
  • [56] Xu, J., Zhang, Z., Friedman, T., Liang, Y., den Broeck, G.V.: A semantic loss function for deep learning with symbolic knowledge. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 5498–5507. PMLR (2018), http://proceedings.mlr.press/v80/xu18h.html
  • [57] Yang, Z., Ishay, A., Lee, J.: NeurASP: Embracing neural networks into answer set programming. In: Bessiere, C. (ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20. pp. 1755–1762. International Joint Conferences on Artificial Intelligence Organization (Jul 2020). https://doi.org/10.24963/ijcai.2020/243

Appendix A Practical Examples

Example A.1 (MNIST Addition).

Suppose we want to express the standard (single-digit) MNIST addition program using ULLER. In this setting, we have a domain T𝑇Titalic_T that represents a training dataset I(T)𝐼𝑇I(T)italic_I ( italic_T ). In Section 4, we discuss how this training dataset can also be a minibatch of examples.

Each data point x𝑥xitalic_x consists of a pair of images (which we access with the properties im1im1\mathrm{im1}im1 and im2im2\mathrm{im2}im2) associated to a label representing the value of their sum (which we can intuitively access via the property sumsum\mathrm{sum}roman_sum). Finally, we have a function f𝑓fitalic_f that we interpret as a neural network classifying MNIST images. Then, if we want to write that for every input the outputs of the neural network should be equal to the sum of the inputs, we can write:

xTfor-all𝑥𝑇\displaystyle\forall x\in T∀ italic_x ∈ italic_T
(n1\displaystyle(n1( italic_n 1 :=f(x.im1),n2:=f(x.im2)\displaystyle:=f(x.\mathrm{im1}),n2:=f(x.\mathrm{im2}):= italic_f ( italic_x . im1 ) , italic_n 2 := italic_f ( italic_x . im2 )
(n1\displaystyle(n1( italic_n 1 +n2=x.sum))\displaystyle+n2=x.\mathrm{sum}))+ italic_n 2 = italic_x . roman_sum ) )
Example A.2 (Smokes Friends Cancer).

In this classical example of Statistical Relational Learning introduced by [37], uncertain facts in a population group are modeled using the neural predicates Friends(x,y)Friends𝑥𝑦\mathrm{Friends}(x,y)roman_Friends ( italic_x , italic_y ) for friendship, Smokes(x)Smokes𝑥\mathrm{Smokes}(x)roman_Smokes ( italic_x ) for smoking, and Cancer(x)Cancer𝑥\mathrm{Cancer}(x)roman_Cancer ( italic_x ) for cancer. As ULLER relies on functions rather than predicates to model uncertainty, we must use true(a)true𝑎\mathrm{true}(a)roman_true ( italic_a ) to formalise the problem in our language as explained in Section 3.1. For simplicity, we use (AB)((AB)(BA))(A\Leftrightarrow B)\equiv((A\Rightarrow B)\land(B\Rightarrow A))( italic_A ⇔ italic_B ) ≡ ( ( italic_A ⇒ italic_B ) ∧ ( italic_B ⇒ italic_A ) ) to denote logical equivalences.

Here is an example of a knowledge base for this problem. Friends of friends are friends:

xfor-all𝑥\displaystyle\forall x∀ italic_x People,yPeople,zPeopleformulae-sequenceabsentPeopleformulae-sequence𝑦People𝑧People\displaystyle\in\mathrm{People},y\in\mathrm{People},z\in\mathrm{People}∈ roman_People , italic_y ∈ roman_People , italic_z ∈ roman_People
(a1:=Friends(x,y),a2:=Friends(y,z),a3:=Friends(x,z)\displaystyle(a_{1}:=\mathrm{Friends}(x,y),a_{2}:=\mathrm{Friends}(y,z),a_{3}:% =\mathrm{Friends}(x,z)( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := roman_Friends ( italic_x , italic_y ) , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT := roman_Friends ( italic_y , italic_z ) , italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT := roman_Friends ( italic_x , italic_z )
((true(a1)true(a2))true(a3)))\displaystyle((\mathrm{true}(a_{1})\land\mathrm{true}(a_{2}))\Rightarrow% \mathrm{true}(a_{3})))( ( roman_true ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ roman_true ( italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ⇒ roman_true ( italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ) )

If two people are friends, either both smoke or neither does:

xfor-all𝑥\displaystyle\forall x∀ italic_x People,yPeopleformulae-sequenceabsentPeople𝑦People\displaystyle\in\mathrm{People},y\in\mathrm{People}\ ∈ roman_People , italic_y ∈ roman_People
(a1:=Friends(x,y),a2:=Smokes(x),a3:=Smokes(y)\displaystyle(a_{1}:=\mathrm{Friends}(x,y),a_{2}:=\mathrm{Smokes}(x),a_{3}:=% \mathrm{Smokes}(y)( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := roman_Friends ( italic_x , italic_y ) , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT := roman_Smokes ( italic_x ) , italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT := roman_Smokes ( italic_y )
(true(a1)(true(a2)true(a3))))\displaystyle(\mathrm{true}(a_{1})\Rightarrow(\mathrm{true}(a_{2})% \Leftrightarrow\mathrm{true}(a_{3}))))( roman_true ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⇒ ( roman_true ( italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⇔ roman_true ( italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ) ) )

Friendless people smoke:

xfor-all𝑥\displaystyle\forall x∀ italic_x PeopleabsentPeople\displaystyle\in\mathrm{People}∈ roman_People
(¬yPeople(a1:=Friends(x,y)(true(a1)))\displaystyle(\neg\exists y\in\mathrm{People}\ (a_{1}:=\mathrm{Friends}(x,y)(% \mathrm{true}(a_{1})))( ¬ ∃ italic_y ∈ roman_People ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := roman_Friends ( italic_x , italic_y ) ( roman_true ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) )
a2:=Smokes(x)(true(a2)))\displaystyle\phantom{(}\Rightarrow a_{2}:=\mathrm{Smokes}(x)(\mathrm{true}(a_% {2})))⇒ italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT := roman_Smokes ( italic_x ) ( roman_true ( italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) )

Smoking causes cancer:

xfor-all𝑥\displaystyle\forall x∀ italic_x PeopleabsentPeople\displaystyle\in\mathrm{People}∈ roman_People
(a1:=Smokes(x),a2:=Cancer(x)\displaystyle(a_{1}:=\mathrm{Smokes}(x),a_{2}:=\mathrm{Cancer}(x)( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := roman_Smokes ( italic_x ) , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT := roman_Cancer ( italic_x )
(true(a1)true(a2)))\displaystyle(\mathrm{true}(a_{1})\Rightarrow\mathrm{true}(a_{2})))( roman_true ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⇒ roman_true ( italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) )

Notice that, according to the definitions of Section 3.1, the probabilistic interpretation of the above formula will assume conditional independence between a1pSmokes(|x)a_{1}\sim{p}_{\mathrm{Smokes}}(\cdot|x)italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT roman_Smokes end_POSTSUBSCRIPT ( ⋅ | italic_x ) and a2pCancer(|x)a_{2}\sim{p}_{\mathrm{Cancer}}(\cdot|x)italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT roman_Cancer end_POSTSUBSCRIPT ( ⋅ | italic_x ). To model a dependence of cancer on smoking, i.e. a2pCancer(|x,a1)a_{2}\sim{p}_{\mathrm{Cancer}}(\cdot|x,a_{1})italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT roman_Cancer end_POSTSUBSCRIPT ( ⋅ | italic_x , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), we can make the probability explicitly depend on the previous variable:

xfor-all𝑥\displaystyle\forall x∀ italic_x PeopleabsentPeople\displaystyle\in\mathrm{People}∈ roman_People
(a1:=Smokes(x),a2:=Cancer(x,a1)\displaystyle(a_{1}:=\mathrm{Smokes}(x),a_{2}:=\mathrm{Cancer}(x,a_{1})( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := roman_Smokes ( italic_x ) , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT := roman_Cancer ( italic_x , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )
(true(a1)true(a2)))\displaystyle(\mathrm{true}(a_{1})\Rightarrow\mathrm{true}(a_{2})))( roman_true ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⇒ roman_true ( italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) )

Next, we have labelled examples for each relationship. For example, for Friends()Friends\mathrm{Friends}()roman_Friends ( ), drawing examples from a dataset TFriendssubscript𝑇FriendsT_{\mathrm{Friends}}italic_T start_POSTSUBSCRIPT roman_Friends end_POSTSUBSCRIPT:

tfor-all𝑡\displaystyle\forall t∀ italic_t TFriendsabsentsubscript𝑇Friends\displaystyle\in T_{\mathrm{Friends}}∈ italic_T start_POSTSUBSCRIPT roman_Friends end_POSTSUBSCRIPT
(l:=Friends(t.x,t.y)\displaystyle(l:=\mathrm{Friends}(t.\mathrm{x},t.\mathrm{y})( italic_l := roman_Friends ( italic_t . roman_x , italic_t . roman_y )
(l=t.label))\displaystyle(l=t.\mathrm{label}))( italic_l = italic_t . roman_label ) )

Appendix B Gradient estimation

The sampling semantics in Equation 26 is a simple way to estimate the truth value of a formula. However, since sampling is not a differentiable operation, it is not possible to use this semantics to train the neural networks. Instead, we can use the score function gradient estimation method [39] to estimate the gradient of the truth value of a formula with respect to the parameters of the neural networks. However, this requires adapting the evaluation of the formula to incorporate score function terms.

One way to implement gradient estimation methods for simple ULLER programs is to use the DiCE estimator [19] which introduces the MagicBox operator \drawdie2(x)=exp(x(x)){\drawdie{2}}(x)=\exp(x-\bot(x))2 ( italic_x ) = roman_exp ( italic_x - ⊥ ( italic_x ) ), where bottom\bot is the StopGradient operator used in deep learning frameworks. This operator allows us to add a term that only appears when we differentiate it, and equals 1 during the forward pass. To incorporate DiCE for Unified Language for LEarning and Reasoning, we have to modify Equation 15

x:=f(T1,,Tn)(F)S=FI,𝒜(η,S)Ci=1n\drawdie2(logpfi(𝒜(η,S)[xi]))\llbracket x:=f(T_{1},\dots,T_{n})(F)\ \rrbracket^{\text{S}}=\llbracket F% \rrbracket^{\text{C}}_{I,{\mathcal{A}}(\eta,S)}\cdot\sum_{i=1}^{n}{\drawdie{2}% }(\log{p}_{f_{i}}({\mathcal{A}}(\eta,S)[x_{i}]))⟦ italic_x := italic_f ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( italic_F ) ⟧ start_POSTSUPERSCRIPT S end_POSTSUPERSCRIPT = ⟦ italic_F ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , caligraphic_A ( italic_η , italic_S ) end_POSTSUBSCRIPT ⋅ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT 2 ( roman_log italic_p start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( caligraphic_A ( italic_η , italic_S ) [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ) ) (29)

Extensions of the DiCE estimator can be used to implement a wide variety of gradient estimation methods [51].

Appendix C Classical in the limit

C.1 Probabilistic semantics

The probabilistic semantics is classical in the limit. To show this, we note that we require that the domain becomes {0,1}01\{0,1\}{ 0 , 1 } instead of probabilities [0,1]01[0,1][ 0 , 1 ]. Under this domain, the product is equal to the min\minroman_min function. We can use this to rewrite all but the interpretation of statements into the classical semantics.

Next, take for a statement x:=f(T1,,Tn)(F)assign𝑥𝑓subscript𝑇1subscript𝑇𝑛𝐹x:=f(T_{1},...,T_{n})(F)italic_x := italic_f ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( italic_F ) the induction assumption that FI^,ηP=FI,ηC\llbracket F\rrbracket^{\text{P}}_{\hat{I},\eta}=\llbracket F\rrbracket^{\text% {C}}_{I,\eta}⟦ italic_F ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_I end_ARG , italic_η end_POSTSUBSCRIPT = ⟦ italic_F ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η end_POSTSUBSCRIPT, where I^^𝐼\hat{I}over^ start_ARG italic_I end_ARG is defined as in Definition 3.4. Then the interpretation of a statement is:

𝔼apf^(|T1,,Tn)[FI^,η[xa]P]=FI^,η[xargmaxaΩn+1pf(a|T1,,Tn))]P\displaystyle\mathbb{E}_{a\sim{p}_{\hat{f}}(\cdot|T_{1},...,T_{n})}[\llbracket F% \rrbracket^{\text{P}}_{\hat{I},\eta[x\mapsto a]}]=\llbracket F\rrbracket^{% \text{P}}_{\hat{I},\eta[x\mapsto\arg\max_{a\in\Omega_{n+1}}{p}_{f}(a|T_{1},...% ,T_{n}))]}blackboard_E start_POSTSUBSCRIPT italic_a ∼ italic_p start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ⋅ | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ⟦ italic_F ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_I end_ARG , italic_η [ italic_x ↦ italic_a ] end_POSTSUBSCRIPT ] = ⟦ italic_F ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_I end_ARG , italic_η [ italic_x ↦ roman_arg roman_max start_POSTSUBSCRIPT italic_a ∈ roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_a | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) ] end_POSTSUBSCRIPT

Here, we reduce the expectation by noting that since pf^(a|T1,,Tn)=δ(aargmaxapf(a|T1,,Tn))subscript𝑝^𝑓conditional𝑎subscript𝑇1subscript𝑇𝑛𝛿𝑎subscriptsuperscript𝑎subscript𝑝𝑓conditionalsuperscript𝑎subscript𝑇1subscript𝑇𝑛p_{\hat{f}}(a|T_{1},...,T_{n})=\delta(a-\arg\max_{a^{\prime}}{p}_{f}(a^{\prime% }|T_{1},...,T_{n}))italic_p start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( italic_a | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = italic_δ ( italic_a - roman_arg roman_max start_POSTSUBSCRIPT italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ), exactly one element gets 1 probability (or a single element with non-zero probability, in the case of continuous distributions). This single element is chosen on the right side. Then, we use the induction assumption to find that this is equal to the classical semantics of statements given in Equation 15.

C.2 Fuzzy semantics

Using the axioms of t-norms, we find that the fuzzy semantics is also classical in the limit. This again can be proven by induction. For Equations 21 and 23, we use the boundary conditions of t-norms, which states that x1=x𝑥tensor-product1𝑥x\operatorname{\otimes}1=xitalic_x ⊗ 1 = italic_x for x[0,1]𝑥01x\in[0,1]italic_x ∈ [ 0 , 1 ]. Therefore, if x=0𝑥0x=0italic_x = 0, 01=00tensor-product100\operatorname{\otimes}1=00 ⊗ 1 = 0 and if x=1𝑥1x=1italic_x = 1, 11=11tensor-product111\operatorname{\otimes}1=11 ⊗ 1 = 1, meaning t-norms act as the min\minroman_min operator under the domain {0,1}01\{0,1\}{ 0 , 1 }.

Next, consider the program fragment x:=f(T1,,Tn)(F)assign𝑥𝑓subscript𝑇1subscript𝑇𝑛𝐹x:=f(T_{1},...,T_{n})\ (F)italic_x := italic_f ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( italic_F ) and take the induction assumption FI^,ηF=FI,ηC\llbracket F\rrbracket^{\text{F}}_{\hat{I},\eta}=\llbracket F\rrbracket^{\text% {C}}_{I,\eta}⟦ italic_F ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_I end_ARG , italic_η end_POSTSUBSCRIPT = ⟦ italic_F ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η end_POSTSUBSCRIPT. First, assume the domain Ωn+1={0,1}subscriptΩ𝑛101\Omega_{n+1}=\{0,1\}roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = { 0 , 1 } and assume argmaxa{0,1}pf(a|T1,,Tn)=1subscript𝑎01subscript𝑝𝑓conditional𝑎subscript𝑇1subscript𝑇𝑛1\arg\max_{a\in\{0,1\}}{p}_{f}(a|T_{1},...,T_{n})=1roman_arg roman_max start_POSTSUBSCRIPT italic_a ∈ { 0 , 1 } end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_a | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = 1. Then, the interpretation of the statement is FI^,η[xpf^(a=1|T1,,Tn)]F=FI^,η[x1]F\llbracket F\rrbracket^{\text{F}}_{\hat{I},\eta[x\mapsto{p}_{\hat{f}}(a=1|T_{1% },...,T_{n})]}=\llbracket F\rrbracket^{\text{F}}_{\hat{I},\eta[x\mapsto 1]}⟦ italic_F ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_I end_ARG , italic_η [ italic_x ↦ italic_p start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( italic_a = 1 | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] end_POSTSUBSCRIPT = ⟦ italic_F ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_I end_ARG , italic_η [ italic_x ↦ 1 ] end_POSTSUBSCRIPT, since the Dirac distribution will put all its mass on the output 1111. Similarly, if argmaxa{0,1}pf(a|T1,,Tn)=0subscript𝑎01subscript𝑝𝑓conditional𝑎subscript𝑇1subscript𝑇𝑛0\arg\max_{a\in\{0,1\}}{p}_{f}(a|T_{1},...,T_{n})=0roman_arg roman_max start_POSTSUBSCRIPT italic_a ∈ { 0 , 1 } end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_a | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = 0, then the interpretation is FI^,η[x0]F\llbracket F\rrbracket^{\text{F}}_{\hat{I},\eta[x\mapsto 0]}⟦ italic_F ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_I end_ARG , italic_η [ italic_x ↦ 0 ] end_POSTSUBSCRIPT. Then we can simply use the induction assumption.

Finally, if Ωn+1{0,1}subscriptΩ𝑛101\Omega_{n+1}\neq\{0,1\}roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ≠ { 0 , 1 }, then we know there is a unique output aΩn+1𝑎subscriptΩ𝑛1a\in\Omega_{n+1}italic_a ∈ roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT such that pf^(a|T1,,Tn)=1subscript𝑝^𝑓conditional𝑎subscript𝑇1subscript𝑇𝑛1p_{\hat{f}}(a|T_{1},...,T_{n})=1italic_p start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( italic_a | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = 1, while for the other outputs pf^(a|T1,,Tn)=0subscript𝑝^𝑓conditional𝑎subscript𝑇1subscript𝑇𝑛0p_{\hat{f}}(a|T_{1},...,T_{n})=0italic_p start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( italic_a | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = 0. Then, using associativity and commutativity of the t-conorm direct-sum\oplus, the interpretation of the statement is

FI^,η[xa]Fpf^(a|T1,,Tn)aΩn+1{a}FI^,η[xa]Fpf^(a|T1,,Tn)\displaystyle\llbracket F\rrbracket^{\text{F}}_{\hat{I},\eta[x\mapsto a]}% \operatorname{\otimes}{p}_{\hat{f}}(a|T_{1},...,T_{n})\oplus\bigoplus_{a^{% \prime}\in\Omega_{n+1}\setminus\{a\}}\llbracket F\rrbracket^{\text{F}}_{\hat{I% },\eta[x\mapsto a^{\prime}]}\operatorname{\otimes}{p}_{\hat{f}}(a^{\prime}|T_{% 1},...,T_{n})⟦ italic_F ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_I end_ARG , italic_η [ italic_x ↦ italic_a ] end_POSTSUBSCRIPT ⊗ italic_p start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( italic_a | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⊕ ⨁ start_POSTSUBSCRIPT italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ∖ { italic_a } end_POSTSUBSCRIPT ⟦ italic_F ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_I end_ARG , italic_η [ italic_x ↦ italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] end_POSTSUBSCRIPT ⊗ italic_p start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )
FI^,η[xa]F1aΩn+1{a}FI^,η[xa]F0\displaystyle\llbracket F\rrbracket^{\text{F}}_{\hat{I},\eta[x\mapsto a]}% \operatorname{\otimes}1\oplus\bigoplus_{a^{\prime}\in\Omega_{n+1}\setminus\{a% \}}\llbracket F\rrbracket^{\text{F}}_{\hat{I},\eta[x\mapsto a^{\prime}]}% \operatorname{\otimes}0⟦ italic_F ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_I end_ARG , italic_η [ italic_x ↦ italic_a ] end_POSTSUBSCRIPT ⊗ 1 ⊕ ⨁ start_POSTSUBSCRIPT italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ∖ { italic_a } end_POSTSUBSCRIPT ⟦ italic_F ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_I end_ARG , italic_η [ italic_x ↦ italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] end_POSTSUBSCRIPT ⊗ 0
FI^,η[xa]FaΩn+1{a}0=FI^,η[xa]F\displaystyle\llbracket F\rrbracket^{\text{F}}_{\hat{I},\eta[x\mapsto a]}% \oplus\bigoplus_{a^{\prime}\in\Omega_{n+1}\setminus\{a\}}0=\llbracket F% \rrbracket^{\text{F}}_{\hat{I},\eta[x\mapsto a]}⟦ italic_F ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_I end_ARG , italic_η [ italic_x ↦ italic_a ] end_POSTSUBSCRIPT ⊕ ⨁ start_POSTSUBSCRIPT italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ∖ { italic_a } end_POSTSUBSCRIPT 0 = ⟦ italic_F ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over^ start_ARG italic_I end_ARG , italic_η [ italic_x ↦ italic_a ] end_POSTSUBSCRIPT

where we again use the boundary conditions of the t-norm tensor-product\operatorname{\otimes} (1x=x1tensor-product𝑥𝑥1\operatorname{\otimes}x=x1 ⊗ italic_x = italic_x) and t-conorm (0x=xdirect-sum0𝑥𝑥0\oplus x=x0 ⊕ italic_x = italic_x).

Appendix D Relation of Probabilistic semantics to the Semantic Loss

Here, we show why the probabilistic semantics is equivalent to the weighted model counting semantics used in, for instance, the Semantic Loss. Let F𝐹Fitalic_F be a closed formula without any statements x:=f(T1,,Tn)(F)assign𝑥𝑓subscript𝑇1subscript𝑇𝑛superscript𝐹x:=f(T_{1},\dots,T_{n})(F^{\prime})italic_x := italic_f ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ( italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) that only involves variables x1,,xnsubscript𝑥1subscript𝑥𝑛x_{1},...,x_{n}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT over finite domains. The weighted model count (WMC) is the evaluation of the classical semantics weighted by probabilities of the assignments to variables. These probabilities are often assumed to be independent, although our framework also allows for the probabilities to depend on previous variables. This is illustrated in Example A.2. The definition of the WMC is

𝖶𝖬𝖢𝖶𝖬𝖢\displaystyle\mathsf{WMC}sansserif_WMC =a1Ω1anΩni=1npfi(ai)FI,{x1a1,,xnan}C\displaystyle=\sum_{a_{1}\in\Omega_{1}}...\sum_{a_{n}\in\Omega_{n}}\prod_{i=1}% ^{n}{p}_{f_{i}}(a_{i})\llbracket F\rrbracket^{\text{C}}_{I,\{x_{1}\mapsto a_{1% },...,x_{n}\mapsto a_{n}\}}= ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT … ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⟦ italic_F ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↦ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ↦ italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } end_POSTSUBSCRIPT (30)
=a1Ω1pf1(a1)anΩnpfn(an)FI,{x1a1,,xnan}C.\displaystyle=\sum_{a_{1}\in\Omega_{1}}{p}_{f_{1}}(a_{1})...\sum_{a_{n}\in% \Omega_{n}}{p}_{f_{n}}(a_{n})\llbracket F\rrbracket^{\text{C}}_{I,\{x_{1}% \mapsto a_{1},...,x_{n}\mapsto a_{n}\}}.= ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) … ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⟦ italic_F ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↦ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ↦ italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } end_POSTSUBSCRIPT .

Next, we rewrite this into a program x1:=f1(),,xn:=fn()(F)formulae-sequenceassignsubscript𝑥1subscript𝑓1assignsubscript𝑥𝑛subscript𝑓𝑛𝐹x_{1}:=f_{1}(),...,x_{n}:=f_{n}()\ (F)italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ) , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT := italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ) ( italic_F ) such that the probabilistic semantics in Definition 3.5 is equal to the weighted model count. For ease of notation, let us denote Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT each statement xi:=fi()assignsubscript𝑥𝑖subscript𝑓𝑖x_{i}:=f_{i}()italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ) for i=1,,n𝑖1𝑛i=1,...,nitalic_i = 1 , … , italic_n. Then, we find the probabilistic semantics of the program by sequentially expanding the interpretation of the statements:

S1,,Sn(F)I,{}Psubscriptsuperscriptsubscript𝑆1subscript𝑆𝑛𝐹P𝐼\displaystyle\llbracket S_{1},...,S_{n}(F)\rrbracket^{\text{P}}_{I,\{\}}⟦ italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_F ) ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , { } end_POSTSUBSCRIPT =a1Ω1pf1(a1)S2,,Sn(F)I,{x1a1}Pabsentsubscriptsubscript𝑎1subscriptΩ1subscript𝑝subscript𝑓1subscript𝑎1subscriptsuperscriptsubscript𝑆2subscript𝑆𝑛𝐹P𝐼maps-tosubscript𝑥1subscript𝑎1\displaystyle=\sum_{a_{1}\in\Omega_{1}}{p}_{f_{1}}(a_{1})\cdot\llbracket S_{2}% ,...,S_{n}(F)\rrbracket^{\text{P}}_{I,\{x_{1}\mapsto a_{1}\}}= ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⋅ ⟦ italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_F ) ⟧ start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↦ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } end_POSTSUBSCRIPT (31)
\displaystyle\dots
=a1Ω1pf1(a1)anΩnpfn(an)FI,{x1a1,,xnan}C\displaystyle=\sum_{a_{1}\in\Omega_{1}}{p}_{f_{1}}(a_{1})...\sum_{a_{n}\in% \Omega_{n}}{p}_{f_{n}}(a_{n})\llbracket F\rrbracket^{\text{C}}_{I,\{x_{1}% \mapsto a_{1},...,x_{n}\mapsto a_{n}\}}= ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) … ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⟦ italic_F ⟧ start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↦ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ↦ italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } end_POSTSUBSCRIPT
=𝖶𝖬𝖢absent𝖶𝖬𝖢\displaystyle=\mathsf{WMC}= sansserif_WMC

where in the last step we use that since the domains are finite and F𝐹Fitalic_F does not contain statements, the probabilistic semantics of F𝐹Fitalic_F is equal to the classic one.

Appendix E Relation of Fuzzy Semantics to Differentiable Fuzzy Logics

Fuzzy logics are actively used in NeSy [4, 49, 10, 21]. We show how existing NeSy systems using fuzzy logics arise from the fuzzy semantics of ULLER. Existing fuzzy logics systems align with our interpretations of terms and logical operators, but differ in their use of fuzzy predicates, which are interpreted as functions to [0,1]01[0,1][ 0 , 1 ], that is, INeSy(P):Ω1××Ωn[0,1]:subscript𝐼NeSy𝑃subscriptΩ1subscriptΩ𝑛01I_{\mathrm{NeSy}}(P):\Omega_{1}\times\dots\times\Omega_{n}\rightarrow[0,1]italic_I start_POSTSUBSCRIPT roman_NeSy end_POSTSUBSCRIPT ( italic_P ) : roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × ⋯ × roman_Ω start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → [ 0 , 1 ]. Then, the truth value of a formula is computed by evaluating the formula with the fuzzy semantics.

We can emulate this in our fuzzy semantics with the true()true\mathrm{true}()roman_true ( ) predicate and proof by induction. For each neural predicate INeSy(Pi):Ω1i××Ωnii[0,1]:subscript𝐼NeSysubscript𝑃𝑖subscriptsuperscriptΩ𝑖1subscriptsuperscriptΩ𝑖subscript𝑛𝑖01I_{\mathrm{NeSy}}(P_{i}):\Omega^{i}_{1}\times\dots\times\Omega^{i}_{n_{i}}% \rightarrow[0,1]italic_I start_POSTSUBSCRIPT roman_NeSy end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) : roman_Ω start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × ⋯ × roman_Ω start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT → [ 0 , 1 ], we define a ULLER function I(fi):Ω1i××Ωnii({0,1}[0,1]):𝐼subscript𝑓𝑖subscriptsuperscriptΩ𝑖1subscriptsuperscriptΩ𝑖subscript𝑛𝑖0101I(f_{i}):\Omega^{i}_{1}\times\dots\times\Omega^{i}_{n_{i}}\rightarrow(\{0,1\}% \rightarrow[0,1])italic_I ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) : roman_Ω start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × ⋯ × roman_Ω start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT → ( { 0 , 1 } → [ 0 , 1 ] ) such that:

INeSy(Pi)(T1i,,Tnii)=I(fi)(T1i,,Tnii)(1)subscript𝐼NeSysubscript𝑃𝑖subscriptsuperscript𝑇𝑖1subscriptsuperscript𝑇𝑖subscript𝑛𝑖𝐼subscript𝑓𝑖subscriptsuperscript𝑇𝑖1subscriptsuperscript𝑇𝑖subscript𝑛𝑖1I_{\mathrm{NeSy}}(P_{i})(T^{i}_{1},\dots,T^{i}_{n_{i}})=I(f_{i})(T^{i}_{1},% \dots,T^{i}_{n_{i}})(1)italic_I start_POSTSUBSCRIPT roman_NeSy end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = italic_I ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ( 1 ) (32)

Let F𝐹Fitalic_F be a first-order logic formula with no statements nor functions, and FNeSy\llbracket F\rrbracket^{\text{NeSy}}⟦ italic_F ⟧ start_POSTSUPERSCRIPT NeSy end_POSTSUPERSCRIPT be its interpretation in a fuzzy NeSy system. Let F𝐹Fitalic_F contain k𝑘kitalic_k neural atoms Pi(T1i,,Tnii)subscript𝑃𝑖subscriptsuperscript𝑇𝑖1subscriptsuperscript𝑇𝑖subscript𝑛𝑖P_{i}(T^{i}_{1},\dots,T^{i}_{n_{i}})italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ), i=1k𝑖1𝑘i=1\dots kitalic_i = 1 … italic_k. Let S1,,Sk(F)subscript𝑆1subscript𝑆𝑘superscript𝐹S_{1},\dots,S_{k}\ (F^{\prime})italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) be a ULLER program with k𝑘kitalic_k statements where Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT defines xi:=fi(T1i,,Tnii)assignsubscript𝑥𝑖subscript𝑓𝑖subscriptsuperscript𝑇𝑖1subscriptsuperscript𝑇𝑖subscript𝑛𝑖x_{i}:=f_{i}(T^{i}_{1},\dots,T^{i}_{n_{i}})italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ), i=1,,k𝑖1𝑘i=1,\dots,kitalic_i = 1 , … , italic_k, and Fsuperscript𝐹F^{\prime}italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is F𝐹Fitalic_F where we replace every mention of Pi(T1i,,Tnii)subscript𝑃𝑖subscriptsuperscript𝑇𝑖1subscriptsuperscript𝑇𝑖subscript𝑛𝑖P_{i}(T^{i}_{1},\dots,T^{i}_{n_{i}})italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) by true(xi)truesubscript𝑥𝑖\mathrm{true}(x_{i})roman_true ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). We have:

S1,,Sk(F)Fsuperscriptsubscript𝑆1subscript𝑆𝑘superscript𝐹F\displaystyle\llbracket S_{1},\dots,S_{k}\ (F^{\prime})\rrbracket^{\text{F}}⟦ italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT =FI,η[x1pf1(1|T11,,Tn11),,xkpfk(1|T1k,,Tnkk)]F\displaystyle=\llbracket F^{\prime}\rrbracket^{\text{F}}_{I,\eta[x_{1}\mapsto p% _{f_{1}}(1|T^{1}_{1},\dots,T^{1}_{n_{1}}),\dots,x_{k}\mapsto p_{f_{k}}(1|T^{k}% _{1},\dots,T^{k}_{n_{k}})]}= ⟦ italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↦ italic_p start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 1 | italic_T start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , … , italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ↦ italic_p start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 1 | italic_T start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] end_POSTSUBSCRIPT (33)
=FI,η[x1INeSy(P1)(T11,,Tn11),,xkINeSy(Pk)(T1k,,Tnkk)]F\displaystyle=\llbracket F^{\prime}\rrbracket^{\text{F}}_{I,\eta[x_{1}\mapsto I% _{\mathrm{NeSy}}(P_{1})(T^{1}_{1},\dots,T^{1}_{n_{1}}),\dots,x_{k}\mapsto I_{% \mathrm{NeSy}}(P_{k})(T^{k}_{1},\dots,T^{k}_{n_{k}})]}= ⟦ italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I , italic_η [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ↦ italic_I start_POSTSUBSCRIPT roman_NeSy end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_T start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , … , italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ↦ italic_I start_POSTSUBSCRIPT roman_NeSy end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ( italic_T start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] end_POSTSUBSCRIPT (34)
=FNeSy\displaystyle=\llbracket F\rrbracket^{\text{NeSy}}= ⟦ italic_F ⟧ start_POSTSUPERSCRIPT NeSy end_POSTSUPERSCRIPT (35)

Equality (34) stems from definition (32). We derive equality (35) by induction. First, note that according to the definition of I(true)𝐼trueI(\mathrm{true})italic_I ( roman_true ) and the assignment in (34), we have:

true(xi)F=INeSy(Pi)(T1i,,Tnii)=P(T1i,,Tnii)NeSyfor i=1,,k\llbracket\mathrm{true}(x_{i})\rrbracket^{\text{F}}=I_{\mathrm{NeSy}}(P_{i})(T% ^{i}_{1},\dots,T^{i}_{n_{i}})=\llbracket P(T^{i}_{1},\dots,T^{i}_{n_{i}})% \rrbracket^{\text{NeSy}}\quad\text{for }i=1,\dots,k⟦ roman_true ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT = italic_I start_POSTSUBSCRIPT roman_NeSy end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = ⟦ italic_P ( italic_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ⟧ start_POSTSUPERSCRIPT NeSy end_POSTSUPERSCRIPT for italic_i = 1 , … , italic_k (36)

If our semantics use the same t-norm operator tensor-product\operatorname{\otimes} as the NeSy system, then:

F1F2F=F1FF2F=F1NeSyF2NeSy=F1F2NeSy\llbracket F_{1}\land F_{2}\rrbracket^{\text{F}}=\llbracket F_{1}\rrbracket^{% \text{F}}\operatorname{\otimes}\llbracket F_{2}\rrbracket^{\text{F}}=% \llbracket F_{1}^{\prime}\rrbracket^{\text{NeSy}}\operatorname{\otimes}% \llbracket F_{2}^{\prime}\rrbracket^{\text{NeSy}}=\llbracket F_{1}^{\prime}% \land F_{2}^{\prime}\rrbracket^{\text{NeSy}}⟦ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT = ⟦ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT ⊗ ⟦ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT = ⟦ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟧ start_POSTSUPERSCRIPT NeSy end_POSTSUPERSCRIPT ⊗ ⟦ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟧ start_POSTSUPERSCRIPT NeSy end_POSTSUPERSCRIPT = ⟦ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∧ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟧ start_POSTSUPERSCRIPT NeSy end_POSTSUPERSCRIPT (37)

where in the second equality we use the induction hypothesis F1F=F1NeSy\llbracket F_{1}\rrbracket^{\text{F}}=\llbracket F_{1}^{\prime}\rrbracket^{% \text{NeSy}}⟦ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT = ⟦ italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟧ start_POSTSUPERSCRIPT NeSy end_POSTSUPERSCRIPT and F2F=F2NeSy\llbracket F_{2}\rrbracket^{\text{F}}=\llbracket F_{2}^{\prime}\rrbracket^{% \text{NeSy}}⟦ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT = ⟦ italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟧ start_POSTSUPERSCRIPT NeSy end_POSTSUPERSCRIPT. The same can naturally be derived for other logical connectives. It follows that we can emulate any formula F𝐹Fitalic_F built with the neural predicates P(T1i,,Tnii)𝑃subscriptsuperscript𝑇𝑖1subscriptsuperscript𝑇𝑖subscript𝑛𝑖P(T^{i}_{1},\dots,T^{i}_{n_{i}})italic_P ( italic_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ), by building formula Fsuperscript𝐹F^{\prime}italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with the equivalently interpreted true(xi)truesubscript𝑥𝑖\mathrm{true}(x_{i})roman_true ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (see Equation (34)) and the same logical constructs, such that FNeSy=S1,,Sk(F)F\llbracket F\rrbracket^{\text{NeSy}}=\llbracket S_{1},\dots,S_{k}\ (F^{\prime}% )\rrbracket^{\text{F}}⟦ italic_F ⟧ start_POSTSUPERSCRIPT NeSy end_POSTSUPERSCRIPT = ⟦ italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⟧ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT.