Disjunctive Policies for Database-Backed Programs

Amir M. Ahmadian KTH Royal Institute of Technology
Matvey Soloviev KTH Royal Institute of Technology
Musard Balliu KTH Royal Institute of Technology

Abstract

When specifying security policies for databases, it is often natural to formulate disjunctive dependencies, where a piece of information may depend on at most one of two dependencies $P_{1}$ or $P_{2}$ , but not both. A formal semantic model of such disjunctive dependencies, the Quantale of Information, was recently introduced by Hunt and Sands as a generalization of the Lattice of Information. In this paper, we seek to contribute to the understanding of disjunctive dependencies in database-backed programs and introduce a practical framework to statically enforce disjunctive security policies. To that end, we introduce the Determinacy Quantale, a new query-based structure which captures the ordering of disjunctive information in databases. This structure can be understood as a query-based counterpart to the Quantale of Information. Based on this structure, we design a sound enforcement mechanism to check disjunctive policies for database-backed programs. This mechanism is based on a type-based analysis for a simple imperative language with database queries, which is precise enough to accommodate a variety of row- and column-level database policies flexibly while kee** track of disjunctions due to control flow. We validate our mechanism by implementing it in a tool, DiVerT, and demonstrate its feasibility on a number of use cases.

I Introduction

Database security and information flow security have largely evolved as two disparate areas [1, 2], while sharing closely-related foundations and mechanisms to enforce security. Modern applications commonly rely on shared database backends to provide rich functionality to a multitude of mutually distrusting users. In response to frontend demands, database query languages, with features such as triggers, store procedures, and user-defined functions, have increasingly come to resemble full-fledged programming languages, thus calling into question the adequacy of the underlying access control models [3, 4]. A security policy describes the totality of expectations that we have of a computer system in the face of adversaries that seek to satisfy objectives that may differ from ours. In the context of database systems, whose purpose is to retain and provide information, the security policies of interest constrain who is allowed to learn what parts of that information. A class of such security policies which has proven particularly challenging to enforce with the methods of database security are disjunctive policies, which states that given two pieces of information, some entity may either learn one or the other, but not both.

A common example of disjunctive policies are databases which contain personally identifiable information, such as medical trial data. Biometric parameters of participants are important confounders that must be considered when drawing conclusions from the data, but at the same time releasing too many parameters of any one participant (such as their height, age and weight) might be sufficient to deanonymize them with high confidence [5]. Hence, a security policy for such a database may specify that the user may learn height and age, or height and weight, or age and weight, but not all three. Other examples of scenarios where disjunctive policies are useful include differential privacy [6] and secret sharing.

In this paper, we combine insights from database security and information flow research to develop a formal model for reasoning about disjunctive information in database-backed programs, and thus take a step towards reconciling the two fields. Our model makes it possible to reason about the semantic information dependencies in a program that performs queries, and compare them against a disjunctive policy. Building upon this, we propose a provably sound static enforcement mechanism that ensure that the policy is satisfied.

It is customary in information flow models to represent information as an equivalence relation on states, with the refinement order of equivalence relations corresponding to having more information. This representation can be used for both the actual information conveyed by a computational process and the bound imposed on it as part of a simple, non-disjunctive security policy. The possible equivalence relations on a given universe of states form a structure called the Lattice of Information (LoI) [7], in which security-relevant questions can be answered, such as whether a program reveals no more information than is allowed by the security policy, or what information is revealed by the combination of two programs. Similar questions have been addressed in the database community using an analogous object called the Disclosure Lattice [8]. We observe that this definition is actually insufficient to characterize information, which motivates us to introduce a more specific structure based on query determinacy, the Determinacy Lattice (DL). The formal relation between the Disclosure Lattice or our definition and LoI was hitherto unexplored, and more importantly neither of them can be used to represent disjunctions as seen in our motivating example.

Recently, Hunt and Sands [9] proposed a new information flow structure called the Quantale of Information (QoI), which seeks to address this shortcoming and establish a formal setting for representing, combining and comparing disjunctions of information. We build upon this work to introduce an analogous structure, the Determinacy Quantale (DQ), representing disjunctive dependencies in database-backed programs. As we show, this structure can be formally related to the QoI, and this relationship is analogous to that between the LoI and the DL. We then use the DQ to design a knowledge-based security condition that relates disjunctive dependencies in database-backed programs to disjunctive policies.

We are the first to address the problem of enforcing disjunctive policies. Prior works that develop language-based enforcement techniques in database-backed applications do not support disjunctive policies, while database-level dependencies are restricted to coarse approximations that incorrectly reject secure programs, such as our previous example [10, 11, 12, 13, 14].

Perhaps unsurprisingly, path sensitivity of a static analysis is key to capturing disjunctive dependencies. We show how standard flow-sensitive type-based dependency analysis [15] can be adapted to a compositional path-sensitive analysis and thus capture disjunctive dependencies in terms of database queries. To represent these dependencies in the DQ model, we introduce a sound approximation of the information disclosed by each database query which is precise enough to represent complex combinations of both row- and column-level dependencies. Finally, in the DQ, the combination of these analyses can be proven sound with respect to our security condition. We expect that the overall architecture of the resulting soundness proof, in which we relate a sequence of abstractions of the behaviour of a program to ordered elements of the DQ, can be generalized to many other enforcement mechanisms for our security condition.

To demonstrate the practicality of our approach, we implement this type-based dependency analysis and query approximation for database-backed programs and evaluate it on a test suite and some use cases which effectively illustrate the need for disjunctive dependencies and disjunctive policies.

Summary of contributions.

•

We introduce a formal model for reasoning about disjunctive dependencies and policies in databases. In the process, we show how to reconcile perspectives from the database security and information flow communities.
•

We introduce a database-specific model of knowledge, the Determinacy Lattice, and a disjunctive extension, called the Determinacy Quantale, and explore their relationship to established general-purpose semantic models.
•

Using our model, we define an extensional security condition for database-backed programs that accommodates disjunctive policies.
•

We propose a type-based program analysis to capture disjunctive dependencies in database-backed programs, combine them with a novel abstraction of queries, and prove them sound with respect to our security condition. This is presented as an instance of a generalizable architecture for such soundness proofs.
•

We implement a prototype tool that uses type-based dependency analysis and query approximation to verify query-based disjunctive policies for database-backed programs, and demonstrate its feasibility on a test suite and a number of use cases.

The rest of paper is structured as follows. After reviewing preliminaries in Section II, we give our account of the DL and introduce the DQ in Section III-C. In Section IV-B, we formalize our model of database-backed programs and the security policies we impose on them, culminating in a formal security condition. We present enforcement mechanisms in Section V, and their implementation and evaluation in Section VI. In Section VII, we contextualize our contributions with a discussion of related work, and finally summarize conclusions in Section VIII.

II Background

II-A Lattice of Information

An equivalence relation ${\sim}\subseteq A\times A$ on a set $A$ is a binary relation that is reflexive, symmetric, and transitive. For example, the equivalence relation parity on the set $A=\{0,1,2,3\}$ is defined as $\{(x,y)\mid x,y\in A\wedge x\ mod\ 2=y\ mod\ 2\}$ . An equivalence relation partitions its underlying domain into disjoint equivalence classes. Given an equivalence relation $P$ on a set $A$ and $a\in A$ , $[a]_{P}$ denotes the unique equivalence class induced by $P$ that $a$ belongs to. We write $[P]$ to denote the set of all equivalence classes induced by $P$ . We call $[P]$ a partition of $A$ and hereafter we may also refer to each element, i.e. equivalence class, of the partition $[P]$ as a cell. For example, parity partitions $A$ into cells $\{0,2\}$ and $\{1,3\}$ .

Equivalence relations over states are commonly used to represent an agent’s knowledge, by relating two states whenever the agent cannot distinguish between them. When an equivalence relation models knowledge, we also call the cells induced by it knowledge sets. These have a distinct intuitive interpretation when we consider functions $f$ that take in some state and return an agent’s view of it. We will write the equivalence relation induced by the output of $f$ as $\sim_{f}=\{(x,y)\mid f(x)=f(y)\}$ . In that case, in a state $a$ , the knowledge set $[a]_{\sim_{f}}$ represents the agent’s remaining uncertainty about the state, in the sense of all the states that the agent still considers possible, after observing the output of $f$ . The agent knows anything that is true in all states in the knowledge set. In this paper, we use the terms knowledge and information interchangeably.

A complete lattice is a set equipped with a partial ordering (reflexive, antisymmetric, and transitive) relation, maximal and minimal elements $\top$ and $\bot$ for this relation and a join (least upper bound) for any subset of elements. The meet (greatest lower bound) of a subset can be defined as the join of the set of all lower bounds of that subset [16]. The Lattice of Information (LoI) [7] is a structure for representing the ordering of information with equivalence relations. Let $\mathcal{L}(A)$ be the set of all equivalence relations defined on a given domain $A$ . The LoI ranks these equivalence relations based on the information they reveal about the underlying domain. Given two equivalence relations $P,Q\in\mathcal{L}(A)$ , this ordering can be defined as follows:

\displaystyle P\sqsubseteq Q\rightarrow\forall a,a^{\prime}\in A\ \ (a\ Q\ a^{% \prime}\Rightarrow a\ P\ a^{\prime})

For any set $S\subseteq\mathcal{L}(A)$ , the least upper bound of $S$ is the equivalence relation $R$ defined as:

\displaystyle\forall x,y\in A\ (x\ R\ y\leftrightarrow\forall P\in S.\ x\ P\ y).

Formally, $LoI(A)=\langle\mathcal{L}(A),\sqsubseteq,\bigsqcup\rangle$ denotes the LoI on domain $A$ , with ordering relation $\sqsubseteq$ and join $\bigsqcup$ . The top element $\top$ in the lattice is the most precise equivalence relation id such that $\texttt{id}=\{(x,y)\mid x,y\in A\wedge x=y\}$ , and the bottom element $\bot$ is the least precise equivalence relation $\texttt{all}=\{(x,y)\mid x,y\in A\}$ .

The join of any two equivalence relations $P\sqcup Q$ , being their least upper bound, is the least informative equivalence relation that is at least as informative as either of $P$ and $Q$ (i.e. is an upper bound on both), and thus represents the information that is conveyed from learning both $P$ and $Q$ . We refer to this as the conjunction of the information in $P$ and $Q$ .

II-B Quantale of Information

The LoI captures the conjunction of any two information sources $P$ and $Q$ as the join of their respective equivalence relations. However, it does not offer an operator that would yield a representation of their disjunction, that is, the information that can be obtained from having access to one of them, but not both. In fact, the disjunction can not in general be represented as a single equivalence relation, and thus an element of the LoI, at all. To address this limitation, Hunt and Sands [9] propose a generalization of the LoI called the Quantale of Information (QoI). A quantale is a complete lattice with an additional binary “tensor” operator $\otimes$ . In the QoI, the tensor is used to represent conjunction, while the lattice join represents disjunction.

The core idea behind the quantale structure is to interpret the disjunction $P_{1}\vee\ldots\vee P_{n}$ of several knowledge relations as describing all knowledge relations $R$ in which the knowledge always comes from one of the $P_{i}$ . More concretely, in any possible state $a\in A$ , the agent’s knowledge $[a]_{R}$ should equal its knowledge in the same state in one of the disjuncts, $[a]_{P_{i}}$ . Which disjunct it is may depend on the state, so the agent may have knowledge from $P_{i}$ in the state $a$ but knowledge from $P_{j}$ in some other state $a^{\prime}$ . Relations $R$ that satisfy this condition are called tilings, based on a picture of covering (since every state needs to be in some equivalence class) the space of possible states $A$ with knowledge sets drawn from any of the disjuncts. Following Hunt and Sands, we define the set of all tilings

\displaystyle\mathrm{mix}(\mathbb{P})=\{R\in LoI(A)\mid x\in[R]\Rightarrow(% \exists P\in\mathbb{P}.\,x\in[P])\},

where $\mathbb{P}$ is a set of equivalence relations.

We would like to think of a relation $R^{\prime}$ as describing no more knowledge than a disjunction $\bigvee\mathbb{P}$ if it’s bounded above by some $R\in\mathrm{mix}(\mathbb{P})$ in the LoI, and more generally define the quantale ordering $\mathbb{S}\sqsubseteq\mathbb{T}$ for $\mathbb{S},\mathbb{T}\subseteq\mathcal{L}(A)$ as $\forall S\in\mathbb{S},\ \exists T\in\mathbb{T}.\ S\sqsubseteq T$ . The resulting relation is not antisymmetric on general sets of relations or even $\mathrm{mix}$ es of general sets, reflecting the circumstance that there may be multiple $\mathrm{mix}$ es representing the same knowledge. As it is standard in lattice theory [17], we use the downwards closure operator $\Downarrow$ to obtain canonical representations of the order cycles of $\sqsubseteq$ and hence construct a partial order.

\displaystyle{\Downarrow}\mathbb{P}=\{Q\in LoI(A)\mid Q\sqsubseteq\mathbb{P}\}

The tiling closure of a set of equivalence relations $\mathbb{P}$ ,

\displaystyle\mathrm{tc}(\mathbb{P})={\Downarrow}\mathrm{mix}(\mathbb{P}),

then canonically represents the knowledge permitted by the disjunction $\bigvee\mathbb{P}$ . The set $\mathrm{tc}(\mathbb{P})$ can still be interpreted as a list of possible equivalence relations, now including any equivalence relation that does not reveal more information than the disjunction.

We then take the elements of the QoI on a state set $A$ to be all tiling closures of subsets of $A$ , with the ordering $\sqsubseteq$ being set inclusion. For the tensor $\mathbb{P}\otimes\mathbb{Q}=\mathrm{tc}(\{P\sqcup Q\mid P\in\mathbb{P},Q\in% \mathbb{Q}\})$ , we rely on the join operator of the LoI $\sqcup$ to calculate the least upper bound of any possible pair of equivalence relations in $\mathbb{P}$ and $\mathbb{Q}$ and then canonicalise the result. Since the sets are interpreted disjunctively, the join $\bigvee_{i}\mathbb{P}_{i}$ can simply be defined as $\mathrm{tc}(\bigcup_{i}\mathbb{P}_{i})$ .

Example 1.

Program 1 operates on a secret integer $x$ between -2 and 3, outputting to user $u$ whether it is greater than zero, and either (if it isn’t) whether it is even, or (if it is) whether it equals 0 or 1 (by dividing by 2, rounding down and testing for 0). We expect the information released by the program ( $\sim_{\mathrm{\mathrm{prg}}}$ in Fig. 1) to be bounded by the disjunction of the knowledge relations capturing the two possible branches (resp. $Q$ , $P$ ).

⬇

1if (x <= 0) then

2 out(-1 ,u);

3 out(x mod 2 == 0, u);

4else

5 out(1, u);

6 out(x div 2 == 0, u);

Program 1:

This could not be accurately expressed with LoI operations, since $Q$ , $P$ and $\sim_{\mathrm{\mathrm{prg}}}$ are all incomparable, but the join of $Q$ and $P$ (as the only available nontrivial way of combining them) is equal to $\top$ and so would equally bound a program that directly releases $x$ . However, $\sim_{\mathrm{\mathrm{prg}}}$ can be tiled with equivalence classes from $Q$ and $P$ , and we in fact have $\mathrm{mix}(\{Q,P\})=\{Q,P,R,\sim_{\mathrm{\mathrm{prg}}}\}$ . So in the QoI, $\mathrm{tc}(\{\sim_{\mathrm{\mathrm{prg}}}\})\sqsubseteq\mathrm{tc}(\{Q,P\})$ , and hence ${\sim_{\mathrm{\mathrm{prg}}}}\sqsubseteq Q\vee P$ .

-2	-1
0	1
2	3

-2	-1
0	1
2	3

-2	-1
0	1
2	3

-2	-1
0	1
2	3

-2	-1
0	1
2	3

Figure 1: Some equivalence relations on

\{-2,-1,0,1,2,3\}

III Information Ordering in Databases

Our goal is to introduce our semantic model for the information revealed by database queries, the Determinacy Lattice, and its extension to disjunctive dependencies, the Determinacy Quantale. To this end, we first review a standard formalism for reasoning about databases that we will employ.

III-A A Primer on Relational Database Models

We use the relational model to formally define databases [18]. In this model, we distinguish between the database schema $D$ , which specifies the structure of the database, and the database state $db$ , which specifies its actual content.

A database schema $D$ is a (nonempty) finite set of relation schemas $t$ , written as $D=\{t_{1},...,t_{n}\}$ . A relation schema (table) $t$ is defined as a set of attributes paired with a set of constraints, where an attribute is a name paired with a domain. The number of attributes in $t$ (written as $|t|$ ) is referred to as its arity. A tuple is a set of data representing a single record within a relation schema. Each tuple contains values for each attribute defined in the relation schema.

A database state $db$ is a snapshot of the database schema $D$ at a particular point in time. It represents the actual data stored in the database, consisting of a collection of tables and their respective tuples. We write ${\llbracket{t}\rrbracket^{db}}$ to represent the tuples of table $t$ under database state $db$ .

We write $\mathrm{states}(D)$ to denote the set of all database states of $D$ . A database configuration is $\langle D,\Gamma\rangle$ where $D$ is the database schema and $\Gamma$ is a set of integrity constraints. We denote $\Omega_{D}=\{db\mid db\in\mathrm{states}(D)\ \wedge\vdash db:\Gamma\}$ where $\vdash$ is an appropriate notion of constraint $\Gamma$ being satisfied. An integrity constraint is an assertion about a database that must be satisfied for a database state to be considered valid. Various classes of integrity constraints exist, for instance functional dependencies which capture primary-key constraints, and inclusion dependencies which are used in foreign-key constraints [18].

Relational calculus. We rely on the Domain Relational Calculus (DRC) for our query language. In the DRC, a (non-boolean) query $q$ over a database schema $D$ has the form $\{\overline{x}\mid\phi\}$ , where $\overline{x}$ is a sequence of variables, $\phi$ is a first order formula over $D$ , and the free variables of $\phi$ are those in $\overline{x}$ . The evaluation of a query $q$ , denoted by ${\llbracket{q}\rrbracket^{db}}$ , is the set of tuples that satisfy the formula $\phi$ with respect to $db$ . A boolean query is written as $\{\ \mid{\phi}\}$ , and its evaluation ${\llbracket{q}\rrbracket^{db}}$ is defined to be the boolean value $\mathsf{true}$ if and only if some tuple in $db$ satisfies $\phi$ . We use $\mathcal{Q}$ to indicate the universe of all possible queries.

The domain relational calculus employed here follows the standard convention, and we refer the reader to the relevant literature for a more comprehensive description of DRC [18].

\displaystyle\mathrm{emp}:\

name

role

salary

\displaystyle\mathrm{mng}:\

division

manager

Figure 2: Database schema for employees and managers

Example 2.

The database schema in Fig. 2 contains relations for employees $\mathrm{emp}$ and managers $\mathrm{mng}$ . A query returning the set of tuples containing the division names and the salary of the managers of each division can be written as:

\displaystyle\{(d,s)\mid\

\displaystyle\exists n,r.\ \mathrm{emp}(n,r,s)\wedge\exists m.\ \mathrm{mng}(d% ,m)\wedge n=m\}.

Views. In DRC, a database view is a relation defined by the result of a non-boolean query. Database views act as virtual tables and, as we will see, are useful when defining security policies. Formally, a view $v$ defined over database schema $D$ is a tuple $\langle id,q\rangle$ , where $id$ is the view identifier and $q$ is the non-boolean query over schema $D$ defining the view. The query $q$ may refer to other views, but we assume that views do not have cyclic dependencies.

The materialization of a view $v$ in a database state $db$ is the evaluation of its defining query $q$ in that state, i.e., ${\llbracket{q}\rrbracket^{db}}$ . We use $v.q$ to refer to the defining query of view $v$ . We extend relational calculus in the standard way to work with views [3].

III-B Determinacy Lattice

Given query sets $Q,Q^{\prime}\in\mathcal{P}{(\mathcal{Q})}$ , query determinacy [19] captures whether results of the queries in $Q$ are always sufficient to determine the result of the queries in $Q^{\prime}$ .

Definition 1.

$Q$ determines $Q^{\prime}$ (denoted by $Q\twoheadrightarrow Q^{\prime}$ ) iff for all database states $db_{1}$ , $db_{2}$ , if ${\llbracket{q}\rrbracket^{db_{1}}}$ = ${\llbracket{q}\rrbracket^{db_{2}}}$ for all $q\in Q$ , then ${\llbracket{q^{\prime}}\rrbracket^{db_{1}}}$ = ${\llbracket{q^{\prime}}\rrbracket^{db_{2}}}$ for all $q^{\prime}\in Q^{\prime}$ .

Intuitively, $Q\twoheadrightarrow Q^{\prime}$ means that pairs of databases for which all queries in $Q$ return the same result also give the same result under any query in $Q^{\prime}$ . This is in fact equivalent to the initial gloss that the results of queries in $Q^{\prime}$ can be computed from the results of queries in $Q$ , as we show in detail in Appendix A.

Query determinacy allows us to define an ordering on sets of queries based on the information they reveal. We call this ordering determinacy order, denote it by $\preceq$ , and define it as $\forall Q,Q^{\prime}\in\mathcal{P}{(\mathcal{Q})}$ , $Q\preceq Q^{\prime}$ iff $Q^{\prime}\twoheadrightarrow Q$ .

Example 3.

Consider queries $q_{1}=\{(n,r)\mid\exists s.\ \mathrm{emp}(n,r,s)\}$ and $q_{2}=\{(r)\mid\exists n,s.\ \mathrm{emp}(n,r,s)\}$ defined on the relations of Fig. 2. Query $q_{1}$ discloses the $\mathrm{name}$ and the $\mathrm{role}$ of the employees while $q_{2}$ only returns their $\mathrm{role}$ . Intuitively, $q_{1}$ reveals more information than $q_{2}$ , which means $q_{2}\preceq q_{1}$ .

This definition of determinacy order is a preorder (reflexive and transitive), but not necessarily a partial order, as it is not anti-symmetric. In other words, $q_{1}\preceq q_{2}$ and $q_{2}\preceq q_{1}$ does not necessarily mean that $q_{1}=q_{2}$ . As in Section II-A, this essentially means that query sets are not canonical representations of the information revealed by them. To rectify this, we form the closure ${\downarrow}$ under the determinacy order, so the determinacy order becomes set inclusion. Intuitively, ${\downarrow}Q$ will contain all the queries in $\mathcal{Q}$ whose answers can be inferred by the set of queries $Q$ . Formally, ${\downarrow}Q$ is defined as:

\displaystyle{\downarrow}Q=\{q\in\mathcal{Q}\mid\{q\}\preceq Q\}

Using the definitions of determinacy order and closure ${\downarrow}$ , we can then define the Determinacy Lattice as follows:

Definition 2.

Given a universe of queries $\mathcal{Q}$ , the Determinacy Lattice $DL(\mathcal{Q})$ is a complete lattice $\langle\mathcal{L},\sqsubseteq,\bigsqcup,\bot,\top\rangle$ such that:

•

$\mathcal{L}=\{{\downarrow}Q\mid Q\subseteq\mathcal{Q}\}$
•

${\downarrow}Q_{1}\sqsubseteq{\downarrow}Q_{2}$ iff $Q_{1}\preceq Q_{2}$
•

$\bigsqcup_{i}{\downarrow}Q_{i}={\downarrow}\bigcup_{i}Q_{i}$
•

$\bot={\downarrow}\varnothing$ , $\top={\downarrow}\mathcal{Q}$ ,

where $\preceq$ is the determinacy order on $\mathcal{Q}$ .

Disclosure order and information flow properties. Our definition of the Determinacy Lattice is similar to the definition of the Disclosure Lattice introduced by Bender et al. [8]. A Disclosure Lattice is a lattice built upon a disclosure order, which is a partial order on sets of queries satisfying additional conditions that are expected of an ordering according to the amount of information disclosed by each set of queries. Bender et al. [8] define the disclosure order as follows:

Definition 3.

Given a universe of queries $\mathcal{Q}$ , a disclosure order $\preceq$ is a preorder on $\mathcal{P}{(\mathcal{Q})}$ that satisfies the following properties:

1.

For all $Q_{1},Q_{2}\in\mathcal{P}{(\mathcal{Q})}$ , if $Q_{1}\subseteq Q_{2}$ then $Q_{1}\preceq Q_{2}$
2.

If $\mathbb{P}\subseteq\mathcal{P}{(\mathcal{Q})}$ and $\forall P\in\mathbb{P},\ P\preceq Q$ then $\bigcup\mathbb{P}\preceq Q$

The first property in this definition ensures that adding new elements to a set of queries only increases the amount of disclosed information and the second property allows us to derive a meaningful upper bound on the information disclosure.

The intended use of disclosure order was to order sets of queries based on the amount of information they reveal about the underlying database. However, we make the observation that this definition is not specific enough to characterize information disclosure in the information flow sense. For example, consider query containment [18], defined as:

Definition 4.

Given queries $q_{1},q_{2}\in\mathcal{Q}$ , we say that $q_{1}$ is contained in $q_{2}$ , denoted by $q_{1}\subseteq q_{2}$ , if for every database states $db\in\Omega_{D}$ , we have ${\llbracket{q_{1}}\rrbracket^{db}}\subseteq{\llbracket{q_{2}}\rrbracket^{db}}$ .

Query containment satisfies all of the requirements of a disclosure order (Def. 3), but it is not enough to guarantee security. To illustrate this, consider a database with a single table $t$ given in Fig. 3.

$vl$
$0$
$1$
$100+s$

Figure 3: Table

t

Table $t$ has a single column $vl$ , and contains values $0$ , $1$ , and $100+s$ , where $s$ is a secret value that can be either $0$ or $1$ . We thus consider two possible instances of this database, one where $t$ contains values $0$ , $1$ , and $100$ and another where it contains $0$ , $1$ , and $101$ . Now, consider the following queries:

	$\displaystyle q_{1}:\{(vl_{1})\mid\exists vl_{2}.\ t1(vl_{1})\wedge t_{2}(vl_{% 2})$	$\displaystyle\wedge vl_{1}<100\}$
	$\displaystyle q_{2}:\{(vl_{1})\mid\exists vl_{2}.\ t1(vl_{1})\wedge t_{2}(vl_{% 2})$	$\displaystyle\wedge vl_{1}<100$
		$\displaystyle\wedge vl_{1}=vl_{2}-100\}$

where $t_{1}$ and $t_{2}$ are just logical copies of table $t$ . It is common practice to make logical copies of relation and use them in queries with self-joins [20]. The result of query $q_{1}$ is always $0$ and $1$ . The result of query $q_{1}$ is $1$ if the secret $s$ is $1$ and $0$ if $s$ is $0$ . As it is evident, for these queries, query containment holds and the result of query $q_{2}$ is contained in the results of $q_{1}$ . However, an observer seeing the result of query $q_{2}$ can learn the value of secret $s$ .

This example illustrates that query containment (a disclosure order) is not sufficient to guarantee the confidentiality of the secret $s$ in an information flow setting. To ensure information flow security, we require a stronger condition, such as the notion of query determinacy order (Def. 1) that we chose to rely on in this paper.

Relation between the DL and the LoI. There exists a close relationship between the DL and the LoI. Specifically, a query $q$ defined over a database schema $D$ induces an equivalence relation ${q}_{\sim}$ on database states $db$ . We can formally define this equivalence relation as:

\displaystyle{q}_{\sim}=\{(db_{1},db_{2})\mid db_{1},db_{2}\in\Omega_{D}\wedge% {\llbracket{q}\rrbracket^{db_{1}}}={\llbracket{q}\rrbracket^{db_{2}}}\}

We write $[{q}_{\sim}]$ to denote the set of all equivalence classes induced by $q$ . Given an equivalence relation ${q}_{\sim}$ on set $\Omega_{D}$ and $db\in\Omega_{D}$ , $[db]_{{q}_{\sim}}$ denotes the equivalence class induced by ${q}_{\sim}$ to which the database state $db$ belongs. We further lift this definition to sets of queries $Q=\{q_{1},q_{2},...,q_{n}\}$ :

\displaystyle{Q}_{\sim}=\{(db_{1},db_{2})\mid db_{1},db_{2}\in\Omega_{D}

\displaystyle\bigwedge_{1\leq i\leq n}{\llbracket{q_{i}}\rrbracket^{db_{1}}}={% \llbracket{q_{i}}\rrbracket^{db_{2}}}\}

This interpretation of database queries as equivalence relations provides a direct connection between the DL and the LoI, where the lattice elements correspond to ${Q}_{\sim}$ , the ordering $\sqsubseteq$ to the determinacy order $\preceq$ , and join and meet follow the definitions of the DL.

Lemma 1.

For all $\mathcal{Q}$ , there is a complete lattice homomorphism from the Determinacy Lattice $DL(\mathcal{Q})$ to the Lattice of Information defined on $\{{Q}_{\sim}\mid Q\in DL(\mathcal{Q})\}$ .

We prove this Lemma in Appendix B. To the extent that we believe ${Q}_{\sim}$ to accurately represent the information conveyed by the queries in $Q$ , this lemma implies that joins and order comparisons can be performed in the DL without explicit reference to the LoI.

III-C Determinacy Quantale

We introduce a generalization of the Determinacy Lattice, called the Determinacy Quantale (DQ), to represent disjunctive dependencies. Our definition of the DQ is intended as a counterpart to the QoI [9], analogously to how the DL corresponds to the LoI. To achieve this, we define a query-set counterpart of the tiling closure operator to capture the disjunction of sets of queries. Since sets of queries correspond to LoI elements (equivalence relations), disjunctive QoI elements (sets of equivalence relations) will be represented as sets of sets of queries. Each set of queries in the outer set represents a possible combination of queries that does not reveal more information than is allowed by the disjunction.

Analogously to the QoI, the tiling closure of a set of sets of queries is defined by forming the downward closure under $\sqsubseteq$ (from the DL) of their mix. The query-set equivalent of the mix operator is defined on a set of sets of queries $\mathbb{Q}=\{Q_{1},...,Q_{n}\}$ such that $Q_{i}\in DL(\mathcal{Q})$ for $\ i=1,...n$ as follows:

\displaystyle\mathrm{mix}(\mathbb{Q})=\{P\in DL(\mathcal{Q})\mid x\in[{P}_{% \sim}]\Rightarrow(\exists Q\in\mathbb{Q}.x\in[{Q}_{\sim}])\}

where $[{Q}_{\sim}]$ denotes the equivalence classes of $Q$ as defined previously. We then define the tiling closure for a set $\mathbb{Q}$ of elements of the DL as $\mathrm{tc}(\mathbb{Q})={\Downarrow}\mathrm{mix}(\mathbb{Q})$ .

We then formally define the Determinacy Quantale $DQ(\mathcal{Q})$ as follows.

Definition 5.

Given a universe of queries $\mathcal{Q}$ , let $DL(\mathcal{Q})$ be the Determinacy Lattice defined on $\mathcal{Q}$ . The Determinacy Quantale $DQ(\mathcal{Q})$ is the quantale $\langle\mathcal{I},\sqsubseteq,\bigvee,\otimes,1\rangle$ , with:

•

$\mathcal{I}=\{\mathrm{tc}(\mathbb{Q})\mid\mathbb{Q}\subseteq DL(\mathcal{Q})\}$
•

$\bigvee_{i}\mathbb{P}_{i}=\mathrm{tc}(\bigcup_{i}\mathbb{P}_{i})$
•

$\mathbb{P}\otimes\mathbb{Q}=\mathrm{tc}\Big{(}\bigcup_{P\in\mathbb{P},Q\in% \mathbb{Q}}(P\sqcup Q)\Big{)}$
•

$\sqsubseteq=\subseteq$
•

$\top=DL(\mathcal{Q})$ , $\bot=\varnothing$ , $1=\varnothing$ ,

where $\mathbb{P},\mathbb{Q}\subseteq DL(\mathcal{Q})$ .

In Appendix C we show that Def. 5 satisfies the usual quantale axioms [9]. As with the DL and LoI, the DQ embeds into a QoI by a quantale homomorphism. This QoI is defined on sets of equivalence relations derived from sets of sets of queries by the following map:

Definition 6.

Given a set of sets of queries $\mathbb{Q}$ ,

\llbracket{\mathbb{Q}}\rrbracket=\{{Q}_{\sim}\mid Q\in\mathbb{Q}\}.

We can then formally state the relationship between the DQ and this quantale as follows.

Lemma 2.

For all $\mathcal{Q}$ , there is a quantale homomorphism from the Determinacy Quantale $DQ(\mathcal{Q})$ to the Quantale of Information defined on $\{\llbracket{\mathbb{Q}}\rrbracket\mid\mathbb{Q}\subseteq DL(\mathcal{Q})\}$ .

The proof of Lemma 2 is presented in Appendix D.

Example 4.

To illustrate the Determinacy Quantale in practice, consider Program 2, which issues either query $q1=\{(r,vl)\mid\exists s,n.\,\mathrm{emp}(n,r,s)\wedge r=\mathrm{Intern}\wedge vl% =s\}$ or $q2=\{(r,vl)\mid\exists s,n.\,\mathrm{emp}(n,r,s)\wedge r=\mathrm{CEO}\wedge vl% =n)\}$ to the database. Query q1 returns the $\mathrm{role}$ and $\mathrm{salary}$ columns of the entry in table $\mathrm{emp}$ if the role of that entry is $\mathrm{Intern}$ . Similarly, query q2 returns the $\mathrm{role}$ and $\mathrm{name}$ columns if the role of the entry in $\mathrm{emp}$ is $\mathrm{CEO}$ .

⬇

1if (y > 0) then

2 x

\leftarrow

3else

4 x

\leftarrow

5out(x, u);

Program 2:

Consider a policy defined on queries $v1=\{(r,n)\mid\exists s.\,\mathrm{emp}(n,r,s)\}$ and $v2=\{(r,s)\mid\exists n.\,\mathrm{emp}(n,r,s)\}$ . v1 and v2, which respectively project on the $\mathrm{name}$ and $\mathrm{role}$ , and the $\mathrm{role}$ and $\mathrm{salary}$ columns of $\mathrm{emp}$ , are used in defining the disjunctive security policy $v1\vee v2$ .

For this example, we assume a database that has only one row in the $\mathrm{emp}$ table, and we also limit the domain of possible roles to $\{\mathrm{CEO},\mathrm{Intern}\}$ . These limitations are necessary in order to have a finite representation of the potential query sets and enables us to effectively depict the sets produced by the $\mathrm{mix}$ and $\mathrm{tc}$ operators.

Program 2 depicts a disjunction that – ignoring variable y – depends either on $q1$ or $q2$ (i.e., $q1\vee q2$ ), which on the DQ can be represented as a point $\mathrm{tc}({\downarrow}\{q1\})\vee\mathrm{tc}({\downarrow}\{q2\})$ . Similarly, the policy $v1\vee v2$ can be represented on the DQ by $\mathrm{tc}({\downarrow}\{v1\})\vee\mathrm{tc}({\downarrow}\{v2\})$ .

Illustrating this point requires calculating the $\mathrm{mix}$ set of v1 and v2, which includes all sets of queries whose equivalence relation can be constructed from the equivalence classes of ${{\downarrow}\{v1\}}_{\sim}$ and ${{\downarrow}\{v2\}}_{\sim}$ . Unfortunately, for any sufficiently rich query language, our definition of $\mathrm{mix}$ inevitably yields an infinite set, as infinitely many queries that are “morally equivalent” or even the same up to renaming variables represent the same knowledge set. To compactly represent such infinite sets, we will pick just one representative, and define

\displaystyle hc(\mathbb{Q})=\{Q^{\prime}\mid\exists Q\in\mathbb{Q}.\ {Q}_{% \sim}={Q^{\prime}}_{\sim}\}

as a closure operator that adds all equivalent queries. Then $\mathrm{mix}\big{(}\{{\downarrow}\{v1\},{\downarrow}\{v2\}\}\big{)}$ will be the set $hc(\{{\downarrow}\{v1\}$ , ${\downarrow}\{v2\}$ , ${\downarrow}\{p1\}$ , ${\downarrow}\{p2\}\})$ , where $p1=\{(r,vl)\mid\big{(}\exists s,n.\,\mathrm{emp}(n,r,s)\wedge r=\mathrm{Intern% }\wedge vl=s\big{)}\vee\big{(}\exists s,n.\,\mathrm{emp}(n,r,s)\wedge r=% \mathrm{CEO}\wedge vl=n\big{)}\}$ and $p2=\{(r,vl)\mid\big{(}\exists s,n.\,\mathrm{emp}(n,r,s)\wedge r=\mathrm{CEO}% \wedge vl=s\big{)}\vee\big{(}\exists s,n.\,\mathrm{emp}(n,r,s)\wedge r=\mathrm% {Intern}\wedge vl=n\big{)}\}$ .

Therefore, we can depict the policy as the point ${\Downarrow}(hc(\{{\downarrow}\{v1\}$ , ${\downarrow}\{v2\}$ , ${\downarrow}\{p1\}$ , ${\downarrow}\{p2\}\}))$ on the DQ. Similarly, the DQ point of the Program 2 (i.e., $\mathrm{tc}({\downarrow}\{q1\})\vee\mathrm{tc}({\downarrow}\{q2\})$ ), can also be depicted by the point ${\Downarrow}hc(\{{\downarrow}\{p1\}\})$ on the DQ. We illustrate the part of the DQ which includes these points in Fig. 4, and as it is evident from the figure, conclude that Program 2 is inline with the policy.

Figure 4: A portion of the DQ for queries q1, q2, v1, v2

IV Security Framework

Drawing on the quantale model of dependencies for programs and databases, we develop an extensional condition that defines security for programs that interact with databases and support disjunctive security policies. We will later use the security condition to prove soundness of enforcement mechanisms in Section V. Specifically, we formalize the syntax and semantics of a simple imperative language with database queries. Programs read the input from the database via queries, while users receive the output through predefined output channels. We define (disjunctive) security policies as views over the database and interpret them end-to-end. We then use this model to define a knowledge-based security condition for our setting.

IV-A Language

Syntax. The syntax for the commands of our language as depicted in Fig. 5, primarily consists of standard commands such as assignment, conditionals, and loops. The command $\texttt{out}(e,u)$ outputs the result of evaluating expression $e$ to user $u\in\mathcal{U}$ . The command $x\leftarrow q$ issues the query $q$ to the database and stores the result in variable $x$ . For modeling the queries, we rely on conjunctive queries with comparison introduced in Section V-A.

Expressions $e$ can be variables $x\in\mathrm{Vars}$ , values (integers) $n\in\mathrm{Val}$ , binary operations $e_{1}\oplus e_{2}$ , single tuples $tp\in\mathrm{Val}$ , and set of tuples $\overline{tp}\in\mathrm{Val}$ . For simplicity, we do not provide de-constructors for database tuples.

$\begin{array}[]{ll}c:=&\texttt{skip}\ \mid\ \texttt{if}\ e\ \texttt{then}\ c_{% 1}\ \texttt{else}\ c_{2}\ \mid\\ &x\leftarrow q\ \mid\ x:=e\ \mid\ c_{1};c_{2}\ \mid\\ &\texttt{while}\ e\ \texttt{do}\ c\ \mid\ \texttt{out}(e,u)\\ \end{array}$

Figure 5: Language syntax

Semantics. As discussed in Section III-C, a database state (or simply state) $db\in\Omega_{D}$ is defined with respect to a schema $D$ and a finite set of integrity constraints. A configuration $\langle c,m,db\rangle$ consists of a command $c$ , a memory $m=\mathrm{Var}\rightarrow\mathrm{Val}$ map** variables to values, and a state $db$ .

The semantics of expressions is mostly standard and its rules are presented in Fig. 6. We use judgments of the form $\langle e,m,db\rangle\downarrow vl$ to denote that an expression $e$ evaluates to value $vl$ in memory $m$ and state $db$ . For simplicity, we refrain from defining binary operations on tuples, unless the underlying database query is boolean.

We use judgments of the form $\langle c,m,db\rangle\xrightarrow{\alpha}\langle c^{\prime},m^{\prime},db^{% \prime}\rangle$ to denote that a configuration $\langle c,m,db\rangle$ in one step evaluates to memory $m^{\prime}$ and state $db^{\prime}$ and (possibly) produces an observation $\alpha\in\mathrm{Obs}$ ; we write $\epsilon$ whenever a command produces no observation. We write $m[x\mapsto vl]$ to denote a memory $m$ with variable $x$ assigned the value $vl$ .

Fig. 7 provides the semantic rules for commands. The query evaluation rule QueryEval is similar to assignment as it evaluates a query $q$ into state $db$ and stores the result in the variable $x$ . We use the command $\texttt{out}(e,u)$ to produce an observation. Formally, an observation $\alpha\in\mathrm{Obs}$ is a tuple $\langle o,u\rangle$ , where $u\in\mathcal{U}$ is the identifier of the user observing the output and $o$ is the result of evaluating expression $e$ , which is either a simple value or the result set of a non-boolean query.

We write $\langle c,m,db\rangle\xRightarrow{\tau}\negthickspace_{u}\langle c^{\prime},m^% {\prime},db^{\prime}\rangle$ to denote when $\langle c,m,db\rangle$ takes one or more steps to reach configuration $\langle c^{\prime},m^{\prime},db^{\prime}\rangle$ while producing the trace (sequence of observations) $\tau\in\mathrm{Obs}^{\ast}$ . We omit the final configuration whenever it is irrelevant and write $\langle c,m,db\rangle\xRightarrow{\tau}\negthickspace_{u}$ .

$\inferrule*[before=\textsc{Int}]{\\ }{\langle n,m,db\rangle\downarrow n}$ $\inferrule*[before=\textsc{Tuple}]{\\ }{\langle tp,m,db\rangle\downarrow tp}$ $\inferrule*[before=\textsc{TupleSet}]{\\ }{\langle\overline{tp},m,db\rangle\downarrow\overline{tp}}$ $\inferrule*[before=\textsc{Var}]{vl=m(x)}{\langle x,m,db\rangle\downarrow vl}$

$\inferrule*[before=\textsc{Op}]{\langle e_{1},m,db\rangle\downarrow n_{1}\\ \langle e_{1},m,db\rangle\downarrow n_{2}\\ n=n_{1}\oplus n_{2}}{\langle e_{1}\oplus e_{2},m,db\rangle\downarrow n}$

Figure 6: Semantic rules for expressions

$\inferrule*[before=\textsc{Skip}]{\\ }{\langle\texttt{skip},m,db\rangle\xrightarrow{\epsilon}\langle\epsilon,m,db\rangle}$ $\inferrule*[before=\textsc{Assign}]{\langle e,m,db\rangle\downarrow vl\\ m^{\prime}=m[x\mapsto vl]}{\langle x:=e,m,db\rangle\xrightarrow{\epsilon}% \langle\epsilon,m^{\prime},db\rangle}$ $\inferrule*[before=\textsc{QueryEval}]{vl={\llbracket{q}\rrbracket^{db}}\\ m^{\prime}=m[x\mapsto vl]}{\langle x\leftarrow q,m,db\rangle\xrightarrow{% \epsilon}\langle\epsilon,m^{\prime},db\rangle}$

$\inferrule*[before=\textsc{IfTrue}]{\langle e,m,db\rangle\downarrow n\\ n\not=0}{\langle\texttt{if}\ e\ \texttt{then}\ c_{1}\ \texttt{else}\ c_{2},m,% db\rangle\xrightarrow{\epsilon}\langle c_{1},m,db\rangle}$ $\inferrule*[before=\textsc{IfFalse}]{\langle e,m,db\rangle\downarrow n\\ n=0}{\langle\texttt{if}\ e\ \texttt{then}\ c_{1}\ \texttt{else}\ c_{2},m,db% \rangle\xrightarrow{\epsilon}\langle c_{2},m,db\rangle}$

$\inferrule*[before=\textsc{WhileTrue}]{\langle e,m,db\rangle\downarrow n\\ n\not=0}{\langle\texttt{while}\ e\ \texttt{do}\ c,m,db\rangle\xrightarrow{% \epsilon}\langle c;\texttt{while}\ e\ \texttt{do}\ c,m,db\rangle}$ $\inferrule*[before=\textsc{WhileFalse}]{\langle e,m,db\rangle\downarrow n\\ n=0}{\langle\texttt{while}\ e\ \texttt{do}\ c,m,db\rangle\xrightarrow{\epsilon% }\langle\epsilon,m,db\rangle}$

$\inferrule*[before=\textsc{Seq}]{\langle c_{1},m,db\rangle\xrightarrow{\alpha}% \langle c_{1}^{\prime},m^{\prime},db^{\prime}\rangle\\ }{\langle c_{1};c_{2},m,db\rangle\xrightarrow{\alpha}\langle c_{1}^{\prime};c_% {2},m^{\prime},db^{\prime}\rangle}$ $\inferrule*[before=\textsc{SeqEmpty}]{\\ }{\langle\epsilon;c,m,db\rangle\xrightarrow{\epsilon}\langle c,m,db\rangle}$ $\inferrule*[before=\textsc{Output}]{\langle e,m,db\rangle\downarrow vl}{% \langle\texttt{out}(e,u),m,db\rangle\xrightarrow{\langle vl,u\rangle}\langle% \epsilon,m,db\rangle}$

Figure 7: Semantics rules for commands

IV-B Security Model

We now introduce our knowledge-based security model for disjunctive security policies. For simplicity, we denote the initial program memory by $m_{0}$ and assume it is fixed and public to all users, hence the only way to input sensitive information is through database queries. Users make observations through output channels, hence their knowledge of the database is determined by what they can infer based on these observations. This model induces standard equivalence relations for database states and observation traces.

Database state equivalence. Two states $db$ and $db^{\prime}$ are equivalent with respect to a set of tables and views $V$ , written as $db\approx_{V}db^{\prime}$ , iff all tables and views in $V$ have identical contents in $db$ and $db^{\prime}$ . Formally, states $db$ and $db^{\prime}$ are equivalent with respect to $V$ iff for all view $v\in V,\ {\llbracket{v.q}\rrbracket^{db}}={\llbracket{v.q}\rrbracket^{db^{% \prime}}}$ and for all table $t\in V,\ {\llbracket{t}\rrbracket^{db}}={\llbracket{t}\rrbracket^{db^{\prime}}}$ . A set of tables and views $V$ induces an equivalence relation, and for a state $db$ , the equivalence class $[db]_{V}$ contains all states that are equivalent to $db$ with respect to $V$ .

Trace equivalence. We use trace projection to define trace equivalence. The projection of a trace $\tau$ for user $u$ written as $\tau\negthickspace\downharpoonright_{u}$ is the sequence of all observations in $\tau$ that $u$ can observe. Traces $\tau_{1}$ and $\tau_{2}$ are equivalent with respect to user $u$ , written as $\tau_{1}\approx_{u}\tau_{2}$ , iff the projection of one of them to $u$ is the prefix of the other, i.e., $\tau_{1}\negthickspace\downharpoonright_{u}\ \preceq\tau_{2}\negthickspace% \downharpoonright_{u}$ or $\tau_{1}\negthickspace\downharpoonright_{u}\ \succeq\tau_{2}\negthickspace% \downharpoonright_{u}$ .

Equivalence of trace prefixes is a standard technicality needed to ignore leaks due to program’s progress/termination [21], and here we adapt a definition of trace equivalence which does not differentiate between program divergence and termination [14].

User knowledge. When executing a program $\mathrm{prg}$ , we assume memory is always initially in the all-zero state $m_{0}$ . Thus, we can view a program’s execution for any user as a function from database $db$ to user-observable output traces, $\tau_{\mathrm{prg},u}(db)=\tau\negthickspace\downharpoonright_{u}$ when $\langle\mathrm{prg},m_{0},db\rangle\xRightarrow{\tau}\negthickspace_{u}$ . This function induces an equivalence relation on databases, $\llbracket\mathrm{prg}\rrbracket_{u}={\sim_{\tau_{\mathrm{prg},u}}}$ , which characterizes the knowledge of $db$ conveyed by the output of $\mathrm{prg}$ to $u$ .

Security policy. A security policy is a list of user policies (written as $P_{u}$ ) for each user $u\in\mathcal{U}$ . User policies are defined as views and table identifiers over a database schema, and determine what a user $u$ is allowed to observe. Fig. 8 presents the syntax of disjunctive policies for our model. They are defined as a set of sets in order to represent a disjunction of conjunctions of simpler policies. A conjunction $\mathrm{con}$ is a set of view $v$ and table $t$ identifiers, and a disjunction $\mathrm{dis}$ is a set of conjunctions. For example, the policy $P_{u}$ for user $u$ who is allowed to see table $t_{1}$ and view $v_{1}$ , or view $v_{2}$ but not both, is defined as $P_{u}=\{\{t_{1},v_{1}\},\{v_{2}\}\}$ .

The overall policy of the system, written as $P$ , is the list of user policies. Per Def. 6, the policy $P_{u}$ can be represented semantically as an element $\llbracket P_{u}\rrbracket$ of the Quantale of Information. Thus, we can formulate our security condition as the assertion that the knowledge of the database that the execution of the program $\mathrm{prg}$ conveys to $u$ is bounded above by the disjunctive knowledge allowed by the policy, $\llbracket P_{u}\rrbracket$ .

Definition 7.

The program $\mathrm{prg}$ is secure for the user $u$ and policy $P_{u}$ if $\llbracket\mathrm{prg}\rrbracket_{u}\sqsubseteq\llbracket P_{u}\rrbracket$ .

$\begin{array}[]{rcl}\mathrm{con}&:=&\{v\}\mid\{t\}\mid\mathrm{con}_{1}\cup% \mathrm{con}_{2}\\ \mathrm{dis}&:=&\{\mathrm{con}\}\mid\mathrm{dis}_{1}\cup\mathrm{dis}_{2}\\ P_{u}&:=&\mathrm{dis}\\ \end{array}$

Figure 8: Syntax of user policy

V Enforcement of Disjunctive Policies

Having formulated the security condition, we would like to prove that useful programs satisfy it. To this end, we introduce a sound static enforcement mechanism, which imposes some structural limitations on the policy and trades off some completeness for the sake of efficiency and ease of analysis.

Fig. 9 illustrates how our mechanism functions at a high level. We assume as input a program and policy in the format described in Fig. 5 and Fig. 8 respectively. The program is then subjected to a static dependency analysis (Section V-B), which computes an overapproximate set of possible paths of control flow through the program, along with the queries (dependencies) retrieved for each path, giving an element of the DQ, that is a (disjunctive) set of (conjunctive) sets of queries. Per Fig. 8, the policy is also already given in this format.

We would like to verify that the program dependencies are bounded by the policy in the DQ, as by Lemma 2, this entails the security condition (Def. 7) that the disjunctive information that is revealed by the program is bounded above by the QoI interpretation of the policy. However, checking DQ ordering on general queries may be computationally costly. We therefore abstract (Section V-C) both the policy and the path dependencies into a more tractable format (symbolic tuples), which again overapproximates the information they can retrieve. To guarantee soundness, we require that the views in the policy are such that this abstraction is lossless for them. Finally, as the security check (Section V-D), we compute a tractable comparison on sets of sets of symbolic tuples that can be shown to imply DQ ordering.

Figure 9: Enforcement steps

V-A Conjunctive Queries

While our theoretical definitions are based on the fully-general domain relational calculus as a query language, to avoid complexity, our enforcement mechanism will work with a restricted subset called conjunctive queries with comparisons (CQCs). This language is a subset of relational calculus that only employs conjunction ( $\wedge$ ) and existential quantification ( $\exists$ ) and omits disjunction ( $\vee$ ), negation ( $\neg$ ), and universal quantification ( $\forall$ ). CQCs can model SELECT-FROM-WHERE portion of SQL, where there are only AND and comparisons in the WHERE clause.

Our language for (non-boolean) CQC $q$ over a database schema $D$ employs the standard notation [18, 20], and has the form $\emph{heading}\leftarrow\emph{body}$ :

\displaystyle\mathrm{ans}(\overline{y})\leftarrow R_{1}(\overline{x}_{1}),...,% R_{n}(\overline{x}_{n}),C_{1},...,C_{m}

where $R_{1},...,R_{n}$ are relations in $D$ , and $\overline{x}_{1},...,\overline{x}_{n}$ are their variables. We use $\mathrm{Var}(q)=\overline{x}_{1}\cup...\cup\overline{x}_{n}$ to denote the set of variables appearing in the body of the query $q$ . $C_{1},...,C_{m}$ are formulae of the form $x_{i}\oplus x_{j}$ where $\oplus$ is the comparison operator which could be anything from $<,\leq,=,\not=,>,\geq$ and $x_{i}$ and $x_{j}$ are either variables in $\mathrm{Var}(q)$ or constants.

We require that $\overline{y}\subseteq\mathrm{Var}(q)$ . Without loss of generality, we assume that there are no self-joins in the query. In case of queries with self-joins, we can make logical copies of the relations to accommodate them [20]. The body of a CQC $q$ comprises two parts, namely the relation identifiers $R_{1},...,R_{n}$ referred to as $\mathrm{ids}(q)$ , and the conditions $C_{1},...,C_{m}$ denoted by $\mathrm{cnd}(q)$ .

Similarly to Section III-A, the evaluation of $q$ on the database state $db$ (denoted by ${\llbracket{q}\rrbracket^{db}}$ ) is defined by taking all tuples in the cartesian product of $\mathrm{ids}(q)$ in $db$ that satisfy $\mathrm{cnd}(q)$ , and projecting to the column set $\overline{y}$ .

Example 5.

Consider the database schema in Fig. 2. The following query returns a set of tuples containing the names of divisions whose managers have a salary of more than $50$ :

\displaystyle\mathrm{ans}(d)\leftarrow\mathrm{emp}(n,r,s),\mathrm{mng}(d,m),n=% m,s>50

V-B Type-based Dependency Analysis

Our static dependency analysis builds on the generic type system of van Delft et al. [15] and extends it with support for disjunctive dependencies. We intuitively expect that a disjunctive dependency analysis must be path-sensitive, so as to distinguish between different executions and also keep track of the history of observations. Both of these requirements are often challenging for type-based analyses, which do not naturally align with the execution order. We will first illustrate these challenges with examples and then present our analysis.

⬇

1if (y > 0) then

2 x := w + z;

3else

4 x := x + 1;

5out(x,u);

Program 3:

⬇

1if (z == 0) then

2 x

\leftarrow

q1;

3else

4 x

\leftarrow

q2;

5out(x,u);

6if (z != 0) then

7 x

\leftarrow

q1;

8else

9 x

\leftarrow

q2;

10out(x,u);

Program 4:

Program 3 illustrates the need for path sensitivity. The analysis should distinguish between the then branch, where variable $x$ depends on the set $\{y,w,z\}$ , and the else branch where $x$ depends on $\{y,x\}$ . Our reference analysis [15] would join these two sets at the end of the if statement, ultimately yielding the dependency set $\{x,y,w,z\}$ . In our analysis, these sets are never joined, but instead combined to form a set of sets, namely, $\{\{y,w,z\},\{y,x\}\}$ , where the outer set represents a disjunctive dependency and the inner sets represent conjunctive dependency.

Program 4 illustrates the need to keep track of the observation history. It outputs $x$ at lines $5$ and $10$ , and the dependency set of $x$ in both places is $\{\{q1,z\},\{q2,z\}\}$ . However, this program will always output both $q1$ and $q2$ . Now, if a policy only allows user $u$ to see either query $q1$ or $q2$ , the outputs at lines $5$ and $10$ will be incorrectly accepted. Hence, the analysis should account for all outputs to user $u$ .

Fig. 10 depicts the rules of our disjunctive dependency analysis. We use judgments of the form $\vdash c:\Gamma$ , where $\Gamma$ is an environment map** variables $\mathrm{Var}$ to set of sets of dependencies $\mathrm{Dep}$ . The set of variables is $\mathrm{Var}=PV\ \cup\ \mathcal{U}\ \cup\ \{pc\}$ , where $PV$ are program variables, $\mathcal{U}$ are users, and $pc$ is the program context. The dependencies $\mathrm{Dep}$ are $\mathrm{Dep}=\mathrm{Var}\ \cup\ \mathcal{Q}$ , where $\mathrm{Var}$ are variables and $\mathcal{Q}$ are queries that can be issued to a database. We use $u\in\mathcal{U}$ to indicate the dependencies of all outputs to user $u$ .

We start by introducing the operators and auxiliary functions employed within the rules, and then proceed to explain the rules themselves. The operator $\otimes$ is used to join two (or more) sets of sets, defined as:

	$\displaystyle\Gamma_{1}(x_{1})\otimes...\otimes\Gamma_{n}(x_{n})=\{S_{1}\cup..% .\cup S_{n}\mid\$	$\displaystyle S_{i}\in\Gamma_{i}(x_{i})$
		$\displaystyle i=1,\dots,n\}$

For example, the join of $\Gamma_{1}(x)=\{\{x,y\},\{z,y\}\}$ and $\Gamma_{2}(y)=\{\{w\},\{x,z\}\}$ is:

\displaystyle\Gamma_{1}(x)\otimes\Gamma_{2}(y)=\{\{x,y,w\},\{x,y,z\},\{z,y,w\}\}

Intuitively, the result of the join operator is a set of sets capturing the product of the original sets of sets under the set union operation. We use this operator to calculate all the possible combinations of two environments.

$\Gamma_{2};\Gamma_{1}$ represents the sequential composition of two environments. Intuitively, $\Gamma_{2};\Gamma_{1}$ is the same as $\Gamma_{2}$ but updated with all of the dependencies that have been previously established in $\Gamma_{1}$ . Formally:

\displaystyle\Gamma_{2};\Gamma_{1}(x)=\bigcup\limits_{S_{2}\in\Gamma_{2}(x)}\ % \bigotimes\limits_{y\in S_{2}}\ \Gamma_{1}(y)

For example, the sequential composition of the environments

	$\displaystyle\Gamma_{1}=[$	$\displaystyle x\mapsto\{\{x\},\{y\}\},y\mapsto\{\{y\}\},pc\mapsto\{\{y,pc\}\}]$
	$\displaystyle\Gamma_{2}=[$	$\displaystyle x\mapsto\{\{pc,x\}\},y\mapsto\{\{pc,y\}\},pc\mapsto\{\{pc\}\}]$

evaluates to

	$\displaystyle\Gamma_{2};\Gamma_{1}=[$	$\displaystyle x\mapsto\{\{x,y,pc\},\{y,pc\}\},y\mapsto\{\{pc,y\}\},$
		$\displaystyle pc\mapsto\{\{y,pc\}\}]$

Finally, the operator $\Cup$ calculates the union of two environments: $\Gamma_{1}\Cup\Gamma_{2}=\forall x\in\mathrm{Var},\ \Gamma_{1}(x)\cup\Gamma_{2% }(x)$ . This operator is used in conditionals to capture the disjunctive join of the two branches. For example, in line 5 in Program 3, $\Gamma_{1}(x)=\{\{y,w,z\}\}$ and $\Gamma_{2}(x)=\{\{y,x\}\}$ , and the result of $(\Gamma_{1}\Cup\Gamma_{2})(x)$ would be $\{\{y,w,z\},\{y,x\}\}$ .

For loops, we rely on the fixed-point of $\Gamma$ , denoted by $\Gamma^{*}$ , which we define as:

\displaystyle\Gamma^{*}=\bigcup\limits_{n>0}\Gamma^{n}

where $\Gamma^{0}=\Gamma_{id}$ and $\Gamma^{n+1}=\Gamma^{n};\Gamma$ .

In these rules, $\Gamma_{id}$ is the identity environment, defined as $\forall x\in\mathrm{Var},\ \Gamma_{id}(x)=\{\{x\}\}$ , and $fv(e)$ denotes the free variables of expression $e$ .

$\inferrule*[before=\textsc{T-Skip}]{\\ }{\vdash\texttt{skip}:\Gamma_{id}}$ $\inferrule*[before=\textsc{T-Assign}]{\Gamma=\Gamma_{id}[x\mapsto\{fv(e)\cup\{% pc\}\}]}{\vdash x:=e:\Gamma}$ $\inferrule*[before=\textsc{T-Output}]{\Gamma^{\prime}=\Gamma_{id}[u\mapsto\{fv% (e)\cup\{pc,u\}\}]}{\vdash\texttt{out}(e,u):\Gamma^{\prime}}$

$\inferrule*[before=\textsc{T-QueryEval}]{\Gamma=\Gamma_{id}[x\mapsto\{\{q,pc\}% \}]}{\vdash x\leftarrow q:\Gamma}$ $\inferrule*[before=\textsc{T-If}]{\vdash c_{i}:\Gamma_{i}\\ \Gamma^{\prime}_{i}=\Gamma_{i};\Gamma_{id}[pc\mapsto\{fv(e)\cup\{pc\}\}]\ i=1,% 2\\ \Gamma^{\prime}=(\Gamma^{\prime}_{1}\Cup\Gamma^{\prime}_{2})[pc\mapsto\{\{pc\}% \}]}{\vdash\texttt{if}\ e\ \texttt{then}\ c_{1}\ \texttt{else}\ c_{2}:\Gamma^{% \prime}}$

$\inferrule*[before=\textsc{T-While}]{\vdash c:\Gamma_{c}\\ \Gamma_{f}=(\Gamma_{c};\Gamma_{id}[pc\mapsto\{fv(e)\cup\{pc\}\}])^{*}\\ \Gamma^{\prime}=\Gamma_{f}[pc\mapsto\{\{pc\}\}]}{\vdash\texttt{while}\ e\ % \texttt{do}\ c:\Gamma^{\prime}}$ $\inferrule*[before=\textsc{T-Seq}]{\vdash c_{1}:\Gamma_{1}\\ \vdash c_{2}:\Gamma_{2}\\ \Gamma^{\prime}=\Gamma_{2};\Gamma_{1}}{\vdash c_{1};c_{2}:\Gamma^{\prime}}$

Figure 10: Type-based dependency analysis rules

T-Assign updates the dependency set of the assigned variable $x$ to the set of the free variables of expression $e$ and $pc$ , otherwise it matches the identity environment. Rule T-QueryEval is similar to assignment, except that instead of $fv(e)$ , it adds query $q$ to the dependency set.

T-If sequentially composes the dependency sets of each branch with the environment $\Gamma_{id}[pc\mapsto\{fv(e)\cup\{pc\}\}]$ , thus adding variables of the branch condition to the dependency set of each branch. Finally, these environments ( $\Gamma_{1}$ and $\Gamma_{2}$ ) are joined disjunctively using the $\Cup$ operator.

T-While uses the fixed-point operator to calculate the dependency set of the loop. To do so, it first calculates the dependency set of the loop body, which is sequentially composed with $\Gamma_{id}[pc\mapsto\{fv(e)\cup\{pc\}\}]$ to account for the dependencies to the loop condition. Finally, the fixed-point operator computes the dependency set of the while loop.

T-Output relies on the dependency set including $fv(e)$ , $\{pc\}$ and $\{u\}$ , where $fv(e)$ includes all the variables of the expression outputted to user $u$ , $\{pc\}$ captures the implicit dependencies to the path conditions, and $\{u\}$ is the dependency set of user $u$ and captures the history of dependencies that user $u$ might have observed up to this point. Observe that by the definition of sequential composition, all the dependencies of the previous outputs will be added to $u$ .

This analysis yields a final environment $\Gamma_{\mathrm{fin}}$ . The result of the analysis is the value of this environment for the user identifier $u$ , which includes both queries and program variables. Since program variables do not contain sensitive information, and we are primarily concerned with queries, we refine the result of $\Gamma_{\mathrm{fin}}(u)$ to only include queries. This refined outcome defines the ultimate result of our analysis, denoted as $\mathrm{QL}_{u}$ :

\displaystyle\mathrm{QL}_{u}\triangleq\bigcup_{S\in\Gamma_{\mathrm{fin}}(u)}\{% S\cap\mathcal{Q}\}

The soundness proof of our enforcement relies on the circumstance that, if the set of queries on which the $u$ -outputs of $\mathrm{prg}$ depend when running on a database state $db$ are denoted by $Q_{\mathrm{prg},u}(db)$ , then this set is guaranteed to be found in the set $\mathrm{QL}_{u}$ produced by the dependency analysis. We show how to define $Q_{\mathrm{prg},u}(db)$ using a taint-tracking semantics presented in Appendix E. Formally, this gives rise to the following soundness condition for the dependency analysis.

Lemma 3.

For all $db\in\Omega_{D}$ , $Q_{\mathrm{prg},u}(db)\in\mathrm{QL}_{u}(\mathrm{prg})$ .

V-C Query Abstraction

Even for CQCs, comparing the information revealed by sets of queries is hard in general. To define a well-behaved and more tractable determinacy order on which to build our DQ, we introduce another overapproximating abstraction, which we will use to soundly label queries and policies.

We define a symbolic tuple as $\langle T,\phi,\pi\rangle$ , where $T=\{t_{1},t_{2}...,t_{n}\}$ is a set of table identifiers, $\phi$ is a boolean combination of equality, inequality, and comparisons over the columns of the tables in $T$ , and $\pi$ is a subset of the columns of the tables in $T$ . In a symbolic tuple, $\pi$ denotes the query’s projection on the columns of the tables in $T$ , and $\phi$ defines the constraints over the rows.

Example 6.

The symbolic tuple of query $\mathrm{ans}(d)\leftarrow\mathrm{emp}(n,r,s),\mathrm{mng}(d,m),n=m,s>50$ defined on the relations of Fig. 2 would be $\langle\{\mathrm{emp},\mathrm{mng}\},s>50\wedge n=m,\{d\}\rangle$ .

While calculating the exact set of symbolic tuples of a relational calculus query is intractable for many classes of queries, it is tractable for conjunctive queries with comparison (CQC). Given a conjunctive query $q=\mathrm{ans}(\overline{y})\leftarrow R_{1}(\overline{x}_{1}),...,R_{n}(% \overline{x}_{n}),C_{1},...,C_{m}$ , the function $\mathrm{sts}$ computes a symbolic tuple from $q$ as follows:

\displaystyle\mathrm{sts}(q)=\langle\mathrm{ids}(q^{\prime}),\big{(}\bigwedge_% {C\in\mathrm{cnd}(q^{\prime})}C\ \big{)},\overline{y}\rangle

where $\mathrm{ids}(q^{\prime})$ and $\mathrm{cnd}(q^{\prime})$ defined in Section V-A return the relation identifiers and conditionals of $q^{\prime}$ , respectively. Here, $q^{\prime}$ is the query obtained by recursively replacing views with their definitions. We lift this definition to sets of queries $Q$ , and define $\mathrm{sts}(Q)$ as $\{\bigcup_{q\in Q}\mathrm{sts}(q)\}$ .

Using $\mathrm{sts}$ , we define the function $\sigma_{\mathrm{st}}$ for a set of sets of queries $\mathbb{Q}$ as follows:

\displaystyle\sigma_{\mathrm{st}}(\mathbb{Q})=\{\mathrm{sts}(Q)\mid Q\in% \mathbb{Q}\}

Policy Analysis. The function $\sigma_{st}$ can also be used to map a disjunctive security policy to a set of labels. However, in order to ensure soundness and avoid approximation, we place some constraints on policies. (1) To make computing the set of symbolic tuples tractable we only support policies with views in the CQC form. (2) We require that the symbolic tuples of views be well-formed, which we define as:

Definition 8.

The symbolic tuple $\langle T,\phi,\pi\rangle$ is said to be well-formed if it satisfies $\mathrm{dep}(\phi)\subseteq\pi$ .

where $\phi=C_{1}\wedge...\wedge C_{n}$ and $\mathrm{dep}(\phi)=\bigcup_{i\in\{1,...,n\}}fv(C_{i})$ returns the column dependency set of $\phi$ .

Well-formedness ensures that the symbolic tuples are precise, at the expense of limiting a view to only applying constrains on the columns which it projects on.

Furthermore, we treat the table identifiers used in policies as special views that return the whole table. For instance, a policy which allows access to table $\mathrm{emp}$ can be rewritten as view $\mathrm{ans}(n,r,s)\leftarrow\mathrm{emp}(n,r,s)$ .

As discussed in Section IV, the disjunctive security policy of user $u$ (written as $P_{u}$ ) is a set of conjunctions $\mathrm{con}$ , interpreted as a disjunction of conjunctions of table and view identifiers. For a policy $P_{u}$ that adheres to the constraints mentioned earlier, $\sigma_{\mathrm{st}}$ is defined as follows:

\displaystyle\sigma_{\mathrm{st}}(P_{u})=\{\mathrm{sts}(\mathrm{con})\mid% \mathrm{con}\in P_{u}\}

Labels. In our model, a security label $\ell$ is defined as a set of symbolic tuples, and we define the ordering relation of two labels, written as $\ell_{1}\sqsubseteq_{\mathrm{st}}\ell_{2}$ , as follows:

Definition 9.

$\ell_{1}\sqsubseteq_{\mathrm{st}}\ell_{2}$ iff for all symbolic tuples $\langle T,\phi,\pi\rangle\in\ell_{1}$ , there are well-formed symbolic tuples $\langle T_{1},\phi_{1},\pi_{1}\rangle,...,\langle T_{n},\phi_{n},\pi_{n}\rangle$ in $\ell_{2}$ such that $T\subseteq(T_{1}\cup...\cup T_{n})$ , $T_{1},...,T_{n}$ are disjoint, $\phi\models(\phi_{1}\wedge...\wedge\phi_{n})$ , and $dep(\phi)\cup\pi\subseteq(\pi_{1}\cup...\cup\pi_{n})$ .

To ensure soundness, we assume that all of the symbolic tuples in the right hand side of $\sqsubseteq_{\mathrm{st}}$ are well-formed. This definition relies on entailment to check the ordering of $\phi$ , and write $\phi_{1}\models\phi_{2}$ which means that any assignment that satisfies $\phi_{1}$ also satisfies $\phi_{2}$ .

Example 7.

Consider symbolic tuples $\ell_{1}=\{\langle\{\mathrm{emp}\},s=10,\{r\}\rangle\}$ and $\ell_{2}=\{\langle\{\mathrm{emp},\mathrm{mng}\},s>5,\{r,s,m\}\rangle\}$ . We have $\ell_{1}\sqsubseteq_{\mathrm{st}}\ell_{2}$ since $\{\mathrm{emp}\}\subseteq\{\mathrm{emp},\mathrm{mng}\}$ , $\{r\}\subseteq\{r,s,m\}$ , $s=10\models s>5$ and $\{s\}\cup\{r\}\subseteq\{r,s,m\}$ .

V-D Enforcement

The dependency analysis of Section V-B extracts the dependencies of program $\mathrm{prg}$ ’s outputs to user $u$ and produces $\mathrm{QL}_{u}$ . Applying $\sigma_{\mathrm{st}}$ to $\mathrm{QL}_{u}$ yields a set of labels, each bounding the information revealed in some path, the $u$ -knowledge of $\mathrm{prg}$ (denoted by $k(\mathrm{prg})_{u}$ ). We interpret this as a disjunction, as any execution follows along one particular path.

Similarly, applying $\sigma_{\mathrm{st}}$ to the disjunctive security policy of user $u$ (i.e., $P_{u}$ ) results in a set of labels. Each label faithfully captures one conjunction, and so the policy is also represented as a set of labels $ak(P_{u})$ , interpreted disjunctively.

By Lemma 2, to verify that the security condition is satisfied, it is sufficient to establish that $\mathrm{QL}_{u}\sqsubseteq P_{u}$ in the DQ. However, checking $\sqsubseteq$ in the DQ is not generally tractable. For the security check, we therefore instead perform a twofold approximation: we check ordering for the conjunctive inner sets using the approximate ordering $\sqsubseteq_{\mathrm{st}}$ , and approximate the mix-based ordering on the disjunctive outer sets in a way that loses little relative to our analysis:

Definition 10.

We say that $k(\mathrm{prg})_{u}\sqsubseteq_{*}ak(P_{u})$ iff

\displaystyle\forall\ell_{k}\in k(\mathrm{prg})_{u},\ \exists\ell_{ak}\in ak(P% _{u}).\ \ \ell_{k}\sqsubseteq_{\mathrm{st}}\ell_{ak}

where $\ell_{ak}$ and $\ell_{k}$ are labels, and $\sqsubseteq_{\mathrm{st}}$ is the symbolic tuple ordering of Def. 9. To ensure faithful labeling of policies, we assume all of the symbolic tuples in $\ell_{ak}$ are well-formed as defined in Def. 8. We can then formalize the relationship between $\sqsubseteq_{*}$ and $\sqsubseteq$ as follows.

Lemma 4.

If $\sigma_{\mathrm{st}}(\{Q_{1},...,Q_{n}\})\sqsubseteq_{*}\sigma_{\mathrm{st}}(% \{P_{1},...,P_{m}\})$ , then in the DQ, $(Q_{1}\vee...\vee Q_{n})\sqsubseteq(P_{1}\vee...\vee P_{m})$ .

We refer the readers to Appendix F-B for the proof of this Lemma.

V-E Soundness Proof

Fig. 11 outlines the overall architecture of our enforcement mechanism and the correctness assertion that we make of it.

Figure 11: Overall architecture of our proof

The rightmost column of Fig. 11 represents a chain of information order relations in the QoI, which we establish for each enforcement step. Following the chain from bottom to top, we obtain the security condition of Def. 7. At the same time, the “left boundary” of the figure, comprising the D.A., $\sigma_{\mathrm{st}}$ abstractions and $\sqsubseteq_{*}$ check, represents the computations that are actually performed to check a program.

Theorem 1.

If a program $\mathrm{prg}$ satisfies Def. 10, then it is secure in the sense of Def. 7.

Proof.

The statement follows from establishing the implications in the diagram of Fig. 11. The top left cell is Lemma 4; the top right cell is Lemma 2; and the bottom cell (dependency analysis) is Appendix E. ∎

VI Implementation and Evaluation

In this section, we describe our prototype DiVerT [22], which implements the type-based dependency analysis of Section V-B and query abstraction of Section V-C to verify the security of database-backed programs. We then evaluate DiVerT’s effectiveness using functional tests and an assortment of real-world-inspired use cases.

VI-A Implementation

To evaluate the feasibility and security of our approach in practice, we implemented the type-based dependency analysis of Section V-B. For the sake of practicality, instead of CQC, DiVerT uses the SELECT-FROM-WHERE portion of SQL, which is analogous to CQC as described in Section V-A. Following the query analysis of Section V-C, these SQL queries are then converted into symbolic tuples. For the security check, the symbolic tuples with the result of the program analysis must be compared to those representing the policy; to perform this comparison following Def. 9, we use the Z3 SMT solver [23]. Our implementation operates on programs in the language presented in Section IV-A, with the addition of two macros @Table@ and @Policy@ for defining the tables’ schema and the security policy.

VI-B Test suite

To validate our implementation, we use a functional test suite consisting of 20 programs, designed to capture a broad variety of examples of disjunctive dependencies. This suite includes programs with row- and column-level policies of varying granularity levels, and those necessitating the use of SMT solvers for verification. Furthermore, the tests verify the behaviour of the dependency analysis by incorporating complex conditionals, loops, and implicit and explicit outputs. The tests can be found in the implementation repository [22].

VI-C Use cases

We evaluate DiVerT on four use cases inspired by real-world problems in which disjunctive policies naturally arise. The purpose of this evaluation is to validate the security analysis of DiVerT on realistic scenarios involving disjunctive policies, and ensure that its behaviour is consistent with the definitions of Section IV-B. Rather than analysing complete applications for each example, we therefore focus on smaller kernels that capture the core security-critical behaviour of the respective problem.

Privacy-preserving location service. Multilateration is a technique to determine the location of a user by measuring their distance to known reference points [24]. Two distances are sufficient to narrow a user’s location down to one of two points on a map, and three identify the location unambiguously. Consider a location service provider which tracks, for some number of users, not only their precise location but also their distances to certain points of interest (PoI) such as restaurants or shops. An advertiser wants to query this service to provide location-based ads. For example, if the user is close to a shop $A$ , and $A$ has a sale going on, the user may be enticed by this information.

Privacy and business considerations make it desirable to not reveal the precise location of the user to the advertisement company accessing the database, while still allowing for some location-based services in this vein. If the advertiser were to learn the distance of a single user to two or more PoIs at a specific time, the user’s location could be inferred. However, we may still want to release the user’s distance to any one PoI which they are currently closest to. This can be interpreted as a disjunctive policy, in which the information revealed for each user is bounded by the disjunction of that user’s distances to some single PoI.

The database schema consists of a single table Distance(id, poi, dis, loc), which stores the ID of each user, the name of the PoI, their distance, and the user’s precise location. We implement a small example with two PoIs $\{\text{`restaurant'},\text{`mall'}\}$ and two users $\{1,2\}$ . Let the view $v_{i,j}$ for each user $i$ and PoI $j$ be defined as the query SELECT id, poi FROM Distance WHERE id = $i$ AND poi = $j$ . The disjunctive policy then covers every combination of user and PoI as a possibility: $\{\{v_{1,\text{`restaurant'}},v_{2,\text{`restaurant'}}\},\{v_{1,\text{`% restaurant'}},v_{2,\text{`mall'}}\},$ $\{v_{1,\text{`mall'}},v_{2,\text{`restaurant'}}\},\{v_{1,\text{`mall'}},v_{2,% \text{`mall'}}\}\}$ .

We test two programs against this policy. In one, the advertiser uses internal parameters identifying a target user and interest, and issues a single query requesting that user’s distance from the relevant point of interest. In the other, the advertiser still targets a particular user, but queries all of that user’s distances. As expected, DiVerT accepts the former program, but rejects the latter.

Privacy-preserving data publishing. Expanding upon the motivating example in the introduction, we consider the case of programs querying a database with personally identifiable information (i.e., quasi-identifiers). As discussed before, revealing too many quasi-identifiers may make it possible to identify an individual. We consider the example of a medical database [5] with a table Patients(zip, gen, dis) storing the ZIP code of residence, gender and disease of patients. An agent querying the database should not learn more than two of these at a time. For simplicity’s sake, we only consider queries that retrieve the same data from each patient. Defining $v_{1}=$ SELECT dis, gen FROM Patients, $v_{2}=$ SELECT zip, gen FROM Patients, and $v_{3}=$ SELECT zip, dis FROM Patients, the disjunctive policy can then be written as $\{\{v_{1}\},\{v_{2}\},\{v_{3}\}\}$ .

Once again, we validate two programs against this policy. Branching on an internal parameter, the client will issue one query to select data for either male or female patients. In the first program, all queries take the form of SELECT dis FROM Patients WHERE gen = $\text{`}{\mathrm{F}}\text{'}$ , whereas in the second one, one of the queries additionally filters on the ZIP code: SELECT dis FROM Patients WHERE gen = $\text{`}{\mathrm{F}}\text{'}$ AND zip = $10001$ . Again, only the latter program is rejected by DiVerT. This reveals a potential subtlety, as data dependency and hence release of information may arise not only from what columns are selected, but also from conditions restricting the set of rows.

Secret sharing. We implement a $(t,n)$ secret sharing schema that splits a secret value $s$ into $n$ shares $s_{1},s_{2},...,s_{n}$ . These shares are then distributed among $n$ parties $p_{1},p_{2},...,p_{n}$ , each receiving a unique share. A secure secret sharing schema requires that the secret $s$ can only be reconstructed if $t$ or more participants combine their shares. If the number of combined shares is less than $t$ , no information about the secret should be revealed. This requirement naturally translates to a disjunctive policy $s_{1}\vee s_{2}\vee...\vee s_{n}$ , stipulating that participants can each only learn one share.

We assume that the shares $s_{1},s_{2},...,s_{n}$ are created by a secure secret sharing schema and are then stored in a database. The database schema consists of the table Shares(shareID, shareVal) which stores the ID of each share and their corresponding value.

The policy only allows a user to read one of the shares (i.e., only one row of the table). We define the view $v_{i}$ for each share as SELECT shareVal, shareID FROM Shares WHERE shareID = $i$ where $i=1,...,n$ . The corresponding disjunctive policy is going to look like $\{\{v_{1}\},\{v_{2}\},...,\{v_{n}\}\}$ .

We implement a program that executes a subroutine for each user, issuing a database query to retrieve the user’s share. For example the query for a user to retrieve the share number $5$ is SELECT shareVal FROM Shares WHERE shareID = $5$ and it is correctly accepted by DiVerT. If the same user issues another query to retrieve share number $6$ , it violates the policy and hence the program is rejected. This scenario shows that DiVerT is able to correctly enforce row-level policies precisely.

Online shop. This use case models an online shop and a user with a gift card can only use it to “buy” items that match the value of the gift card. Here we consider a scenario with an online shop that only provides digital items and they are stored in a database. The database schema consists of the items table Items(id, name, data) which stores the ID and name of each digital item. We define a view $v_{n}$ for each item as SELECT data, name FROM Items WHERE name = $n$ where $n$ is the item’s name.

Assume a database that has the items Movie, CinemaTicket, Audiobook, Ebook, and GymMem. A policy should only allow the user to access a certain amount of items whose value adds up to value of gift card. For instance a disjunctive policy may look like: $\{\{v_{\mathrm{Movie}},v_{\mathrm{CinemaTicket}}\},\{v_{\mathrm{Audiobook}},v_% {\mathrm{Ebook}}\},\{v_{\mathrm{GymMem}}\},$ $\{v_{\mathrm{CinemaTicket}},v_{\mathrm{Ebook}}\}\}$ .

We model a user program that issues queries to select items, e.g., SELECT data FROM Items WHERE name = $\text{`}{\mathrm{Movie}}\text{'}$ .

DiVerT accepts this query because view $v_{\mathrm{Movie}}$ allows the user to access Movie. We create two different scenarios; in one the user issues another query asking for Audiobook, which DiVerT rejects. In the second scenario, the user asks for CinemaTicket which is allowed by the policy, and hence DiVerT accepts it.

VII Related Work

This section puts our contributions in the context of related works in the areas of information flow security and database security, discussing security models of dependencies and tractable enforcement mechanisms. To our knowledge, we are the first to explore enforcement mechanisms for disjunctive policies, as well as to reconcile semantic models of (disjunctive) dependencies across the areas of information flow control and database access control.

Security models. Semantic models of dependencies have a long history since the introduction of the Lattice of Information (LoI) by Landauer and Redmond [7]. These models define a lattice structure to represent information as equivalence relations ordered by refinement and serve as cornerstone to justify soundness of various dependency analysis at the heart of enforcement mechanisms for security. For example, the universal lattice by Hunt and Sands [25] models dependencies between program variables such that the lattice elements are sets of variables ordered by set containment, and uses it to justify soundness against baseline security conditions, e.g., noninterference [26].

Within the database community, Bender et al. [8, 4] define the notion of Disclosure Lattice to represent the information disclosed by sets of database queries. Disclosure Lattice has been further developed by Guarnieri et al. [14] to enforce conjunctive information-flow policies for database-backed programs. We point out that not all disclosure orders are suitable to represent information disclosure in the context of information flow control: By studying its relation to LoI, we show that query determinacy and the stronger notion of equivalent query rewriting [19] provide sound abstraction, while query containment does not.

Our work builds on recent work by Hunt and Sands [9], which provides a semantic model for disjunctive dependencies, under the notion of the Quantale of Information. We study quantale structures in the context of databases, providing support for disjunctive policies in database-backed programs. While these policies are rooted in the area of access control, cf. ethical wall policies [27], the work of Hunt and Sands [9] is the first to provide an extensional characterization as information-flow policies. Drawing on our new notion of Determinacy Quantale, we develop a security condition to capture the security of database-backed programs in presence of disjunctive database policies.

Enforcement mechanisms. The problem of enforcing disjunctive policies for programs and/or databases is completely unexplored. We study how a standard type-based program analysis [15], equipped the notion of path sensitivity, can be adapted to statically capture disjunctive program dependencies.

At the core of our analysis is a new abstraction of database queries which enables flexible enforcement of disjunctive policies by means of SMT solvers, as witnessed by our use cases. An immediate benefit of our Determinacy Quantale is that we can prove soundness of the enforcement with respect to a solid semantic baseline for disjunctive dependencies.

There exists a wide array of works enforcing conjunctive policies for database-backed programs. Guarnieri et al. [14] propose dynamic monitoring to enforce database policies. Their abstractions are limited to boolean queries and rely on the Disclosure Lattice of Bender et al. [8, 4], which may cause soundness issues when assuming query containment as the underlying lattice order.

Language-integrated queries are supported by a range of works such as SIF [10] and JsLinq [12], SeLinks [11], UrFlow [28], DAISY [14], Jacqueline [29], and LWeb [13] for row- and column-level conjunctive policies. These works apply PL-based enforcement techniques such as type systems, dependent types, refinement types, and symbolic execution to database-backed programs [30, 31, 13, 14], but lack support for expressing and enforcing disjunctive policies.

Li and Zhang [32] explore path-sensitive program analysis to improve precision of information flow analysis, yet they do not consider disjunctive policies. QAPLA [33] is a database access control middleware supporting complex security policies, such as linking and aggregation policies, with focus only on access control.

VIII Conclusions

We presented a case for the significance of disjunctive dependency analysis to the security of database-backed programs. After reviewing recent theoretical developments in representing disjunctive information, we introduced two structures, the Determinacy Lattice and the Determinacy Quantale, as database-oriented counterparts to theoretical structures representing simple and disjunctive knowledge respectively.

Using these structures, we formulated a security condition which expresses that a database-backed program satisfies a given disjunctive policy. In order to enforce this security condition, we developed a type-based static analysis to compute a bound on the disjunctive dependencies of database-backed programs in a model language. By a series of approximations, this bound itself can be tractably compared to the representation of a static policy.

These steps constitute an enforcement mechanism for disjunctive policies, which we proved sound with respect to our security condition. To showcase this enforcement mechanism, we implemented it in our prototype tool, DiVerT. In order to validate this prototype and the overall framework, we verified the tool on a set of functional tests covering a variety of language features and disjunctive information patterns, as well as several use cases representing real-world scenarios in which we want to enforce disjunctive policies.

IX Acknowledgements

We are grateful to David Sands and Roberto Guanciale for fruitful discussions, and would also like to thank the anonymous reviewers for their insightful comments and feedback.

This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation, the Swedish Research Council (VR), and the Swedish Foundation for Strategic Research (SSF).

References

[1] E. Bertino and R. Sandhu, “Database security-concepts, approaches, and challenges,” IEEE Transactions on Dependable and Secure Computing, vol. 2, no. 1, pp. 2–19, 2005.
[2] A. Sabelfeld and A. C. Myers, “Language-based information-flow security,” IEEE Journal on Selected Areas in Communications, vol. 21, no. 1, pp. 5–19, 2003.
[3] M. Guarnieri, S. Marinovic, and D. Basin, “Strong and provably secure database access control,” in IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2016, pp. 163–178.
[4] G. Bender, L. Kot, and J. Gehrke, “Explainable security for relational databases,” in ACM SIGMOD International Conference on Management of data. ACM, 2014, pp. 1411–1422.
[5] L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-based Systems, vol. 10, no. 05, pp. 557–570, 2002.
[6] C. Dwork, “Differential privacy,” in Automata, Languages and Programming: 33rd International Colloquium. Springer, 2006, pp. 1–12.
[7] J. Landauer and T. Redmond, “A lattice of information,” in Proceedings Computer Security Foundations Workshop. IEEE, 1993, pp. 65–70.
[8] G. M. Bender, L. Kot, J. Gehrke, and C. Koch, “Fine-grained disclosure control for app ecosystems,” in ACM SIGMOD International Conference on Management of Data. ACM, 2013, pp. 869–880.
[9] S. Hunt and D. Sands, “A quantale of information,” in IEEE 34th Computer Security Foundations Symposium (CSF). IEEE, 2021, pp. 1–15.
[10] S. Chong, K. Vikram, A. C. Myers et al., “Sif: Enforcing confidentiality and integrity in web applications,” in 16th USENIX Security Symposium (USENIX Security). USENIX Association, 2007, pp. 1–16.
[11] B. J. Corcoran, N. Swamy, and M. Hicks, “Cross-tier, label-based security enforcement for web applications,” in ACM SIGMOD International Conference on Management of data. ACM, 2009, pp. 269–282.
[12] M. Balliu, B. Liebe, D. Schoepe, and A. Sabelfeld, “Jslinq: Building secure applications across tiers,” in 6th ACM Conference on Data and Application Security and Privacy. ACM, 2016, pp. 307–318.
[13] J. Parker, N. Vazou, and M. Hicks, “Lweb: Information flow security for multi-tier web applications,” Proceedings of the ACM on Programming Languages, vol. 3, no. POPL, pp. 1–30, 2019.
[14] M. Guarnieri, M. Balliu, D. Schoepe, D. Basin, and A. Sabelfeld, “Information-flow control for database-backed applications,” in IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2019, pp. 79–94.
[15] B. v. Delft, S. Hunt, and D. Sands, “Very static enforcement of dynamic policies,” in International Conference on Principles of Security and Trust. Springer, 2015, pp. 32–52.
[16] I. Kaplansky, Set Theory and Metric Spaces. AMS Chelsea Publishing, 2001.
[17] B. A. Davey and H. A. Priestley, Introduction to lattices and order. Cambridge University Press, 2002.
[18] S. Abiteboul, R. Hull, and V. Vianu, Foundations of databases. Addison-Wesley, 1995.
[19] A. Nash, L. Segoufin, and V. Vianu, “Views and queries: Determinacy and rewriting,” ACM Transactions on Database Systems (TODS), vol. 35, no. 3, pp. 1–41, 2010.
[20] Q. Wang and K. Yi, “Conjunctive queries with comparisons,” in International Conference on Management of Data. ACM, 2022, pp. 108–121.
[21] A. Askarov, S. Hunt, A. Sabelfeld, and D. Sands, “Termination-insensitive noninterference leaks more than just a bit,” in 13th European Symposium on Research in Computer Security. Springer, 2008, pp. 333–348.
[22] A. M. Ahmadian, M. Soloviev, and M. Balliu, “Divert,” 2023, software release. [Online]. Available: https://github.com/KTH-LangSec/DiVerT
[23] L. De Moura and N. Bjørner, “Z3: An efficient smt solver,” in International conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS). Springer, 2008, pp. 337–340.
[24] W. Murphy and W. Hereman, “Determination of a position in three dimensions using trilateration and approximate distances,” Department of Mathematical and Computer Sciences, Colorado School of Mines, Golden, Colorado, MCS-95, vol. 7, p. 19, 1995.
[25] S. Hunt and D. Sands, “On flow-sensitive security types,” in 33rd SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL. ACM, 2006, pp. 79–90.
[26] J. A. Goguen and J. Meseguer, “Security policies and security models,” in IEEE Symposium on Security and Privacy (S&P). IEEE, 1982, pp. 11–20.
[27] D. F. Brewer and M. J. Nash, “The chinese wall security policy.” in IEEE Symposium on Security and Privacy (S&P). IEEE, 1989, pp. 206–214.
[28] A. Chlipala, “Static checking of dynamically-varying security policies in database-backed applications,” in 9th USENIX Symposium on Operating Systems Design and Implementation. USENIX Association, 2010, pp. 105–118.
[29] J. Yang, T. Hance, T. H. Austin, A. Solar-Lezama, C. Flanagan, and S. Chong, “Precise, dynamic information flow for database-backed applications,” ACM SIGPLAN Notices, vol. 51, no. 6, pp. 631–647, 2016.
[30] N. Swamy, B. J. Corcoran, and M. Hicks, “Fable: A language for enforcing user-defined security policies,” in IEEE Symposium on Security and Privacy (S&P). IEEE, 2008, pp. 369–383.
[31] L. Lourenço and L. Caires, “Dependent information flow types,” in 42nd SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL. ACM, 2015, pp. 317–328.
[32] P. Li and D. Zhang, “Towards a flow- and path-sensitive information flow analysis,” in 30th IEEE Computer Security Foundations Symposium (CSF). IEEE, 2017, pp. 53–67.
[33] A. Mehta, E. Elnikety, K. Harvey, D. Garg, and P. Druschel, “Qapla: Policy compliance for database-backed systems,” in 26th USENIX Security Symposium (USENIX Security). USENIX Association, 2017, pp. 1463–1479.

Appendix A Interpretations of Query Determinacy

We prove the following technical lemma to show that the two intuitive interpretations of the definition of query determinacy are equivalent.

Lemma 5.

If $A$ is recursively enumerable and $f:A\rightarrow B$ and $g:A\rightarrow C$ are computable, then the following are equivalent:

(i)

For all $a,a^{\prime}\in A$ , if $f(a)=f(a^{\prime})$ , then $g(a)=g(a^{\prime})$ .
(ii)

There exists a computable $h:B\rightarrow C$ such that for all $a\in A$ , $g(a)=h(f(a))$ .

Proof.

(ii) $\Rightarrow$ (i): Suppose $b=f(a)=f(a^{\prime})$ , and $h$ is as in (ii). Then $g(a)=h(f(a))=h(b)$ , and $g(a^{\prime})=h(f(a^{\prime}))=h(b)$ .

(i) $\Rightarrow$ (ii): Let $\hat{f}:B\rightharpoonup A$ be the partial function that enumerates $A$ and for a given $b\in B$ returns the first $a\in A$ it finds such that $f(a)=b$ . This is computable, per the algorithmic description provided. This does not necessarily satisfy $\hat{f}(f(a))=a$ , but we do have $f(\hat{f}(f(a)))=f(a)$ by definition (since the enumeration of $A$ will either encounter $a$ or another $a^{\prime}$ such that $f(a^{\prime})=f(a)$ eventually). Hence $g(\hat{f}(f(a)))=g(a)$ by (i). So defining $h$ by $h(b)=g(\hat{f}(b))$ , we find that $h(f(a))=g(a)$ as required. ∎

Instantiating Lemma 5 with $A$ as the set of possible databases, $f$ as the function $r_{Q}(db)=\{{\llbracket{q}\rrbracket^{db}}\mid q\in Q\}$ that computes the results of the queries in $Q$ on $db$ , and $g$ as the same for $Q^{\prime}$ , we find that $Q$ determining $Q^{\prime}$ indeed means that the (results of) queries in $Q$ are always sufficient to determine (compute) the result of the queries in $Q^{\prime}$ .

Appendix B Relation Between DL and LoI

We first prove some auxiliary lemmas, and then proceed to prove Lemma 1.

Lemma 6.

For sets of queries $Q_{1},Q_{2}\in DL(\mathcal{Q})$ , the ordering ${\downarrow}Q_{1}\sqsubseteq{\downarrow}Q_{2}$ on the DL implies ${Q_{1}}_{\sim}\sqsubseteq{Q_{2}}_{\sim}$ on the LoI defined on $\{{Q}_{\sim}\mid Q\in DL(\mathcal{Q})\}$ :

\displaystyle{\downarrow}Q_{1}\sqsubseteq{\downarrow}Q_{2}\rightarrow{Q_{1}}_{% \sim}\sqsubseteq{Q_{2}}_{\sim}

Proof.

The definition of the ordering relation of the LoI (Section II) and ${Q_{1}}_{\sim}\sqsubseteq{Q_{2}}_{\sim}$ would give us:

	$\displaystyle{Q_{1}}_{\sim}\sqsubseteq\leavevmode\nobreak\$	$\displaystyle{Q_{2}}_{\sim}\rightarrow$
		$\displaystyle\forall db,db^{\prime}\in\Omega_{D}\ \ (db\ {Q_{1}}_{\sim}\ db^{% \prime}\Rightarrow db\ {Q_{2}}_{\sim}\ db^{\prime})$		(1)

By the definition of equivalence relations for query sets ( ${Q}_{\sim}$ ), for all $db,db^{\prime}\in\Omega_{D}$ we have:

	$\displaystyle(db\ {Q_{1}}_{\sim}\ db^{\prime}\Rightarrow db\ {Q_{2}}_{\sim}\ % db^{\prime})\rightarrow$
	$\displaystyle\Big{(}({\llbracket{q_{2}}\rrbracket^{db}}={\llbracket{q_{2}}% \rrbracket^{db^{\prime}}}\forall q_{2}\in Q_{2})\Rightarrow({\llbracket{q_{1}}% \rrbracket^{db}}={\llbracket{q_{1}}\rrbracket^{db^{\prime}}}\forall q_{1}\in Q% _{1})\Big{)}$		(2)

(1) and (2) would give us:

	$\displaystyle{Q_{1}}_{\sim}\sqsubseteq{Q_{2}}_{\sim}\rightarrow\forall db,db^{% \prime}\in\Omega_{D}$
	$\displaystyle\Big{(}({\llbracket{q_{2}}\rrbracket^{db}}={\llbracket{q_{2}}% \rrbracket^{db^{\prime}}}\forall q_{2}\in Q_{2})\Rightarrow({\llbracket{q_{1}}% \rrbracket^{db}}={\llbracket{q_{1}}\rrbracket^{db^{\prime}}}\forall q_{1}\in Q% _{1})\Big{)}$		(3)

On the other hand, by the definition of the Determinacy Lattice 2, we have ${\downarrow}Q_{1}\sqsubseteq{\downarrow}Q_{2}\leftrightarrow Q_{1}\preceq Q_{2}$ . From the definition of determinacy ordering, $Q_{1}\preceq Q_{2}$ means $Q_{2}\twoheadrightarrow Q_{1}$ . By the definition of query determinacy (Def. 1) we know that $Q_{2}\twoheadrightarrow Q_{1}$ if:

	$\displaystyle\forall db,db^{\prime}\in\Omega_{D}$
	$\displaystyle\Big{(}({\llbracket{q_{2}}\rrbracket^{db}}={\llbracket{q_{2}}% \rrbracket^{db^{\prime}}}\forall q_{2}\in Q_{2})\Rightarrow({\llbracket{q_{1}}% \rrbracket^{db}}={\llbracket{q_{1}}\rrbracket^{db^{\prime}}}\forall q_{1}\in Q% _{1})\Big{)}$		(4)

It is evident from (3) and (4) that ${\downarrow}Q_{1}\sqsubseteq{\downarrow}Q_{2}\rightarrow{Q_{1}}_{\sim}% \sqsubseteq{Q_{2}}_{\sim}$ holds. ∎

Relying on Def. 6 to establish the set of equivalence relations derived from a set of sets of queries, we propose following lemma:

Lemma 7.

For any set of sets of queries $\mathbb{Q}\subseteq DL(\mathcal{Q})$ , the join of $\mathbb{Q}$ on the DL implies the join of $\llbracket{\mathbb{Q}}\rrbracket$ on the LoI defined on $\{{Q}_{\sim}\mid Q\in DL(\mathcal{Q})\}$ :

\displaystyle\bigsqcup\mathbb{Q}\rightarrow\bigsqcup\llbracket{\mathbb{Q}}\rrbracket

Proof.

Assume there is a set of queries $R\in DL(\mathcal{Q})$ such that $R=\bigsqcup\mathbb{Q}$ .

By the definition of the Determinacy Lattice III-B, we have $\bigsqcup\mathbb{Q}={\downarrow}\bigcup\mathbb{Q}$ which would give us $R={\downarrow}\bigcup\mathbb{Q}$ . By the definitions of ${\downarrow}$ and query determinacy(Def. 1), it is straightforward to see $(\bigcup\mathbb{Q})\twoheadrightarrow{\downarrow}\bigcup\mathbb{Q}$ and ${\downarrow}\bigcup\mathbb{Q}\twoheadrightarrow(\bigcup\mathbb{Q})$ . Replacing ${\downarrow}\bigcup\mathbb{Q}$ with $R$ , by the definition of query determinacy (Def. 1) we have $R\twoheadrightarrow(\bigcup\mathbb{Q})$ :

	$\displaystyle\forall db,db^{\prime}\in\Omega_{D}$
	$\displaystyle\Big{(}\forall r\in R.\ {\llbracket{r}\rrbracket^{db}}={% \llbracket{r}\rrbracket^{db^{\prime}}}\rightarrow\forall p\in\bigcup\mathbb{Q}% .\ {\llbracket{p}\rrbracket^{db}}={\llbracket{p}\rrbracket^{db^{\prime}}}\Big{)}$		(1)

and $(\bigcup\mathbb{Q})\twoheadrightarrow R$ :

	$\displaystyle\forall db,db^{\prime}\in\Omega_{D}$
	$\displaystyle\Big{(}\forall p\in\bigcup\mathbb{Q}.\ {\llbracket{p}\rrbracket^{% db}}={\llbracket{p}\rrbracket^{db^{\prime}}}\rightarrow\forall r\in R.\ {% \llbracket{r}\rrbracket^{db}}={\llbracket{r}\rrbracket^{db^{\prime}}}\Big{)}$		(2)

(1) and (2) would give us:

	$\displaystyle\forall db,db^{\prime}\in\Omega_{D}$
	$\displaystyle(\forall r\in R.\ {\llbracket{r}\rrbracket^{db}}={\llbracket{r}% \rrbracket^{db^{\prime}}}\leftrightarrow\forall p\in\bigcup\mathbb{Q}.\ {% \llbracket{p}\rrbracket^{db}}={\llbracket{p}\rrbracket^{db^{\prime}}})$		(3)

Assume $\bigsqcup\llbracket{\mathbb{Q}}\rrbracket$ is an equivalence relation $R^{\prime}$ . By the definition of the join of the LoI (Section II):

\displaystyle\bigsqcup\llbracket{\mathbb{Q}}\rrbracket=\forall db,db^{\prime}% \in\Omega_{D}\ (db\ {R^{\prime}}_{\sim}\ db^{\prime}\leftrightarrow\forall Q% \in\mathbb{Q}.\ db\ {Q}_{\sim}\ db^{\prime})

and by the definition of equivalence relations for query sets, for all $db,db^{\prime}\in\Omega_{D}$ we have:

	$\displaystyle\bigsqcup\llbracket{\mathbb{Q}}\rrbracket=$
	$\displaystyle(\forall r\in R^{\prime}.\ {\llbracket{r}\rrbracket^{db}}={% \llbracket{r}\rrbracket^{db^{\prime}}}\leftrightarrow\forall Q\in\mathbb{Q}.\ % \forall q\in Q.\ {\llbracket{q}\rrbracket^{db}}={\llbracket{q}\rrbracket^{db^{% \prime}}})=$
	$\displaystyle(\forall r\in R^{\prime}.\ {\llbracket{r}\rrbracket^{db}}={% \llbracket{r}\rrbracket^{db^{\prime}}}\leftrightarrow\forall p\in\bigcup% \mathbb{Q}.\ {\llbracket{p}\rrbracket^{db}}={\llbracket{p}\rrbracket^{db^{% \prime}}})$		(4)

(3) and (4) would allow us to conclude $R=R^{\prime}$ , hence $\bigsqcup\mathbb{Q}\rightarrow\bigsqcup\llbracket{\mathbb{Q}}\rrbracket$ . ∎

See 1

Proof.

To prove this homomorphism, we need to show that the Determinacy Lattice’s ordering and join, as well as the top and bottom elements imply their LoI counterparts. Lemmas 6 and 7 provide the proofs of ordering and join. The proof for top and bottom elements:

•

${\downarrow}\mathcal{Q}\rightarrow{({\downarrow}\mathcal{Q})}_{\sim}$
•

${\downarrow}\varnothing\rightarrow{({\downarrow}\varnothing)}_{\sim}$

follows trivially from the definition of ${\downarrow}$ and _∼. ∎

Appendix C Determinacy Quantale Axioms

We follow the approach of [9] to prove that our definition of the Determinacy Quantale is indeed a quantale. We begin by defining what is a quantale.

Definition 11.

A quantale is a structure $\langle\mathcal{L},\sqsubseteq,\vee,\otimes,1\rangle$ such that:

1.

$\langle\mathcal{L},\sqsubseteq,\vee\rangle$ is a complete join-semilattice
2.

$\langle\mathcal{L},\otimes,1\rangle$ is monoid, that is $\otimes$ is associative and $\forall x\in\mathcal{L},x\otimes 1=x=1\otimes x$
3.

$\otimes$ distributes over $\vee$ .

A quantale is called commutative when its $\otimes$ operator is commutative [9].

Next, we prove some lemmas that are later used in the proof of Theorem 2.

Lemma 8.

Both $\mathrm{mix}$ and $\mathrm{tc}$ are closure operators.

Proof.

A closure operator is a function $f:\mathcal{P}{(A)}\rightarrow\mathcal{P}{(A)}$ from the power set of domain $A$ to itself that satisfies the following properties for all sets $X,Y\subseteq A$ :

•

$f$ is extensive: $X\subseteq\operatorname{f}(X)$
•

$f$ is increasing: $X\subseteq Y\Rightarrow f(X)\subseteq f(Y)$
•

$f$ is idempotent: $f(f(X))=f(X)$

It is straightforward to show that both $\mathrm{mix}$ and $\mathrm{tc}$ satisfy these conditions. ∎

Definition 12.

For a closure operator ${\downarrow}$ defined on the domain $A$ , and a function $F:A\rightarrow A$ , say that $F$ weakly commutes with ${\downarrow}$ if $F(cl(X))\subseteq cl(F(X))$ for all $X\subseteq A$ .

Lemma 9.

Let ${\downarrow}:A\rightarrow A$ be a closure operator and let $X,Y\subseteq A$ . Suppose that $F:A\rightarrow A$ weakly commutes with ${\downarrow}$ and that $G:A\times A\rightarrow A$ weakly commutes with ${\downarrow}$ in each argument. Then:

1.

${\downarrow}(F({\downarrow}(X)))={\downarrow}(F(X))$
2.

${\downarrow}(G({\downarrow}(X)\times{\downarrow}(Y)))={\downarrow}(G(X\times Y))$

Proof.

Routine, following the properties of closure operator. ∎

Lemma 10.

Let $\mathbb{P},\mathbb{Q}\subseteq DL(\mathcal{Q})$ , the union operator $\cup$ weakly commutes with $\mathrm{tc}$ :

\displaystyle\mathrm{tc}(\mathrm{tc}(\mathbb{P})\cup\mathrm{tc}(\mathbb{Q}))=% \mathrm{tc}(\mathbb{P}\cup\mathbb{Q})

Proof.

It suffices to show $\mathbb{R}\in\mathrm{tc}(\mathrm{tc}(\mathbb{P})\cup\mathrm{tc}(\mathbb{Q}))$ iff $\mathbb{R}\in\mathrm{tc}(\mathbb{P}\cup\mathbb{Q})$ , which follows easily from the definitions of $\cup$ and $\mathrm{tc}$ . ∎

Lemma 11.

The join operator of DL weakly commutes with $\mathrm{tc}$ in each argument

Proof.

Let $P,Q\in DL(\mathcal{Q})$ , and let $\mathbb{S}\subseteq DL(\mathcal{Q})$ . If ${Q}_{\sim}$ is tiled by $\llbracket{\mathbb{S}}\rrbracket$ then ${P}_{\sim}\sqcup{Q}_{\sim}$ is tiled by $\{{P}_{\sim}\sqcup R\mid R\in\llbracket{\mathbb{S}}\rrbracket\}$ . This follows easily from the definition of the equivalence relation induced by a query (i.e., _∼), $\mathrm{mix}$ , Lemma 7 and the fact that $[{P}_{\sim}\sqcup{Q}_{\sim}]=\{A\cap B\mid A\in[{P}_{\sim}],B\in[{Q}_{\sim}]\}\setminus\varnothing$ . ∎

Lemma 12.

Given two sets of sets of queries $\mathbb{Q},\mathbb{P}\subseteq DL(\mathcal{Q})$ it holds that:

\displaystyle\mathrm{tc}(\mathbb{Q})\otimes\mathrm{tc}(\mathbb{P})=\mathbb{Q}% \otimes\mathbb{P}

Proof.

By Lemma 11 we know that the join operator of DL weakly commutes with $\mathrm{tc}$ in each argument. We apply this lemma to the definition of $\otimes$ operator:

	$\displaystyle\mathrm{tc}(\mathbb{Q})\otimes\mathrm{tc}(\mathbb{P})=$
	$\displaystyle\mathrm{tc}(\bigcup_{Q\in\mathrm{tc}(\mathbb{Q}),P\in\mathrm{tc}(% \mathbb{P})}(Q\sqcup P))=$
	$\displaystyle\mathrm{tc}(\bigcup_{Q\in\mathbb{Q},P\in\mathbb{P}}(Q\sqcup P))=$
	$\displaystyle\mathbb{Q}\otimes\mathbb{P}$

∎

Lemma 13.

Given two sets of sets of queries $\mathbb{Q},\mathbb{P}\subseteq DL(\mathcal{Q})$ it holds that:

\displaystyle\mathbb{Q}\vee\mathbb{P}=\mathbb{P}\vee\mathbb{Q}

Proof.

Follows directly from the definition of $\vee$ in the DL and the commutativity of union operator $\cup$ .

	$\displaystyle\mathbb{Q}\vee\mathbb{P}=$
	$\displaystyle{\downarrow}(\mathbb{Q}\cup\mathbb{P})=$
	$\displaystyle{\downarrow}(\mathbb{P}\cup\mathbb{Q})=$
	$\displaystyle\mathbb{P}\vee\mathbb{Q}$

∎

Now, we show that DQ in Def. 5 is a quantale.

Theorem 2.

The Determinacy Quantale is a commutative quantale.

Proof.

We have to show that our definition of Determinacy Quantale respects the quantale axioms of Def. 11.

1.

Showing $\langle\mathcal{I},\sqsubseteq,\vee\rangle$ is a complete join-semilattice is straightforward following Lemma 8 and the fact that $\mathrm{tc}$ is a closure operator.

We should show that $\otimes$ is associative and $1$ is a unit:

For the associativity of $\otimes$ we need to show that $\mathbb{P}\otimes(\mathbb{Q}\otimes\mathbb{R})=(\mathbb{P}\otimes\mathbb{Q})% \otimes\mathbb{R}$ . Here we rely on Lemmas 11 and 12 to eliminate the nested uses of $\mathrm{tc}$ and the basic properties of $\cup$ operator to show that both sides of $\mathbb{P}\otimes(\mathbb{Q}\otimes\mathbb{R})=(\mathbb{P}\otimes\mathbb{Q})% \otimes\mathbb{R}$ can be reduced to identical expressions.
Left side:

	$\displaystyle\mathbb{P}\otimes(\mathbb{Q}\otimes\mathbb{R})=$
	$\displaystyle\mathbb{P}\otimes\mathrm{tc}(\bigcup_{Q\in\mathbb{Q},R\in\mathbb{% R}}(Q\sqcup R))=$
	$\displaystyle\mathbb{P}\otimes(\bigcup_{Q\in\mathbb{Q},R\in\mathbb{R}}(Q\sqcup R% ))=$
	$\displaystyle\mathrm{tc}(\bigcup_{P\in\mathbb{P},T\in(\bigcup_{Q\in\mathbb{Q},% R\in\mathbb{R}}(Q\sqcup R))}(P\sqcup T))=$
	$\displaystyle\mathrm{tc}(\bigcup_{P\in\mathbb{P},Q\in\mathbb{Q},R\in\mathbb{R}% )}(P\sqcup Q\sqcup R))$		(1)

Right Side:

	$\displaystyle(\mathbb{P}\otimes\mathbb{Q})\otimes\mathbb{R}=$
	$\displaystyle\mathrm{tc}(\bigcup_{P\in\mathbb{P},Q\in\mathbb{Q}}(P\sqcup Q))% \otimes\mathbb{R}=$
	$\displaystyle(\bigcup_{P\in\mathbb{P},Q\in\mathbb{Q}}(P\sqcup Q))\otimes% \mathbb{R}=$
	$\displaystyle\mathrm{tc}(\bigcup_{T\in(\bigcup_{P\in\mathbb{P},Q\in\mathbb{Q}}% (P\sqcup Q)),R\in\mathbb{R}}(T\sqcup R))=$
	$\displaystyle\mathrm{tc}(\bigcup_{P\in\mathbb{P},Q\in\mathbb{Q},R\in\mathbb{R}% )}(P\sqcup Q\sqcup R))$		(2)

By (1) and (2) we can conclude that $\mathbb{P}\otimes(\mathbb{Q}\otimes\mathbb{R})=(\mathbb{P}\otimes\mathbb{Q})% \otimes\mathbb{R}$ .

To show that $1=\varnothing$ is a unit for $\otimes$ we need to show that $\forall x\in\mathcal{I},x\otimes 1=x=1\otimes x$ . Using $\varnothing$ as the unit, and applying the definition of $\otimes$ will give us:

\displaystyle\mathbb{Q}\otimes\varnothing=\mathrm{tc}(\bigcup_{Q\in\mathbb{Q}}% (Q))=\mathrm{tc}(\mathbb{Q})=\mathbb{Q}

which following the associativity of $\otimes$ gives us $\forall x\in\mathcal{I},x\otimes\varnothing=x=\varnothing\otimes x$ .

To establish distributivity we need to show that $\mathbb{P}\otimes(\mathbb{Q}\vee\mathbb{R})=(\mathbb{P}\otimes\mathbb{Q})\vee(% \mathbb{P}\otimes\mathbb{R})$ . We again rely on Lemmas 11 and 12 and basic properties of $\cup$ to show:

	$\displaystyle\mathbb{P}\otimes(\mathbb{Q}\vee\mathbb{R})=$
	$\displaystyle\mathbb{P}\otimes\mathrm{tc}(\mathbb{Q}\cup\mathbb{R})=$
	$\displaystyle\mathbb{P}\otimes(\mathbb{Q}\cup\mathbb{R})=$
	$\displaystyle\mathrm{tc}(\bigcup_{P\in\mathbb{P},T\in(\mathbb{Q}\cup\mathbb{R}% )}(P\sqcup T))=$
	$\displaystyle\mathrm{tc}(\bigcup_{P\in\mathbb{P},Q\in\mathbb{Q}}(P\sqcup Q)% \cup\bigcup_{P\in\mathbb{P},R\in\mathbb{R}}(P\sqcup R))=$
	$\displaystyle\mathrm{tc}((\mathbb{P}\otimes\mathbb{Q})\cup(\mathbb{P}\otimes% \mathbb{R}))=$
	$\displaystyle(\mathbb{P}\otimes\mathbb{Q})\vee(\mathbb{P}\otimes\mathbb{R})$

Commutativity of $\otimes$ is inherited directly from Lemma 13 and the commutativity of $\sqcup$ in DL.

	$\displaystyle\mathbb{P}\otimes\mathbb{Q}=$
	$\displaystyle\mathrm{tc}(\bigcup_{P\in\mathbb{P},Q\in\mathbb{Q}}(P\sqcup Q))=$
	$\displaystyle\mathrm{tc}(\bigcup_{P\in\mathbb{P},Q\in\mathbb{Q}}(Q\sqcup P))=$
	$\displaystyle\mathbb{Q}\otimes\mathbb{P}$

∎

Appendix D Relation Between DQ and QoI

We first provide some auxiliary lemmas, and then proceed to prove Lemma 2.

Lemma 14.

Given sets of sets of queries $\mathbb{Q},\mathbb{P}\subseteq DL(\mathcal{Q})$ , $\mathrm{tc}(\mathbb{Q})\subseteq\mathrm{tc}(\mathbb{P})$ on the DQ implies $\llbracket{\mathrm{tc}(\mathbb{Q})}\rrbracket\subseteq\llbracket{\mathrm{tc}(% \mathbb{P})}\rrbracket$ on the QoI defined on $\{\llbracket{\mathbb{Q}}\rrbracket\mid\mathbb{Q}\subseteq DL(\mathcal{Q})\}$ :

Proof.

Trivial from the Def. 6. ∎

Lemma 15.

$\bigvee_{i}\mathbb{P}_{i}$ on the DQ implies $\bigvee_{i}\llbracket{\mathbb{P}_{i}}\rrbracket$ on the QoI defined on $\{\llbracket{\mathbb{Q}}\rrbracket\mid\mathbb{Q}\subseteq DL(\mathcal{Q})\}$ .

Proof.

Trivial from the Def. 6. ∎

Lemma 16.

Given sets of sets of queries $\mathbb{Q},\mathbb{P}\subseteq DL(\mathcal{Q})$ , $\mathrm{tc}(\mathbb{Q})\otimes\mathrm{tc}(\mathbb{P})$ on the DQ implies $\llbracket{\mathrm{tc}(\mathbb{Q})}\rrbracket\otimes\llbracket{\mathrm{tc}(% \mathbb{P})}\rrbracket$ on the QoI defined on $\{\llbracket{\mathbb{Q}}\rrbracket\mid\mathbb{Q}\subseteq DL(\mathcal{Q})\}$ .

Proof.

Follows trivially from Def. 6 and Lemma 7. ∎

See 2

Proof.

To prove this homomorphism, we need to show that the Determinacy Quantale’s ordering, join and tensor, as well as the top and bottom elements imply their QoI counterparts. Lemmas 14, 15, and 16 provide the proofs of ordering, join and tensor, respectively. The proof of the top element:

•

$DL(\mathcal{Q})\rightarrow LoI(\llbracket{\mathcal{Q}}\rrbracket)$

follows from Def. 6 and Lemma 1, and the proof of the bottom element:

•

$\varnothing\rightarrow\varnothing$

is trivial. ∎

Appendix E Correctness of Dependency Analysis

To show that the diagram in Fig. 11 commutes, we aim to show commutativity for each cell in it. In this section, we establish this for the bottommost cell of it. To that end, we need to establish that the QoI point $\llbracket\mathrm{QL}_{u}\rrbracket$ that corresponds to the query list $\mathrm{QL}_{u}=\{Q_{1},\ldots,Q_{n}\}$ extracted from a program $\mathrm{prg}$ by the dependency analysis is an upper bound on the knowledge relation $\llbracket\mathrm{prg}_{u}\rrbracket$ induced by $\mathrm{prg}$ .

The basic outline of the argument rests on identifying a particular single equivalence relation $k(\mathrm{QL},\mathrm{prg})\in\mathrm{mix}([{Q_{1}}_{\sim}],\ldots,[{Q_{n}}_{% \sim}])$ , which satisfies $\llbracket\mathrm{prg}\rrbracket\sqsubseteq k(\mathrm{QL},\mathrm{prg})$ . Intuitively, this relation captures how much information the program could leak at most if it output the full result of every query that its output depends on. As long as the analysis is sound, this is an instantiation of the disjunction represented by QL, with each disjunct selected precisely for those starting configurations where the program’s output turns out to depend on the queries enumerated in that disjunct.

For a fixed program $\mathrm{prg}$ and user $u$ , we assume the existence of a function $Q=Q_{\mathrm{prg},u}$ from databases $db\in\Omega_{D}$ to sets of queries, which returns the set of those queries performed when executing $\mathrm{prg}$ on database $db$ whose result taints some output to the user $u$ . We formally define the function $Q$ by relying on a taint analysis.

Taint analysis. The semantics of the taint analysis enriches the normal operational semantics of the language in the sense that it has transitions whenever the operational one does, and acts the same on those components of a configuration that exist in the operational one; so runs in it can be put in one-to-one correspondence to operational ones.

$\inferrule*[before=\textsc{TA-Skip}]{\\ }{\langle\Delta,\texttt{skip},m,db\rangle\xrightarrow{\epsilon}\langle\Delta,% \epsilon,m,db\rangle}$

$\inferrule*[before=\textsc{TA-Assign}]{\langle e,m,db\rangle\downarrow vl\\ m^{\prime}=m[x\mapsto vl]\\ \Delta^{\prime}=\Delta[x\mapsto\Delta(pc)\cup\textstyle\bigcup_{x\in fv(e)}% \Delta(x)]}{\langle\Delta,x:=e,m,db\rangle\xrightarrow{\epsilon}\langle\Delta^% {\prime},\epsilon,m^{\prime},db\rangle}$

$\inferrule*[before=\textsc{TA-QueryEval}]{vl={\llbracket{q}\rrbracket^{db}}\\ m^{\prime}=m[x\mapsto vl]\\ \Delta^{\prime}=\Delta[x\mapsto\Delta(pc)\cup q]}{\langle\Delta,x\leftarrow q,% m,db\rangle\xrightarrow{\epsilon}\langle\Delta^{\prime},\epsilon,m^{\prime},db\rangle}$

$\inferrule*[before=\textsc{TA-IfTrue}]{\langle e,m,db\rangle\downarrow n\\ n\not=0\\ \\ c^{\prime}_{1}=c_{1};\texttt{set }pc\texttt{ to }\Delta(pc)\\ \\ \Delta^{\prime}=\Delta[pc\mapsto\Delta(pc)\cup\textstyle\bigcup_{x\in fv(e)}% \Delta(x)]}{\langle\Delta,\texttt{if}\ e\ \texttt{then}\ c_{1}\ \texttt{else}% \ c_{2},m,db\rangle\xrightarrow{\epsilon}\langle\Delta^{\prime},c^{\prime}_{1}% ,m,db\rangle}$ $\inferrule*[before=\textsc{TA-IfFalse}]{\langle e,m,db\rangle\downarrow n\\ n=0\\ \\ c^{\prime}_{2}=c_{2};\texttt{set }pc\texttt{ to }\Delta(pc)\\ \\ \Delta^{\prime}=\Delta[pc\mapsto\Delta(pc)\cup\textstyle\bigcup_{x\in fv(e)}% \Delta(x)]}{\langle\Delta,\texttt{if}\ e\ \texttt{then}\ c_{1}\ \texttt{else}% \ c_{2},m,db\rangle\xrightarrow{\epsilon}\langle\Delta^{\prime},c^{\prime}_{2}% ,m,db\rangle}$

$\inferrule*[before=\textsc{TA-WhileTrue}]{\langle e,m,db\rangle\downarrow n\\ n\not=0\\ \\ c^{\prime}=c;\texttt{while}\ e\ \texttt{do}\ c;\texttt{set }pc\texttt{ to }% \Delta(pc)\\ \\ \Delta^{\prime}=\Delta[pc\mapsto\Delta(pc)\cup\textstyle\bigcup_{x\in fv(e)}% \Delta(x)]}{\langle\Delta,\texttt{while}\ e\ \texttt{do}\ c,m,db\rangle% \xrightarrow{\epsilon}\langle\Delta^{\prime},c^{\prime},m,db\rangle}$ $\inferrule*[before=\textsc{TA-WhileFalse}]{\langle e,m,db\rangle\downarrow n\\ n=0\\ \\ c^{\prime}=\texttt{set }pc\texttt{ to }\Delta(pc)\\ \\ \Delta^{\prime}=\Delta[pc\mapsto\Delta(pc)\cup\textstyle\bigcup_{x\in fv(e)}% \Delta(x)]}{\langle\Delta,\texttt{while}\ e\ \texttt{do}\ c,m,db\rangle% \xrightarrow{\epsilon}\langle\Delta^{\prime},\epsilon,m,db\rangle}$

$\inferrule*[before=\textsc{TA-Seq}]{\langle\Delta,c_{1},m,db\rangle% \xrightarrow{\alpha}\langle\Delta^{\prime},c_{1}^{\prime},m^{\prime},db^{% \prime}\rangle\\ }{\langle\Delta,c_{1};c_{2},m,db\rangle\xrightarrow{\alpha}\langle\Delta^{% \prime},c_{1}^{\prime};c_{2},m^{\prime},db^{\prime}\rangle}$ $\inferrule*[before=\textsc{TA-SeqEmpty}]{\\ }{\langle\Delta,\epsilon;c,m,db\rangle\xrightarrow{\epsilon}\langle\Delta,c,m,% db\rangle}$

$\inferrule*[before=\textsc{TA-Output}]{\langle e,m,db\rangle\downarrow vl\\ \beta=\Delta(pc)\cup\textstyle\bigcup_{x\in fv(e)}\Delta(x)}{\langle\Delta,% \texttt{out}(e,u),m,db\rangle\xrightarrow{\langle vl,u,\beta\rangle}\langle% \Delta,\epsilon,m,db\rangle}$ $\inferrule*[before=\textsc{TA-SetPC}]{\Delta^{\prime}=\Delta[pc\mapsto\delta]}% {\langle\Delta,\texttt{set }pc\texttt{ to }\delta,m,db\rangle\xrightarrow{% \epsilon}\langle\Delta^{\prime},\epsilon,m,db\rangle}$

Figure 12: Taint analysis rules

The rules of the taint analysis presented in Fig. 12 are fairly straightforward. We use map** $\Delta$ to map each variable to a set of dependencies of variables and queries.

The rules for if rely on auxiliary command $\texttt{set }pc\texttt{ to }\delta$ to restore the dependency set of $pc$ to its previous state ( $\Delta(pc)$ ) upon exiting the if branch. We sequentially composite this command with the body of if to ensure its execution after leaving the if branch’s body. The rules for while use $\texttt{set }pc\texttt{ to }\delta$ in a similar manner.

The rule TA-Output uses $fv(e)$ to extract all the variables of expression $e$ , and relies on the union of the $\Delta$ s of those variables to calculate $\beta$ , which is the set of dependencies the execution up to this output, depended on.

We extend the definition of trace $\tau$ to a sequence of observations of the form $\langle vl,u,\beta\rangle$ , and use the notation $\tau\negthickspace\mathrel{\downharpoonleft\mkern-5.7mu\downharpoonleft}_{u}$ to denote the sequence of all $\beta$ s in $\tau$ that $u$ can observe. We use this notation to define function $Q$ as follows:

Definition 13.

Given a database state $db$ and user $u$ , such that $\langle c,m_{0},db\rangle\xRightarrow{\tau}\negthickspace_{u}$ , $Q(db)$ is defined as $\{\beta\mid\beta\in\tau\negthickspace\mathrel{\downharpoonleft\mkern-5.7mu% \downharpoonleft}_{u}\}$

A proof of Lemma 3 can then proceed by a straightforward induction on the semantics.

In Def. 13 we formally define the function $Q$ . This function satisfies a closure property that informally states that if on two given databases the output depended on different sets of queries, then the choice of the set of dependencies itself must have been due to the outcome of a query which is among the dependencies in both databases and evaluates to a different result.

Lemma 17.

For all $db,db^{\prime}\in\Omega_{D}$ , if $Q(db)\neq Q(db^{\prime})$ , then there exists a particular query $q\in Q(db)\cap Q(db^{\prime})$ such that ${\llbracket{q}\rrbracket^{db}}\neq{\llbracket{q}\rrbracket^{db^{\prime}}}$ .

We say that database states $db$ and $db^{\prime}$ are equivalent with respect to a dependency set $S$ (written as $db\approx_{S}db^{\prime}$ ) iff ${\llbracket{y}\rrbracket^{db}}={\llbracket{y}\rrbracket^{db^{\prime}}}$ for all $y\in S$ where $y\in\mathcal{Q}$ .

Lemma 18.

For all states $db_{1}$ and $db_{2}$ and users $u$ , if $\langle c,m_{0},db_{1}\rangle\xRightarrow{t_{1}}\negthickspace_{u}$ , $\langle c,m_{0},db_{2}\rangle\xRightarrow{t_{2}}\negthickspace_{u}$ , $Q\triangleq Q(db_{1})=Q(db_{2})$ and $db_{1}\approx_{Q}db_{2}$ , then $t_{1}\negthickspace\downharpoonright_{u}=t_{2}\negthickspace\downharpoonright_% {u}$ .

We then define $k(\mathrm{QL}_{u},\mathrm{prg})$ as the equivalence relation

\{(db,db^{\prime})\in\Omega_{D}^{2}\mid Q(db)=Q(db^{\prime})\wedge(db,db^{% \prime})\in Q(db)_{\sim}\},

that is, we partition each respective subset of databases $db$ that shares one set of queries $Q(db)$ into equivalence classes according to the knowledge relation induced by $Q(db)$ .

Lemma 19.

$\llbracket\mathrm{prg}\rrbracket_{u}\sqsubseteq k(\mathrm{QL}_{u},\mathrm{prg}% )\sqsubseteq\llbracket\mathrm{QL}_{u}\rrbracket$ .

Proof.

$k(\mathrm{QL}_{u},\mathrm{prg})\sqsubseteq\llbracket\mathrm{QL}_{u}\rrbracket$ : Will in fact show that $k(\mathrm{QL}_{u},\mathrm{prg})\in\mathrm{mix}([{Q_{1}}_{\sim}],\ldots,[{Q_{n}% }_{\sim}])$ , where $\mathrm{QL}_{u}=\{Q_{1},\ldots,Q_{n}\}$ . For that, it suffices to show that every equivalence class $x\in[k(\mathrm{QL}_{u},\mathrm{prg})]$ is also an equivalence class of one of the $Q_{i}$ . Let $db\in x$ be arbitrary. Then claim that $x\in[Q(db)]$ , which suffices since by Lemma 3, $Q(db)$ is one of the $Q_{i}$ . To establish this, just need to show that $Q(db)=Q(db^{\prime})$ for all $db^{\prime}$ such that $(db,db^{\prime})\in Q(db)_{\sim}$ , so that $(db,db^{\prime})\in k(\mathrm{QL}_{u},\mathrm{prg})$ as well. But this follows from Lemma 17: if some $db^{\prime}$ has $(db,db^{\prime})\in Q(db)_{\sim}$ , then ${\llbracket{q}\rrbracket^{db}}={\llbracket{q}\rrbracket^{db^{\prime}}}$ for all $q\in Q(db)$ , but then we must not have $Q(db)\neq Q(db^{\prime})$ .

$\llbracket\mathrm{prg}\rrbracket_{u}\sqsubseteq k(\mathrm{QL}_{u},\mathrm{prg})$ : Straightforward application of Lemma 18. ∎

Appendix F Query Analysis

F-A Symbolic Tuple Ordering

To show that the symbolic tuples ordering of Def. 9 induces a determinacy order and prove Lemma 20 we first need to define the evaluation of a symbolic tuple in a database state.

Symbolic tuple evaluation. The evaluation of a symbolic tuple $\langle T,\phi,\pi\rangle$ in the database state $db$ written as ${\llbracket{\langle T,\phi,\pi\rangle}\rrbracket^{db}}$ is a $\pi$ -projection on the set of $db$ ’s tuples defined on the join of tables in $T$ that satisfy the constraint $\phi$ . Formally:

Definition 14.

Given database state $db$ and symbolic tuple $\langle T,\phi,\pi\rangle$ , ${\llbracket{\langle T,\phi,\pi\rangle}\rrbracket^{db}}$ is defined as:

\displaystyle\{tp\negthickspace\downharpoonright_{\pi}\mid tp\in\prod_{t\in T}% {\llbracket{t}\rrbracket^{db}},tp\models\phi\}

where $tp\negthickspace\downharpoonright_{\pi}$ is a tuple with its columns limited to those in $\pi$ , and $tp\models\phi$ means that tuple $tp$ satisfies formula $\phi$ .

We proceed to prove Lemma 20.

Lemma 20.

Given two sets of queries $Q_{1}$ and $Q_{2}$ , if $\mathrm{sts}(Q_{1})\sqsubseteq_{\mathrm{st}}\mathrm{sts}(Q_{2})$ then $Q_{1}\preceq Q_{2}$ .

Proof.

Assume $\ell_{Q_{1}}=\mathrm{sts}(Q_{1})$ and $\ell_{Q_{2}}=\mathrm{sts}(Q_{2})$ . By Def. 9 we want to show that if for all symbolic tuples $\langle T,\phi,\pi\rangle\in\ell_{Q_{1}}$ , there is a set of well-formed symbolic tuples $S=\langle T_{1},\phi_{1},\pi_{1}\rangle,...,\langle T_{n},\phi_{n},\pi_{n}\rangle$ such that $S\subseteq\ell_{Q_{2}}$ , $T_{1},...,T_{n}$ are disjoint, $T\subseteq(T_{1}\cup...\cup T_{n})$ , $\phi\models(\phi_{1}\wedge...\wedge\phi_{n})$ , and $\mathrm{dep}(\phi)\cup\pi\subseteq(\pi_{1}\cup...\cup\pi_{n})$ , then $Q_{1}\preceq Q_{2}$ .

We assume an intermediate symbolic tuple $\mathrm{st}_{\mathrm{itr}}$ and define it as $\langle T_{1}\cup...\cup T_{n},\phi_{1}\wedge...\wedge\phi_{n},\pi_{1}\cup...% \cup\pi_{n}\rangle$ . $\mathrm{st}_{\mathrm{itr}}$ models the symbolic tuples created from the join of $\langle T_{1},\phi_{1},\pi_{1}\rangle,...,\langle T_{n},\phi_{n},\pi_{n}\rangle$ . Additionally, $T_{1},...,T_{n}$ are disjoint, which by the definition of symbolic tuples means that $\pi_{1},...,\pi_{n}$ and the dependencies of $\phi_{1},...,\phi_{n}$ are also disjoint, effectively making $\mathrm{st}_{\mathrm{itr}}$ the symbolic tuple of the Cartesian product of tuples $\langle T_{1},\phi_{1},\pi_{1}\rangle,...,\langle T_{n},\phi_{n},\pi_{n}\rangle$ .

We want to show that the symbolic tuples in $S$ can determine $\mathrm{st}_{\mathrm{itr}}$ :

	$\displaystyle\forall db_{1},db_{2}\in\Omega_{D}.$
	$\displaystyle{\llbracket{\mathrm{st}}\rrbracket^{db_{1}}}={\llbracket{\mathrm{% st}}\rrbracket^{db_{2}}}\ \forall\mathrm{st}\in S\rightarrow{\llbracket{% \mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}={\llbracket{\mathrm{st}_{% \mathrm{itr}}}\rrbracket^{db_{2}}}$		(1)

For a specific database state $db$ , ${\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db}}$ would give us all the tuples defined on $T_{1},...,T_{n}$ satisfying $\phi_{1}\wedge...\wedge\phi_{n}$ and projected on the columns in $\pi_{1}\cup...\cup\pi_{n}$ .

Assume there is a pair of databases $db_{1},db_{2}\in\Omega_{D}$ such that ${\llbracket{\mathrm{st}}\rrbracket^{db_{1}}}={\llbracket{\mathrm{st}}% \rrbracket^{db_{2}}}\ \forall\mathrm{st}\in S$ holds but ${\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}\not={\llbracket{% \mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}}$ . By the assumption ${\llbracket{\mathrm{st}}\rrbracket^{db_{1}}}={\llbracket{\mathrm{st}}% \rrbracket^{db_{2}}}\ \forall\mathrm{st}\in S$ we know that for all $\mathrm{st}\in S$ , if tuple $tp$ is in ${\llbracket{\mathrm{st}}\rrbracket^{db_{1}}}$ it is also in ${\llbracket{\mathrm{st}}\rrbracket^{db_{2}}}$ , and vice versa.

For ${\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}\not={\llbracket{% \mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}}$ to hold, we have to consider two cases:

1.
There is a tuple $tp\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}$ such that $tp$ cannot be constructed from the tuples in set $\{tp^{\prime}\in{\llbracket{\mathrm{st}}\rrbracket^{db_{1}}}\mid\mathrm{st}\in S\}$
- -
  
  All of the symbolic tuples $\mathrm{st}\in S$ are well-formed and $T_{1},...,T_{n}$ are disjoint, which makes $\mathrm{st}_{\mathrm{itr}}$ the symbolic tuple of the Cartesian product of $S$ . This means that tuple $tp\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}$ is defined on the product of tables $T_{1},...,T_{n}$ , satisfies $\phi_{1}\wedge...\wedge\phi_{n}$ , and projected on $\pi_{1}\cup...\cup\pi_{n}$ . Which means that each tuple $tp\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}$ is constructed from the merge of tuples $tp_{1},...,tp_{n}$ where $tp_{i}\in{\llbracket{\langle T_{i},\phi_{i},\pi_{i}\rangle}\rrbracket^{db_{1}}}$ for $i=1,...,n$ . Thus, this case is not possible.
2.
There is a tuple ${\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}}$ such that $tp$ cannot be constructed from the tuple set $\{tp^{\prime}\in{\llbracket{\mathrm{st}}\rrbracket^{db_{2}}}\mid\mathrm{st}\in S\}$
- -
  
  Similar to the first case.

Next, we need to show that $\mathrm{st}_{\mathrm{itr}}$ determines $\langle T,\phi,\pi\rangle$ :

	$\displaystyle\forall db_{1},db_{2}\in\Omega_{D}$
	$\displaystyle{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}={% \llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}}\rightarrow{% \llbracket{\langle T,\phi,\pi\rangle}\rrbracket^{db_{1}}}={\llbracket{\langle T% ,\phi,\pi\rangle}\rrbracket^{db_{2}}}$		(2)

By ${\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}={\llbracket{% \mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}}$ we know $\forall tp_{1}\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}},% \exists tp_{2}\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}}$ and $tp_{1}=tp_{2}$ , and $\forall tp_{2}\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}},% \exists tp_{1}\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}$ and $tp_{2}=tp_{1}$ .

Intuitively, for a given database $db$ , ${\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db}}$ has has more columns and tuples than ${\llbracket{\langle T,\phi,\pi\rangle}\rrbracket^{db}}$ . Symbolic tuple $\langle T,\phi,\pi\rangle$ throws away some columns by limiting the resulting tuples to tables in $T$ which is a subset of $T_{1}\cup...\cup T_{n}$ and projecting on $\pi$ which is a subset of $\pi_{1}\cup...\cup\pi_{n}$ . It also eliminate some rows by applying $\phi$ to the result set, which is stronger than $\phi_{1}\wedge...\wedge\phi_{n}$ .

We need to show that applying these limitations maintains query determinacy. We consider these cases separately:

•

Columns: Projecting away some columns from the evaluation of $\mathrm{st}_{\mathrm{itr}}$ is going to maintain query determinacy. We denote by $tp\negthickspace\downharpoonright_{\pi}$ , projecting tuple $tp$ to only columns specified in $\pi$ , additionally we use the notation $\mathrm{col}(T)$ to indicate the columns of $T$ . We use the same notation for tuples and write $\mathrm{col}(tp)$ to denote the set of columns of tuple $tp$ . For a tuple $tp$ such that $tp\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}$ and $tp\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}}$ , by projecting away some columns from $tp$ we end up with a new tuple $tp^{\prime}=tp\negthickspace\downharpoonright_{\pi}$ such that $\mathrm{col}(tp^{\prime})\subseteq\mathrm{col}(tp)$ . Since $tp$ is in both ${\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}$ and ${\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}}$ , and by the definition of ordering $\pi\subseteq\pi_{1}\cup...\cup\pi_{n}$ , we can conclude $tp^{\prime}$ will also be in both ${\llbracket{\langle T_{1}\cup...\cup T_{n},\phi_{1}\wedge...\wedge\phi_{n},\pi% \rangle}\rrbracket^{db_{1}}}$ and ${\llbracket{\langle T_{1}\cup...\cup T_{n},\phi_{1}\wedge...\wedge\phi_{n},\pi% \rangle}\rrbracket^{db_{2}}}$ , this follows easily from Def. 14.
•

Rows: Removing some rows from the last step is going to maintain query determinacy. By the definition of ordering we know that $\mathrm{dep}(\phi)\cup\pi\subseteq\pi_{1}\cup...\cup\pi_{n}$ and that $\langle T_{1},\phi_{1},\pi_{1}\rangle,...,\langle T_{n},\phi_{n},\pi_{n}\rangle$ are well-formed, which means that $\phi$ only applies to the columns that were retrieved by the intermediate tuple (projected to $\pi$ ). Since $\phi$ is a stronger condition than $\phi_{1}\wedge...\wedge\phi_{n}$ , for a tuple $tp$ such that $tp\in{\llbracket{\langle T_{1}\cup...\cup T_{n},\phi_{1}\wedge...\wedge\phi_{n% },\pi\rangle}\rrbracket^{db_{1}}}$ and $tp\in{\llbracket{\langle T_{1}\cup...\cup T_{n},\phi_{1}\wedge...\wedge\phi_{n% },\pi\rangle}\rrbracket^{db_{2}}}$ , if $tp$ satisfies $\phi$ then $tp$ would also be in both ${\llbracket{\langle T_{1}\cup...\cup T_{n},\phi,\pi\rangle}\rrbracket^{db_{1}}}$ and ${\llbracket{\langle T_{1}\cup...\cup T_{n},\phi,\pi\rangle}\rrbracket^{db_{2}}}$ . Otherwise, if $tp$ is not in one of then, it is not going to be in the other one either.
•

Tables: Similar to the first case, for a tuple $tp$ such that $tp\in{\llbracket{\langle T_{1}\cup...\cup T_{n},\phi,\pi\rangle}\rrbracket^{db% _{1}}}$ and $tp\in{\llbracket{\langle T_{1}\cup...\cup T_{n},\phi,\pi\rangle}\rrbracket^{db% _{2}}}$ , by projecting away the columns of some of the tables from $tp$ we end up with a new tuple $tp^{\prime}=tp\negthickspace\downharpoonright_{\mathrm{col}(T)}$ . Since $tp$ is in both ${\llbracket{\langle T_{1}\cup...\cup T_{n},\phi_{1}\wedge...\wedge\phi_{n},\pi% \rangle}\rrbracket^{db_{1}}}$ and ${\llbracket{\langle T_{1}\cup...\cup T_{n},\phi_{1}\wedge...\wedge\phi_{n},\pi% \rangle}\rrbracket^{db_{2}}}$ , and by Def. 9 $T\subseteq T_{1}\cup...\cup T_{n}$ , we can conclude $t^{\prime}$ will also be in both ${\llbracket{\langle T,\phi,\pi\rangle}\rrbracket^{db_{1}}}$ and ${\llbracket{\langle T,\phi,\pi\rangle}\rrbracket^{db_{2}}}$ .

(1) and (2) would give us:

	$\displaystyle\forall db_{1},db_{2}\in\Omega_{D}$
	$\displaystyle{\llbracket{\mathrm{st}}\rrbracket^{db_{1}}}={\llbracket{\mathrm{% st}}\rrbracket^{db_{2}}}\ \forall\mathrm{st}\in S\rightarrow{\llbracket{% \langle T,\phi,\pi\rangle}\rrbracket^{db_{1}}}={\llbracket{\langle T,\phi,\pi% \rangle}\rrbracket^{db_{2}}}$

which allows us to conclude $\langle T_{1},\phi_{1},\pi_{1}\rangle...\langle T_{n},\phi_{n},\pi_{n}\rangle$ determines $\langle T,\phi,\pi\rangle$ .

Repeating this process for all of the symbolic tuples in $\ell_{Q_{1}}$ would give us $Q_{2}\twoheadrightarrow Q_{1}$ which means $Q_{1}\preceq Q_{2}$ . ∎

F-B Symbolic Tuple and DQ Ordering

We present the proof of Lemma 4.

See 4

Proof.

Assume $\sigma_{\mathrm{st}}(\{Q_{1},...,Q_{n}\})=\{\ell_{Q_{1}},...,\ell_{Q_{n}}\}$ and $\sigma_{\mathrm{st}}(\{P_{1},...,P_{m}\})=\{\ell_{P_{1}},...,\ell_{P_{m}}\}$ . We have $\{\ell_{Q_{1}},...,\ell_{Q_{n}}\}$ $\sqsubseteq_{*}\{\ell_{P_{1}},...,\ell_{P_{m}}\}$ .

By the definition of $\sqsubseteq_{*}$ and Lemma 20, we know that for each $Q_{i}$ in $\{Q_{1},...,Q_{n}\}$ there is at least one $P_{j}$ in $\{P_{1},...,P_{m}\}$ such that $Q_{i}\preceq P_{j}$ .

We apply $\mathrm{tc}$ to $Q_{i}$ and $P_{j}$ which would give us $\mathrm{tc}(Q_{i})\subseteq\mathrm{tc}(P_{j})$ . By applying $\mathrm{tc}$ to every element of $\{P_{1},...,P_{m}\}$ , using the basic properties of $\cup$ we will have $\mathrm{tc}(Q_{i})\subseteq\mathrm{tc}(P_{1})\cup...\cup\mathrm{tc}(P_{m})$ for all $i\in\{1,...,n\}$ .

Since the tiling closure of each $Q_{i}$ is individually less than $\mathrm{tc}(P_{1})\cup...\cup\mathrm{tc}(P_{m})$ , their union would still be less that $\mathrm{tc}(P_{1})\cup...\cup\mathrm{tc}(P_{m})$ which gives us:

\displaystyle\mathrm{tc}(Q_{1})\cup...\cup\mathrm{tc}(Q_{n})\subseteq\mathrm{% tc}(P_{1})\cup...\cup\mathrm{tc}(P_{m})

(1)

We apply the tiling closure to both sides of (1) and rely on Lemma 10 to remove the nested uses of $\mathrm{tc}$ , which would give us:

\displaystyle\mathrm{tc}(Q_{1}\cup...\cup Q_{n})\subseteq\mathrm{tc}(P_{1}\cup% ...\cup P_{m})

which by the definition of $\vee$ in the DQ would mean $(Q_{1}\vee...\vee Q_{n})\sqsubseteq(P_{1}\vee...\vee P_{m})$ ∎