Disjunctive Policies for Database-Backed Programs

Amir M. Ahmadian KTH Royal Institute of Technology
   Matvey Soloviev KTH Royal Institute of Technology
   Musard Balliu KTH Royal Institute of Technology
Abstract

When specifying security policies for databases, it is often natural to formulate disjunctive dependencies, where a piece of information may depend on at most one of two dependencies P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT or P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, but not both. A formal semantic model of such disjunctive dependencies, the Quantale of Information, was recently introduced by Hunt and Sands as a generalization of the Lattice of Information. In this paper, we seek to contribute to the understanding of disjunctive dependencies in database-backed programs and introduce a practical framework to statically enforce disjunctive security policies. To that end, we introduce the Determinacy Quantale, a new query-based structure which captures the ordering of disjunctive information in databases. This structure can be understood as a query-based counterpart to the Quantale of Information. Based on this structure, we design a sound enforcement mechanism to check disjunctive policies for database-backed programs. This mechanism is based on a type-based analysis for a simple imperative language with database queries, which is precise enough to accommodate a variety of row- and column-level database policies flexibly while kee** track of disjunctions due to control flow. We validate our mechanism by implementing it in a tool, DiVerT, and demonstrate its feasibility on a number of use cases.

I Introduction

Database security and information flow security have largely evolved as two disparate areas [1, 2], while sharing closely-related foundations and mechanisms to enforce security. Modern applications commonly rely on shared database backends to provide rich functionality to a multitude of mutually distrusting users. In response to frontend demands, database query languages, with features such as triggers, store procedures, and user-defined functions, have increasingly come to resemble full-fledged programming languages, thus calling into question the adequacy of the underlying access control models [3, 4]. A security policy describes the totality of expectations that we have of a computer system in the face of adversaries that seek to satisfy objectives that may differ from ours. In the context of database systems, whose purpose is to retain and provide information, the security policies of interest constrain who is allowed to learn what parts of that information. A class of such security policies which has proven particularly challenging to enforce with the methods of database security are disjunctive policies, which states that given two pieces of information, some entity may either learn one or the other, but not both.

A common example of disjunctive policies are databases which contain personally identifiable information, such as medical trial data. Biometric parameters of participants are important confounders that must be considered when drawing conclusions from the data, but at the same time releasing too many parameters of any one participant (such as their height, age and weight) might be sufficient to deanonymize them with high confidence [5]. Hence, a security policy for such a database may specify that the user may learn height and age, or height and weight, or age and weight, but not all three. Other examples of scenarios where disjunctive policies are useful include differential privacy [6] and secret sharing.

In this paper, we combine insights from database security and information flow research to develop a formal model for reasoning about disjunctive information in database-backed programs, and thus take a step towards reconciling the two fields. Our model makes it possible to reason about the semantic information dependencies in a program that performs queries, and compare them against a disjunctive policy. Building upon this, we propose a provably sound static enforcement mechanism that ensure that the policy is satisfied.

It is customary in information flow models to represent information as an equivalence relation on states, with the refinement order of equivalence relations corresponding to having more information. This representation can be used for both the actual information conveyed by a computational process and the bound imposed on it as part of a simple, non-disjunctive security policy. The possible equivalence relations on a given universe of states form a structure called the Lattice of Information (LoI) [7], in which security-relevant questions can be answered, such as whether a program reveals no more information than is allowed by the security policy, or what information is revealed by the combination of two programs. Similar questions have been addressed in the database community using an analogous object called the Disclosure Lattice [8]. We observe that this definition is actually insufficient to characterize information, which motivates us to introduce a more specific structure based on query determinacy, the Determinacy Lattice (DL). The formal relation between the Disclosure Lattice or our definition and LoI was hitherto unexplored, and more importantly neither of them can be used to represent disjunctions as seen in our motivating example.

Recently, Hunt and Sands [9] proposed a new information flow structure called the Quantale of Information (QoI), which seeks to address this shortcoming and establish a formal setting for representing, combining and comparing disjunctions of information. We build upon this work to introduce an analogous structure, the Determinacy Quantale (DQ), representing disjunctive dependencies in database-backed programs. As we show, this structure can be formally related to the QoI, and this relationship is analogous to that between the LoI and the DL. We then use the DQ to design a knowledge-based security condition that relates disjunctive dependencies in database-backed programs to disjunctive policies.

We are the first to address the problem of enforcing disjunctive policies. Prior works that develop language-based enforcement techniques in database-backed applications do not support disjunctive policies, while database-level dependencies are restricted to coarse approximations that incorrectly reject secure programs, such as our previous example [10, 11, 12, 13, 14].

Perhaps unsurprisingly, path sensitivity of a static analysis is key to capturing disjunctive dependencies. We show how standard flow-sensitive type-based dependency analysis [15] can be adapted to a compositional path-sensitive analysis and thus capture disjunctive dependencies in terms of database queries. To represent these dependencies in the DQ model, we introduce a sound approximation of the information disclosed by each database query which is precise enough to represent complex combinations of both row- and column-level dependencies. Finally, in the DQ, the combination of these analyses can be proven sound with respect to our security condition. We expect that the overall architecture of the resulting soundness proof, in which we relate a sequence of abstractions of the behaviour of a program to ordered elements of the DQ, can be generalized to many other enforcement mechanisms for our security condition.

To demonstrate the practicality of our approach, we implement this type-based dependency analysis and query approximation for database-backed programs and evaluate it on a test suite and some use cases which effectively illustrate the need for disjunctive dependencies and disjunctive policies.

Summary of contributions.

  • We introduce a formal model for reasoning about disjunctive dependencies and policies in databases. In the process, we show how to reconcile perspectives from the database security and information flow communities.

  • We introduce a database-specific model of knowledge, the Determinacy Lattice, and a disjunctive extension, called the Determinacy Quantale, and explore their relationship to established general-purpose semantic models.

  • Using our model, we define an extensional security condition for database-backed programs that accommodates disjunctive policies.

  • We propose a type-based program analysis to capture disjunctive dependencies in database-backed programs, combine them with a novel abstraction of queries, and prove them sound with respect to our security condition. This is presented as an instance of a generalizable architecture for such soundness proofs.

  • We implement a prototype tool that uses type-based dependency analysis and query approximation to verify query-based disjunctive policies for database-backed programs, and demonstrate its feasibility on a test suite and a number of use cases.

The rest of paper is structured as follows. After reviewing preliminaries in Section II, we give our account of the DL and introduce the DQ in Section III-C. In Section IV-B, we formalize our model of database-backed programs and the security policies we impose on them, culminating in a formal security condition. We present enforcement mechanisms in Section V, and their implementation and evaluation in Section VI. In Section VII, we contextualize our contributions with a discussion of related work, and finally summarize conclusions in Section VIII.

II Background

II-A Lattice of Information

An equivalence relation A×A{\sim}\subseteq A\times A∼ ⊆ italic_A × italic_A on a set A𝐴Aitalic_A is a binary relation that is reflexive, symmetric, and transitive. For example, the equivalence relation parity on the set A={0,1,2,3}𝐴0123A=\{0,1,2,3\}italic_A = { 0 , 1 , 2 , 3 } is defined as {(x,y)x,yAxmod 2=ymod 2}conditional-set𝑥𝑦𝑥𝑦𝐴𝑥𝑚𝑜𝑑2𝑦𝑚𝑜𝑑2\{(x,y)\mid x,y\in A\wedge x\ mod\ 2=y\ mod\ 2\}{ ( italic_x , italic_y ) ∣ italic_x , italic_y ∈ italic_A ∧ italic_x italic_m italic_o italic_d 2 = italic_y italic_m italic_o italic_d 2 }. An equivalence relation partitions its underlying domain into disjoint equivalence classes. Given an equivalence relation P𝑃Pitalic_P on a set A𝐴Aitalic_A and aA𝑎𝐴a\in Aitalic_a ∈ italic_A, [a]Psubscriptdelimited-[]𝑎𝑃[a]_{P}[ italic_a ] start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT denotes the unique equivalence class induced by P𝑃Pitalic_P that a𝑎aitalic_a belongs to. We write [P]delimited-[]𝑃[P][ italic_P ] to denote the set of all equivalence classes induced by P𝑃Pitalic_P. We call [P]delimited-[]𝑃[P][ italic_P ] a partition of A𝐴Aitalic_A and hereafter we may also refer to each element, i.e. equivalence class, of the partition [P]delimited-[]𝑃[P][ italic_P ] as a cell. For example, parity partitions A𝐴Aitalic_A into cells {0,2}02\{0,2\}{ 0 , 2 } and {1,3}13\{1,3\}{ 1 , 3 }.

Equivalence relations over states are commonly used to represent an agent’s knowledge, by relating two states whenever the agent cannot distinguish between them. When an equivalence relation models knowledge, we also call the cells induced by it knowledge sets. These have a distinct intuitive interpretation when we consider functions f𝑓fitalic_f that take in some state and return an agent’s view of it. We will write the equivalence relation induced by the output of f𝑓fitalic_f as f={(x,y)f(x)=f(y)}\sim_{f}=\{(x,y)\mid f(x)=f(y)\}∼ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = { ( italic_x , italic_y ) ∣ italic_f ( italic_x ) = italic_f ( italic_y ) }. In that case, in a state a𝑎aitalic_a, the knowledge set [a]fsubscriptdelimited-[]𝑎subscriptsimilar-to𝑓[a]_{\sim_{f}}[ italic_a ] start_POSTSUBSCRIPT ∼ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT represents the agent’s remaining uncertainty about the state, in the sense of all the states that the agent still considers possible, after observing the output of f𝑓fitalic_f. The agent knows anything that is true in all states in the knowledge set. In this paper, we use the terms knowledge and information interchangeably.

A complete lattice is a set equipped with a partial ordering (reflexive, antisymmetric, and transitive) relation, maximal and minimal elements top\top and bottom\bot for this relation and a join (least upper bound) for any subset of elements. The meet (greatest lower bound) of a subset can be defined as the join of the set of all lower bounds of that subset [16]. The Lattice of Information (LoI) [7] is a structure for representing the ordering of information with equivalence relations. Let (A)𝐴\mathcal{L}(A)caligraphic_L ( italic_A ) be the set of all equivalence relations defined on a given domain A𝐴Aitalic_A. The LoI ranks these equivalence relations based on the information they reveal about the underlying domain. Given two equivalence relations P,Q(A)𝑃𝑄𝐴P,Q\in\mathcal{L}(A)italic_P , italic_Q ∈ caligraphic_L ( italic_A ), this ordering can be defined as follows:

PQa,aA(aQaaPa)formulae-sequencesquare-image-of-or-equals𝑃𝑄for-all𝑎superscript𝑎𝐴𝑎𝑄superscript𝑎𝑎𝑃superscript𝑎\displaystyle P\sqsubseteq Q\rightarrow\forall a,a^{\prime}\in A\ \ (a\ Q\ a^{% \prime}\Rightarrow a\ P\ a^{\prime})italic_P ⊑ italic_Q → ∀ italic_a , italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_A ( italic_a italic_Q italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⇒ italic_a italic_P italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )

For any set S(A)𝑆𝐴S\subseteq\mathcal{L}(A)italic_S ⊆ caligraphic_L ( italic_A ), the least upper bound of S𝑆Sitalic_S is the equivalence relation R𝑅Ritalic_R defined as:

x,yA(xRyPS.xPy).\displaystyle\forall x,y\in A\ (x\ R\ y\leftrightarrow\forall P\in S.\ x\ P\ y).∀ italic_x , italic_y ∈ italic_A ( italic_x italic_R italic_y ↔ ∀ italic_P ∈ italic_S . italic_x italic_P italic_y ) .

Formally, LoI(A)=(A),,𝐿𝑜𝐼𝐴𝐴square-image-of-or-equalssquare-unionLoI(A)=\langle\mathcal{L}(A),\sqsubseteq,\bigsqcup\rangleitalic_L italic_o italic_I ( italic_A ) = ⟨ caligraphic_L ( italic_A ) , ⊑ , ⨆ ⟩ denotes the LoI on domain A𝐴Aitalic_A, with ordering relation square-image-of-or-equals\sqsubseteq and join square-union\bigsqcup. The top element top\top in the lattice is the most precise equivalence relation id such that id={(x,y)x,yAx=y}idconditional-set𝑥𝑦𝑥𝑦𝐴𝑥𝑦\texttt{id}=\{(x,y)\mid x,y\in A\wedge x=y\}id = { ( italic_x , italic_y ) ∣ italic_x , italic_y ∈ italic_A ∧ italic_x = italic_y }, and the bottom element bottom\bot is the least precise equivalence relation all={(x,y)x,yA}allconditional-set𝑥𝑦𝑥𝑦𝐴\texttt{all}=\{(x,y)\mid x,y\in A\}all = { ( italic_x , italic_y ) ∣ italic_x , italic_y ∈ italic_A }.

The join of any two equivalence relations PQsquare-union𝑃𝑄P\sqcup Qitalic_P ⊔ italic_Q , being their least upper bound, is the least informative equivalence relation that is at least as informative as either of P𝑃Pitalic_P and Q𝑄Qitalic_Q (i.e. is an upper bound on both), and thus represents the information that is conveyed from learning both P𝑃Pitalic_P and Q𝑄Qitalic_Q. We refer to this as the conjunction of the information in P𝑃Pitalic_P and Q𝑄Qitalic_Q.

II-B Quantale of Information

The LoI captures the conjunction of any two information sources P𝑃Pitalic_P and Q𝑄Qitalic_Q as the join of their respective equivalence relations. However, it does not offer an operator that would yield a representation of their disjunction, that is, the information that can be obtained from having access to one of them, but not both. In fact, the disjunction can not in general be represented as a single equivalence relation, and thus an element of the LoI, at all. To address this limitation, Hunt and Sands [9] propose a generalization of the LoI called the Quantale of Information (QoI). A quantale is a complete lattice with an additional binary “tensor” operator tensor-product\otimes. In the QoI, the tensor is used to represent conjunction, while the lattice join represents disjunction.

The core idea behind the quantale structure is to interpret the disjunction P1Pnsubscript𝑃1subscript𝑃𝑛P_{1}\vee\ldots\vee P_{n}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∨ … ∨ italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT of several knowledge relations as describing all knowledge relations R𝑅Ritalic_R in which the knowledge always comes from one of the Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. More concretely, in any possible state aA𝑎𝐴a\in Aitalic_a ∈ italic_A, the agent’s knowledge [a]Rsubscriptdelimited-[]𝑎𝑅[a]_{R}[ italic_a ] start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT should equal its knowledge in the same state in one of the disjuncts, [a]Pisubscriptdelimited-[]𝑎subscript𝑃𝑖[a]_{P_{i}}[ italic_a ] start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Which disjunct it is may depend on the state, so the agent may have knowledge from Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the state a𝑎aitalic_a but knowledge from Pjsubscript𝑃𝑗P_{j}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in some other state asuperscript𝑎a^{\prime}italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Relations R𝑅Ritalic_R that satisfy this condition are called tilings, based on a picture of covering (since every state needs to be in some equivalence class) the space of possible states A𝐴Aitalic_A with knowledge sets drawn from any of the disjuncts. Following Hunt and Sands, we define the set of all tilings

mix()={RLoI(A)x[R](P.x[P])},\displaystyle\mathrm{mix}(\mathbb{P})=\{R\in LoI(A)\mid x\in[R]\Rightarrow(% \exists P\in\mathbb{P}.\,x\in[P])\},roman_mix ( blackboard_P ) = { italic_R ∈ italic_L italic_o italic_I ( italic_A ) ∣ italic_x ∈ [ italic_R ] ⇒ ( ∃ italic_P ∈ blackboard_P . italic_x ∈ [ italic_P ] ) } ,

where \mathbb{P}blackboard_P is a set of equivalence relations.

We would like to think of a relation Rsuperscript𝑅R^{\prime}italic_R start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT as describing no more knowledge than a disjunction \bigvee\mathbb{P}⋁ blackboard_P if it’s bounded above by some Rmix()𝑅mixR\in\mathrm{mix}(\mathbb{P})italic_R ∈ roman_mix ( blackboard_P ) in the LoI, and more generally define the quantale ordering 𝕊𝕋square-image-of-or-equals𝕊𝕋\mathbb{S}\sqsubseteq\mathbb{T}blackboard_S ⊑ blackboard_T for 𝕊,𝕋(A)𝕊𝕋𝐴\mathbb{S},\mathbb{T}\subseteq\mathcal{L}(A)blackboard_S , blackboard_T ⊆ caligraphic_L ( italic_A ) as S𝕊,T𝕋.STformulae-sequenceformulae-sequencefor-all𝑆𝕊𝑇𝕋square-image-of-or-equals𝑆𝑇\forall S\in\mathbb{S},\ \exists T\in\mathbb{T}.\ S\sqsubseteq T∀ italic_S ∈ blackboard_S , ∃ italic_T ∈ blackboard_T . italic_S ⊑ italic_T. The resulting relation is not antisymmetric on general sets of relations or even mixmix\mathrm{mix}roman_mixes of general sets, reflecting the circumstance that there may be multiple mixmix\mathrm{mix}roman_mixes representing the same knowledge. As it is standard in lattice theory [17], we use the downwards closure operator \Downarrow to obtain canonical representations of the order cycles of square-image-of-or-equals\sqsubseteq and hence construct a partial order.

={QLoI(A)Q}absentconditional-set𝑄𝐿𝑜𝐼𝐴square-image-of-or-equals𝑄\displaystyle{\Downarrow}\mathbb{P}=\{Q\in LoI(A)\mid Q\sqsubseteq\mathbb{P}\}⇓ blackboard_P = { italic_Q ∈ italic_L italic_o italic_I ( italic_A ) ∣ italic_Q ⊑ blackboard_P }

The tiling closure of a set of equivalence relations \mathbb{P}blackboard_P,

tc()=mix(),\displaystyle\mathrm{tc}(\mathbb{P})={\Downarrow}\mathrm{mix}(\mathbb{P}),roman_tc ( blackboard_P ) = ⇓ roman_mix ( blackboard_P ) ,

then canonically represents the knowledge permitted by the disjunction \bigvee\mathbb{P}⋁ blackboard_P. The set tc()tc\mathrm{tc}(\mathbb{P})roman_tc ( blackboard_P ) can still be interpreted as a list of possible equivalence relations, now including any equivalence relation that does not reveal more information than the disjunction.

We then take the elements of the QoI on a state set A𝐴Aitalic_A to be all tiling closures of subsets of A𝐴Aitalic_A, with the ordering square-image-of-or-equals\sqsubseteq being set inclusion. For the tensor =tc({PQP,Q})tensor-producttcconditional-setsquare-union𝑃𝑄formulae-sequence𝑃𝑄\mathbb{P}\otimes\mathbb{Q}=\mathrm{tc}(\{P\sqcup Q\mid P\in\mathbb{P},Q\in% \mathbb{Q}\})blackboard_P ⊗ blackboard_Q = roman_tc ( { italic_P ⊔ italic_Q ∣ italic_P ∈ blackboard_P , italic_Q ∈ blackboard_Q } ), we rely on the join operator of the LoI square-union\sqcup to calculate the least upper bound of any possible pair of equivalence relations in \mathbb{P}blackboard_P and \mathbb{Q}blackboard_Q and then canonicalise the result. Since the sets are interpreted disjunctively, the join iisubscript𝑖subscript𝑖\bigvee_{i}\mathbb{P}_{i}⋁ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT blackboard_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can simply be defined as tc(ii)tcsubscript𝑖subscript𝑖\mathrm{tc}(\bigcup_{i}\mathbb{P}_{i})roman_tc ( ⋃ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT blackboard_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).

Example 1.

Program 1 operates on a secret integer x𝑥xitalic_x between -2 and 3, outputting to user u𝑢uitalic_u whether it is greater than zero, and either (if it isn’t) whether it is even, or (if it is) whether it equals 0 or 1 (by dividing by 2, rounding down and testing for 0). We expect the information released by the program (prgsubscriptsimilar-toprg\sim_{\mathrm{\mathrm{prg}}}∼ start_POSTSUBSCRIPT roman_prg end_POSTSUBSCRIPT in Fig. 1) to be bounded by the disjunction of the knowledge relations capturing the two possible branches (resp. Q𝑄Qitalic_Q, P𝑃Pitalic_P).

1if (x <= 0) then
2 out(-1 ,u);
3 out(x mod 2 == 0, u);
4else
5 out(1, u);
6 out(x div 2 == 0, u);
Program 1:

This could not be accurately expressed with LoI operations, since Q𝑄Qitalic_Q, P𝑃Pitalic_P and prgsubscriptsimilar-toprg\sim_{\mathrm{\mathrm{prg}}}∼ start_POSTSUBSCRIPT roman_prg end_POSTSUBSCRIPT are all incomparable, but the join of Q𝑄Qitalic_Q and P𝑃Pitalic_P (as the only available nontrivial way of combining them) is equal to top\top and so would equally bound a program that directly releases x𝑥xitalic_x. However, prgsubscriptsimilar-toprg\sim_{\mathrm{\mathrm{prg}}}∼ start_POSTSUBSCRIPT roman_prg end_POSTSUBSCRIPT can be tiled with equivalence classes from Q𝑄Qitalic_Q and P𝑃Pitalic_P, and we in fact have mix({Q,P})={Q,P,R,prg}mix𝑄𝑃𝑄𝑃𝑅subscriptsimilar-toprg\mathrm{mix}(\{Q,P\})=\{Q,P,R,\sim_{\mathrm{\mathrm{prg}}}\}roman_mix ( { italic_Q , italic_P } ) = { italic_Q , italic_P , italic_R , ∼ start_POSTSUBSCRIPT roman_prg end_POSTSUBSCRIPT }. So in the QoI, tc({prg})tc({Q,P})square-image-of-or-equalstcsubscriptsimilar-toprgtc𝑄𝑃\mathrm{tc}(\{\sim_{\mathrm{\mathrm{prg}}}\})\sqsubseteq\mathrm{tc}(\{Q,P\})roman_tc ( { ∼ start_POSTSUBSCRIPT roman_prg end_POSTSUBSCRIPT } ) ⊑ roman_tc ( { italic_Q , italic_P } ), and hence prgQP{\sim_{\mathrm{\mathrm{prg}}}}\sqsubseteq Q\vee P∼ start_POSTSUBSCRIPT roman_prg end_POSTSUBSCRIPT ⊑ italic_Q ∨ italic_P.

-2 -1
0 1
2 3
allall\mathrm{all}roman_all
-2 -1
0 1
2 3
Q𝑄Qitalic_Q
-2 -1
0 1
2 3
P𝑃Pitalic_P
-2 -1
0 1
2 3
prgsubscriptsimilar-toprg\sim_{\mathrm{\mathrm{prg}}}∼ start_POSTSUBSCRIPT roman_prg end_POSTSUBSCRIPT
-2 -1
0 1
2 3
R𝑅Ritalic_R
Figure 1: Some equivalence relations on {2,1,0,1,2,3}210123\{-2,-1,0,1,2,3\}{ - 2 , - 1 , 0 , 1 , 2 , 3 }

III Information Ordering in Databases

Our goal is to introduce our semantic model for the information revealed by database queries, the Determinacy Lattice, and its extension to disjunctive dependencies, the Determinacy Quantale. To this end, we first review a standard formalism for reasoning about databases that we will employ.

III-A A Primer on Relational Database Models

We use the relational model to formally define databases [18]. In this model, we distinguish between the database schema D𝐷Ditalic_D, which specifies the structure of the database, and the database state db𝑑𝑏dbitalic_d italic_b, which specifies its actual content.

A database schema D𝐷Ditalic_D is a (nonempty) finite set of relation schemas t𝑡titalic_t, written as D={t1,,tn}𝐷subscript𝑡1subscript𝑡𝑛D=\{t_{1},...,t_{n}\}italic_D = { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }. A relation schema (table) t𝑡titalic_t is defined as a set of attributes paired with a set of constraints, where an attribute is a name paired with a domain. The number of attributes in t𝑡titalic_t (written as |t|𝑡|t|| italic_t |) is referred to as its arity. A tuple is a set of data representing a single record within a relation schema. Each tuple contains values for each attribute defined in the relation schema.

A database state db𝑑𝑏dbitalic_d italic_b is a snapshot of the database schema D𝐷Ditalic_D at a particular point in time. It represents the actual data stored in the database, consisting of a collection of tables and their respective tuples. We write tdb{\llbracket{t}\rrbracket^{db}}⟦ italic_t ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT to represent the tuples of table t𝑡titalic_t under database state db𝑑𝑏dbitalic_d italic_b.

We write states(D)states𝐷\mathrm{states}(D)roman_states ( italic_D ) to denote the set of all database states of D𝐷Ditalic_D. A database configuration is D,Γ𝐷Γ\langle D,\Gamma\rangle⟨ italic_D , roman_Γ ⟩ where D𝐷Ditalic_D is the database schema and ΓΓ\Gammaroman_Γ is a set of integrity constraints. We denote ΩD={dbdbstates(D)db:Γ}subscriptΩ𝐷conditional-set𝑑𝑏proves𝑑𝑏limit-fromstates𝐷𝑑𝑏:Γ\Omega_{D}=\{db\mid db\in\mathrm{states}(D)\ \wedge\vdash db:\Gamma\}roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT = { italic_d italic_b ∣ italic_d italic_b ∈ roman_states ( italic_D ) ∧ ⊢ italic_d italic_b : roman_Γ } where proves\vdash is an appropriate notion of constraint ΓΓ\Gammaroman_Γ being satisfied. An integrity constraint is an assertion about a database that must be satisfied for a database state to be considered valid. Various classes of integrity constraints exist, for instance functional dependencies which capture primary-key constraints, and inclusion dependencies which are used in foreign-key constraints [18].

Relational calculus. We rely on the Domain Relational Calculus (DRC) for our query language. In the DRC, a (non-boolean) query q𝑞qitalic_q over a database schema D𝐷Ditalic_D has the form {x¯ϕ}conditional-set¯𝑥italic-ϕ\{\overline{x}\mid\phi\}{ over¯ start_ARG italic_x end_ARG ∣ italic_ϕ }, where x¯¯𝑥\overline{x}over¯ start_ARG italic_x end_ARG is a sequence of variables, ϕitalic-ϕ\phiitalic_ϕ is a first order formula over D𝐷Ditalic_D, and the free variables of ϕitalic-ϕ\phiitalic_ϕ are those in x¯¯𝑥\overline{x}over¯ start_ARG italic_x end_ARG. The evaluation of a query q𝑞qitalic_q, denoted by qdb{\llbracket{q}\rrbracket^{db}}⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT, is the set of tuples that satisfy the formula ϕitalic-ϕ\phiitalic_ϕ with respect to db𝑑𝑏dbitalic_d italic_b. A boolean query is written as {ϕ}\{\ \mid{\phi}\}{ ∣ italic_ϕ }, and its evaluation qdb{\llbracket{q}\rrbracket^{db}}⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT is defined to be the boolean value 𝗍𝗋𝗎𝖾𝗍𝗋𝗎𝖾\mathsf{true}sansserif_true if and only if some tuple in db𝑑𝑏dbitalic_d italic_b satisfies ϕitalic-ϕ\phiitalic_ϕ. We use 𝒬𝒬\mathcal{Q}caligraphic_Q to indicate the universe of all possible queries.

The domain relational calculus employed here follows the standard convention, and we refer the reader to the relevant literature for a more comprehensive description of DRC [18].

emp::empabsent\displaystyle\mathrm{emp}:\ roman_emp :
name role salary
mng::mngabsent\displaystyle\mathrm{mng}:\ roman_mng :
division manager
Figure 2: Database schema for employees and managers
Example 2.

The database schema in Fig. 2 contains relations for employees empemp\mathrm{emp}roman_emp and managers mngmng\mathrm{mng}roman_mng. A query returning the set of tuples containing the division names and the salary of the managers of each division can be written as:

{(d,s)\displaystyle\{(d,s)\mid\ { ( italic_d , italic_s ) ∣ n,r.emp(n,r,s)m.mng(d,m)n=m}.\displaystyle\exists n,r.\ \mathrm{emp}(n,r,s)\wedge\exists m.\ \mathrm{mng}(d% ,m)\wedge n=m\}.∃ italic_n , italic_r . roman_emp ( italic_n , italic_r , italic_s ) ∧ ∃ italic_m . roman_mng ( italic_d , italic_m ) ∧ italic_n = italic_m } .

Views. In DRC, a database view is a relation defined by the result of a non-boolean query. Database views act as virtual tables and, as we will see, are useful when defining security policies. Formally, a view v𝑣vitalic_v defined over database schema D𝐷Ditalic_D is a tuple id,q𝑖𝑑𝑞\langle id,q\rangle⟨ italic_i italic_d , italic_q ⟩, where id𝑖𝑑iditalic_i italic_d is the view identifier and q𝑞qitalic_q is the non-boolean query over schema D𝐷Ditalic_D defining the view. The query q𝑞qitalic_q may refer to other views, but we assume that views do not have cyclic dependencies.

The materialization of a view v𝑣vitalic_v in a database state db𝑑𝑏dbitalic_d italic_b is the evaluation of its defining query q𝑞qitalic_q in that state, i.e., qdb{\llbracket{q}\rrbracket^{db}}⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT. We use v.qformulae-sequence𝑣𝑞v.qitalic_v . italic_q to refer to the defining query of view v𝑣vitalic_v. We extend relational calculus in the standard way to work with views [3].

III-B Determinacy Lattice

Given query sets Q,Q𝒫(𝒬)𝑄superscript𝑄𝒫𝒬Q,Q^{\prime}\in\mathcal{P}{(\mathcal{Q})}italic_Q , italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_P ( caligraphic_Q ), query determinacy [19] captures whether results of the queries in Q𝑄Qitalic_Q are always sufficient to determine the result of the queries in Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

Definition 1.

Q𝑄Qitalic_Q determines Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT (denoted by QQ𝑄superscript𝑄Q\twoheadrightarrow Q^{\prime}italic_Q ↠ italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT) iff for all database states db1𝑑subscript𝑏1db_{1}italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, db2𝑑subscript𝑏2db_{2}italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, if qdb1{\llbracket{q}\rrbracket^{db_{1}}}⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = qdb2{\llbracket{q}\rrbracket^{db_{2}}}⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT for all qQ𝑞𝑄q\in Qitalic_q ∈ italic_Q, then qdb1{\llbracket{q^{\prime}}\rrbracket^{db_{1}}}⟦ italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = qdb2{\llbracket{q^{\prime}}\rrbracket^{db_{2}}}⟦ italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT for all qQsuperscript𝑞superscript𝑄q^{\prime}\in Q^{\prime}italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

Intuitively, QQ𝑄superscript𝑄Q\twoheadrightarrow Q^{\prime}italic_Q ↠ italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT means that pairs of databases for which all queries in Q𝑄Qitalic_Q return the same result also give the same result under any query in Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. This is in fact equivalent to the initial gloss that the results of queries in Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT can be computed from the results of queries in Q𝑄Qitalic_Q, as we show in detail in Appendix A.

Query determinacy allows us to define an ordering on sets of queries based on the information they reveal. We call this ordering determinacy order, denote it by precedes-or-equals\preceq, and define it as Q,Q𝒫(𝒬)for-all𝑄superscript𝑄𝒫𝒬\forall Q,Q^{\prime}\in\mathcal{P}{(\mathcal{Q})}∀ italic_Q , italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_P ( caligraphic_Q ), QQprecedes-or-equals𝑄superscript𝑄Q\preceq Q^{\prime}italic_Q ⪯ italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT iff QQsuperscript𝑄𝑄Q^{\prime}\twoheadrightarrow Qitalic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ↠ italic_Q.

Example 3.

Consider queries q1={(n,r)s.emp(n,r,s)}subscript𝑞1conditional-set𝑛𝑟formulae-sequence𝑠emp𝑛𝑟𝑠q_{1}=\{(n,r)\mid\exists s.\ \mathrm{emp}(n,r,s)\}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { ( italic_n , italic_r ) ∣ ∃ italic_s . roman_emp ( italic_n , italic_r , italic_s ) } and q2={(r)n,s.emp(n,r,s)}subscript𝑞2conditional-set𝑟formulae-sequence𝑛𝑠emp𝑛𝑟𝑠q_{2}=\{(r)\mid\exists n,s.\ \mathrm{emp}(n,r,s)\}italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { ( italic_r ) ∣ ∃ italic_n , italic_s . roman_emp ( italic_n , italic_r , italic_s ) } defined on the relations of Fig. 2. Query q1subscript𝑞1q_{1}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT discloses the namename\mathrm{name}roman_name and the rolerole\mathrm{role}roman_role of the employees while q2subscript𝑞2q_{2}italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT only returns their rolerole\mathrm{role}roman_role. Intuitively, q1subscript𝑞1q_{1}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT reveals more information than q2subscript𝑞2q_{2}italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, which means q2q1precedes-or-equalssubscript𝑞2subscript𝑞1q_{2}\preceq q_{1}italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⪯ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

This definition of determinacy order is a preorder (reflexive and transitive), but not necessarily a partial order, as it is not anti-symmetric. In other words, q1q2precedes-or-equalssubscript𝑞1subscript𝑞2q_{1}\preceq q_{2}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⪯ italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and q2q1precedes-or-equalssubscript𝑞2subscript𝑞1q_{2}\preceq q_{1}italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⪯ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT does not necessarily mean that q1=q2subscript𝑞1subscript𝑞2q_{1}=q_{2}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. As in Section II-A, this essentially means that query sets are not canonical representations of the information revealed by them. To rectify this, we form the closure {\downarrow} under the determinacy order, so the determinacy order becomes set inclusion. Intuitively, Qabsent𝑄{\downarrow}Q↓ italic_Q will contain all the queries in 𝒬𝒬\mathcal{Q}caligraphic_Q whose answers can be inferred by the set of queries Q𝑄Qitalic_Q. Formally, Qabsent𝑄{\downarrow}Q↓ italic_Q is defined as:

Q={q𝒬{q}Q}absent𝑄conditional-set𝑞𝒬precedes-or-equals𝑞𝑄\displaystyle{\downarrow}Q=\{q\in\mathcal{Q}\mid\{q\}\preceq Q\}↓ italic_Q = { italic_q ∈ caligraphic_Q ∣ { italic_q } ⪯ italic_Q }

Using the definitions of determinacy order and closure {\downarrow}, we can then define the Determinacy Lattice as follows:

Definition 2.

Given a universe of queries 𝒬𝒬\mathcal{Q}caligraphic_Q, the Determinacy Lattice DL(𝒬)𝐷𝐿𝒬DL(\mathcal{Q})italic_D italic_L ( caligraphic_Q ) is a complete lattice ,,,,square-image-of-or-equalssquare-unionbottomtop\langle\mathcal{L},\sqsubseteq,\bigsqcup,\bot,\top\rangle⟨ caligraphic_L , ⊑ , ⨆ , ⊥ , ⊤ ⟩ such that:

  • ={QQ𝒬}\mathcal{L}=\{{\downarrow}Q\mid Q\subseteq\mathcal{Q}\}caligraphic_L = { ↓ italic_Q ∣ italic_Q ⊆ caligraphic_Q }

  • Q1Q2{\downarrow}Q_{1}\sqsubseteq{\downarrow}Q_{2}↓ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊑ ↓ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT iff Q1Q2precedes-or-equalssubscript𝑄1subscript𝑄2Q_{1}\preceq Q_{2}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⪯ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

  • iQi=iQi\bigsqcup_{i}{\downarrow}Q_{i}={\downarrow}\bigcup_{i}Q_{i}⨆ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ↓ italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ↓ ⋃ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

  • =\bot={\downarrow}\varnothing⊥ = ↓ ∅, =𝒬\top={\downarrow}\mathcal{Q}⊤ = ↓ caligraphic_Q,

where precedes-or-equals\preceq is the determinacy order on 𝒬𝒬\mathcal{Q}caligraphic_Q.

Disclosure order and information flow properties. Our definition of the Determinacy Lattice is similar to the definition of the Disclosure Lattice introduced by Bender et al. [8]. A Disclosure Lattice is a lattice built upon a disclosure order, which is a partial order on sets of queries satisfying additional conditions that are expected of an ordering according to the amount of information disclosed by each set of queries. Bender et al. [8] define the disclosure order as follows:

Definition 3.

Given a universe of queries 𝒬𝒬\mathcal{Q}caligraphic_Q, a disclosure order precedes-or-equals\preceq is a preorder on 𝒫(𝒬)𝒫𝒬\mathcal{P}{(\mathcal{Q})}caligraphic_P ( caligraphic_Q ) that satisfies the following properties:

  1. 1.

    For all Q1,Q2𝒫(𝒬)subscript𝑄1subscript𝑄2𝒫𝒬Q_{1},Q_{2}\in\mathcal{P}{(\mathcal{Q})}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ caligraphic_P ( caligraphic_Q ), if Q1Q2subscript𝑄1subscript𝑄2Q_{1}\subseteq Q_{2}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊆ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT then Q1Q2precedes-or-equalssubscript𝑄1subscript𝑄2Q_{1}\preceq Q_{2}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⪯ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

  2. 2.

    If 𝒫(𝒬)𝒫𝒬\mathbb{P}\subseteq\mathcal{P}{(\mathcal{Q})}blackboard_P ⊆ caligraphic_P ( caligraphic_Q ) and P,PQformulae-sequencefor-all𝑃precedes-or-equals𝑃𝑄\forall P\in\mathbb{P},\ P\preceq Q∀ italic_P ∈ blackboard_P , italic_P ⪯ italic_Q then Qprecedes-or-equals𝑄\bigcup\mathbb{P}\preceq Q⋃ blackboard_P ⪯ italic_Q

The first property in this definition ensures that adding new elements to a set of queries only increases the amount of disclosed information and the second property allows us to derive a meaningful upper bound on the information disclosure.

The intended use of disclosure order was to order sets of queries based on the amount of information they reveal about the underlying database. However, we make the observation that this definition is not specific enough to characterize information disclosure in the information flow sense. For example, consider query containment [18], defined as:

Definition 4.

Given queries q1,q2𝒬subscript𝑞1subscript𝑞2𝒬q_{1},q_{2}\in\mathcal{Q}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ caligraphic_Q, we say that q1subscript𝑞1q_{1}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is contained in q2subscript𝑞2q_{2}italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, denoted by q1q2subscript𝑞1subscript𝑞2q_{1}\subseteq q_{2}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊆ italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, if for every database states dbΩD𝑑𝑏subscriptΩ𝐷db\in\Omega_{D}italic_d italic_b ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT, we have q1dbq2db{\llbracket{q_{1}}\rrbracket^{db}}\subseteq{\llbracket{q_{2}}\rrbracket^{db}}⟦ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT ⊆ ⟦ italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT.

Query containment satisfies all of the requirements of a disclosure order (Def. 3), but it is not enough to guarantee security. To illustrate this, consider a database with a single table t𝑡titalic_t given in Fig. 3.

vl𝑣𝑙vlitalic_v italic_l
00
1111
100+s100𝑠100+s100 + italic_s
Figure 3: Table t𝑡titalic_t

Table t𝑡titalic_t has a single column vl𝑣𝑙vlitalic_v italic_l, and contains values 00, 1111, and 100+s100𝑠100+s100 + italic_s, where s𝑠sitalic_s is a secret value that can be either 00 or 1111. We thus consider two possible instances of this database, one where t𝑡titalic_t contains values 00, 1111, and 100100100100 and another where it contains 00, 1111, and 101101101101. Now, consider the following queries:

q1:{(vl1)vl2.t1(vl1)t2(vl2)\displaystyle q_{1}:\{(vl_{1})\mid\exists vl_{2}.\ t1(vl_{1})\wedge t_{2}(vl_{% 2})italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : { ( italic_v italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∣ ∃ italic_v italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . italic_t 1 ( italic_v italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_v italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) vl1<100}\displaystyle\wedge vl_{1}<100\}∧ italic_v italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < 100 }
q2:{(vl1)vl2.t1(vl1)t2(vl2)\displaystyle q_{2}:\{(vl_{1})\mid\exists vl_{2}.\ t1(vl_{1})\wedge t_{2}(vl_{% 2})italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : { ( italic_v italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∣ ∃ italic_v italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . italic_t 1 ( italic_v italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_v italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) vl1<100𝑣subscript𝑙1100\displaystyle\wedge vl_{1}<100∧ italic_v italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < 100
vl1=vl2100}\displaystyle\wedge vl_{1}=vl_{2}-100\}∧ italic_v italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_v italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 100 }

where t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are just logical copies of table t𝑡titalic_t. It is common practice to make logical copies of relation and use them in queries with self-joins [20]. The result of query q1subscript𝑞1q_{1}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is always 00 and 1111. The result of query q1subscript𝑞1q_{1}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is 1111 if the secret s𝑠sitalic_s is 1111 and 00 if s𝑠sitalic_s is 00. As it is evident, for these queries, query containment holds and the result of query q2subscript𝑞2q_{2}italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is contained in the results of q1subscript𝑞1q_{1}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. However, an observer seeing the result of query q2subscript𝑞2q_{2}italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT can learn the value of secret s𝑠sitalic_s.

This example illustrates that query containment (a disclosure order) is not sufficient to guarantee the confidentiality of the secret s𝑠sitalic_s in an information flow setting. To ensure information flow security, we require a stronger condition, such as the notion of query determinacy order (Def. 1) that we chose to rely on in this paper.

Relation between the DL and the LoI. There exists a close relationship between the DL and the LoI. Specifically, a query q𝑞qitalic_q defined over a database schema D𝐷Ditalic_D induces an equivalence relation qsubscript𝑞similar-to{q}_{\sim}italic_q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT on database states db𝑑𝑏dbitalic_d italic_b. We can formally define this equivalence relation as:

q={(db1,db2)db1,db2ΩDqdb1=qdb2}\displaystyle{q}_{\sim}=\{(db_{1},db_{2})\mid db_{1},db_{2}\in\Omega_{D}\wedge% {\llbracket{q}\rrbracket^{db_{1}}}={\llbracket{q}\rrbracket^{db_{2}}}\}italic_q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT = { ( italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∣ italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ∧ ⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = ⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT }

We write [q]delimited-[]subscript𝑞similar-to[{q}_{\sim}][ italic_q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ] to denote the set of all equivalence classes induced by q𝑞qitalic_q. Given an equivalence relation qsubscript𝑞similar-to{q}_{\sim}italic_q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT on set ΩDsubscriptΩ𝐷\Omega_{D}roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT and dbΩD𝑑𝑏subscriptΩ𝐷db\in\Omega_{D}italic_d italic_b ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT, [db]qsubscriptdelimited-[]𝑑𝑏subscript𝑞similar-to[db]_{{q}_{\sim}}[ italic_d italic_b ] start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT end_POSTSUBSCRIPT denotes the equivalence class induced by qsubscript𝑞similar-to{q}_{\sim}italic_q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT to which the database state db𝑑𝑏dbitalic_d italic_b belongs. We further lift this definition to sets of queries Q={q1,q2,,qn}𝑄subscript𝑞1subscript𝑞2subscript𝑞𝑛Q=\{q_{1},q_{2},...,q_{n}\}italic_Q = { italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }:

Q={(db1,db2)db1,db2ΩD\displaystyle{Q}_{\sim}=\{(db_{1},db_{2})\mid db_{1},db_{2}\in\Omega_{D}italic_Q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT = { ( italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∣ italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT 1inqidb1=qidb2}\displaystyle\bigwedge_{1\leq i\leq n}{\llbracket{q_{i}}\rrbracket^{db_{1}}}={% \llbracket{q_{i}}\rrbracket^{db_{2}}}\}⋀ start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n end_POSTSUBSCRIPT ⟦ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = ⟦ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT }

This interpretation of database queries as equivalence relations provides a direct connection between the DL and the LoI, where the lattice elements correspond to Qsubscript𝑄similar-to{Q}_{\sim}italic_Q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT, the ordering square-image-of-or-equals\sqsubseteq to the determinacy order precedes-or-equals\preceq, and join and meet follow the definitions of the DL.

Lemma 1.

For all 𝒬𝒬\mathcal{Q}caligraphic_Q, there is a complete lattice homomorphism from the Determinacy Lattice DL(𝒬)𝐷𝐿𝒬DL(\mathcal{Q})italic_D italic_L ( caligraphic_Q ) to the Lattice of Information defined on {QQDL(𝒬)}conditional-setsubscript𝑄similar-to𝑄𝐷𝐿𝒬\{{Q}_{\sim}\mid Q\in DL(\mathcal{Q})\}{ italic_Q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ∣ italic_Q ∈ italic_D italic_L ( caligraphic_Q ) }.

We prove this Lemma in Appendix B. To the extent that we believe Qsubscript𝑄similar-to{Q}_{\sim}italic_Q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT to accurately represent the information conveyed by the queries in Q𝑄Qitalic_Q, this lemma implies that joins and order comparisons can be performed in the DL without explicit reference to the LoI.

III-C Determinacy Quantale

We introduce a generalization of the Determinacy Lattice, called the Determinacy Quantale (DQ), to represent disjunctive dependencies. Our definition of the DQ is intended as a counterpart to the QoI [9], analogously to how the DL corresponds to the LoI. To achieve this, we define a query-set counterpart of the tiling closure operator to capture the disjunction of sets of queries. Since sets of queries correspond to LoI elements (equivalence relations), disjunctive QoI elements (sets of equivalence relations) will be represented as sets of sets of queries. Each set of queries in the outer set represents a possible combination of queries that does not reveal more information than is allowed by the disjunction.

Analogously to the QoI, the tiling closure of a set of sets of queries is defined by forming the downward closure under square-image-of-or-equals\sqsubseteq (from the DL) of their mix. The query-set equivalent of the mix operator is defined on a set of sets of queries ={Q1,,Qn}subscript𝑄1subscript𝑄𝑛\mathbb{Q}=\{Q_{1},...,Q_{n}\}blackboard_Q = { italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } such that QiDL(𝒬)subscript𝑄𝑖𝐷𝐿𝒬Q_{i}\in DL(\mathcal{Q})italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_D italic_L ( caligraphic_Q ) for i=1,n𝑖1𝑛\ i=1,...nitalic_i = 1 , … italic_n as follows:

mix()={PDL(𝒬)x[P](Q.x[Q])}\displaystyle\mathrm{mix}(\mathbb{Q})=\{P\in DL(\mathcal{Q})\mid x\in[{P}_{% \sim}]\Rightarrow(\exists Q\in\mathbb{Q}.x\in[{Q}_{\sim}])\}roman_mix ( blackboard_Q ) = { italic_P ∈ italic_D italic_L ( caligraphic_Q ) ∣ italic_x ∈ [ italic_P start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ] ⇒ ( ∃ italic_Q ∈ blackboard_Q . italic_x ∈ [ italic_Q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ] ) }

where [Q]delimited-[]subscript𝑄similar-to[{Q}_{\sim}][ italic_Q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ] denotes the equivalence classes of Q𝑄Qitalic_Q as defined previously. We then define the tiling closure for a set \mathbb{Q}blackboard_Q of elements of the DL as tc()=mix()\mathrm{tc}(\mathbb{Q})={\Downarrow}\mathrm{mix}(\mathbb{Q})roman_tc ( blackboard_Q ) = ⇓ roman_mix ( blackboard_Q ).

We then formally define the Determinacy Quantale DQ(𝒬)𝐷𝑄𝒬DQ(\mathcal{Q})italic_D italic_Q ( caligraphic_Q ) as follows.

Definition 5.

Given a universe of queries 𝒬𝒬\mathcal{Q}caligraphic_Q, let DL(𝒬)𝐷𝐿𝒬DL(\mathcal{Q})italic_D italic_L ( caligraphic_Q ) be the Determinacy Lattice defined on 𝒬𝒬\mathcal{Q}caligraphic_Q. The Determinacy Quantale DQ(𝒬)𝐷𝑄𝒬DQ(\mathcal{Q})italic_D italic_Q ( caligraphic_Q ) is the quantale ,,,,1square-image-of-or-equalstensor-product1\langle\mathcal{I},\sqsubseteq,\bigvee,\otimes,1\rangle⟨ caligraphic_I , ⊑ , ⋁ , ⊗ , 1 ⟩, with:

  • ={tc()DL(𝒬)}conditional-settc𝐷𝐿𝒬\mathcal{I}=\{\mathrm{tc}(\mathbb{Q})\mid\mathbb{Q}\subseteq DL(\mathcal{Q})\}caligraphic_I = { roman_tc ( blackboard_Q ) ∣ blackboard_Q ⊆ italic_D italic_L ( caligraphic_Q ) }

  • ii=tc(ii)subscript𝑖subscript𝑖tcsubscript𝑖subscript𝑖\bigvee_{i}\mathbb{P}_{i}=\mathrm{tc}(\bigcup_{i}\mathbb{P}_{i})⋁ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT blackboard_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_tc ( ⋃ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT blackboard_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )

  • =tc(P,Q(PQ))tensor-producttcsubscriptformulae-sequence𝑃𝑄square-union𝑃𝑄\mathbb{P}\otimes\mathbb{Q}=\mathrm{tc}\Big{(}\bigcup_{P\in\mathbb{P},Q\in% \mathbb{Q}}(P\sqcup Q)\Big{)}blackboard_P ⊗ blackboard_Q = roman_tc ( ⋃ start_POSTSUBSCRIPT italic_P ∈ blackboard_P , italic_Q ∈ blackboard_Q end_POSTSUBSCRIPT ( italic_P ⊔ italic_Q ) )

  • =square-image-of-or-equals\sqsubseteq=\subseteq⊑ = ⊆

  • =DL(𝒬)\top=DL(\mathcal{Q})⊤ = italic_D italic_L ( caligraphic_Q ), =\bot=\varnothing⊥ = ∅, 1=11=\varnothing1 = ∅,

where ,DL(𝒬)𝐷𝐿𝒬\mathbb{P},\mathbb{Q}\subseteq DL(\mathcal{Q})blackboard_P , blackboard_Q ⊆ italic_D italic_L ( caligraphic_Q ).

In Appendix C we show that Def. 5 satisfies the usual quantale axioms [9]. As with the DL and LoI, the DQ embeds into a QoI by a quantale homomorphism. This QoI is defined on sets of equivalence relations derived from sets of sets of queries by the following map:

Definition 6.

Given a set of sets of queries \mathbb{Q}blackboard_Q,

={QQ}.\llbracket{\mathbb{Q}}\rrbracket=\{{Q}_{\sim}\mid Q\in\mathbb{Q}\}.⟦ blackboard_Q ⟧ = { italic_Q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ∣ italic_Q ∈ blackboard_Q } .

We can then formally state the relationship between the DQ and this quantale as follows.

Lemma 2.

For all 𝒬𝒬\mathcal{Q}caligraphic_Q, there is a quantale homomorphism from the Determinacy Quantale DQ(𝒬)𝐷𝑄𝒬DQ(\mathcal{Q})italic_D italic_Q ( caligraphic_Q ) to the Quantale of Information defined on {DL(𝒬)}\{\llbracket{\mathbb{Q}}\rrbracket\mid\mathbb{Q}\subseteq DL(\mathcal{Q})\}{ ⟦ blackboard_Q ⟧ ∣ blackboard_Q ⊆ italic_D italic_L ( caligraphic_Q ) }.

The proof of Lemma 2 is presented in Appendix D.

Example 4.

To illustrate the Determinacy Quantale in practice, consider Program 2, which issues either query q1={(r,vl)s,n.emp(n,r,s)r=Internvl=s}𝑞1conditional-set𝑟𝑣𝑙formulae-sequence𝑠𝑛emp𝑛𝑟𝑠𝑟Intern𝑣𝑙𝑠q1=\{(r,vl)\mid\exists s,n.\,\mathrm{emp}(n,r,s)\wedge r=\mathrm{Intern}\wedge vl% =s\}italic_q 1 = { ( italic_r , italic_v italic_l ) ∣ ∃ italic_s , italic_n . roman_emp ( italic_n , italic_r , italic_s ) ∧ italic_r = roman_Intern ∧ italic_v italic_l = italic_s } or q2={(r,vl)s,n.emp(n,r,s)r=CEOvl=n)}q2=\{(r,vl)\mid\exists s,n.\,\mathrm{emp}(n,r,s)\wedge r=\mathrm{CEO}\wedge vl% =n)\}italic_q 2 = { ( italic_r , italic_v italic_l ) ∣ ∃ italic_s , italic_n . roman_emp ( italic_n , italic_r , italic_s ) ∧ italic_r = roman_CEO ∧ italic_v italic_l = italic_n ) } to the database. Query q1 returns the rolerole\mathrm{role}roman_role and salarysalary\mathrm{salary}roman_salary columns of the entry in table empemp\mathrm{emp}roman_emp if the role of that entry is InternIntern\mathrm{Intern}roman_Intern. Similarly, query q2 returns the rolerole\mathrm{role}roman_role and namename\mathrm{name}roman_name columns if the role of the entry in empemp\mathrm{emp}roman_emp is CEOCEO\mathrm{CEO}roman_CEO.

1if (y > 0) then
2 x \leftarrow q1
3else
4 x \leftarrow q2
5out(x, u);
Program 2:

Consider a policy defined on queries v1={(r,n)s.emp(n,r,s)}𝑣1conditional-set𝑟𝑛formulae-sequence𝑠emp𝑛𝑟𝑠v1=\{(r,n)\mid\exists s.\,\mathrm{emp}(n,r,s)\}italic_v 1 = { ( italic_r , italic_n ) ∣ ∃ italic_s . roman_emp ( italic_n , italic_r , italic_s ) } and v2={(r,s)n.emp(n,r,s)}𝑣2conditional-set𝑟𝑠formulae-sequence𝑛emp𝑛𝑟𝑠v2=\{(r,s)\mid\exists n.\,\mathrm{emp}(n,r,s)\}italic_v 2 = { ( italic_r , italic_s ) ∣ ∃ italic_n . roman_emp ( italic_n , italic_r , italic_s ) }. v1 and v2, which respectively project on the namename\mathrm{name}roman_name and rolerole\mathrm{role}roman_role, and the rolerole\mathrm{role}roman_role and salarysalary\mathrm{salary}roman_salary columns of empemp\mathrm{emp}roman_emp, are used in defining the disjunctive security policy v1v2𝑣1𝑣2v1\vee v2italic_v 1 ∨ italic_v 2.

For this example, we assume a database that has only one row in the empemp\mathrm{emp}roman_emp table, and we also limit the domain of possible roles to {CEO,Intern}CEOIntern\{\mathrm{CEO},\mathrm{Intern}\}{ roman_CEO , roman_Intern }. These limitations are necessary in order to have a finite representation of the potential query sets and enables us to effectively depict the sets produced by the mixmix\mathrm{mix}roman_mix and tctc\mathrm{tc}roman_tc operators.

Program 2 depicts a disjunction that – ignoring variable y – depends either on q1𝑞1q1italic_q 1 or q2𝑞2q2italic_q 2 (i.e., q1q2𝑞1𝑞2q1\vee q2italic_q 1 ∨ italic_q 2), which on the DQ can be represented as a point tc({q1})tc({q2})\mathrm{tc}({\downarrow}\{q1\})\vee\mathrm{tc}({\downarrow}\{q2\})roman_tc ( ↓ { italic_q 1 } ) ∨ roman_tc ( ↓ { italic_q 2 } ). Similarly, the policy v1v2𝑣1𝑣2v1\vee v2italic_v 1 ∨ italic_v 2 can be represented on the DQ by tc({v1})tc({v2})\mathrm{tc}({\downarrow}\{v1\})\vee\mathrm{tc}({\downarrow}\{v2\})roman_tc ( ↓ { italic_v 1 } ) ∨ roman_tc ( ↓ { italic_v 2 } ).

Illustrating this point requires calculating the mixmix\mathrm{mix}roman_mix set of v1 and v2, which includes all sets of queries whose equivalence relation can be constructed from the equivalence classes of {v1}absentsubscript𝑣1similar-to{{\downarrow}\{v1\}}_{\sim}↓ { italic_v 1 } start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT and {v2}absentsubscript𝑣2similar-to{{\downarrow}\{v2\}}_{\sim}↓ { italic_v 2 } start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT. Unfortunately, for any sufficiently rich query language, our definition of mixmix\mathrm{mix}roman_mix inevitably yields an infinite set, as infinitely many queries that are “morally equivalent” or even the same up to renaming variables represent the same knowledge set. To compactly represent such infinite sets, we will pick just one representative, and define

hc()={QQ.Q=Q}𝑐conditional-setsuperscript𝑄formulae-sequence𝑄subscript𝑄similar-tosubscriptsuperscript𝑄similar-to\displaystyle hc(\mathbb{Q})=\{Q^{\prime}\mid\exists Q\in\mathbb{Q}.\ {Q}_{% \sim}={Q^{\prime}}_{\sim}\}italic_h italic_c ( blackboard_Q ) = { italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∣ ∃ italic_Q ∈ blackboard_Q . italic_Q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT = italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT }

as a closure operator that adds all equivalent queries. Then mix({{v1},{v2}})\mathrm{mix}\big{(}\{{\downarrow}\{v1\},{\downarrow}\{v2\}\}\big{)}roman_mix ( { ↓ { italic_v 1 } , ↓ { italic_v 2 } } ) will be the set hc({{v1}hc(\{{\downarrow}\{v1\}italic_h italic_c ( { ↓ { italic_v 1 }, {v2}absent𝑣2{\downarrow}\{v2\}↓ { italic_v 2 }, {p1}absent𝑝1{\downarrow}\{p1\}↓ { italic_p 1 }, {p2}}){\downarrow}\{p2\}\})↓ { italic_p 2 } } ), where p1={(r,vl)(s,n.emp(n,r,s)r=Internvl=s)(s,n.emp(n,r,s)r=CEOvl=n)}p1=\{(r,vl)\mid\big{(}\exists s,n.\,\mathrm{emp}(n,r,s)\wedge r=\mathrm{Intern% }\wedge vl=s\big{)}\vee\big{(}\exists s,n.\,\mathrm{emp}(n,r,s)\wedge r=% \mathrm{CEO}\wedge vl=n\big{)}\}italic_p 1 = { ( italic_r , italic_v italic_l ) ∣ ( ∃ italic_s , italic_n . roman_emp ( italic_n , italic_r , italic_s ) ∧ italic_r = roman_Intern ∧ italic_v italic_l = italic_s ) ∨ ( ∃ italic_s , italic_n . roman_emp ( italic_n , italic_r , italic_s ) ∧ italic_r = roman_CEO ∧ italic_v italic_l = italic_n ) } and p2={(r,vl)(s,n.emp(n,r,s)r=CEOvl=s)(s,n.emp(n,r,s)r=Internvl=n)}p2=\{(r,vl)\mid\big{(}\exists s,n.\,\mathrm{emp}(n,r,s)\wedge r=\mathrm{CEO}% \wedge vl=s\big{)}\vee\big{(}\exists s,n.\,\mathrm{emp}(n,r,s)\wedge r=\mathrm% {Intern}\wedge vl=n\big{)}\}italic_p 2 = { ( italic_r , italic_v italic_l ) ∣ ( ∃ italic_s , italic_n . roman_emp ( italic_n , italic_r , italic_s ) ∧ italic_r = roman_CEO ∧ italic_v italic_l = italic_s ) ∨ ( ∃ italic_s , italic_n . roman_emp ( italic_n , italic_r , italic_s ) ∧ italic_r = roman_Intern ∧ italic_v italic_l = italic_n ) }.

Therefore, we can depict the policy as the point (hc({{v1}{\Downarrow}(hc(\{{\downarrow}\{v1\}⇓ ( italic_h italic_c ( { ↓ { italic_v 1 }, {v2}absent𝑣2{\downarrow}\{v2\}↓ { italic_v 2 }, {p1}absent𝑝1{\downarrow}\{p1\}↓ { italic_p 1 }, {p2}})){\downarrow}\{p2\}\}))↓ { italic_p 2 } } ) ) on the DQ. Similarly, the DQ point of the Program 2 (i.e., tc({q1})tc({q2})\mathrm{tc}({\downarrow}\{q1\})\vee\mathrm{tc}({\downarrow}\{q2\})roman_tc ( ↓ { italic_q 1 } ) ∨ roman_tc ( ↓ { italic_q 2 } )), can also be depicted by the point hc({{p1}}){\Downarrow}hc(\{{\downarrow}\{p1\}\})⇓ italic_h italic_c ( { ↓ { italic_p 1 } } ) on the DQ. We illustrate the part of the DQ which includes these points in Fig. 4, and as it is evident from the figure, conclude that Program 2 is inline with the policy.

tc({v1})annotatedtcabsent𝑣1\mathrm{tc}({\downarrow}\{v1\})roman_tc ( ↓ { italic_v 1 } )tc({v2})annotatedtcabsent𝑣2\mathrm{tc}({\downarrow}\{v2\})roman_tc ( ↓ { italic_v 2 } )tc({q1})annotatedtcabsent𝑞1\mathrm{tc}({\downarrow}\{q1\})roman_tc ( ↓ { italic_q 1 } )tc({q2})annotatedtcabsent𝑞2\mathrm{tc}({\downarrow}\{q2\})roman_tc ( ↓ { italic_q 2 } )hc({{p1}}){\Downarrow}hc(\{{\downarrow}\{p1\}\})⇓ italic_h italic_c ( { ↓ { italic_p 1 } } )hc({{v1},{v2},{p1},{p2}}){\Downarrow}hc(\{{\downarrow}\{v1\},{\downarrow}\{v2\},{\downarrow}\{p1\},{% \downarrow}\{p2\}\})⇓ italic_h italic_c ( { ↓ { italic_v 1 } , ↓ { italic_v 2 } , ↓ { italic_p 1 } , ↓ { italic_p 2 } } )
Figure 4: A portion of the DQ for queries q1, q2, v1, v2

IV Security Framework

Drawing on the quantale model of dependencies for programs and databases, we develop an extensional condition that defines security for programs that interact with databases and support disjunctive security policies. We will later use the security condition to prove soundness of enforcement mechanisms in Section V. Specifically, we formalize the syntax and semantics of a simple imperative language with database queries. Programs read the input from the database via queries, while users receive the output through predefined output channels. We define (disjunctive) security policies as views over the database and interpret them end-to-end. We then use this model to define a knowledge-based security condition for our setting.

IV-A Language

Syntax. The syntax for the commands of our language as depicted in Fig. 5, primarily consists of standard commands such as assignment, conditionals, and loops. The command out(e,u)out𝑒𝑢\texttt{out}(e,u)out ( italic_e , italic_u ) outputs the result of evaluating expression e𝑒eitalic_e to user u𝒰𝑢𝒰u\in\mathcal{U}italic_u ∈ caligraphic_U. The command xq𝑥𝑞x\leftarrow qitalic_x ← italic_q issues the query q𝑞qitalic_q to the database and stores the result in variable x𝑥xitalic_x. For modeling the queries, we rely on conjunctive queries with comparison introduced in Section V-A.

Expressions e𝑒eitalic_e can be variables xVars𝑥Varsx\in\mathrm{Vars}italic_x ∈ roman_Vars, values (integers) nVal𝑛Valn\in\mathrm{Val}italic_n ∈ roman_Val, binary operations e1e2direct-sumsubscript𝑒1subscript𝑒2e_{1}\oplus e_{2}italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊕ italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, single tuples tpVal𝑡𝑝Valtp\in\mathrm{Val}italic_t italic_p ∈ roman_Val, and set of tuples tp¯Val¯𝑡𝑝Val\overline{tp}\in\mathrm{Val}over¯ start_ARG italic_t italic_p end_ARG ∈ roman_Val. For simplicity, we do not provide de-constructors for database tuples.

c:=skipifethenc1elsec2xqx:=ec1;c2whileedocout(e,u)\begin{array}[]{ll}c:=&\texttt{skip}\ \mid\ \texttt{if}\ e\ \texttt{then}\ c_{% 1}\ \texttt{else}\ c_{2}\ \mid\\ &x\leftarrow q\ \mid\ x:=e\ \mid\ c_{1};c_{2}\ \mid\\ &\texttt{while}\ e\ \texttt{do}\ c\ \mid\ \texttt{out}(e,u)\\ \end{array}start_ARRAY start_ROW start_CELL italic_c := end_CELL start_CELL skip ∣ if italic_e then italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT else italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∣ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_x ← italic_q ∣ italic_x := italic_e ∣ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∣ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL while italic_e do italic_c ∣ out ( italic_e , italic_u ) end_CELL end_ROW end_ARRAY

Figure 5: Language syntax

Semantics. As discussed in Section III-C, a database state (or simply state) dbΩD𝑑𝑏subscriptΩ𝐷db\in\Omega_{D}italic_d italic_b ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT is defined with respect to a schema D𝐷Ditalic_D and a finite set of integrity constraints. A configuration c,m,db𝑐𝑚𝑑𝑏\langle c,m,db\rangle⟨ italic_c , italic_m , italic_d italic_b ⟩ consists of a command c𝑐citalic_c, a memory m=VarVal𝑚VarValm=\mathrm{Var}\rightarrow\mathrm{Val}italic_m = roman_Var → roman_Val map** variables to values, and a state db𝑑𝑏dbitalic_d italic_b.

The semantics of expressions is mostly standard and its rules are presented in Fig. 6. We use judgments of the form e,m,dbvl𝑒𝑚𝑑𝑏𝑣𝑙\langle e,m,db\rangle\downarrow vl⟨ italic_e , italic_m , italic_d italic_b ⟩ ↓ italic_v italic_l to denote that an expression e𝑒eitalic_e evaluates to value vl𝑣𝑙vlitalic_v italic_l in memory m𝑚mitalic_m and state db𝑑𝑏dbitalic_d italic_b. For simplicity, we refrain from defining binary operations on tuples, unless the underlying database query is boolean.

We use judgments of the form c,m,db𝛼c,m,db𝛼𝑐𝑚𝑑𝑏superscript𝑐superscript𝑚𝑑superscript𝑏\langle c,m,db\rangle\xrightarrow{\alpha}\langle c^{\prime},m^{\prime},db^{% \prime}\rangle⟨ italic_c , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_α → end_ARROW ⟨ italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ to denote that a configuration c,m,db𝑐𝑚𝑑𝑏\langle c,m,db\rangle⟨ italic_c , italic_m , italic_d italic_b ⟩ in one step evaluates to memory msuperscript𝑚m^{\prime}italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and state db𝑑superscript𝑏db^{\prime}italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and (possibly) produces an observation αObs𝛼Obs\alpha\in\mathrm{Obs}italic_α ∈ roman_Obs; we write ϵitalic-ϵ\epsilonitalic_ϵ whenever a command produces no observation. We write m[xvl]𝑚delimited-[]maps-to𝑥𝑣𝑙m[x\mapsto vl]italic_m [ italic_x ↦ italic_v italic_l ] to denote a memory m𝑚mitalic_m with variable x𝑥xitalic_x assigned the value vl𝑣𝑙vlitalic_v italic_l.

Fig. 7 provides the semantic rules for commands. The query evaluation rule QueryEval is similar to assignment as it evaluates a query q𝑞qitalic_q into state db𝑑𝑏dbitalic_d italic_b and stores the result in the variable x𝑥xitalic_x. We use the command out(e,u)out𝑒𝑢\texttt{out}(e,u)out ( italic_e , italic_u ) to produce an observation. Formally, an observation αObs𝛼Obs\alpha\in\mathrm{Obs}italic_α ∈ roman_Obs is a tuple o,u𝑜𝑢\langle o,u\rangle⟨ italic_o , italic_u ⟩, where u𝒰𝑢𝒰u\in\mathcal{U}italic_u ∈ caligraphic_U is the identifier of the user observing the output and o𝑜oitalic_o is the result of evaluating expression e𝑒eitalic_e, which is either a simple value or the result set of a non-boolean query.

We write c,m,db𝜏uc,m,dbsubscript𝜏𝑢𝑐𝑚𝑑𝑏superscript𝑐superscript𝑚𝑑superscript𝑏\langle c,m,db\rangle\xRightarrow{\tau}\negthickspace_{u}\langle c^{\prime},m^% {\prime},db^{\prime}\rangle⟨ italic_c , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_τ ⇒ end_ARROW start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⟨ italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ to denote when c,m,db𝑐𝑚𝑑𝑏\langle c,m,db\rangle⟨ italic_c , italic_m , italic_d italic_b ⟩ takes one or more steps to reach configuration c,m,dbsuperscript𝑐superscript𝑚𝑑superscript𝑏\langle c^{\prime},m^{\prime},db^{\prime}\rangle⟨ italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ while producing the trace (sequence of observations) τObs𝜏superscriptObs\tau\in\mathrm{Obs}^{\ast}italic_τ ∈ roman_Obs start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. We omit the final configuration whenever it is irrelevant and write c,m,db𝜏usubscript𝜏𝑢𝑐𝑚𝑑𝑏absent\langle c,m,db\rangle\xRightarrow{\tau}\negthickspace_{u}⟨ italic_c , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_τ ⇒ end_ARROW start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT.

\inferrule[before=Int]n,m,dbn\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒Int𝑛𝑚𝑑𝑏𝑛\inferrule*[before=\textsc{Int}]{\\ }{\langle n,m,db\rangle\downarrow n}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = Int ] ⟨ italic_n , italic_m , italic_d italic_b ⟩ ↓ italic_n       \inferrule[before=Tuple]tp,m,dbtp\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒Tuple𝑡𝑝𝑚𝑑𝑏𝑡𝑝\inferrule*[before=\textsc{Tuple}]{\\ }{\langle tp,m,db\rangle\downarrow tp}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = Tuple ] ⟨ italic_t italic_p , italic_m , italic_d italic_b ⟩ ↓ italic_t italic_p       \inferrule[before=TupleSet]tp¯,m,dbtp¯\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒TupleSet¯𝑡𝑝𝑚𝑑𝑏¯𝑡𝑝\inferrule*[before=\textsc{TupleSet}]{\\ }{\langle\overline{tp},m,db\rangle\downarrow\overline{tp}}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = TupleSet ] ⟨ over¯ start_ARG italic_t italic_p end_ARG , italic_m , italic_d italic_b ⟩ ↓ over¯ start_ARG italic_t italic_p end_ARG       \inferrule[before=Var]vl=m(x)x,m,dbvl\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒Var𝑣𝑙𝑚𝑥𝑥𝑚𝑑𝑏𝑣𝑙\inferrule*[before=\textsc{Var}]{vl=m(x)}{\langle x,m,db\rangle\downarrow vl}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = Var ] italic_v italic_l = italic_m ( italic_x ) ⟨ italic_x , italic_m , italic_d italic_b ⟩ ↓ italic_v italic_l

\inferrule[before=Op]e1,m,dbn1e1,m,dbn2n=n1n2e1e2,m,dbn\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒Opsubscript𝑒1𝑚𝑑𝑏subscript𝑛1subscript𝑒1𝑚𝑑𝑏subscript𝑛2𝑛direct-sumsubscript𝑛1subscript𝑛2direct-sumsubscript𝑒1subscript𝑒2𝑚𝑑𝑏𝑛\inferrule*[before=\textsc{Op}]{\langle e_{1},m,db\rangle\downarrow n_{1}\\ \langle e_{1},m,db\rangle\downarrow n_{2}\\ n=n_{1}\oplus n_{2}}{\langle e_{1}\oplus e_{2},m,db\rangle\downarrow n}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = Op ] ⟨ italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m , italic_d italic_b ⟩ ↓ italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟨ italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m , italic_d italic_b ⟩ ↓ italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n = italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊕ italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟨ italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊕ italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m , italic_d italic_b ⟩ ↓ italic_n

Figure 6: Semantic rules for expressions

\inferrule[before=Skip]skip,m,dbϵϵ,m,dbitalic-ϵ\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒Skipskip𝑚𝑑𝑏italic-ϵ𝑚𝑑𝑏\inferrule*[before=\textsc{Skip}]{\\ }{\langle\texttt{skip},m,db\rangle\xrightarrow{\epsilon}\langle\epsilon,m,db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = Skip ] ⟨ skip , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ italic_ϵ , italic_m , italic_d italic_b ⟩       \inferrule[before=Assign]e,m,dbvlm=m[xvl]x:=e,m,dbϵϵ,m,db\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒Assign𝑒𝑚𝑑𝑏𝑣𝑙superscript𝑚𝑚delimited-[]maps-to𝑥𝑣𝑙delimited-⟨⟩assign𝑥𝑒𝑚𝑑𝑏italic-ϵitalic-ϵsuperscript𝑚𝑑𝑏\inferrule*[before=\textsc{Assign}]{\langle e,m,db\rangle\downarrow vl\\ m^{\prime}=m[x\mapsto vl]}{\langle x:=e,m,db\rangle\xrightarrow{\epsilon}% \langle\epsilon,m^{\prime},db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = Assign ] ⟨ italic_e , italic_m , italic_d italic_b ⟩ ↓ italic_v italic_l italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_m [ italic_x ↦ italic_v italic_l ] ⟨ italic_x := italic_e , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ italic_ϵ , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_d italic_b ⟩       \inferrule[before=QueryEval]vl=qdbm=m[xvl]xq,m,dbϵϵ,m,db\inferrule*[before=\textsc{QueryEval}]{vl={\llbracket{q}\rrbracket^{db}}\\ m^{\prime}=m[x\mapsto vl]}{\langle x\leftarrow q,m,db\rangle\xrightarrow{% \epsilon}\langle\epsilon,m^{\prime},db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = QueryEval ] italic_v italic_l = ⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_m [ italic_x ↦ italic_v italic_l ] ⟨ italic_x ← italic_q , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ italic_ϵ , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_d italic_b ⟩

\inferrule[before=IfTrue]e,m,dbnn0ifethenc1elsec2,m,dbϵc1,m,db\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒IfTrue𝑒𝑚𝑑𝑏𝑛𝑛0if𝑒thensubscript𝑐1elsesubscript𝑐2𝑚𝑑𝑏italic-ϵsubscript𝑐1𝑚𝑑𝑏\inferrule*[before=\textsc{IfTrue}]{\langle e,m,db\rangle\downarrow n\\ n\not=0}{\langle\texttt{if}\ e\ \texttt{then}\ c_{1}\ \texttt{else}\ c_{2},m,% db\rangle\xrightarrow{\epsilon}\langle c_{1},m,db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = IfTrue ] ⟨ italic_e , italic_m , italic_d italic_b ⟩ ↓ italic_n italic_n ≠ 0 ⟨ if italic_e then italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT else italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m , italic_d italic_b ⟩       \inferrule[before=IfFalse]e,m,dbnn=0ifethenc1elsec2,m,dbϵc2,m,db\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒IfFalse𝑒𝑚𝑑𝑏𝑛𝑛0if𝑒thensubscript𝑐1elsesubscript𝑐2𝑚𝑑𝑏italic-ϵsubscript𝑐2𝑚𝑑𝑏\inferrule*[before=\textsc{IfFalse}]{\langle e,m,db\rangle\downarrow n\\ n=0}{\langle\texttt{if}\ e\ \texttt{then}\ c_{1}\ \texttt{else}\ c_{2},m,db% \rangle\xrightarrow{\epsilon}\langle c_{2},m,db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = IfFalse ] ⟨ italic_e , italic_m , italic_d italic_b ⟩ ↓ italic_n italic_n = 0 ⟨ if italic_e then italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT else italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m , italic_d italic_b ⟩

\inferrule[before=WhileTrue]e,m,dbnn0whileedoc,m,dbϵc;whileedoc,m,db\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒WhileTrue𝑒𝑚𝑑𝑏𝑛𝑛0while𝑒do𝑐𝑚𝑑𝑏italic-ϵ𝑐while𝑒do𝑐𝑚𝑑𝑏\inferrule*[before=\textsc{WhileTrue}]{\langle e,m,db\rangle\downarrow n\\ n\not=0}{\langle\texttt{while}\ e\ \texttt{do}\ c,m,db\rangle\xrightarrow{% \epsilon}\langle c;\texttt{while}\ e\ \texttt{do}\ c,m,db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = WhileTrue ] ⟨ italic_e , italic_m , italic_d italic_b ⟩ ↓ italic_n italic_n ≠ 0 ⟨ while italic_e do italic_c , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ italic_c ; while italic_e do italic_c , italic_m , italic_d italic_b ⟩       \inferrule[before=WhileFalse]e,m,dbnn=0whileedoc,m,dbϵϵ,m,db\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒WhileFalse𝑒𝑚𝑑𝑏𝑛𝑛0while𝑒do𝑐𝑚𝑑𝑏italic-ϵitalic-ϵ𝑚𝑑𝑏\inferrule*[before=\textsc{WhileFalse}]{\langle e,m,db\rangle\downarrow n\\ n=0}{\langle\texttt{while}\ e\ \texttt{do}\ c,m,db\rangle\xrightarrow{\epsilon% }\langle\epsilon,m,db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = WhileFalse ] ⟨ italic_e , italic_m , italic_d italic_b ⟩ ↓ italic_n italic_n = 0 ⟨ while italic_e do italic_c , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ italic_ϵ , italic_m , italic_d italic_b ⟩

\inferrule[before=Seq]c1,m,db𝛼c1,m,dbc1;c2,m,db𝛼c1;c2,m,db𝛼\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒Seqsubscript𝑐1𝑚𝑑𝑏superscriptsubscript𝑐1superscript𝑚𝑑superscript𝑏subscript𝑐1subscript𝑐2𝑚𝑑𝑏𝛼superscriptsubscript𝑐1subscript𝑐2superscript𝑚𝑑superscript𝑏\inferrule*[before=\textsc{Seq}]{\langle c_{1},m,db\rangle\xrightarrow{\alpha}% \langle c_{1}^{\prime},m^{\prime},db^{\prime}\rangle\\ }{\langle c_{1};c_{2},m,db\rangle\xrightarrow{\alpha}\langle c_{1}^{\prime};c_% {2},m^{\prime},db^{\prime}\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = Seq ] ⟨ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_α → end_ARROW ⟨ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ ⟨ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_α → end_ARROW ⟨ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩       \inferrule[before=SeqEmpty]ϵ;c,m,dbϵc,m,dbitalic-ϵ\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒SeqEmptyitalic-ϵ𝑐𝑚𝑑𝑏𝑐𝑚𝑑𝑏\inferrule*[before=\textsc{SeqEmpty}]{\\ }{\langle\epsilon;c,m,db\rangle\xrightarrow{\epsilon}\langle c,m,db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = SeqEmpty ] ⟨ italic_ϵ ; italic_c , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ italic_c , italic_m , italic_d italic_b ⟩       \inferrule[before=Output]e,m,dbvlout(e,u),m,dbvl,uϵ,m,db\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒Output𝑒𝑚𝑑𝑏𝑣𝑙out𝑒𝑢𝑚𝑑𝑏𝑣𝑙𝑢italic-ϵ𝑚𝑑𝑏\inferrule*[before=\textsc{Output}]{\langle e,m,db\rangle\downarrow vl}{% \langle\texttt{out}(e,u),m,db\rangle\xrightarrow{\langle vl,u\rangle}\langle% \epsilon,m,db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = Output ] ⟨ italic_e , italic_m , italic_d italic_b ⟩ ↓ italic_v italic_l ⟨ out ( italic_e , italic_u ) , italic_m , italic_d italic_b ⟩ start_ARROW start_OVERACCENT ⟨ italic_v italic_l , italic_u ⟩ end_OVERACCENT → end_ARROW ⟨ italic_ϵ , italic_m , italic_d italic_b ⟩

Figure 7: Semantics rules for commands

IV-B Security Model

We now introduce our knowledge-based security model for disjunctive security policies. For simplicity, we denote the initial program memory by m0subscript𝑚0m_{0}italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and assume it is fixed and public to all users, hence the only way to input sensitive information is through database queries. Users make observations through output channels, hence their knowledge of the database is determined by what they can infer based on these observations. This model induces standard equivalence relations for database states and observation traces.

Database state equivalence. Two states db𝑑𝑏dbitalic_d italic_b and db𝑑superscript𝑏db^{\prime}italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are equivalent with respect to a set of tables and views V𝑉Vitalic_V, written as dbVdbsubscript𝑉𝑑𝑏𝑑superscript𝑏db\approx_{V}db^{\prime}italic_d italic_b ≈ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, iff all tables and views in V𝑉Vitalic_V have identical contents in db𝑑𝑏dbitalic_d italic_b and db𝑑superscript𝑏db^{\prime}italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Formally, states db𝑑𝑏dbitalic_d italic_b and db𝑑superscript𝑏db^{\prime}italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are equivalent with respect to V𝑉Vitalic_V iff for all view vV,v.qdb=v.qdbv\in V,\ {\llbracket{v.q}\rrbracket^{db}}={\llbracket{v.q}\rrbracket^{db^{% \prime}}}italic_v ∈ italic_V , ⟦ italic_v . italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_v . italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT and for all table tV,tdb=tdbt\in V,\ {\llbracket{t}\rrbracket^{db}}={\llbracket{t}\rrbracket^{db^{\prime}}}italic_t ∈ italic_V , ⟦ italic_t ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_t ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. A set of tables and views V𝑉Vitalic_V induces an equivalence relation, and for a state db𝑑𝑏dbitalic_d italic_b, the equivalence class [db]Vsubscriptdelimited-[]𝑑𝑏𝑉[db]_{V}[ italic_d italic_b ] start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT contains all states that are equivalent to db𝑑𝑏dbitalic_d italic_b with respect to V𝑉Vitalic_V.

Trace equivalence. We use trace projection to define trace equivalence. The projection of a trace τ𝜏\tauitalic_τ for user u𝑢uitalic_u written as τusubscript𝑢𝜏absent\tau\negthickspace\downharpoonright_{u}italic_τ ⇂ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT is the sequence of all observations in τ𝜏\tauitalic_τ that u𝑢uitalic_u can observe. Traces τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are equivalent with respect to user u𝑢uitalic_u, written as τ1uτ2subscript𝑢subscript𝜏1subscript𝜏2\tau_{1}\approx_{u}\tau_{2}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≈ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, iff the projection of one of them to u𝑢uitalic_u is the prefix of the other, i.e.,  τ1uτ2u\tau_{1}\negthickspace\downharpoonright_{u}\ \preceq\tau_{2}\negthickspace% \downharpoonright_{u}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⇂ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⪯ italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⇂ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT or τ1uτ2u\tau_{1}\negthickspace\downharpoonright_{u}\ \succeq\tau_{2}\negthickspace% \downharpoonright_{u}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⇂ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⪰ italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⇂ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT.

Equivalence of trace prefixes is a standard technicality needed to ignore leaks due to program’s progress/termination [21], and here we adapt a definition of trace equivalence which does not differentiate between program divergence and termination [14].

User knowledge. When executing a program prgprg\mathrm{prg}roman_prg, we assume memory is always initially in the all-zero state m0subscript𝑚0m_{0}italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Thus, we can view a program’s execution for any user as a function from database db𝑑𝑏dbitalic_d italic_b to user-observable output traces, τprg,u(db)=τusubscript𝜏prg𝑢𝑑𝑏𝜏subscript𝑢absent\tau_{\mathrm{prg},u}(db)=\tau\negthickspace\downharpoonright_{u}italic_τ start_POSTSUBSCRIPT roman_prg , italic_u end_POSTSUBSCRIPT ( italic_d italic_b ) = italic_τ ⇂ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT when prg,m0,db𝜏usubscript𝜏𝑢prgsubscript𝑚0𝑑𝑏absent\langle\mathrm{prg},m_{0},db\rangle\xRightarrow{\tau}\negthickspace_{u}⟨ roman_prg , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d italic_b ⟩ start_ARROW overitalic_τ ⇒ end_ARROW start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT. This function induces an equivalence relation on databases, prgu=τprg,u\llbracket\mathrm{prg}\rrbracket_{u}={\sim_{\tau_{\mathrm{prg},u}}}⟦ roman_prg ⟧ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = ∼ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT roman_prg , italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT, which characterizes the knowledge of db𝑑𝑏dbitalic_d italic_b conveyed by the output of prgprg\mathrm{prg}roman_prg to u𝑢uitalic_u.

Security policy. A security policy is a list of user policies (written as Pusubscript𝑃𝑢P_{u}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT) for each user u𝒰𝑢𝒰u\in\mathcal{U}italic_u ∈ caligraphic_U. User policies are defined as views and table identifiers over a database schema, and determine what a user u𝑢uitalic_u is allowed to observe. Fig. 8 presents the syntax of disjunctive policies for our model. They are defined as a set of sets in order to represent a disjunction of conjunctions of simpler policies. A conjunction concon\mathrm{con}roman_con is a set of view v𝑣vitalic_v and table t𝑡titalic_t identifiers, and a disjunction disdis\mathrm{dis}roman_dis is a set of conjunctions. For example, the policy Pusubscript𝑃𝑢P_{u}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT for user u𝑢uitalic_u who is allowed to see table t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and view v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, or view v2subscript𝑣2v_{2}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT but not both, is defined as Pu={{t1,v1},{v2}}subscript𝑃𝑢subscript𝑡1subscript𝑣1subscript𝑣2P_{u}=\{\{t_{1},v_{1}\},\{v_{2}\}\}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = { { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } , { italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } }.

The overall policy of the system, written as P𝑃Pitalic_P, is the list of user policies. Per Def. 6, the policy Pusubscript𝑃𝑢P_{u}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT can be represented semantically as an element Pudelimited-⟦⟧subscript𝑃𝑢\llbracket P_{u}\rrbracket⟦ italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⟧ of the Quantale of Information. Thus, we can formulate our security condition as the assertion that the knowledge of the database that the execution of the program prgprg\mathrm{prg}roman_prg conveys to u𝑢uitalic_u is bounded above by the disjunctive knowledge allowed by the policy, Pudelimited-⟦⟧subscript𝑃𝑢\llbracket P_{u}\rrbracket⟦ italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⟧.

Definition 7.

The program prgprg\mathrm{prg}roman_prg is secure for the user u𝑢uitalic_u and policy Pusubscript𝑃𝑢P_{u}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT if prguPu\llbracket\mathrm{prg}\rrbracket_{u}\sqsubseteq\llbracket P_{u}\rrbracket⟦ roman_prg ⟧ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⊑ ⟦ italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⟧.

con:={v}{t}con1con2dis:={con}dis1dis2Pu:=disconassign𝑣delimited-∣∣𝑡subscriptcon1subscriptcon2disassignconditionalconsubscriptdis1subscriptdis2subscript𝑃𝑢assigndis\begin{array}[]{rcl}\mathrm{con}&:=&\{v\}\mid\{t\}\mid\mathrm{con}_{1}\cup% \mathrm{con}_{2}\\ \mathrm{dis}&:=&\{\mathrm{con}\}\mid\mathrm{dis}_{1}\cup\mathrm{dis}_{2}\\ P_{u}&:=&\mathrm{dis}\\ \end{array}start_ARRAY start_ROW start_CELL roman_con end_CELL start_CELL := end_CELL start_CELL { italic_v } ∣ { italic_t } ∣ roman_con start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ roman_con start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL roman_dis end_CELL start_CELL := end_CELL start_CELL { roman_con } ∣ roman_dis start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ roman_dis start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_CELL start_CELL := end_CELL start_CELL roman_dis end_CELL end_ROW end_ARRAY

Figure 8: Syntax of user policy

V Enforcement of Disjunctive Policies

Having formulated the security condition, we would like to prove that useful programs satisfy it. To this end, we introduce a sound static enforcement mechanism, which imposes some structural limitations on the policy and trades off some completeness for the sake of efficiency and ease of analysis.

Fig. 9 illustrates how our mechanism functions at a high level. We assume as input a program and policy in the format described in Fig. 5 and Fig. 8 respectively. The program is then subjected to a static dependency analysis (Section V-B), which computes an overapproximate set of possible paths of control flow through the program, along with the queries (dependencies) retrieved for each path, giving an element of the DQ, that is a (disjunctive) set of (conjunctive) sets of queries. Per Fig. 8, the policy is also already given in this format.

We would like to verify that the program dependencies are bounded by the policy in the DQ, as by Lemma 2, this entails the security condition (Def. 7) that the disjunctive information that is revealed by the program is bounded above by the QoI interpretation of the policy. However, checking DQ ordering on general queries may be computationally costly. We therefore abstract (Section V-C) both the policy and the path dependencies into a more tractable format (symbolic tuples), which again overapproximates the information they can retrieve. To guarantee soundness, we require that the views in the policy are such that this abstraction is lossless for them. Finally, as the security check (Section V-D), we compute a tractable comparison on sets of sets of symbolic tuples that can be shown to imply DQ ordering.

PolicyProgram
Dependency
Analysis
Query
Abstraction
Query
Abstraction
Security
Check
Figure 9: Enforcement steps

V-A Conjunctive Queries

While our theoretical definitions are based on the fully-general domain relational calculus as a query language, to avoid complexity, our enforcement mechanism will work with a restricted subset called conjunctive queries with comparisons (CQCs). This language is a subset of relational calculus that only employs conjunction (\wedge) and existential quantification (\exists) and omits disjunction (\vee), negation (¬\neg¬), and universal quantification (for-all\forall). CQCs can model SELECT-FROM-WHERE portion of SQL, where there are only AND and comparisons in the WHERE clause.

Our language for (non-boolean) CQC q𝑞qitalic_q over a database schema D𝐷Ditalic_D employs the standard notation [18, 20], and has the form headingbodyheadingbody\emph{heading}\leftarrow\emph{body}heading ← body:

ans(y¯)R1(x¯1),,Rn(x¯n),C1,,Cmans¯𝑦subscript𝑅1subscript¯𝑥1subscript𝑅𝑛subscript¯𝑥𝑛subscript𝐶1subscript𝐶𝑚\displaystyle\mathrm{ans}(\overline{y})\leftarrow R_{1}(\overline{x}_{1}),...,% R_{n}(\overline{x}_{n}),C_{1},...,C_{m}roman_ans ( over¯ start_ARG italic_y end_ARG ) ← italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_C start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT

where R1,,Rnsubscript𝑅1subscript𝑅𝑛R_{1},...,R_{n}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are relations in D𝐷Ditalic_D, and x¯1,,x¯nsubscript¯𝑥1subscript¯𝑥𝑛\overline{x}_{1},...,\overline{x}_{n}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are their variables. We use Var(q)=x¯1x¯nVar𝑞subscript¯𝑥1subscript¯𝑥𝑛\mathrm{Var}(q)=\overline{x}_{1}\cup...\cup\overline{x}_{n}roman_Var ( italic_q ) = over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT to denote the set of variables appearing in the body of the query q𝑞qitalic_q. C1,,Cmsubscript𝐶1subscript𝐶𝑚C_{1},...,C_{m}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_C start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT are formulae of the form xixjdirect-sumsubscript𝑥𝑖subscript𝑥𝑗x_{i}\oplus x_{j}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊕ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT where direct-sum\oplus is the comparison operator which could be anything from <,,=,,>,<,\leq,=,\not=,>,\geq< , ≤ , = , ≠ , > , ≥ and xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are either variables in Var(q)Var𝑞\mathrm{Var}(q)roman_Var ( italic_q ) or constants.

We require that y¯Var(q)¯𝑦Var𝑞\overline{y}\subseteq\mathrm{Var}(q)over¯ start_ARG italic_y end_ARG ⊆ roman_Var ( italic_q ). Without loss of generality, we assume that there are no self-joins in the query. In case of queries with self-joins, we can make logical copies of the relations to accommodate them [20]. The body of a CQC q𝑞qitalic_q comprises two parts, namely the relation identifiers R1,,Rnsubscript𝑅1subscript𝑅𝑛R_{1},...,R_{n}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT referred to as ids(q)ids𝑞\mathrm{ids}(q)roman_ids ( italic_q ), and the conditions C1,,Cmsubscript𝐶1subscript𝐶𝑚C_{1},...,C_{m}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_C start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT denoted by cnd(q)cnd𝑞\mathrm{cnd}(q)roman_cnd ( italic_q ).

Similarly to Section III-A, the evaluation of q𝑞qitalic_q on the database state db𝑑𝑏dbitalic_d italic_b (denoted by qdb{\llbracket{q}\rrbracket^{db}}⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT) is defined by taking all tuples in the cartesian product of ids(q)ids𝑞\mathrm{ids}(q)roman_ids ( italic_q ) in db𝑑𝑏dbitalic_d italic_b that satisfy cnd(q)cnd𝑞\mathrm{cnd}(q)roman_cnd ( italic_q ), and projecting to the column set y¯¯𝑦\overline{y}over¯ start_ARG italic_y end_ARG.

Example 5.

Consider the database schema in Fig. 2. The following query returns a set of tuples containing the names of divisions whose managers have a salary of more than 50505050:

ans(d)emp(n,r,s),mng(d,m),n=m,s>50formulae-sequenceans𝑑emp𝑛𝑟𝑠mng𝑑𝑚formulae-sequence𝑛𝑚𝑠50\displaystyle\mathrm{ans}(d)\leftarrow\mathrm{emp}(n,r,s),\mathrm{mng}(d,m),n=% m,s>50roman_ans ( italic_d ) ← roman_emp ( italic_n , italic_r , italic_s ) , roman_mng ( italic_d , italic_m ) , italic_n = italic_m , italic_s > 50

V-B Type-based Dependency Analysis

Our static dependency analysis builds on the generic type system of van Delft et al. [15] and extends it with support for disjunctive dependencies. We intuitively expect that a disjunctive dependency analysis must be path-sensitive, so as to distinguish between different executions and also keep track of the history of observations. Both of these requirements are often challenging for type-based analyses, which do not naturally align with the execution order. We will first illustrate these challenges with examples and then present our analysis.

1if (y > 0) then
2 x := w + z;
3else
4 x := x + 1;
5out(x,u);
Program 3:
1if (z == 0) then
2 x \leftarrow q1;
3else
4 x \leftarrow q2;
5out(x,u);
6if (z != 0) then
7 x \leftarrow q1;
8else
9 x \leftarrow q2;
10out(x,u);
Program 4:

Program 3 illustrates the need for path sensitivity. The analysis should distinguish between the then branch, where variable x𝑥xitalic_x depends on the set {y,w,z}𝑦𝑤𝑧\{y,w,z\}{ italic_y , italic_w , italic_z }, and the else branch where x𝑥xitalic_x depends on {y,x}𝑦𝑥\{y,x\}{ italic_y , italic_x }. Our reference analysis [15] would join these two sets at the end of the if statement, ultimately yielding the dependency set {x,y,w,z}𝑥𝑦𝑤𝑧\{x,y,w,z\}{ italic_x , italic_y , italic_w , italic_z }. In our analysis, these sets are never joined, but instead combined to form a set of sets, namely, {{y,w,z},{y,x}}𝑦𝑤𝑧𝑦𝑥\{\{y,w,z\},\{y,x\}\}{ { italic_y , italic_w , italic_z } , { italic_y , italic_x } }, where the outer set represents a disjunctive dependency and the inner sets represent conjunctive dependency.

Program 4 illustrates the need to keep track of the observation history. It outputs x𝑥xitalic_x at lines 5555 and 10101010, and the dependency set of x𝑥xitalic_x in both places is {{q1,z},{q2,z}}𝑞1𝑧𝑞2𝑧\{\{q1,z\},\{q2,z\}\}{ { italic_q 1 , italic_z } , { italic_q 2 , italic_z } }. However, this program will always output both q1𝑞1q1italic_q 1 and q2𝑞2q2italic_q 2. Now, if a policy only allows user u𝑢uitalic_u to see either query q1𝑞1q1italic_q 1 or q2𝑞2q2italic_q 2, the outputs at lines 5555 and 10101010 will be incorrectly accepted. Hence, the analysis should account for all outputs to user u𝑢uitalic_u.

Fig. 10 depicts the rules of our disjunctive dependency analysis. We use judgments of the form c:Γ\vdash c:\Gamma⊢ italic_c : roman_Γ, where ΓΓ\Gammaroman_Γ is an environment map** variables VarVar\mathrm{Var}roman_Var to set of sets of dependencies DepDep\mathrm{Dep}roman_Dep. The set of variables is Var=PV𝒰{pc}Var𝑃𝑉𝒰𝑝𝑐\mathrm{Var}=PV\ \cup\ \mathcal{U}\ \cup\ \{pc\}roman_Var = italic_P italic_V ∪ caligraphic_U ∪ { italic_p italic_c }, where PV𝑃𝑉PVitalic_P italic_V are program variables, 𝒰𝒰\mathcal{U}caligraphic_U are users, and pc𝑝𝑐pcitalic_p italic_c is the program context. The dependencies DepDep\mathrm{Dep}roman_Dep are Dep=Var𝒬DepVar𝒬\mathrm{Dep}=\mathrm{Var}\ \cup\ \mathcal{Q}roman_Dep = roman_Var ∪ caligraphic_Q, where VarVar\mathrm{Var}roman_Var are variables and 𝒬𝒬\mathcal{Q}caligraphic_Q are queries that can be issued to a database. We use u𝒰𝑢𝒰u\in\mathcal{U}italic_u ∈ caligraphic_U to indicate the dependencies of all outputs to user u𝑢uitalic_u.

We start by introducing the operators and auxiliary functions employed within the rules, and then proceed to explain the rules themselves. The operator tensor-product\otimes is used to join two (or more) sets of sets, defined as:

Γ1(x1)Γn(xn)={S1Sn\displaystyle\Gamma_{1}(x_{1})\otimes...\otimes\Gamma_{n}(x_{n})=\{S_{1}\cup..% .\cup S_{n}\mid\ roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⊗ … ⊗ roman_Γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = { italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∣ SiΓi(xi)subscript𝑆𝑖subscriptΓ𝑖subscript𝑥𝑖\displaystyle S_{i}\in\Gamma_{i}(x_{i})italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ roman_Γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
i=1,,n}\displaystyle i=1,\dots,n\}italic_i = 1 , … , italic_n }

For example, the join of Γ1(x)={{x,y},{z,y}}subscriptΓ1𝑥𝑥𝑦𝑧𝑦\Gamma_{1}(x)=\{\{x,y\},\{z,y\}\}roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) = { { italic_x , italic_y } , { italic_z , italic_y } } and Γ2(y)={{w},{x,z}}subscriptΓ2𝑦𝑤𝑥𝑧\Gamma_{2}(y)=\{\{w\},\{x,z\}\}roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_y ) = { { italic_w } , { italic_x , italic_z } } is:

Γ1(x)Γ2(y)={{x,y,w},{x,y,z},{z,y,w}}tensor-productsubscriptΓ1𝑥subscriptΓ2𝑦𝑥𝑦𝑤𝑥𝑦𝑧𝑧𝑦𝑤\displaystyle\Gamma_{1}(x)\otimes\Gamma_{2}(y)=\{\{x,y,w\},\{x,y,z\},\{z,y,w\}\}roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) ⊗ roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_y ) = { { italic_x , italic_y , italic_w } , { italic_x , italic_y , italic_z } , { italic_z , italic_y , italic_w } }

Intuitively, the result of the join operator is a set of sets capturing the product of the original sets of sets under the set union operation. We use this operator to calculate all the possible combinations of two environments.

Γ2;Γ1subscriptΓ2subscriptΓ1\Gamma_{2};\Gamma_{1}roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT represents the sequential composition of two environments. Intuitively, Γ2;Γ1subscriptΓ2subscriptΓ1\Gamma_{2};\Gamma_{1}roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the same as Γ2subscriptΓ2\Gamma_{2}roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT but updated with all of the dependencies that have been previously established in Γ1subscriptΓ1\Gamma_{1}roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Formally:

Γ2;Γ1(x)=S2Γ2(x)yS2Γ1(y)subscriptΓ2subscriptΓ1𝑥subscriptsubscript𝑆2subscriptΓ2𝑥subscripttensor-product𝑦subscript𝑆2subscriptΓ1𝑦\displaystyle\Gamma_{2};\Gamma_{1}(x)=\bigcup\limits_{S_{2}\in\Gamma_{2}(x)}\ % \bigotimes\limits_{y\in S_{2}}\ \Gamma_{1}(y)roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) = ⋃ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ⨂ start_POSTSUBSCRIPT italic_y ∈ italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_y )

For example, the sequential composition of the environments

Γ1=[\displaystyle\Gamma_{1}=[roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = [ x{{x},{y}},y{{y}},pc{{y,pc}}]\displaystyle x\mapsto\{\{x\},\{y\}\},y\mapsto\{\{y\}\},pc\mapsto\{\{y,pc\}\}]italic_x ↦ { { italic_x } , { italic_y } } , italic_y ↦ { { italic_y } } , italic_p italic_c ↦ { { italic_y , italic_p italic_c } } ]
Γ2=[\displaystyle\Gamma_{2}=[roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = [ x{{pc,x}},y{{pc,y}},pc{{pc}}]\displaystyle x\mapsto\{\{pc,x\}\},y\mapsto\{\{pc,y\}\},pc\mapsto\{\{pc\}\}]italic_x ↦ { { italic_p italic_c , italic_x } } , italic_y ↦ { { italic_p italic_c , italic_y } } , italic_p italic_c ↦ { { italic_p italic_c } } ]

evaluates to

Γ2;Γ1=[\displaystyle\Gamma_{2};\Gamma_{1}=[roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = [ x{{x,y,pc},{y,pc}},y{{pc,y}},formulae-sequencemaps-to𝑥𝑥𝑦𝑝𝑐𝑦𝑝𝑐maps-to𝑦𝑝𝑐𝑦\displaystyle x\mapsto\{\{x,y,pc\},\{y,pc\}\},y\mapsto\{\{pc,y\}\},italic_x ↦ { { italic_x , italic_y , italic_p italic_c } , { italic_y , italic_p italic_c } } , italic_y ↦ { { italic_p italic_c , italic_y } } ,
pc{{y,pc}}]\displaystyle pc\mapsto\{\{y,pc\}\}]italic_p italic_c ↦ { { italic_y , italic_p italic_c } } ]

Finally, the operator double-union\Cup calculates the union of two environments: Γ1Γ2=xVar,Γ1(x)Γ2(x)formulae-sequencedouble-unionsubscriptΓ1subscriptΓ2for-all𝑥VarsubscriptΓ1𝑥subscriptΓ2𝑥\Gamma_{1}\Cup\Gamma_{2}=\forall x\in\mathrm{Var},\ \Gamma_{1}(x)\cup\Gamma_{2% }(x)roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋓ roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∀ italic_x ∈ roman_Var , roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) ∪ roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ). This operator is used in conditionals to capture the disjunctive join of the two branches. For example, in line 5 in Program 3, Γ1(x)={{y,w,z}}subscriptΓ1𝑥𝑦𝑤𝑧\Gamma_{1}(x)=\{\{y,w,z\}\}roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) = { { italic_y , italic_w , italic_z } } and Γ2(x)={{y,x}}subscriptΓ2𝑥𝑦𝑥\Gamma_{2}(x)=\{\{y,x\}\}roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) = { { italic_y , italic_x } }, and the result of (Γ1Γ2)(x)double-unionsubscriptΓ1subscriptΓ2𝑥(\Gamma_{1}\Cup\Gamma_{2})(x)( roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋓ roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ( italic_x ) would be {{y,w,z},{y,x}}𝑦𝑤𝑧𝑦𝑥\{\{y,w,z\},\{y,x\}\}{ { italic_y , italic_w , italic_z } , { italic_y , italic_x } }.

For loops, we rely on the fixed-point of ΓΓ\Gammaroman_Γ, denoted by ΓsuperscriptΓ\Gamma^{*}roman_Γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, which we define as:

Γ=n>0ΓnsuperscriptΓsubscript𝑛0superscriptΓ𝑛\displaystyle\Gamma^{*}=\bigcup\limits_{n>0}\Gamma^{n}roman_Γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ⋃ start_POSTSUBSCRIPT italic_n > 0 end_POSTSUBSCRIPT roman_Γ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT

where Γ0=ΓidsuperscriptΓ0subscriptΓ𝑖𝑑\Gamma^{0}=\Gamma_{id}roman_Γ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = roman_Γ start_POSTSUBSCRIPT italic_i italic_d end_POSTSUBSCRIPT and Γn+1=Γn;ΓsuperscriptΓ𝑛1superscriptΓ𝑛Γ\Gamma^{n+1}=\Gamma^{n};\Gammaroman_Γ start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT = roman_Γ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ; roman_Γ.

In these rules, ΓidsubscriptΓ𝑖𝑑\Gamma_{id}roman_Γ start_POSTSUBSCRIPT italic_i italic_d end_POSTSUBSCRIPT is the identity environment, defined as xVar,Γid(x)={{x}}formulae-sequencefor-all𝑥VarsubscriptΓ𝑖𝑑𝑥𝑥\forall x\in\mathrm{Var},\ \Gamma_{id}(x)=\{\{x\}\}∀ italic_x ∈ roman_Var , roman_Γ start_POSTSUBSCRIPT italic_i italic_d end_POSTSUBSCRIPT ( italic_x ) = { { italic_x } }, and fv(e)𝑓𝑣𝑒fv(e)italic_f italic_v ( italic_e ) denotes the free variables of expression e𝑒eitalic_e.

\inferrule[before=T-Skip]skip:Γidproves\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒T-Skipskip:subscriptΓ𝑖𝑑\inferrule*[before=\textsc{T-Skip}]{\\ }{\vdash\texttt{skip}:\Gamma_{id}}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = T-Skip ] ⊢ skip : roman_Γ start_POSTSUBSCRIPT italic_i italic_d end_POSTSUBSCRIPT       \inferrule[before=T-Assign]Γ=Γid[x{fv(e){pc}}]x:=e:Γproves\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒T-AssignΓsubscriptΓ𝑖𝑑delimited-[]maps-to𝑥𝑓𝑣𝑒𝑝𝑐assign𝑥𝑒:Γ\inferrule*[before=\textsc{T-Assign}]{\Gamma=\Gamma_{id}[x\mapsto\{fv(e)\cup\{% pc\}\}]}{\vdash x:=e:\Gamma}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = T-Assign ] roman_Γ = roman_Γ start_POSTSUBSCRIPT italic_i italic_d end_POSTSUBSCRIPT [ italic_x ↦ { italic_f italic_v ( italic_e ) ∪ { italic_p italic_c } } ] ⊢ italic_x := italic_e : roman_Γ       \inferrule[before=T-Output]Γ=Γid[u{fv(e){pc,u}}]out(e,u):Γproves\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒T-OutputsuperscriptΓsubscriptΓ𝑖𝑑delimited-[]maps-to𝑢𝑓𝑣𝑒𝑝𝑐𝑢out𝑒𝑢:superscriptΓ\inferrule*[before=\textsc{T-Output}]{\Gamma^{\prime}=\Gamma_{id}[u\mapsto\{fv% (e)\cup\{pc,u\}\}]}{\vdash\texttt{out}(e,u):\Gamma^{\prime}}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = T-Output ] roman_Γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_Γ start_POSTSUBSCRIPT italic_i italic_d end_POSTSUBSCRIPT [ italic_u ↦ { italic_f italic_v ( italic_e ) ∪ { italic_p italic_c , italic_u } } ] ⊢ out ( italic_e , italic_u ) : roman_Γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT

\inferrule[before=T-QueryEval]Γ=Γid[x{{q,pc}}]xq:Γproves\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒T-QueryEvalΓsubscriptΓ𝑖𝑑delimited-[]maps-to𝑥𝑞𝑝𝑐𝑥𝑞:Γ\inferrule*[before=\textsc{T-QueryEval}]{\Gamma=\Gamma_{id}[x\mapsto\{\{q,pc\}% \}]}{\vdash x\leftarrow q:\Gamma}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = T-QueryEval ] roman_Γ = roman_Γ start_POSTSUBSCRIPT italic_i italic_d end_POSTSUBSCRIPT [ italic_x ↦ { { italic_q , italic_p italic_c } } ] ⊢ italic_x ← italic_q : roman_Γ       \inferrule[before=T-If]ci:ΓiΓi=Γi;Γid[pc{fv(e){pc}}]i=1,2Γ=(Γ1Γ2)[pc{{pc}}]ifethenc1elsec2:Γproves\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒T-Ifsubscript𝑐𝑖:formulae-sequencesubscriptΓ𝑖subscriptsuperscriptΓ𝑖subscriptΓ𝑖formulae-sequencesubscriptΓ𝑖𝑑delimited-[]maps-to𝑝𝑐𝑓𝑣𝑒𝑝𝑐𝑖12superscriptΓdouble-unionsubscriptsuperscriptΓ1subscriptsuperscriptΓ2delimited-[]maps-to𝑝𝑐𝑝𝑐provesif𝑒thensubscript𝑐1elsesubscript𝑐2:superscriptΓ\inferrule*[before=\textsc{T-If}]{\vdash c_{i}:\Gamma_{i}\\ \Gamma^{\prime}_{i}=\Gamma_{i};\Gamma_{id}[pc\mapsto\{fv(e)\cup\{pc\}\}]\ i=1,% 2\\ \Gamma^{\prime}=(\Gamma^{\prime}_{1}\Cup\Gamma^{\prime}_{2})[pc\mapsto\{\{pc\}% \}]}{\vdash\texttt{if}\ e\ \texttt{then}\ c_{1}\ \texttt{else}\ c_{2}:\Gamma^{% \prime}}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = T-If ] ⊢ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : roman_Γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_Γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_Γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; roman_Γ start_POSTSUBSCRIPT italic_i italic_d end_POSTSUBSCRIPT [ italic_p italic_c ↦ { italic_f italic_v ( italic_e ) ∪ { italic_p italic_c } } ] italic_i = 1 , 2 roman_Γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ( roman_Γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋓ roman_Γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) [ italic_p italic_c ↦ { { italic_p italic_c } } ] ⊢ if italic_e then italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT else italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : roman_Γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT

\inferrule[before=T-While]c:ΓcΓf=(Γc;Γid[pc{fv(e){pc}}])Γ=Γf[pc{{pc}}]whileedoc:Γproves\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒T-While𝑐:subscriptΓ𝑐subscriptΓ𝑓superscriptsubscriptΓ𝑐subscriptΓ𝑖𝑑delimited-[]maps-to𝑝𝑐𝑓𝑣𝑒𝑝𝑐superscriptΓsubscriptΓ𝑓delimited-[]maps-to𝑝𝑐𝑝𝑐proveswhile𝑒do𝑐:superscriptΓ\inferrule*[before=\textsc{T-While}]{\vdash c:\Gamma_{c}\\ \Gamma_{f}=(\Gamma_{c};\Gamma_{id}[pc\mapsto\{fv(e)\cup\{pc\}\}])^{*}\\ \Gamma^{\prime}=\Gamma_{f}[pc\mapsto\{\{pc\}\}]}{\vdash\texttt{while}\ e\ % \texttt{do}\ c:\Gamma^{\prime}}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = T-While ] ⊢ italic_c : roman_Γ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = ( roman_Γ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ; roman_Γ start_POSTSUBSCRIPT italic_i italic_d end_POSTSUBSCRIPT [ italic_p italic_c ↦ { italic_f italic_v ( italic_e ) ∪ { italic_p italic_c } } ] ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_Γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_Γ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT [ italic_p italic_c ↦ { { italic_p italic_c } } ] ⊢ while italic_e do italic_c : roman_Γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT       \inferrule[before=T-Seq]c1:Γ1c2:Γ2Γ=Γ2;Γ1c1;c2:Γproves\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒T-Seqsubscript𝑐1:subscriptΓ1provessubscript𝑐2:subscriptΓ2superscriptΓsubscriptΓ2subscriptΓ1provessubscript𝑐1subscript𝑐2:superscriptΓ\inferrule*[before=\textsc{T-Seq}]{\vdash c_{1}:\Gamma_{1}\\ \vdash c_{2}:\Gamma_{2}\\ \Gamma^{\prime}=\Gamma_{2};\Gamma_{1}}{\vdash c_{1};c_{2}:\Gamma^{\prime}}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = T-Seq ] ⊢ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊢ italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_Γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊢ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : roman_Γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT

Figure 10: Type-based dependency analysis rules

T-Assign updates the dependency set of the assigned variable x𝑥xitalic_x to the set of the free variables of expression e𝑒eitalic_e and pc𝑝𝑐pcitalic_p italic_c, otherwise it matches the identity environment. Rule T-QueryEval is similar to assignment, except that instead of fv(e)𝑓𝑣𝑒fv(e)italic_f italic_v ( italic_e ), it adds query q𝑞qitalic_q to the dependency set.

T-If sequentially composes the dependency sets of each branch with the environment Γid[pc{fv(e){pc}}]subscriptΓ𝑖𝑑delimited-[]maps-to𝑝𝑐𝑓𝑣𝑒𝑝𝑐\Gamma_{id}[pc\mapsto\{fv(e)\cup\{pc\}\}]roman_Γ start_POSTSUBSCRIPT italic_i italic_d end_POSTSUBSCRIPT [ italic_p italic_c ↦ { italic_f italic_v ( italic_e ) ∪ { italic_p italic_c } } ], thus adding variables of the branch condition to the dependency set of each branch. Finally, these environments (Γ1subscriptΓ1\Gamma_{1}roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Γ2subscriptΓ2\Gamma_{2}roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) are joined disjunctively using the double-union\Cup operator.

T-While uses the fixed-point operator to calculate the dependency set of the loop. To do so, it first calculates the dependency set of the loop body, which is sequentially composed with Γid[pc{fv(e){pc}}]subscriptΓ𝑖𝑑delimited-[]maps-to𝑝𝑐𝑓𝑣𝑒𝑝𝑐\Gamma_{id}[pc\mapsto\{fv(e)\cup\{pc\}\}]roman_Γ start_POSTSUBSCRIPT italic_i italic_d end_POSTSUBSCRIPT [ italic_p italic_c ↦ { italic_f italic_v ( italic_e ) ∪ { italic_p italic_c } } ] to account for the dependencies to the loop condition. Finally, the fixed-point operator computes the dependency set of the while loop.

T-Output relies on the dependency set including fv(e)𝑓𝑣𝑒fv(e)italic_f italic_v ( italic_e ), {pc}𝑝𝑐\{pc\}{ italic_p italic_c } and {u}𝑢\{u\}{ italic_u }, where fv(e)𝑓𝑣𝑒fv(e)italic_f italic_v ( italic_e ) includes all the variables of the expression outputted to user u𝑢uitalic_u, {pc}𝑝𝑐\{pc\}{ italic_p italic_c } captures the implicit dependencies to the path conditions, and {u}𝑢\{u\}{ italic_u } is the dependency set of user u𝑢uitalic_u and captures the history of dependencies that user u𝑢uitalic_u might have observed up to this point. Observe that by the definition of sequential composition, all the dependencies of the previous outputs will be added to u𝑢uitalic_u.

This analysis yields a final environment ΓfinsubscriptΓfin\Gamma_{\mathrm{fin}}roman_Γ start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT. The result of the analysis is the value of this environment for the user identifier u𝑢uitalic_u, which includes both queries and program variables. Since program variables do not contain sensitive information, and we are primarily concerned with queries, we refine the result of Γfin(u)subscriptΓfin𝑢\Gamma_{\mathrm{fin}}(u)roman_Γ start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( italic_u ) to only include queries. This refined outcome defines the ultimate result of our analysis, denoted as QLusubscriptQL𝑢\mathrm{QL}_{u}roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT:

QLuSΓfin(u){S𝒬}subscriptQL𝑢subscript𝑆subscriptΓfin𝑢𝑆𝒬\displaystyle\mathrm{QL}_{u}\triangleq\bigcup_{S\in\Gamma_{\mathrm{fin}}(u)}\{% S\cap\mathcal{Q}\}roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ≜ ⋃ start_POSTSUBSCRIPT italic_S ∈ roman_Γ start_POSTSUBSCRIPT roman_fin end_POSTSUBSCRIPT ( italic_u ) end_POSTSUBSCRIPT { italic_S ∩ caligraphic_Q }

The soundness proof of our enforcement relies on the circumstance that, if the set of queries on which the u𝑢uitalic_u-outputs of prgprg\mathrm{prg}roman_prg depend when running on a database state db𝑑𝑏dbitalic_d italic_b are denoted by Qprg,u(db)subscript𝑄prg𝑢𝑑𝑏Q_{\mathrm{prg},u}(db)italic_Q start_POSTSUBSCRIPT roman_prg , italic_u end_POSTSUBSCRIPT ( italic_d italic_b ), then this set is guaranteed to be found in the set QLusubscriptQL𝑢\mathrm{QL}_{u}roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT produced by the dependency analysis. We show how to define Qprg,u(db)subscript𝑄prg𝑢𝑑𝑏Q_{\mathrm{prg},u}(db)italic_Q start_POSTSUBSCRIPT roman_prg , italic_u end_POSTSUBSCRIPT ( italic_d italic_b ) using a taint-tracking semantics presented in Appendix E. Formally, this gives rise to the following soundness condition for the dependency analysis.

Lemma 3.

For all dbΩD𝑑𝑏subscriptΩ𝐷db\in\Omega_{D}italic_d italic_b ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT, Qprg,u(db)QLu(prg)subscript𝑄prg𝑢𝑑𝑏subscriptQL𝑢prgQ_{\mathrm{prg},u}(db)\in\mathrm{QL}_{u}(\mathrm{prg})italic_Q start_POSTSUBSCRIPT roman_prg , italic_u end_POSTSUBSCRIPT ( italic_d italic_b ) ∈ roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( roman_prg ).

V-C Query Abstraction

Even for CQCs, comparing the information revealed by sets of queries is hard in general. To define a well-behaved and more tractable determinacy order on which to build our DQ, we introduce another overapproximating abstraction, which we will use to soundly label queries and policies.

We define a symbolic tuple as T,ϕ,π𝑇italic-ϕ𝜋\langle T,\phi,\pi\rangle⟨ italic_T , italic_ϕ , italic_π ⟩, where T={t1,t2,tn}𝑇subscript𝑡1subscript𝑡2subscript𝑡𝑛T=\{t_{1},t_{2}...,t_{n}\}italic_T = { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } is a set of table identifiers, ϕitalic-ϕ\phiitalic_ϕ is a boolean combination of equality, inequality, and comparisons over the columns of the tables in T𝑇Titalic_T, and π𝜋\piitalic_π is a subset of the columns of the tables in T𝑇Titalic_T. In a symbolic tuple, π𝜋\piitalic_π denotes the query’s projection on the columns of the tables in T𝑇Titalic_T, and ϕitalic-ϕ\phiitalic_ϕ defines the constraints over the rows.

Example 6.

The symbolic tuple of query ans(d)emp(n,r,s),mng(d,m),n=m,s>50formulae-sequenceans𝑑emp𝑛𝑟𝑠mng𝑑𝑚formulae-sequence𝑛𝑚𝑠50\mathrm{ans}(d)\leftarrow\mathrm{emp}(n,r,s),\mathrm{mng}(d,m),n=m,s>50roman_ans ( italic_d ) ← roman_emp ( italic_n , italic_r , italic_s ) , roman_mng ( italic_d , italic_m ) , italic_n = italic_m , italic_s > 50 defined on the relations of Fig. 2 would be {emp,mng},s>50n=m,{d}delimited-⟨⟩formulae-sequenceempmng𝑠50𝑛𝑚𝑑\langle\{\mathrm{emp},\mathrm{mng}\},s>50\wedge n=m,\{d\}\rangle⟨ { roman_emp , roman_mng } , italic_s > 50 ∧ italic_n = italic_m , { italic_d } ⟩.

While calculating the exact set of symbolic tuples of a relational calculus query is intractable for many classes of queries, it is tractable for conjunctive queries with comparison (CQC). Given a conjunctive query q=ans(y¯)R1(x¯1),,Rn(x¯n),C1,,Cmformulae-sequence𝑞ans¯𝑦subscript𝑅1subscript¯𝑥1subscript𝑅𝑛subscript¯𝑥𝑛subscript𝐶1subscript𝐶𝑚q=\mathrm{ans}(\overline{y})\leftarrow R_{1}(\overline{x}_{1}),...,R_{n}(% \overline{x}_{n}),C_{1},...,C_{m}italic_q = roman_ans ( over¯ start_ARG italic_y end_ARG ) ← italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_C start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, the function stssts\mathrm{sts}roman_sts computes a symbolic tuple from q𝑞qitalic_q as follows:

sts(q)=ids(q),(Ccnd(q)C),y¯sts𝑞idssuperscript𝑞subscript𝐶cndsuperscript𝑞𝐶¯𝑦\displaystyle\mathrm{sts}(q)=\langle\mathrm{ids}(q^{\prime}),\big{(}\bigwedge_% {C\in\mathrm{cnd}(q^{\prime})}C\ \big{)},\overline{y}\rangleroman_sts ( italic_q ) = ⟨ roman_ids ( italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , ( ⋀ start_POSTSUBSCRIPT italic_C ∈ roman_cnd ( italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_C ) , over¯ start_ARG italic_y end_ARG ⟩

where ids(q)idssuperscript𝑞\mathrm{ids}(q^{\prime})roman_ids ( italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) and cnd(q)cndsuperscript𝑞\mathrm{cnd}(q^{\prime})roman_cnd ( italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) defined in Section V-A return the relation identifiers and conditionals of qsuperscript𝑞q^{\prime}italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, respectively. Here, qsuperscript𝑞q^{\prime}italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the query obtained by recursively replacing views with their definitions. We lift this definition to sets of queries Q𝑄Qitalic_Q, and define sts(Q)sts𝑄\mathrm{sts}(Q)roman_sts ( italic_Q ) as {qQsts(q)}subscript𝑞𝑄sts𝑞\{\bigcup_{q\in Q}\mathrm{sts}(q)\}{ ⋃ start_POSTSUBSCRIPT italic_q ∈ italic_Q end_POSTSUBSCRIPT roman_sts ( italic_q ) }.

Using stssts\mathrm{sts}roman_sts, we define the function σstsubscript𝜎st\sigma_{\mathrm{st}}italic_σ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT for a set of sets of queries \mathbb{Q}blackboard_Q as follows:

σst()={sts(Q)Q}subscript𝜎stconditional-setsts𝑄𝑄\displaystyle\sigma_{\mathrm{st}}(\mathbb{Q})=\{\mathrm{sts}(Q)\mid Q\in% \mathbb{Q}\}italic_σ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT ( blackboard_Q ) = { roman_sts ( italic_Q ) ∣ italic_Q ∈ blackboard_Q }

Policy Analysis. The function σstsubscript𝜎𝑠𝑡\sigma_{st}italic_σ start_POSTSUBSCRIPT italic_s italic_t end_POSTSUBSCRIPT can also be used to map a disjunctive security policy to a set of labels. However, in order to ensure soundness and avoid approximation, we place some constraints on policies. (1) To make computing the set of symbolic tuples tractable we only support policies with views in the CQC form. (2) We require that the symbolic tuples of views be well-formed, which we define as:

Definition 8.

The symbolic tuple T,ϕ,π𝑇italic-ϕ𝜋\langle T,\phi,\pi\rangle⟨ italic_T , italic_ϕ , italic_π ⟩ is said to be well-formed if it satisfies dep(ϕ)πdepitalic-ϕ𝜋\mathrm{dep}(\phi)\subseteq\piroman_dep ( italic_ϕ ) ⊆ italic_π.

where ϕ=C1Cnitalic-ϕsubscript𝐶1subscript𝐶𝑛\phi=C_{1}\wedge...\wedge C_{n}italic_ϕ = italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ … ∧ italic_C start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and dep(ϕ)=i{1,,n}fv(Ci)depitalic-ϕsubscript𝑖1𝑛𝑓𝑣subscript𝐶𝑖\mathrm{dep}(\phi)=\bigcup_{i\in\{1,...,n\}}fv(C_{i})roman_dep ( italic_ϕ ) = ⋃ start_POSTSUBSCRIPT italic_i ∈ { 1 , … , italic_n } end_POSTSUBSCRIPT italic_f italic_v ( italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) returns the column dependency set of ϕitalic-ϕ\phiitalic_ϕ.

Well-formedness ensures that the symbolic tuples are precise, at the expense of limiting a view to only applying constrains on the columns which it projects on.

Furthermore, we treat the table identifiers used in policies as special views that return the whole table. For instance, a policy which allows access to table empemp\mathrm{emp}roman_emp can be rewritten as view ans(n,r,s)emp(n,r,s)ans𝑛𝑟𝑠emp𝑛𝑟𝑠\mathrm{ans}(n,r,s)\leftarrow\mathrm{emp}(n,r,s)roman_ans ( italic_n , italic_r , italic_s ) ← roman_emp ( italic_n , italic_r , italic_s ).

As discussed in Section IV, the disjunctive security policy of user u𝑢uitalic_u (written as Pusubscript𝑃𝑢P_{u}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT) is a set of conjunctions concon\mathrm{con}roman_con, interpreted as a disjunction of conjunctions of table and view identifiers. For a policy Pusubscript𝑃𝑢P_{u}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT that adheres to the constraints mentioned earlier, σstsubscript𝜎st\sigma_{\mathrm{st}}italic_σ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT is defined as follows:

σst(Pu)={sts(con)conPu}subscript𝜎stsubscript𝑃𝑢conditional-setstsconconsubscript𝑃𝑢\displaystyle\sigma_{\mathrm{st}}(P_{u})=\{\mathrm{sts}(\mathrm{con})\mid% \mathrm{con}\in P_{u}\}italic_σ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) = { roman_sts ( roman_con ) ∣ roman_con ∈ italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT }

Labels. In our model, a security label \ellroman_ℓ is defined as a set of symbolic tuples, and we define the ordering relation of two labels, written as 1st2subscriptsquare-image-of-or-equalsstsubscript1subscript2\ell_{1}\sqsubseteq_{\mathrm{st}}\ell_{2}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊑ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, as follows:

Definition 9.

1st2subscriptsquare-image-of-or-equalsstsubscript1subscript2\ell_{1}\sqsubseteq_{\mathrm{st}}\ell_{2}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊑ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT iff for all symbolic tuples T,ϕ,π1𝑇italic-ϕ𝜋subscript1\langle T,\phi,\pi\rangle\in\ell_{1}⟨ italic_T , italic_ϕ , italic_π ⟩ ∈ roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, there are well-formed symbolic tuples T1,ϕ1,π1,,Tn,ϕn,πnsubscript𝑇1subscriptitalic-ϕ1subscript𝜋1subscript𝑇𝑛subscriptitalic-ϕ𝑛subscript𝜋𝑛\langle T_{1},\phi_{1},\pi_{1}\rangle,...,\langle T_{n},\phi_{n},\pi_{n}\rangle⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ , … , ⟨ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ in 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT such that T(T1Tn)𝑇subscript𝑇1subscript𝑇𝑛T\subseteq(T_{1}\cup...\cup T_{n})italic_T ⊆ ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), T1,,Tnsubscript𝑇1subscript𝑇𝑛T_{1},...,T_{n}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are disjoint, ϕ(ϕ1ϕn)modelsitalic-ϕsubscriptitalic-ϕ1subscriptitalic-ϕ𝑛\phi\models(\phi_{1}\wedge...\wedge\phi_{n})italic_ϕ ⊧ ( italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ … ∧ italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), and dep(ϕ)π(π1πn)𝑑𝑒𝑝italic-ϕ𝜋subscript𝜋1subscript𝜋𝑛dep(\phi)\cup\pi\subseteq(\pi_{1}\cup...\cup\pi_{n})italic_d italic_e italic_p ( italic_ϕ ) ∪ italic_π ⊆ ( italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ).

To ensure soundness, we assume that all of the symbolic tuples in the right hand side of stsubscriptsquare-image-of-or-equalsst\sqsubseteq_{\mathrm{st}}⊑ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT are well-formed. This definition relies on entailment to check the ordering of ϕitalic-ϕ\phiitalic_ϕ, and write ϕ1ϕ2modelssubscriptitalic-ϕ1subscriptitalic-ϕ2\phi_{1}\models\phi_{2}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊧ italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT which means that any assignment that satisfies ϕ1subscriptitalic-ϕ1\phi_{1}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT also satisfies ϕ2subscriptitalic-ϕ2\phi_{2}italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Example 7.

Consider symbolic tuples 1={{emp},s=10,{r}}\ell_{1}=\{\langle\{\mathrm{emp}\},s=10,\{r\}\rangle\}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { ⟨ { roman_emp } , italic_s = 10 , { italic_r } ⟩ } and 2={{emp,mng},s>5,{r,s,m}}\ell_{2}=\{\langle\{\mathrm{emp},\mathrm{mng}\},s>5,\{r,s,m\}\rangle\}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { ⟨ { roman_emp , roman_mng } , italic_s > 5 , { italic_r , italic_s , italic_m } ⟩ }. We have 1st2subscriptsquare-image-of-or-equalsstsubscript1subscript2\ell_{1}\sqsubseteq_{\mathrm{st}}\ell_{2}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊑ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT since {emp}{emp,mng}empempmng\{\mathrm{emp}\}\subseteq\{\mathrm{emp},\mathrm{mng}\}{ roman_emp } ⊆ { roman_emp , roman_mng }, {r}{r,s,m}𝑟𝑟𝑠𝑚\{r\}\subseteq\{r,s,m\}{ italic_r } ⊆ { italic_r , italic_s , italic_m }, s=10s>5𝑠10models𝑠5s=10\models s>5italic_s = 10 ⊧ italic_s > 5 and {s}{r}{r,s,m}𝑠𝑟𝑟𝑠𝑚\{s\}\cup\{r\}\subseteq\{r,s,m\}{ italic_s } ∪ { italic_r } ⊆ { italic_r , italic_s , italic_m }.

V-D Enforcement

The dependency analysis of Section V-B extracts the dependencies of program prgprg\mathrm{prg}roman_prg’s outputs to user u𝑢uitalic_u and produces QLusubscriptQL𝑢\mathrm{QL}_{u}roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT. Applying σstsubscript𝜎st\sigma_{\mathrm{st}}italic_σ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT to QLusubscriptQL𝑢\mathrm{QL}_{u}roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT yields a set of labels, each bounding the information revealed in some path, the u𝑢uitalic_u-knowledge of prgprg\mathrm{prg}roman_prg (denoted by k(prg)u𝑘subscriptprg𝑢k(\mathrm{prg})_{u}italic_k ( roman_prg ) start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT). We interpret this as a disjunction, as any execution follows along one particular path.

Similarly, applying σstsubscript𝜎st\sigma_{\mathrm{st}}italic_σ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT to the disjunctive security policy of user u𝑢uitalic_u (i.e., Pusubscript𝑃𝑢P_{u}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT) results in a set of labels. Each label faithfully captures one conjunction, and so the policy is also represented as a set of labels ak(Pu)𝑎𝑘subscript𝑃𝑢ak(P_{u})italic_a italic_k ( italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ), interpreted disjunctively.

By Lemma 2, to verify that the security condition is satisfied, it is sufficient to establish that QLuPusquare-image-of-or-equalssubscriptQL𝑢subscript𝑃𝑢\mathrm{QL}_{u}\sqsubseteq P_{u}roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⊑ italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT in the DQ. However, checking square-image-of-or-equals\sqsubseteq in the DQ is not generally tractable. For the security check, we therefore instead perform a twofold approximation: we check ordering for the conjunctive inner sets using the approximate ordering stsubscriptsquare-image-of-or-equalsst\sqsubseteq_{\mathrm{st}}⊑ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT, and approximate the mix-based ordering on the disjunctive outer sets in a way that loses little relative to our analysis:

Definition 10.

We say that k(prg)uak(Pu)subscriptsquare-image-of-or-equals𝑘subscriptprg𝑢𝑎𝑘subscript𝑃𝑢k(\mathrm{prg})_{u}\sqsubseteq_{*}ak(P_{u})italic_k ( roman_prg ) start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⊑ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_a italic_k ( italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) iff

kk(prg)u,akak(Pu).kstak\displaystyle\forall\ell_{k}\in k(\mathrm{prg})_{u},\ \exists\ell_{ak}\in ak(P% _{u}).\ \ \ell_{k}\sqsubseteq_{\mathrm{st}}\ell_{ak}∀ roman_ℓ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ italic_k ( roman_prg ) start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , ∃ roman_ℓ start_POSTSUBSCRIPT italic_a italic_k end_POSTSUBSCRIPT ∈ italic_a italic_k ( italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) . roman_ℓ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⊑ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_a italic_k end_POSTSUBSCRIPT

where aksubscript𝑎𝑘\ell_{ak}roman_ℓ start_POSTSUBSCRIPT italic_a italic_k end_POSTSUBSCRIPT and ksubscript𝑘\ell_{k}roman_ℓ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are labels, and stsubscriptsquare-image-of-or-equalsst\sqsubseteq_{\mathrm{st}}⊑ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT is the symbolic tuple ordering of Def. 9. To ensure faithful labeling of policies, we assume all of the symbolic tuples in aksubscript𝑎𝑘\ell_{ak}roman_ℓ start_POSTSUBSCRIPT italic_a italic_k end_POSTSUBSCRIPT are well-formed as defined in Def. 8. We can then formalize the relationship between subscriptsquare-image-of-or-equals\sqsubseteq_{*}⊑ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT and square-image-of-or-equals\sqsubseteq as follows.

Lemma 4.

If σst({Q1,,Qn})σst({P1,,Pm})subscriptsquare-image-of-or-equalssubscript𝜎stsubscript𝑄1subscript𝑄𝑛subscript𝜎stsubscript𝑃1subscript𝑃𝑚\sigma_{\mathrm{st}}(\{Q_{1},...,Q_{n}\})\sqsubseteq_{*}\sigma_{\mathrm{st}}(% \{P_{1},...,P_{m}\})italic_σ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT ( { italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ) ⊑ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT ( { italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } ), then in the DQ, (Q1Qn)(P1Pm)square-image-of-or-equalssubscript𝑄1subscript𝑄𝑛subscript𝑃1subscript𝑃𝑚(Q_{1}\vee...\vee Q_{n})\sqsubseteq(P_{1}\vee...\vee P_{m})( italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∨ … ∨ italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⊑ ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∨ … ∨ italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ).

We refer the readers to Appendix F-B for the proof of this Lemma.

V-E Soundness Proof

Fig. 11 outlines the overall architecture of our enforcement mechanism and the correctness assertion that we make of it.

ak(Pu)𝑎𝑘subscript𝑃𝑢ak(P_{u})italic_a italic_k ( italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT )k(prg)u𝑘subscriptprg𝑢k(\mathrm{prg})_{u}italic_k ( roman_prg ) start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPTPusubscript𝑃𝑢P_{u}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPTQLusubscriptQL𝑢\text{QL}_{u}QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPTPudelimited-⟦⟧subscript𝑃𝑢\llbracket P_{u}\rrbracket⟦ italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⟧QLudelimited-⟦⟧subscriptQL𝑢\llbracket\text{QL}_{u}\rrbracket⟦ QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⟧prgprg\mathrm{prg}roman_prgprgu\llbracket\mathrm{prg}\rrbracket_{u}⟦ roman_prg ⟧ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT\Rightarrow\Rightarrow\Rightarrowsquare-image-of-or-equals\sqsubseteqsquare-image-of-or-equals\sqsubseteqsquare-image-of-or-equals\sqsubseteqsubscriptsquare-image-of-or-equals\sqsubseteq_{*}⊑ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPTD.A.σstsubscript𝜎st\sigma_{\mathrm{st}}italic_σ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPTσstsubscript𝜎st\sigma_{\mathrm{st}}italic_σ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPTQoIDQ
Figure 11: Overall architecture of our proof

The rightmost column of Fig. 11 represents a chain of information order relations in the QoI, which we establish for each enforcement step. Following the chain from bottom to top, we obtain the security condition of Def. 7. At the same time, the “left boundary” of the figure, comprising the D.A., σstsubscript𝜎st\sigma_{\mathrm{st}}italic_σ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT abstractions and subscriptsquare-image-of-or-equals\sqsubseteq_{*}⊑ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT check, represents the computations that are actually performed to check a program.

Theorem 1.

If a program prgprg\mathrm{prg}roman_prg satisfies Def. 10, then it is secure in the sense of Def. 7.

Proof.

The statement follows from establishing the implications in the diagram of Fig. 11. The top left cell is Lemma 4; the top right cell is Lemma 2; and the bottom cell (dependency analysis) is Appendix E. ∎

VI Implementation and Evaluation

In this section, we describe our prototype DiVerT [22], which implements the type-based dependency analysis of Section V-B and query abstraction of Section V-C to verify the security of database-backed programs. We then evaluate DiVerT’s effectiveness using functional tests and an assortment of real-world-inspired use cases.

VI-A Implementation

To evaluate the feasibility and security of our approach in practice, we implemented the type-based dependency analysis of Section V-B. For the sake of practicality, instead of CQC, DiVerT uses the SELECT-FROM-WHERE portion of SQL, which is analogous to CQC as described in Section V-A. Following the query analysis of Section V-C, these SQL queries are then converted into symbolic tuples. For the security check, the symbolic tuples with the result of the program analysis must be compared to those representing the policy; to perform this comparison following Def. 9, we use the Z3 SMT solver [23]. Our implementation operates on programs in the language presented in Section IV-A, with the addition of two macros @Table@ and @Policy@ for defining the tables’ schema and the security policy.

VI-B Test suite

To validate our implementation, we use a functional test suite consisting of 20 programs, designed to capture a broad variety of examples of disjunctive dependencies. This suite includes programs with row- and column-level policies of varying granularity levels, and those necessitating the use of SMT solvers for verification. Furthermore, the tests verify the behaviour of the dependency analysis by incorporating complex conditionals, loops, and implicit and explicit outputs. The tests can be found in the implementation repository [22].

VI-C Use cases

We evaluate DiVerT on four use cases inspired by real-world problems in which disjunctive policies naturally arise. The purpose of this evaluation is to validate the security analysis of DiVerT on realistic scenarios involving disjunctive policies, and ensure that its behaviour is consistent with the definitions of Section IV-B. Rather than analysing complete applications for each example, we therefore focus on smaller kernels that capture the core security-critical behaviour of the respective problem.

Privacy-preserving location service. Multilateration is a technique to determine the location of a user by measuring their distance to known reference points [24]. Two distances are sufficient to narrow a user’s location down to one of two points on a map, and three identify the location unambiguously. Consider a location service provider which tracks, for some number of users, not only their precise location but also their distances to certain points of interest (PoI) such as restaurants or shops. An advertiser wants to query this service to provide location-based ads. For example, if the user is close to a shop A𝐴Aitalic_A, and A𝐴Aitalic_A has a sale going on, the user may be enticed by this information.

Privacy and business considerations make it desirable to not reveal the precise location of the user to the advertisement company accessing the database, while still allowing for some location-based services in this vein. If the advertiser were to learn the distance of a single user to two or more PoIs at a specific time, the user’s location could be inferred. However, we may still want to release the user’s distance to any one PoI which they are currently closest to. This can be interpreted as a disjunctive policy, in which the information revealed for each user is bounded by the disjunction of that user’s distances to some single PoI.

The database schema consists of a single table Distance(id, poi, dis, loc), which stores the ID of each user, the name of the PoI, their distance, and the user’s precise location. We implement a small example with two PoIs {‘restaurant’,‘mall’}‘restaurant’‘mall’\{\text{`restaurant'},\text{`mall'}\}{ ‘restaurant’ , ‘mall’ } and two users {1,2}12\{1,2\}{ 1 , 2 }. Let the view vi,jsubscript𝑣𝑖𝑗v_{i,j}italic_v start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT for each user i𝑖iitalic_i and PoI j𝑗jitalic_j be defined as the query SELECT id, poi FROM Distance WHERE id = i𝑖iitalic_i AND poi = j𝑗jitalic_j. The disjunctive policy then covers every combination of user and PoI as a possibility: {{v1,‘restaurant’,v2,‘restaurant’},{v1,‘restaurant’,v2,‘mall’},\{\{v_{1,\text{`restaurant'}},v_{2,\text{`restaurant'}}\},\{v_{1,\text{`% restaurant'}},v_{2,\text{`mall'}}\},{ { italic_v start_POSTSUBSCRIPT 1 , ‘restaurant’ end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 2 , ‘restaurant’ end_POSTSUBSCRIPT } , { italic_v start_POSTSUBSCRIPT 1 , ‘restaurant’ end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 2 , ‘mall’ end_POSTSUBSCRIPT } , {v1,‘mall’,v2,‘restaurant’},{v1,‘mall’,v2,‘mall’}}\{v_{1,\text{`mall'}},v_{2,\text{`restaurant'}}\},\{v_{1,\text{`mall'}},v_{2,% \text{`mall'}}\}\}{ italic_v start_POSTSUBSCRIPT 1 , ‘mall’ end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 2 , ‘restaurant’ end_POSTSUBSCRIPT } , { italic_v start_POSTSUBSCRIPT 1 , ‘mall’ end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 2 , ‘mall’ end_POSTSUBSCRIPT } }.

We test two programs against this policy. In one, the advertiser uses internal parameters identifying a target user and interest, and issues a single query requesting that user’s distance from the relevant point of interest. In the other, the advertiser still targets a particular user, but queries all of that user’s distances. As expected, DiVerT accepts the former program, but rejects the latter.

Privacy-preserving data publishing. Expanding upon the motivating example in the introduction, we consider the case of programs querying a database with personally identifiable information (i.e., quasi-identifiers). As discussed before, revealing too many quasi-identifiers may make it possible to identify an individual. We consider the example of a medical database [5] with a table Patients(zip, gen, dis) storing the ZIP code of residence, gender and disease of patients. An agent querying the database should not learn more than two of these at a time. For simplicity’s sake, we only consider queries that retrieve the same data from each patient. Defining v1=subscript𝑣1absentv_{1}=italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = SELECT dis, gen FROM Patients, v2=subscript𝑣2absentv_{2}=italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = SELECT zip, gen FROM Patients, and v3=subscript𝑣3absentv_{3}=italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = SELECT zip, dis FROM Patients, the disjunctive policy can then be written as {{v1},{v2},{v3}}subscript𝑣1subscript𝑣2subscript𝑣3\{\{v_{1}\},\{v_{2}\},\{v_{3}\}\}{ { italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } , { italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , { italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT } }.

Once again, we validate two programs against this policy. Branching on an internal parameter, the client will issue one query to select data for either male or female patients. In the first program, all queries take the form of SELECT dis FROM Patients WHERE gen = FF\text{`}{\mathrm{F}}\text{'}‘ roman_F ’, whereas in the second one, one of the queries additionally filters on the ZIP code: SELECT dis FROM Patients WHERE gen = FF\text{`}{\mathrm{F}}\text{'}‘ roman_F ’ AND zip = 10001100011000110001. Again, only the latter program is rejected by DiVerT. This reveals a potential subtlety, as data dependency and hence release of information may arise not only from what columns are selected, but also from conditions restricting the set of rows.

Secret sharing. We implement a (t,n)𝑡𝑛(t,n)( italic_t , italic_n ) secret sharing schema that splits a secret value s𝑠sitalic_s into n𝑛nitalic_n shares s1,s2,,snsubscript𝑠1subscript𝑠2subscript𝑠𝑛s_{1},s_{2},...,s_{n}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. These shares are then distributed among n𝑛nitalic_n parties p1,p2,,pnsubscript𝑝1subscript𝑝2subscript𝑝𝑛p_{1},p_{2},...,p_{n}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, each receiving a unique share. A secure secret sharing schema requires that the secret s𝑠sitalic_s can only be reconstructed if t𝑡titalic_t or more participants combine their shares. If the number of combined shares is less than t𝑡titalic_t, no information about the secret should be revealed. This requirement naturally translates to a disjunctive policy s1s2snsubscript𝑠1subscript𝑠2subscript𝑠𝑛s_{1}\vee s_{2}\vee...\vee s_{n}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∨ italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∨ … ∨ italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, stipulating that participants can each only learn one share.

We assume that the shares s1,s2,,snsubscript𝑠1subscript𝑠2subscript𝑠𝑛s_{1},s_{2},...,s_{n}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are created by a secure secret sharing schema and are then stored in a database. The database schema consists of the table Shares(shareID, shareVal) which stores the ID of each share and their corresponding value.

The policy only allows a user to read one of the shares (i.e., only one row of the table). We define the view visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for each share as SELECT shareVal, shareID FROM Shares WHERE shareID = i𝑖iitalic_i where i=1,,n𝑖1𝑛i=1,...,nitalic_i = 1 , … , italic_n. The corresponding disjunctive policy is going to look like {{v1},{v2},,{vn}}subscript𝑣1subscript𝑣2subscript𝑣𝑛\{\{v_{1}\},\{v_{2}\},...,\{v_{n}\}\}{ { italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } , { italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , … , { italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } }.

We implement a program that executes a subroutine for each user, issuing a database query to retrieve the user’s share. For example the query for a user to retrieve the share number 5555 is SELECT shareVal FROM Shares WHERE shareID = 5555 and it is correctly accepted by DiVerT. If the same user issues another query to retrieve share number 6666, it violates the policy and hence the program is rejected. This scenario shows that DiVerT is able to correctly enforce row-level policies precisely.

Online shop. This use case models an online shop and a user with a gift card can only use it to “buy” items that match the value of the gift card. Here we consider a scenario with an online shop that only provides digital items and they are stored in a database. The database schema consists of the items table Items(id, name, data) which stores the ID and name of each digital item. We define a view vnsubscript𝑣𝑛v_{n}italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for each item as SELECT data, name FROM Items WHERE name = n𝑛nitalic_n where n𝑛nitalic_n is the item’s name.

Assume a database that has the items Movie, CinemaTicket, Audiobook, Ebook, and GymMem. A policy should only allow the user to access a certain amount of items whose value adds up to value of gift card. For instance a disjunctive policy may look like: {{vMovie,vCinemaTicket},{vAudiobook,vEbook},{vGymMem},\{\{v_{\mathrm{Movie}},v_{\mathrm{CinemaTicket}}\},\{v_{\mathrm{Audiobook}},v_% {\mathrm{Ebook}}\},\{v_{\mathrm{GymMem}}\},{ { italic_v start_POSTSUBSCRIPT roman_Movie end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT roman_CinemaTicket end_POSTSUBSCRIPT } , { italic_v start_POSTSUBSCRIPT roman_Audiobook end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT roman_Ebook end_POSTSUBSCRIPT } , { italic_v start_POSTSUBSCRIPT roman_GymMem end_POSTSUBSCRIPT } , {vCinemaTicket,vEbook}}\{v_{\mathrm{CinemaTicket}},v_{\mathrm{Ebook}}\}\}{ italic_v start_POSTSUBSCRIPT roman_CinemaTicket end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT roman_Ebook end_POSTSUBSCRIPT } }.

We model a user program that issues queries to select items, e.g., SELECT data FROM Items WHERE name = MovieMovie\text{`}{\mathrm{Movie}}\text{'}‘ roman_Movie ’.

DiVerT accepts this query because view vMoviesubscript𝑣Moviev_{\mathrm{Movie}}italic_v start_POSTSUBSCRIPT roman_Movie end_POSTSUBSCRIPT allows the user to access Movie. We create two different scenarios; in one the user issues another query asking for Audiobook, which DiVerT rejects. In the second scenario, the user asks for CinemaTicket which is allowed by the policy, and hence DiVerT accepts it.

VII Related Work

This section puts our contributions in the context of related works in the areas of information flow security and database security, discussing security models of dependencies and tractable enforcement mechanisms. To our knowledge, we are the first to explore enforcement mechanisms for disjunctive policies, as well as to reconcile semantic models of (disjunctive) dependencies across the areas of information flow control and database access control.

Security models. Semantic models of dependencies have a long history since the introduction of the Lattice of Information (LoI) by Landauer and Redmond [7]. These models define a lattice structure to represent information as equivalence relations ordered by refinement and serve as cornerstone to justify soundness of various dependency analysis at the heart of enforcement mechanisms for security. For example, the universal lattice by Hunt and Sands [25] models dependencies between program variables such that the lattice elements are sets of variables ordered by set containment, and uses it to justify soundness against baseline security conditions, e.g., noninterference [26].

Within the database community, Bender et al. [8, 4] define the notion of Disclosure Lattice to represent the information disclosed by sets of database queries. Disclosure Lattice has been further developed by Guarnieri et al. [14] to enforce conjunctive information-flow policies for database-backed programs. We point out that not all disclosure orders are suitable to represent information disclosure in the context of information flow control: By studying its relation to LoI, we show that query determinacy and the stronger notion of equivalent query rewriting [19] provide sound abstraction, while query containment does not.

Our work builds on recent work by Hunt and Sands [9], which provides a semantic model for disjunctive dependencies, under the notion of the Quantale of Information. We study quantale structures in the context of databases, providing support for disjunctive policies in database-backed programs. While these policies are rooted in the area of access control, cf. ethical wall policies [27], the work of Hunt and Sands [9] is the first to provide an extensional characterization as information-flow policies. Drawing on our new notion of Determinacy Quantale, we develop a security condition to capture the security of database-backed programs in presence of disjunctive database policies.

Enforcement mechanisms. The problem of enforcing disjunctive policies for programs and/or databases is completely unexplored. We study how a standard type-based program analysis [15], equipped the notion of path sensitivity, can be adapted to statically capture disjunctive program dependencies.

At the core of our analysis is a new abstraction of database queries which enables flexible enforcement of disjunctive policies by means of SMT solvers, as witnessed by our use cases. An immediate benefit of our Determinacy Quantale is that we can prove soundness of the enforcement with respect to a solid semantic baseline for disjunctive dependencies.

There exists a wide array of works enforcing conjunctive policies for database-backed programs. Guarnieri et al. [14] propose dynamic monitoring to enforce database policies. Their abstractions are limited to boolean queries and rely on the Disclosure Lattice of Bender et al. [8, 4], which may cause soundness issues when assuming query containment as the underlying lattice order.

Language-integrated queries are supported by a range of works such as SIF [10] and JsLinq [12], SeLinks [11], UrFlow [28], DAISY [14], Jacqueline [29], and LWeb [13] for row- and column-level conjunctive policies. These works apply PL-based enforcement techniques such as type systems, dependent types, refinement types, and symbolic execution to database-backed programs [30, 31, 13, 14], but lack support for expressing and enforcing disjunctive policies.

Li and Zhang [32] explore path-sensitive program analysis to improve precision of information flow analysis, yet they do not consider disjunctive policies. QAPLA [33] is a database access control middleware supporting complex security policies, such as linking and aggregation policies, with focus only on access control.

VIII Conclusions

We presented a case for the significance of disjunctive dependency analysis to the security of database-backed programs. After reviewing recent theoretical developments in representing disjunctive information, we introduced two structures, the Determinacy Lattice and the Determinacy Quantale, as database-oriented counterparts to theoretical structures representing simple and disjunctive knowledge respectively.

Using these structures, we formulated a security condition which expresses that a database-backed program satisfies a given disjunctive policy. In order to enforce this security condition, we developed a type-based static analysis to compute a bound on the disjunctive dependencies of database-backed programs in a model language. By a series of approximations, this bound itself can be tractably compared to the representation of a static policy.

These steps constitute an enforcement mechanism for disjunctive policies, which we proved sound with respect to our security condition. To showcase this enforcement mechanism, we implemented it in our prototype tool, DiVerT. In order to validate this prototype and the overall framework, we verified the tool on a set of functional tests covering a variety of language features and disjunctive information patterns, as well as several use cases representing real-world scenarios in which we want to enforce disjunctive policies.

IX Acknowledgements

We are grateful to David Sands and Roberto Guanciale for fruitful discussions, and would also like to thank the anonymous reviewers for their insightful comments and feedback.

This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation, the Swedish Research Council (VR), and the Swedish Foundation for Strategic Research (SSF).

References

  • [1] E. Bertino and R. Sandhu, “Database security-concepts, approaches, and challenges,” IEEE Transactions on Dependable and Secure Computing, vol. 2, no. 1, pp. 2–19, 2005.
  • [2] A. Sabelfeld and A. C. Myers, “Language-based information-flow security,” IEEE Journal on Selected Areas in Communications, vol. 21, no. 1, pp. 5–19, 2003.
  • [3] M. Guarnieri, S. Marinovic, and D. Basin, “Strong and provably secure database access control,” in IEEE European Symposium on Security and Privacy (EuroS&P).   IEEE, 2016, pp. 163–178.
  • [4] G. Bender, L. Kot, and J. Gehrke, “Explainable security for relational databases,” in ACM SIGMOD International Conference on Management of data.   ACM, 2014, pp. 1411–1422.
  • [5] L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-based Systems, vol. 10, no. 05, pp. 557–570, 2002.
  • [6] C. Dwork, “Differential privacy,” in Automata, Languages and Programming: 33rd International Colloquium.   Springer, 2006, pp. 1–12.
  • [7] J. Landauer and T. Redmond, “A lattice of information,” in Proceedings Computer Security Foundations Workshop.   IEEE, 1993, pp. 65–70.
  • [8] G. M. Bender, L. Kot, J. Gehrke, and C. Koch, “Fine-grained disclosure control for app ecosystems,” in ACM SIGMOD International Conference on Management of Data.   ACM, 2013, pp. 869–880.
  • [9] S. Hunt and D. Sands, “A quantale of information,” in IEEE 34th Computer Security Foundations Symposium (CSF).   IEEE, 2021, pp. 1–15.
  • [10] S. Chong, K. Vikram, A. C. Myers et al., “Sif: Enforcing confidentiality and integrity in web applications,” in 16th USENIX Security Symposium (USENIX Security).   USENIX Association, 2007, pp. 1–16.
  • [11] B. J. Corcoran, N. Swamy, and M. Hicks, “Cross-tier, label-based security enforcement for web applications,” in ACM SIGMOD International Conference on Management of data.   ACM, 2009, pp. 269–282.
  • [12] M. Balliu, B. Liebe, D. Schoepe, and A. Sabelfeld, “Jslinq: Building secure applications across tiers,” in 6th ACM Conference on Data and Application Security and Privacy.   ACM, 2016, pp. 307–318.
  • [13] J. Parker, N. Vazou, and M. Hicks, “Lweb: Information flow security for multi-tier web applications,” Proceedings of the ACM on Programming Languages, vol. 3, no. POPL, pp. 1–30, 2019.
  • [14] M. Guarnieri, M. Balliu, D. Schoepe, D. Basin, and A. Sabelfeld, “Information-flow control for database-backed applications,” in IEEE European Symposium on Security and Privacy (EuroS&P).   IEEE, 2019, pp. 79–94.
  • [15] B. v. Delft, S. Hunt, and D. Sands, “Very static enforcement of dynamic policies,” in International Conference on Principles of Security and Trust.   Springer, 2015, pp. 32–52.
  • [16] I. Kaplansky, Set Theory and Metric Spaces.   AMS Chelsea Publishing, 2001.
  • [17] B. A. Davey and H. A. Priestley, Introduction to lattices and order.   Cambridge University Press, 2002.
  • [18] S. Abiteboul, R. Hull, and V. Vianu, Foundations of databases.   Addison-Wesley, 1995.
  • [19] A. Nash, L. Segoufin, and V. Vianu, “Views and queries: Determinacy and rewriting,” ACM Transactions on Database Systems (TODS), vol. 35, no. 3, pp. 1–41, 2010.
  • [20] Q. Wang and K. Yi, “Conjunctive queries with comparisons,” in International Conference on Management of Data.   ACM, 2022, pp. 108–121.
  • [21] A. Askarov, S. Hunt, A. Sabelfeld, and D. Sands, “Termination-insensitive noninterference leaks more than just a bit,” in 13th European Symposium on Research in Computer Security.   Springer, 2008, pp. 333–348.
  • [22] A. M. Ahmadian, M. Soloviev, and M. Balliu, “Divert,” 2023, software release. [Online]. Available: https://github.com/KTH-LangSec/DiVerT
  • [23] L. De Moura and N. Bjørner, “Z3: An efficient smt solver,” in International conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS).   Springer, 2008, pp. 337–340.
  • [24] W. Murphy and W. Hereman, “Determination of a position in three dimensions using trilateration and approximate distances,” Department of Mathematical and Computer Sciences, Colorado School of Mines, Golden, Colorado, MCS-95, vol. 7, p. 19, 1995.
  • [25] S. Hunt and D. Sands, “On flow-sensitive security types,” in 33rd SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL.   ACM, 2006, pp. 79–90.
  • [26] J. A. Goguen and J. Meseguer, “Security policies and security models,” in IEEE Symposium on Security and Privacy (S&P).   IEEE, 1982, pp. 11–20.
  • [27] D. F. Brewer and M. J. Nash, “The chinese wall security policy.” in IEEE Symposium on Security and Privacy (S&P).   IEEE, 1989, pp. 206–214.
  • [28] A. Chlipala, “Static checking of dynamically-varying security policies in database-backed applications,” in 9th USENIX Symposium on Operating Systems Design and Implementation.   USENIX Association, 2010, pp. 105–118.
  • [29] J. Yang, T. Hance, T. H. Austin, A. Solar-Lezama, C. Flanagan, and S. Chong, “Precise, dynamic information flow for database-backed applications,” ACM SIGPLAN Notices, vol. 51, no. 6, pp. 631–647, 2016.
  • [30] N. Swamy, B. J. Corcoran, and M. Hicks, “Fable: A language for enforcing user-defined security policies,” in IEEE Symposium on Security and Privacy (S&P).   IEEE, 2008, pp. 369–383.
  • [31] L. Lourenço and L. Caires, “Dependent information flow types,” in 42nd SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL.   ACM, 2015, pp. 317–328.
  • [32] P. Li and D. Zhang, “Towards a flow- and path-sensitive information flow analysis,” in 30th IEEE Computer Security Foundations Symposium (CSF).   IEEE, 2017, pp. 53–67.
  • [33] A. Mehta, E. Elnikety, K. Harvey, D. Garg, and P. Druschel, “Qapla: Policy compliance for database-backed systems,” in 26th USENIX Security Symposium (USENIX Security).   USENIX Association, 2017, pp. 1463–1479.

Appendix A Interpretations of Query Determinacy

We prove the following technical lemma to show that the two intuitive interpretations of the definition of query determinacy are equivalent.

Lemma 5.

If A𝐴Aitalic_A is recursively enumerable and f:AB:𝑓𝐴𝐵f:A\rightarrow Bitalic_f : italic_A → italic_B and g:AC:𝑔𝐴𝐶g:A\rightarrow Citalic_g : italic_A → italic_C are computable, then the following are equivalent:

  • (i)

    For all a,aA𝑎superscript𝑎𝐴a,a^{\prime}\in Aitalic_a , italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_A, if f(a)=f(a)𝑓𝑎𝑓superscript𝑎f(a)=f(a^{\prime})italic_f ( italic_a ) = italic_f ( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), then g(a)=g(a)𝑔𝑎𝑔superscript𝑎g(a)=g(a^{\prime})italic_g ( italic_a ) = italic_g ( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ).

  • (ii)

    There exists a computable h:BC:𝐵𝐶h:B\rightarrow Citalic_h : italic_B → italic_C such that for all aA𝑎𝐴a\in Aitalic_a ∈ italic_A, g(a)=h(f(a))𝑔𝑎𝑓𝑎g(a)=h(f(a))italic_g ( italic_a ) = italic_h ( italic_f ( italic_a ) ).

Proof.

(ii)\Rightarrow(i): Suppose b=f(a)=f(a)𝑏𝑓𝑎𝑓superscript𝑎b=f(a)=f(a^{\prime})italic_b = italic_f ( italic_a ) = italic_f ( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), and hhitalic_h is as in (ii). Then g(a)=h(f(a))=h(b)𝑔𝑎𝑓𝑎𝑏g(a)=h(f(a))=h(b)italic_g ( italic_a ) = italic_h ( italic_f ( italic_a ) ) = italic_h ( italic_b ), and g(a)=h(f(a))=h(b)𝑔superscript𝑎𝑓superscript𝑎𝑏g(a^{\prime})=h(f(a^{\prime}))=h(b)italic_g ( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_h ( italic_f ( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) = italic_h ( italic_b ).

(i)\Rightarrow(ii): Let f^:BA:^𝑓𝐵𝐴\hat{f}:B\rightharpoonup Aover^ start_ARG italic_f end_ARG : italic_B ⇀ italic_A be the partial function that enumerates A𝐴Aitalic_A and for a given bB𝑏𝐵b\in Bitalic_b ∈ italic_B returns the first aA𝑎𝐴a\in Aitalic_a ∈ italic_A it finds such that f(a)=b𝑓𝑎𝑏f(a)=bitalic_f ( italic_a ) = italic_b. This is computable, per the algorithmic description provided. This does not necessarily satisfy f^(f(a))=a^𝑓𝑓𝑎𝑎\hat{f}(f(a))=aover^ start_ARG italic_f end_ARG ( italic_f ( italic_a ) ) = italic_a, but we do have f(f^(f(a)))=f(a)𝑓^𝑓𝑓𝑎𝑓𝑎f(\hat{f}(f(a)))=f(a)italic_f ( over^ start_ARG italic_f end_ARG ( italic_f ( italic_a ) ) ) = italic_f ( italic_a ) by definition (since the enumeration of A𝐴Aitalic_A will either encounter a𝑎aitalic_a or another asuperscript𝑎a^{\prime}italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that f(a)=f(a)𝑓superscript𝑎𝑓𝑎f(a^{\prime})=f(a)italic_f ( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_f ( italic_a ) eventually). Hence g(f^(f(a)))=g(a)𝑔^𝑓𝑓𝑎𝑔𝑎g(\hat{f}(f(a)))=g(a)italic_g ( over^ start_ARG italic_f end_ARG ( italic_f ( italic_a ) ) ) = italic_g ( italic_a ) by (i). So defining hhitalic_h by h(b)=g(f^(b))𝑏𝑔^𝑓𝑏h(b)=g(\hat{f}(b))italic_h ( italic_b ) = italic_g ( over^ start_ARG italic_f end_ARG ( italic_b ) ), we find that h(f(a))=g(a)𝑓𝑎𝑔𝑎h(f(a))=g(a)italic_h ( italic_f ( italic_a ) ) = italic_g ( italic_a ) as required. ∎

Instantiating Lemma 5 with A𝐴Aitalic_A as the set of possible databases, f𝑓fitalic_f as the function rQ(db)={qdbqQ}r_{Q}(db)=\{{\llbracket{q}\rrbracket^{db}}\mid q\in Q\}italic_r start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_d italic_b ) = { ⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT ∣ italic_q ∈ italic_Q } that computes the results of the queries in Q𝑄Qitalic_Q on db𝑑𝑏dbitalic_d italic_b, and g𝑔gitalic_g as the same for Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we find that Q𝑄Qitalic_Q determining Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT indeed means that the (results of) queries in Q𝑄Qitalic_Q are always sufficient to determine (compute) the result of the queries in Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

Appendix B Relation Between DL and LoI

We first prove some auxiliary lemmas, and then proceed to prove Lemma 1.

Lemma 6.

For sets of queries Q1,Q2DL(𝒬)subscript𝑄1subscript𝑄2𝐷𝐿𝒬Q_{1},Q_{2}\in DL(\mathcal{Q})italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_D italic_L ( caligraphic_Q ), the ordering Q1Q2{\downarrow}Q_{1}\sqsubseteq{\downarrow}Q_{2}↓ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊑ ↓ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT on the DL implies Q1Q2square-image-of-or-equalssubscriptsubscript𝑄1similar-tosubscriptsubscript𝑄2similar-to{Q_{1}}_{\sim}\sqsubseteq{Q_{2}}_{\sim}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ⊑ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT on the LoI defined on {QQDL(𝒬)}conditional-setsubscript𝑄similar-to𝑄𝐷𝐿𝒬\{{Q}_{\sim}\mid Q\in DL(\mathcal{Q})\}{ italic_Q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ∣ italic_Q ∈ italic_D italic_L ( caligraphic_Q ) }:

Q1Q2Q1Q2\displaystyle{\downarrow}Q_{1}\sqsubseteq{\downarrow}Q_{2}\rightarrow{Q_{1}}_{% \sim}\sqsubseteq{Q_{2}}_{\sim}↓ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊑ ↓ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT → italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ⊑ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT
Proof.

The definition of the ordering relation of the LoI (Section II) and Q1Q2square-image-of-or-equalssubscriptsubscript𝑄1similar-tosubscriptsubscript𝑄2similar-to{Q_{1}}_{\sim}\sqsubseteq{Q_{2}}_{\sim}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ⊑ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT would give us:

Q1square-image-of-or-equalssubscriptsubscript𝑄1similar-toabsent\displaystyle{Q_{1}}_{\sim}\sqsubseteq\leavevmode\nobreak\ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ⊑ Q2subscriptsubscript𝑄2similar-toabsent\displaystyle{Q_{2}}_{\sim}\rightarrowitalic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT →
db,dbΩD(dbQ1dbdbQ2db)formulae-sequencefor-all𝑑𝑏𝑑superscript𝑏subscriptΩ𝐷𝑑𝑏subscriptsubscript𝑄1similar-to𝑑superscript𝑏𝑑𝑏subscriptsubscript𝑄2similar-to𝑑superscript𝑏\displaystyle\forall db,db^{\prime}\in\Omega_{D}\ \ (db\ {Q_{1}}_{\sim}\ db^{% \prime}\Rightarrow db\ {Q_{2}}_{\sim}\ db^{\prime})∀ italic_d italic_b , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( italic_d italic_b italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⇒ italic_d italic_b italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) (1)

By the definition of equivalence relations for query sets (Qsubscript𝑄similar-to{Q}_{\sim}italic_Q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT), for all db,dbΩD𝑑𝑏𝑑superscript𝑏subscriptΩ𝐷db,db^{\prime}\in\Omega_{D}italic_d italic_b , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT we have:

(dbQ1dbdbQ2db)𝑑𝑏subscriptsubscript𝑄1similar-to𝑑superscript𝑏𝑑𝑏subscriptsubscript𝑄2similar-to𝑑superscript𝑏absent\displaystyle(db\ {Q_{1}}_{\sim}\ db^{\prime}\Rightarrow db\ {Q_{2}}_{\sim}\ % db^{\prime})\rightarrow( italic_d italic_b italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⇒ italic_d italic_b italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) →
((q2db=q2dbq2Q2)(q1db=q1dbq1Q1))\displaystyle\Big{(}({\llbracket{q_{2}}\rrbracket^{db}}={\llbracket{q_{2}}% \rrbracket^{db^{\prime}}}\forall q_{2}\in Q_{2})\Rightarrow({\llbracket{q_{1}}% \rrbracket^{db}}={\llbracket{q_{1}}\rrbracket^{db^{\prime}}}\forall q_{1}\in Q% _{1})\Big{)}( ( ⟦ italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∀ italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⇒ ( ⟦ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∀ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) (2)

(1) and (2) would give us:

Q1Q2db,dbΩDformulae-sequencesquare-image-of-or-equalssubscriptsubscript𝑄1similar-tosubscriptsubscript𝑄2similar-tofor-all𝑑𝑏𝑑superscript𝑏subscriptΩ𝐷\displaystyle{Q_{1}}_{\sim}\sqsubseteq{Q_{2}}_{\sim}\rightarrow\forall db,db^{% \prime}\in\Omega_{D}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ⊑ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT → ∀ italic_d italic_b , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT
((q2db=q2dbq2Q2)(q1db=q1dbq1Q1))\displaystyle\Big{(}({\llbracket{q_{2}}\rrbracket^{db}}={\llbracket{q_{2}}% \rrbracket^{db^{\prime}}}\forall q_{2}\in Q_{2})\Rightarrow({\llbracket{q_{1}}% \rrbracket^{db}}={\llbracket{q_{1}}\rrbracket^{db^{\prime}}}\forall q_{1}\in Q% _{1})\Big{)}( ( ⟦ italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∀ italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⇒ ( ⟦ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∀ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) (3)

On the other hand, by the definition of the Determinacy Lattice 2, we have Q1Q2Q1Q2{\downarrow}Q_{1}\sqsubseteq{\downarrow}Q_{2}\leftrightarrow Q_{1}\preceq Q_{2}↓ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊑ ↓ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ↔ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⪯ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. From the definition of determinacy ordering, Q1Q2precedes-or-equalssubscript𝑄1subscript𝑄2Q_{1}\preceq Q_{2}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⪯ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT means Q2Q1subscript𝑄2subscript𝑄1Q_{2}\twoheadrightarrow Q_{1}italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ↠ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. By the definition of query determinacy (Def. 1) we know that Q2Q1subscript𝑄2subscript𝑄1Q_{2}\twoheadrightarrow Q_{1}italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ↠ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT if:

db,dbΩDfor-all𝑑𝑏𝑑superscript𝑏subscriptΩ𝐷\displaystyle\forall db,db^{\prime}\in\Omega_{D}∀ italic_d italic_b , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT
((q2db=q2dbq2Q2)(q1db=q1dbq1Q1))\displaystyle\Big{(}({\llbracket{q_{2}}\rrbracket^{db}}={\llbracket{q_{2}}% \rrbracket^{db^{\prime}}}\forall q_{2}\in Q_{2})\Rightarrow({\llbracket{q_{1}}% \rrbracket^{db}}={\llbracket{q_{1}}\rrbracket^{db^{\prime}}}\forall q_{1}\in Q% _{1})\Big{)}( ( ⟦ italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∀ italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⇒ ( ⟦ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∀ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) (4)

It is evident from (3) and (4) that Q1Q2Q1Q2{\downarrow}Q_{1}\sqsubseteq{\downarrow}Q_{2}\rightarrow{Q_{1}}_{\sim}% \sqsubseteq{Q_{2}}_{\sim}↓ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊑ ↓ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT → italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ⊑ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT holds. ∎

Relying on Def. 6 to establish the set of equivalence relations derived from a set of sets of queries, we propose following lemma:

Lemma 7.

For any set of sets of queries DL(𝒬)𝐷𝐿𝒬\mathbb{Q}\subseteq DL(\mathcal{Q})blackboard_Q ⊆ italic_D italic_L ( caligraphic_Q ), the join of \mathbb{Q}blackboard_Q on the DL implies the join of delimited-⟦⟧\llbracket{\mathbb{Q}}\rrbracket⟦ blackboard_Q ⟧ on the LoI defined on {QQDL(𝒬)}conditional-setsubscript𝑄similar-to𝑄𝐷𝐿𝒬\{{Q}_{\sim}\mid Q\in DL(\mathcal{Q})\}{ italic_Q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ∣ italic_Q ∈ italic_D italic_L ( caligraphic_Q ) }:

\displaystyle\bigsqcup\mathbb{Q}\rightarrow\bigsqcup\llbracket{\mathbb{Q}}\rrbracket⨆ blackboard_Q → ⨆ ⟦ blackboard_Q ⟧
Proof.

Assume there is a set of queries RDL(𝒬)𝑅𝐷𝐿𝒬R\in DL(\mathcal{Q})italic_R ∈ italic_D italic_L ( caligraphic_Q ) such that R=𝑅square-unionR=\bigsqcup\mathbb{Q}italic_R = ⨆ blackboard_Q.

By the definition of the Determinacy Lattice III-B, we have =\bigsqcup\mathbb{Q}={\downarrow}\bigcup\mathbb{Q}⨆ blackboard_Q = ↓ ⋃ blackboard_Q which would give us R=R={\downarrow}\bigcup\mathbb{Q}italic_R = ↓ ⋃ blackboard_Q. By the definitions of {\downarrow} and query determinacy(Def. 1), it is straightforward to see ()(\bigcup\mathbb{Q})\twoheadrightarrow{\downarrow}\bigcup\mathbb{Q}( ⋃ blackboard_Q ) ↠ ↓ ⋃ blackboard_Q and ()absent{\downarrow}\bigcup\mathbb{Q}\twoheadrightarrow(\bigcup\mathbb{Q})↓ ⋃ blackboard_Q ↠ ( ⋃ blackboard_Q ). Replacing absent{\downarrow}\bigcup\mathbb{Q}↓ ⋃ blackboard_Q with R𝑅Ritalic_R, by the definition of query determinacy (Def. 1) we have R()𝑅R\twoheadrightarrow(\bigcup\mathbb{Q})italic_R ↠ ( ⋃ blackboard_Q ):

db,dbΩDfor-all𝑑𝑏𝑑superscript𝑏subscriptΩ𝐷\displaystyle\forall db,db^{\prime}\in\Omega_{D}∀ italic_d italic_b , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT
(rR.rdb=rdbp.pdb=pdb)\displaystyle\Big{(}\forall r\in R.\ {\llbracket{r}\rrbracket^{db}}={% \llbracket{r}\rrbracket^{db^{\prime}}}\rightarrow\forall p\in\bigcup\mathbb{Q}% .\ {\llbracket{p}\rrbracket^{db}}={\llbracket{p}\rrbracket^{db^{\prime}}}\Big{)}( ∀ italic_r ∈ italic_R . ⟦ italic_r ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_r ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT → ∀ italic_p ∈ ⋃ blackboard_Q . ⟦ italic_p ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_p ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) (1)

and ()R𝑅(\bigcup\mathbb{Q})\twoheadrightarrow R( ⋃ blackboard_Q ) ↠ italic_R:

db,dbΩDfor-all𝑑𝑏𝑑superscript𝑏subscriptΩ𝐷\displaystyle\forall db,db^{\prime}\in\Omega_{D}∀ italic_d italic_b , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT
(p.pdb=pdbrR.rdb=rdb)\displaystyle\Big{(}\forall p\in\bigcup\mathbb{Q}.\ {\llbracket{p}\rrbracket^{% db}}={\llbracket{p}\rrbracket^{db^{\prime}}}\rightarrow\forall r\in R.\ {% \llbracket{r}\rrbracket^{db}}={\llbracket{r}\rrbracket^{db^{\prime}}}\Big{)}( ∀ italic_p ∈ ⋃ blackboard_Q . ⟦ italic_p ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_p ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT → ∀ italic_r ∈ italic_R . ⟦ italic_r ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_r ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) (2)

(1) and (2) would give us:

db,dbΩDfor-all𝑑𝑏𝑑superscript𝑏subscriptΩ𝐷\displaystyle\forall db,db^{\prime}\in\Omega_{D}∀ italic_d italic_b , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT
(rR.rdb=rdbp.pdb=pdb)\displaystyle(\forall r\in R.\ {\llbracket{r}\rrbracket^{db}}={\llbracket{r}% \rrbracket^{db^{\prime}}}\leftrightarrow\forall p\in\bigcup\mathbb{Q}.\ {% \llbracket{p}\rrbracket^{db}}={\llbracket{p}\rrbracket^{db^{\prime}}})( ∀ italic_r ∈ italic_R . ⟦ italic_r ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_r ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ↔ ∀ italic_p ∈ ⋃ blackboard_Q . ⟦ italic_p ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_p ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) (3)

Assume \bigsqcup\llbracket{\mathbb{Q}}\rrbracket⨆ ⟦ blackboard_Q ⟧ is an equivalence relation Rsuperscript𝑅R^{\prime}italic_R start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. By the definition of the join of the LoI (Section II):

=db,dbΩD(dbRdbQ.dbQdb)\displaystyle\bigsqcup\llbracket{\mathbb{Q}}\rrbracket=\forall db,db^{\prime}% \in\Omega_{D}\ (db\ {R^{\prime}}_{\sim}\ db^{\prime}\leftrightarrow\forall Q% \in\mathbb{Q}.\ db\ {Q}_{\sim}\ db^{\prime})⨆ ⟦ blackboard_Q ⟧ = ∀ italic_d italic_b , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( italic_d italic_b italic_R start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ↔ ∀ italic_Q ∈ blackboard_Q . italic_d italic_b italic_Q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )

and by the definition of equivalence relations for query sets, for all db,dbΩD𝑑𝑏𝑑superscript𝑏subscriptΩ𝐷db,db^{\prime}\in\Omega_{D}italic_d italic_b , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT we have:

=\displaystyle\bigsqcup\llbracket{\mathbb{Q}}\rrbracket=⨆ ⟦ blackboard_Q ⟧ =
(rR.rdb=rdbQ.qQ.qdb=qdb)=\displaystyle(\forall r\in R^{\prime}.\ {\llbracket{r}\rrbracket^{db}}={% \llbracket{r}\rrbracket^{db^{\prime}}}\leftrightarrow\forall Q\in\mathbb{Q}.\ % \forall q\in Q.\ {\llbracket{q}\rrbracket^{db}}={\llbracket{q}\rrbracket^{db^{% \prime}}})=( ∀ italic_r ∈ italic_R start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . ⟦ italic_r ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_r ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ↔ ∀ italic_Q ∈ blackboard_Q . ∀ italic_q ∈ italic_Q . ⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) =
(rR.rdb=rdbp.pdb=pdb)\displaystyle(\forall r\in R^{\prime}.\ {\llbracket{r}\rrbracket^{db}}={% \llbracket{r}\rrbracket^{db^{\prime}}}\leftrightarrow\forall p\in\bigcup% \mathbb{Q}.\ {\llbracket{p}\rrbracket^{db}}={\llbracket{p}\rrbracket^{db^{% \prime}}})( ∀ italic_r ∈ italic_R start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . ⟦ italic_r ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_r ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ↔ ∀ italic_p ∈ ⋃ blackboard_Q . ⟦ italic_p ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_p ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) (4)

(3) and (4) would allow us to conclude R=R𝑅superscript𝑅R=R^{\prime}italic_R = italic_R start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, hence \bigsqcup\mathbb{Q}\rightarrow\bigsqcup\llbracket{\mathbb{Q}}\rrbracket⨆ blackboard_Q → ⨆ ⟦ blackboard_Q ⟧. ∎

See 1

Proof.

To prove this homomorphism, we need to show that the Determinacy Lattice’s ordering and join, as well as the top and bottom elements imply their LoI counterparts. Lemmas 6 and 7 provide the proofs of ordering and join. The proof for top and bottom elements:

  • 𝒬(𝒬){\downarrow}\mathcal{Q}\rightarrow{({\downarrow}\mathcal{Q})}_{\sim}↓ caligraphic_Q → ( ↓ caligraphic_Q ) start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT

  • (){\downarrow}\varnothing\rightarrow{({\downarrow}\varnothing)}_{\sim}↓ ∅ → ( ↓ ∅ ) start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT

follows trivially from the definition of {\downarrow} and . ∎

Appendix C Determinacy Quantale Axioms

We follow the approach of [9] to prove that our definition of the Determinacy Quantale is indeed a quantale. We begin by defining what is a quantale.

Definition 11.

A quantale is a structure ,,,,1square-image-of-or-equalstensor-product1\langle\mathcal{L},\sqsubseteq,\vee,\otimes,1\rangle⟨ caligraphic_L , ⊑ , ∨ , ⊗ , 1 ⟩ such that:

  1. 1.

    ,,square-image-of-or-equals\langle\mathcal{L},\sqsubseteq,\vee\rangle⟨ caligraphic_L , ⊑ , ∨ ⟩ is a complete join-semilattice

  2. 2.

    ,,1tensor-product1\langle\mathcal{L},\otimes,1\rangle⟨ caligraphic_L , ⊗ , 1 ⟩ is monoid, that is tensor-product\otimes is associative and x,x1=x=1xformulae-sequencefor-all𝑥tensor-product𝑥1𝑥tensor-product1𝑥\forall x\in\mathcal{L},x\otimes 1=x=1\otimes x∀ italic_x ∈ caligraphic_L , italic_x ⊗ 1 = italic_x = 1 ⊗ italic_x

  3. 3.

    tensor-product\otimes distributes over \vee.

A quantale is called commutative when its tensor-product\otimes operator is commutative [9].

Next, we prove some lemmas that are later used in the proof of Theorem 2.

Lemma 8.

Both mixmix\mathrm{mix}roman_mix and tctc\mathrm{tc}roman_tc are closure operators.

Proof.

A closure operator is a function f:𝒫(A)𝒫(A):𝑓𝒫𝐴𝒫𝐴f:\mathcal{P}{(A)}\rightarrow\mathcal{P}{(A)}italic_f : caligraphic_P ( italic_A ) → caligraphic_P ( italic_A ) from the power set of domain A𝐴Aitalic_A to itself that satisfies the following properties for all sets X,YA𝑋𝑌𝐴X,Y\subseteq Aitalic_X , italic_Y ⊆ italic_A:

  • f𝑓fitalic_f is extensive: Xf(X)𝑋f𝑋X\subseteq\operatorname{f}(X)italic_X ⊆ roman_f ( italic_X )

  • f𝑓fitalic_f is increasing: XYf(X)f(Y)𝑋𝑌𝑓𝑋𝑓𝑌X\subseteq Y\Rightarrow f(X)\subseteq f(Y)italic_X ⊆ italic_Y ⇒ italic_f ( italic_X ) ⊆ italic_f ( italic_Y )

  • f𝑓fitalic_f is idempotent: f(f(X))=f(X)𝑓𝑓𝑋𝑓𝑋f(f(X))=f(X)italic_f ( italic_f ( italic_X ) ) = italic_f ( italic_X )

It is straightforward to show that both mixmix\mathrm{mix}roman_mix and tctc\mathrm{tc}roman_tc satisfy these conditions. ∎

Definition 12.

For a closure operator {\downarrow} defined on the domain A𝐴Aitalic_A, and a function F:AA:𝐹𝐴𝐴F:A\rightarrow Aitalic_F : italic_A → italic_A, say that F𝐹Fitalic_F weakly commutes with {\downarrow} if F(cl(X))cl(F(X))𝐹𝑐𝑙𝑋𝑐𝑙𝐹𝑋F(cl(X))\subseteq cl(F(X))italic_F ( italic_c italic_l ( italic_X ) ) ⊆ italic_c italic_l ( italic_F ( italic_X ) ) for all XA𝑋𝐴X\subseteq Aitalic_X ⊆ italic_A.

Lemma 9.

Let :AA{\downarrow}:A\rightarrow A↓ : italic_A → italic_A be a closure operator and let X,YA𝑋𝑌𝐴X,Y\subseteq Aitalic_X , italic_Y ⊆ italic_A. Suppose that F:AA:𝐹𝐴𝐴F:A\rightarrow Aitalic_F : italic_A → italic_A weakly commutes with {\downarrow} and that G:A×AA:𝐺𝐴𝐴𝐴G:A\times A\rightarrow Aitalic_G : italic_A × italic_A → italic_A weakly commutes with {\downarrow} in each argument. Then:

  1. 1.

    (F((X)))=(F(X)){\downarrow}(F({\downarrow}(X)))={\downarrow}(F(X))↓ ( italic_F ( ↓ ( italic_X ) ) ) = ↓ ( italic_F ( italic_X ) )

  2. 2.

    (G((X)×(Y)))=(G(X×Y)){\downarrow}(G({\downarrow}(X)\times{\downarrow}(Y)))={\downarrow}(G(X\times Y))↓ ( italic_G ( ↓ ( italic_X ) × ↓ ( italic_Y ) ) ) = ↓ ( italic_G ( italic_X × italic_Y ) )

Proof.

Routine, following the properties of closure operator. ∎

Lemma 10.

Let ,DL(𝒬)𝐷𝐿𝒬\mathbb{P},\mathbb{Q}\subseteq DL(\mathcal{Q})blackboard_P , blackboard_Q ⊆ italic_D italic_L ( caligraphic_Q ), the union operator \cup weakly commutes with tctc\mathrm{tc}roman_tc:

tc(tc()tc())=tc()tctctctc\displaystyle\mathrm{tc}(\mathrm{tc}(\mathbb{P})\cup\mathrm{tc}(\mathbb{Q}))=% \mathrm{tc}(\mathbb{P}\cup\mathbb{Q})roman_tc ( roman_tc ( blackboard_P ) ∪ roman_tc ( blackboard_Q ) ) = roman_tc ( blackboard_P ∪ blackboard_Q )
Proof.

It suffices to show tc(tc()tc())tctctc\mathbb{R}\in\mathrm{tc}(\mathrm{tc}(\mathbb{P})\cup\mathrm{tc}(\mathbb{Q}))blackboard_R ∈ roman_tc ( roman_tc ( blackboard_P ) ∪ roman_tc ( blackboard_Q ) ) iff tc()tc\mathbb{R}\in\mathrm{tc}(\mathbb{P}\cup\mathbb{Q})blackboard_R ∈ roman_tc ( blackboard_P ∪ blackboard_Q ), which follows easily from the definitions of \cup and tctc\mathrm{tc}roman_tc. ∎

Lemma 11.

The join operator of DL weakly commutes with tctc\mathrm{tc}roman_tc in each argument

Proof.

Let P,QDL(𝒬)𝑃𝑄𝐷𝐿𝒬P,Q\in DL(\mathcal{Q})italic_P , italic_Q ∈ italic_D italic_L ( caligraphic_Q ), and let 𝕊DL(𝒬)𝕊𝐷𝐿𝒬\mathbb{S}\subseteq DL(\mathcal{Q})blackboard_S ⊆ italic_D italic_L ( caligraphic_Q ). If Qsubscript𝑄similar-to{Q}_{\sim}italic_Q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT is tiled by 𝕊delimited-⟦⟧𝕊\llbracket{\mathbb{S}}\rrbracket⟦ blackboard_S ⟧ then PQsquare-unionsubscript𝑃similar-tosubscript𝑄similar-to{P}_{\sim}\sqcup{Q}_{\sim}italic_P start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ⊔ italic_Q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT is tiled by {PRR𝕊}\{{P}_{\sim}\sqcup R\mid R\in\llbracket{\mathbb{S}}\rrbracket\}{ italic_P start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ⊔ italic_R ∣ italic_R ∈ ⟦ blackboard_S ⟧ }. This follows easily from the definition of the equivalence relation induced by a query (i.e., ), mixmix\mathrm{mix}roman_mix, Lemma 7 and the fact that [PQ]={ABA[P],B[Q]}delimited-[]square-unionsubscript𝑃similar-tosubscript𝑄similar-toconditional-set𝐴𝐵formulae-sequence𝐴delimited-[]subscript𝑃similar-to𝐵delimited-[]subscript𝑄similar-to[{P}_{\sim}\sqcup{Q}_{\sim}]=\{A\cap B\mid A\in[{P}_{\sim}],B\in[{Q}_{\sim}]\}\setminus\varnothing[ italic_P start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ⊔ italic_Q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ] = { italic_A ∩ italic_B ∣ italic_A ∈ [ italic_P start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ] , italic_B ∈ [ italic_Q start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ] } ∖ ∅. ∎

Lemma 12.

Given two sets of sets of queries ,DL(𝒬)𝐷𝐿𝒬\mathbb{Q},\mathbb{P}\subseteq DL(\mathcal{Q})blackboard_Q , blackboard_P ⊆ italic_D italic_L ( caligraphic_Q ) it holds that:

tc()tc()=tensor-producttctctensor-product\displaystyle\mathrm{tc}(\mathbb{Q})\otimes\mathrm{tc}(\mathbb{P})=\mathbb{Q}% \otimes\mathbb{P}roman_tc ( blackboard_Q ) ⊗ roman_tc ( blackboard_P ) = blackboard_Q ⊗ blackboard_P
Proof.

By Lemma 11 we know that the join operator of DL weakly commutes with tctc\mathrm{tc}roman_tc in each argument. We apply this lemma to the definition of tensor-product\otimes operator:

tc()tc()=tensor-producttctcabsent\displaystyle\mathrm{tc}(\mathbb{Q})\otimes\mathrm{tc}(\mathbb{P})=roman_tc ( blackboard_Q ) ⊗ roman_tc ( blackboard_P ) =
tc(Qtc(),Ptc()(QP))=tcsubscriptformulae-sequence𝑄tc𝑃tcsquare-union𝑄𝑃absent\displaystyle\mathrm{tc}(\bigcup_{Q\in\mathrm{tc}(\mathbb{Q}),P\in\mathrm{tc}(% \mathbb{P})}(Q\sqcup P))=roman_tc ( ⋃ start_POSTSUBSCRIPT italic_Q ∈ roman_tc ( blackboard_Q ) , italic_P ∈ roman_tc ( blackboard_P ) end_POSTSUBSCRIPT ( italic_Q ⊔ italic_P ) ) =
tc(Q,P(QP))=tcsubscriptformulae-sequence𝑄𝑃square-union𝑄𝑃absent\displaystyle\mathrm{tc}(\bigcup_{Q\in\mathbb{Q},P\in\mathbb{P}}(Q\sqcup P))=roman_tc ( ⋃ start_POSTSUBSCRIPT italic_Q ∈ blackboard_Q , italic_P ∈ blackboard_P end_POSTSUBSCRIPT ( italic_Q ⊔ italic_P ) ) =
tensor-product\displaystyle\mathbb{Q}\otimes\mathbb{P}blackboard_Q ⊗ blackboard_P

Lemma 13.

Given two sets of sets of queries ,DL(𝒬)𝐷𝐿𝒬\mathbb{Q},\mathbb{P}\subseteq DL(\mathcal{Q})blackboard_Q , blackboard_P ⊆ italic_D italic_L ( caligraphic_Q ) it holds that:

=\displaystyle\mathbb{Q}\vee\mathbb{P}=\mathbb{P}\vee\mathbb{Q}blackboard_Q ∨ blackboard_P = blackboard_P ∨ blackboard_Q
Proof.

Follows directly from the definition of \vee in the DL and the commutativity of union operator \cup.

=absent\displaystyle\mathbb{Q}\vee\mathbb{P}=blackboard_Q ∨ blackboard_P =
()=absentabsent\displaystyle{\downarrow}(\mathbb{Q}\cup\mathbb{P})=↓ ( blackboard_Q ∪ blackboard_P ) =
()=absentabsent\displaystyle{\downarrow}(\mathbb{P}\cup\mathbb{Q})=↓ ( blackboard_P ∪ blackboard_Q ) =
\displaystyle\mathbb{P}\vee\mathbb{Q}blackboard_P ∨ blackboard_Q

Now, we show that DQ in Def. 5 is a quantale.

Theorem 2.

The Determinacy Quantale is a commutative quantale.

Proof.

We have to show that our definition of Determinacy Quantale respects the quantale axioms of Def. 11.

  1. 1.

    Showing ,,square-image-of-or-equals\langle\mathcal{I},\sqsubseteq,\vee\rangle⟨ caligraphic_I , ⊑ , ∨ ⟩ is a complete join-semilattice is straightforward following Lemma 8 and the fact that tctc\mathrm{tc}roman_tc is a closure operator.

  2. 2.

    We should show that tensor-product\otimes is associative and 1111 is a unit:

    1. a.

      For the associativity of tensor-product\otimes we need to show that ()=()tensor-producttensor-producttensor-producttensor-product\mathbb{P}\otimes(\mathbb{Q}\otimes\mathbb{R})=(\mathbb{P}\otimes\mathbb{Q})% \otimes\mathbb{R}blackboard_P ⊗ ( blackboard_Q ⊗ blackboard_R ) = ( blackboard_P ⊗ blackboard_Q ) ⊗ blackboard_R. Here we rely on Lemmas 11 and 12 to eliminate the nested uses of tctc\mathrm{tc}roman_tc and the basic properties of \cup operator to show that both sides of ()=()tensor-producttensor-producttensor-producttensor-product\mathbb{P}\otimes(\mathbb{Q}\otimes\mathbb{R})=(\mathbb{P}\otimes\mathbb{Q})% \otimes\mathbb{R}blackboard_P ⊗ ( blackboard_Q ⊗ blackboard_R ) = ( blackboard_P ⊗ blackboard_Q ) ⊗ blackboard_R can be reduced to identical expressions.
      Left side:

      ()=tensor-producttensor-productabsent\displaystyle\mathbb{P}\otimes(\mathbb{Q}\otimes\mathbb{R})=blackboard_P ⊗ ( blackboard_Q ⊗ blackboard_R ) =
      tc(Q,R(QR))=tensor-producttcsubscriptformulae-sequence𝑄𝑅square-union𝑄𝑅absent\displaystyle\mathbb{P}\otimes\mathrm{tc}(\bigcup_{Q\in\mathbb{Q},R\in\mathbb{% R}}(Q\sqcup R))=blackboard_P ⊗ roman_tc ( ⋃ start_POSTSUBSCRIPT italic_Q ∈ blackboard_Q , italic_R ∈ blackboard_R end_POSTSUBSCRIPT ( italic_Q ⊔ italic_R ) ) =
      (Q,R(QR))=tensor-productsubscriptformulae-sequence𝑄𝑅square-union𝑄𝑅absent\displaystyle\mathbb{P}\otimes(\bigcup_{Q\in\mathbb{Q},R\in\mathbb{R}}(Q\sqcup R% ))=blackboard_P ⊗ ( ⋃ start_POSTSUBSCRIPT italic_Q ∈ blackboard_Q , italic_R ∈ blackboard_R end_POSTSUBSCRIPT ( italic_Q ⊔ italic_R ) ) =
      tc(P,T(Q,R(QR))(PT))=tcsubscriptformulae-sequence𝑃𝑇subscriptformulae-sequence𝑄𝑅square-union𝑄𝑅square-union𝑃𝑇absent\displaystyle\mathrm{tc}(\bigcup_{P\in\mathbb{P},T\in(\bigcup_{Q\in\mathbb{Q},% R\in\mathbb{R}}(Q\sqcup R))}(P\sqcup T))=roman_tc ( ⋃ start_POSTSUBSCRIPT italic_P ∈ blackboard_P , italic_T ∈ ( ⋃ start_POSTSUBSCRIPT italic_Q ∈ blackboard_Q , italic_R ∈ blackboard_R end_POSTSUBSCRIPT ( italic_Q ⊔ italic_R ) ) end_POSTSUBSCRIPT ( italic_P ⊔ italic_T ) ) =
      tc(P,Q,R)(PQR))\displaystyle\mathrm{tc}(\bigcup_{P\in\mathbb{P},Q\in\mathbb{Q},R\in\mathbb{R}% )}(P\sqcup Q\sqcup R))roman_tc ( ⋃ start_POSTSUBSCRIPT italic_P ∈ blackboard_P , italic_Q ∈ blackboard_Q , italic_R ∈ blackboard_R ) end_POSTSUBSCRIPT ( italic_P ⊔ italic_Q ⊔ italic_R ) ) (1)

      Right Side:

      ()=tensor-producttensor-productabsent\displaystyle(\mathbb{P}\otimes\mathbb{Q})\otimes\mathbb{R}=( blackboard_P ⊗ blackboard_Q ) ⊗ blackboard_R =
      tc(P,Q(PQ))=tensor-producttcsubscriptformulae-sequence𝑃𝑄square-union𝑃𝑄absent\displaystyle\mathrm{tc}(\bigcup_{P\in\mathbb{P},Q\in\mathbb{Q}}(P\sqcup Q))% \otimes\mathbb{R}=roman_tc ( ⋃ start_POSTSUBSCRIPT italic_P ∈ blackboard_P , italic_Q ∈ blackboard_Q end_POSTSUBSCRIPT ( italic_P ⊔ italic_Q ) ) ⊗ blackboard_R =
      (P,Q(PQ))=tensor-productsubscriptformulae-sequence𝑃𝑄square-union𝑃𝑄absent\displaystyle(\bigcup_{P\in\mathbb{P},Q\in\mathbb{Q}}(P\sqcup Q))\otimes% \mathbb{R}=( ⋃ start_POSTSUBSCRIPT italic_P ∈ blackboard_P , italic_Q ∈ blackboard_Q end_POSTSUBSCRIPT ( italic_P ⊔ italic_Q ) ) ⊗ blackboard_R =
      tc(T(P,Q(PQ)),R(TR))=tcsubscriptformulae-sequence𝑇subscriptformulae-sequence𝑃𝑄square-union𝑃𝑄𝑅square-union𝑇𝑅absent\displaystyle\mathrm{tc}(\bigcup_{T\in(\bigcup_{P\in\mathbb{P},Q\in\mathbb{Q}}% (P\sqcup Q)),R\in\mathbb{R}}(T\sqcup R))=roman_tc ( ⋃ start_POSTSUBSCRIPT italic_T ∈ ( ⋃ start_POSTSUBSCRIPT italic_P ∈ blackboard_P , italic_Q ∈ blackboard_Q end_POSTSUBSCRIPT ( italic_P ⊔ italic_Q ) ) , italic_R ∈ blackboard_R end_POSTSUBSCRIPT ( italic_T ⊔ italic_R ) ) =
      tc(P,Q,R)(PQR))\displaystyle\mathrm{tc}(\bigcup_{P\in\mathbb{P},Q\in\mathbb{Q},R\in\mathbb{R}% )}(P\sqcup Q\sqcup R))roman_tc ( ⋃ start_POSTSUBSCRIPT italic_P ∈ blackboard_P , italic_Q ∈ blackboard_Q , italic_R ∈ blackboard_R ) end_POSTSUBSCRIPT ( italic_P ⊔ italic_Q ⊔ italic_R ) ) (2)

      By (1) and (2) we can conclude that ()=()tensor-producttensor-producttensor-producttensor-product\mathbb{P}\otimes(\mathbb{Q}\otimes\mathbb{R})=(\mathbb{P}\otimes\mathbb{Q})% \otimes\mathbb{R}blackboard_P ⊗ ( blackboard_Q ⊗ blackboard_R ) = ( blackboard_P ⊗ blackboard_Q ) ⊗ blackboard_R.

    2. b.

      To show that 1=11=\varnothing1 = ∅ is a unit for tensor-product\otimes we need to show that x,x1=x=1xformulae-sequencefor-all𝑥tensor-product𝑥1𝑥tensor-product1𝑥\forall x\in\mathcal{I},x\otimes 1=x=1\otimes x∀ italic_x ∈ caligraphic_I , italic_x ⊗ 1 = italic_x = 1 ⊗ italic_x. Using \varnothing as the unit, and applying the definition of tensor-product\otimes will give us:

      =tc(Q(Q))=tc()=tensor-producttcsubscript𝑄𝑄tc\displaystyle\mathbb{Q}\otimes\varnothing=\mathrm{tc}(\bigcup_{Q\in\mathbb{Q}}% (Q))=\mathrm{tc}(\mathbb{Q})=\mathbb{Q}blackboard_Q ⊗ ∅ = roman_tc ( ⋃ start_POSTSUBSCRIPT italic_Q ∈ blackboard_Q end_POSTSUBSCRIPT ( italic_Q ) ) = roman_tc ( blackboard_Q ) = blackboard_Q

      which following the associativity of tensor-product\otimes gives us x,x=x=xformulae-sequencefor-all𝑥tensor-product𝑥𝑥tensor-product𝑥\forall x\in\mathcal{I},x\otimes\varnothing=x=\varnothing\otimes x∀ italic_x ∈ caligraphic_I , italic_x ⊗ ∅ = italic_x = ∅ ⊗ italic_x.

  3. 3.

    To establish distributivity we need to show that ()=()()tensor-producttensor-producttensor-product\mathbb{P}\otimes(\mathbb{Q}\vee\mathbb{R})=(\mathbb{P}\otimes\mathbb{Q})\vee(% \mathbb{P}\otimes\mathbb{R})blackboard_P ⊗ ( blackboard_Q ∨ blackboard_R ) = ( blackboard_P ⊗ blackboard_Q ) ∨ ( blackboard_P ⊗ blackboard_R ). We again rely on Lemmas 11 and 12 and basic properties of \cup to show:

    ()=tensor-productabsent\displaystyle\mathbb{P}\otimes(\mathbb{Q}\vee\mathbb{R})=blackboard_P ⊗ ( blackboard_Q ∨ blackboard_R ) =
    tc()=tensor-producttcabsent\displaystyle\mathbb{P}\otimes\mathrm{tc}(\mathbb{Q}\cup\mathbb{R})=blackboard_P ⊗ roman_tc ( blackboard_Q ∪ blackboard_R ) =
    ()=tensor-productabsent\displaystyle\mathbb{P}\otimes(\mathbb{Q}\cup\mathbb{R})=blackboard_P ⊗ ( blackboard_Q ∪ blackboard_R ) =
    tc(P,T()(PT))=tcsubscriptformulae-sequence𝑃𝑇square-union𝑃𝑇absent\displaystyle\mathrm{tc}(\bigcup_{P\in\mathbb{P},T\in(\mathbb{Q}\cup\mathbb{R}% )}(P\sqcup T))=roman_tc ( ⋃ start_POSTSUBSCRIPT italic_P ∈ blackboard_P , italic_T ∈ ( blackboard_Q ∪ blackboard_R ) end_POSTSUBSCRIPT ( italic_P ⊔ italic_T ) ) =
    tc(P,Q(PQ)P,R(PR))=tcsubscriptformulae-sequence𝑃𝑄square-union𝑃𝑄subscriptformulae-sequence𝑃𝑅square-union𝑃𝑅absent\displaystyle\mathrm{tc}(\bigcup_{P\in\mathbb{P},Q\in\mathbb{Q}}(P\sqcup Q)% \cup\bigcup_{P\in\mathbb{P},R\in\mathbb{R}}(P\sqcup R))=roman_tc ( ⋃ start_POSTSUBSCRIPT italic_P ∈ blackboard_P , italic_Q ∈ blackboard_Q end_POSTSUBSCRIPT ( italic_P ⊔ italic_Q ) ∪ ⋃ start_POSTSUBSCRIPT italic_P ∈ blackboard_P , italic_R ∈ blackboard_R end_POSTSUBSCRIPT ( italic_P ⊔ italic_R ) ) =
    tc(()())=tctensor-producttensor-productabsent\displaystyle\mathrm{tc}((\mathbb{P}\otimes\mathbb{Q})\cup(\mathbb{P}\otimes% \mathbb{R}))=roman_tc ( ( blackboard_P ⊗ blackboard_Q ) ∪ ( blackboard_P ⊗ blackboard_R ) ) =
    ()()tensor-producttensor-product\displaystyle(\mathbb{P}\otimes\mathbb{Q})\vee(\mathbb{P}\otimes\mathbb{R})( blackboard_P ⊗ blackboard_Q ) ∨ ( blackboard_P ⊗ blackboard_R )
  4. 4.

    Commutativity of tensor-product\otimes is inherited directly from Lemma 13 and the commutativity of square-union\sqcup in DL.

    =tensor-productabsent\displaystyle\mathbb{P}\otimes\mathbb{Q}=blackboard_P ⊗ blackboard_Q =
    tc(P,Q(PQ))=tcsubscriptformulae-sequence𝑃𝑄square-union𝑃𝑄absent\displaystyle\mathrm{tc}(\bigcup_{P\in\mathbb{P},Q\in\mathbb{Q}}(P\sqcup Q))=roman_tc ( ⋃ start_POSTSUBSCRIPT italic_P ∈ blackboard_P , italic_Q ∈ blackboard_Q end_POSTSUBSCRIPT ( italic_P ⊔ italic_Q ) ) =
    tc(P,Q(QP))=tcsubscriptformulae-sequence𝑃𝑄square-union𝑄𝑃absent\displaystyle\mathrm{tc}(\bigcup_{P\in\mathbb{P},Q\in\mathbb{Q}}(Q\sqcup P))=roman_tc ( ⋃ start_POSTSUBSCRIPT italic_P ∈ blackboard_P , italic_Q ∈ blackboard_Q end_POSTSUBSCRIPT ( italic_Q ⊔ italic_P ) ) =
    tensor-product\displaystyle\mathbb{Q}\otimes\mathbb{P}blackboard_Q ⊗ blackboard_P

Appendix D Relation Between DQ and QoI

We first provide some auxiliary lemmas, and then proceed to prove Lemma 2.

Lemma 14.

Given sets of sets of queries ,DL(𝒬)𝐷𝐿𝒬\mathbb{Q},\mathbb{P}\subseteq DL(\mathcal{Q})blackboard_Q , blackboard_P ⊆ italic_D italic_L ( caligraphic_Q ), tc()tc()tctc\mathrm{tc}(\mathbb{Q})\subseteq\mathrm{tc}(\mathbb{P})roman_tc ( blackboard_Q ) ⊆ roman_tc ( blackboard_P ) on the DQ implies tc()tc()\llbracket{\mathrm{tc}(\mathbb{Q})}\rrbracket\subseteq\llbracket{\mathrm{tc}(% \mathbb{P})}\rrbracket⟦ roman_tc ( blackboard_Q ) ⟧ ⊆ ⟦ roman_tc ( blackboard_P ) ⟧ on the QoI defined on {DL(𝒬)}\{\llbracket{\mathbb{Q}}\rrbracket\mid\mathbb{Q}\subseteq DL(\mathcal{Q})\}{ ⟦ blackboard_Q ⟧ ∣ blackboard_Q ⊆ italic_D italic_L ( caligraphic_Q ) }:

Proof.

Trivial from the Def. 6. ∎

Lemma 15.

iisubscript𝑖subscript𝑖\bigvee_{i}\mathbb{P}_{i}⋁ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT blackboard_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT on the DQ implies ii\bigvee_{i}\llbracket{\mathbb{P}_{i}}\rrbracket⋁ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟦ blackboard_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟧ on the QoI defined on {DL(𝒬)}\{\llbracket{\mathbb{Q}}\rrbracket\mid\mathbb{Q}\subseteq DL(\mathcal{Q})\}{ ⟦ blackboard_Q ⟧ ∣ blackboard_Q ⊆ italic_D italic_L ( caligraphic_Q ) }.

Proof.

Trivial from the Def. 6. ∎

Lemma 16.

Given sets of sets of queries ,DL(𝒬)𝐷𝐿𝒬\mathbb{Q},\mathbb{P}\subseteq DL(\mathcal{Q})blackboard_Q , blackboard_P ⊆ italic_D italic_L ( caligraphic_Q ), tc()tc()tensor-producttctc\mathrm{tc}(\mathbb{Q})\otimes\mathrm{tc}(\mathbb{P})roman_tc ( blackboard_Q ) ⊗ roman_tc ( blackboard_P ) on the DQ implies tc()tc()\llbracket{\mathrm{tc}(\mathbb{Q})}\rrbracket\otimes\llbracket{\mathrm{tc}(% \mathbb{P})}\rrbracket⟦ roman_tc ( blackboard_Q ) ⟧ ⊗ ⟦ roman_tc ( blackboard_P ) ⟧ on the QoI defined on {DL(𝒬)}\{\llbracket{\mathbb{Q}}\rrbracket\mid\mathbb{Q}\subseteq DL(\mathcal{Q})\}{ ⟦ blackboard_Q ⟧ ∣ blackboard_Q ⊆ italic_D italic_L ( caligraphic_Q ) }.

Proof.

Follows trivially from Def. 6 and Lemma 7. ∎

See 2

Proof.

To prove this homomorphism, we need to show that the Determinacy Quantale’s ordering, join and tensor, as well as the top and bottom elements imply their QoI counterparts. Lemmas 14, 15, and 16 provide the proofs of ordering, join and tensor, respectively. The proof of the top element:

  • DL(𝒬)LoI(𝒬)DL(\mathcal{Q})\rightarrow LoI(\llbracket{\mathcal{Q}}\rrbracket)italic_D italic_L ( caligraphic_Q ) → italic_L italic_o italic_I ( ⟦ caligraphic_Q ⟧ )

follows from Def. 6 and Lemma 1, and the proof of the bottom element:

  • \varnothing\rightarrow\varnothing∅ → ∅

is trivial. ∎

Appendix E Correctness of Dependency Analysis

To show that the diagram in Fig. 11 commutes, we aim to show commutativity for each cell in it. In this section, we establish this for the bottommost cell of it. To that end, we need to establish that the QoI point QLudelimited-⟦⟧subscriptQL𝑢\llbracket\mathrm{QL}_{u}\rrbracket⟦ roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⟧ that corresponds to the query list QLu={Q1,,Qn}subscriptQL𝑢subscript𝑄1subscript𝑄𝑛\mathrm{QL}_{u}=\{Q_{1},\ldots,Q_{n}\}roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = { italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } extracted from a program prgprg\mathrm{prg}roman_prg by the dependency analysis is an upper bound on the knowledge relation prgudelimited-⟦⟧subscriptprg𝑢\llbracket\mathrm{prg}_{u}\rrbracket⟦ roman_prg start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⟧ induced by prgprg\mathrm{prg}roman_prg.

The basic outline of the argument rests on identifying a particular single equivalence relation k(QL,prg)mix([Q1],,[Qn])𝑘QLprgmixdelimited-[]subscriptsubscript𝑄1similar-todelimited-[]subscriptsubscript𝑄𝑛similar-tok(\mathrm{QL},\mathrm{prg})\in\mathrm{mix}([{Q_{1}}_{\sim}],\ldots,[{Q_{n}}_{% \sim}])italic_k ( roman_QL , roman_prg ) ∈ roman_mix ( [ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ] , … , [ italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ] ), which satisfies prgk(QL,prg)\llbracket\mathrm{prg}\rrbracket\sqsubseteq k(\mathrm{QL},\mathrm{prg})⟦ roman_prg ⟧ ⊑ italic_k ( roman_QL , roman_prg ). Intuitively, this relation captures how much information the program could leak at most if it output the full result of every query that its output depends on. As long as the analysis is sound, this is an instantiation of the disjunction represented by QL, with each disjunct selected precisely for those starting configurations where the program’s output turns out to depend on the queries enumerated in that disjunct.

For a fixed program prgprg\mathrm{prg}roman_prg and user u𝑢uitalic_u, we assume the existence of a function Q=Qprg,u𝑄subscript𝑄prg𝑢Q=Q_{\mathrm{prg},u}italic_Q = italic_Q start_POSTSUBSCRIPT roman_prg , italic_u end_POSTSUBSCRIPT from databases dbΩD𝑑𝑏subscriptΩ𝐷db\in\Omega_{D}italic_d italic_b ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT to sets of queries, which returns the set of those queries performed when executing prgprg\mathrm{prg}roman_prg on database db𝑑𝑏dbitalic_d italic_b whose result taints some output to the user u𝑢uitalic_u. We formally define the function Q𝑄Qitalic_Q by relying on a taint analysis.

Taint analysis. The semantics of the taint analysis enriches the normal operational semantics of the language in the sense that it has transitions whenever the operational one does, and acts the same on those components of a configuration that exist in the operational one; so runs in it can be put in one-to-one correspondence to operational ones.

\inferrule[before=TA-Skip]Δ,skip,m,dbϵΔ,ϵ,m,dbitalic-ϵ\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒TA-SkipΔskip𝑚𝑑𝑏Δitalic-ϵ𝑚𝑑𝑏\inferrule*[before=\textsc{TA-Skip}]{\\ }{\langle\Delta,\texttt{skip},m,db\rangle\xrightarrow{\epsilon}\langle\Delta,% \epsilon,m,db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = TA-Skip ] ⟨ roman_Δ , skip , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ roman_Δ , italic_ϵ , italic_m , italic_d italic_b ⟩

\inferrule[before=TA-Assign]e,m,dbvlm=m[xvl]Δ=Δ[xΔ(pc)xfv(e)Δ(x)]Δ,x:=e,m,dbϵΔ,ϵ,m,db\inferrule*[before=\textsc{TA-Assign}]{\langle e,m,db\rangle\downarrow vl\\ m^{\prime}=m[x\mapsto vl]\\ \Delta^{\prime}=\Delta[x\mapsto\Delta(pc)\cup\textstyle\bigcup_{x\in fv(e)}% \Delta(x)]}{\langle\Delta,x:=e,m,db\rangle\xrightarrow{\epsilon}\langle\Delta^% {\prime},\epsilon,m^{\prime},db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = TA-Assign ] ⟨ italic_e , italic_m , italic_d italic_b ⟩ ↓ italic_v italic_l italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_m [ italic_x ↦ italic_v italic_l ] roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_Δ [ italic_x ↦ roman_Δ ( italic_p italic_c ) ∪ ⋃ start_POSTSUBSCRIPT italic_x ∈ italic_f italic_v ( italic_e ) end_POSTSUBSCRIPT roman_Δ ( italic_x ) ] ⟨ roman_Δ , italic_x := italic_e , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_ϵ , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_d italic_b ⟩

\inferrule[before=TA-QueryEval]vl=qdbm=m[xvl]Δ=Δ[xΔ(pc)q]Δ,xq,m,dbϵΔ,ϵ,m,db\inferrule*[before=\textsc{TA-QueryEval}]{vl={\llbracket{q}\rrbracket^{db}}\\ m^{\prime}=m[x\mapsto vl]\\ \Delta^{\prime}=\Delta[x\mapsto\Delta(pc)\cup q]}{\langle\Delta,x\leftarrow q,% m,db\rangle\xrightarrow{\epsilon}\langle\Delta^{\prime},\epsilon,m^{\prime},db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = TA-QueryEval ] italic_v italic_l = ⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_m [ italic_x ↦ italic_v italic_l ] roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_Δ [ italic_x ↦ roman_Δ ( italic_p italic_c ) ∪ italic_q ] ⟨ roman_Δ , italic_x ← italic_q , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_ϵ , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_d italic_b ⟩

\inferrule[before=TA-IfTrue]e,m,dbnn0c1=c1;set pc to Δ(pc)Δ=Δ[pcΔ(pc)xfv(e)Δ(x)]Δ,ifethenc1elsec2,m,dbϵΔ,c1,m,dbformulae-sequence\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒TA-IfTrue𝑒𝑚𝑑𝑏𝑛𝑛0subscriptsuperscript𝑐1subscript𝑐1set 𝑝𝑐 to Δ𝑝𝑐superscriptΔΔdelimited-[]maps-to𝑝𝑐Δ𝑝𝑐subscript𝑥𝑓𝑣𝑒Δ𝑥Δif𝑒thensubscript𝑐1elsesubscript𝑐2𝑚𝑑𝑏italic-ϵsuperscriptΔsubscriptsuperscript𝑐1𝑚𝑑𝑏\inferrule*[before=\textsc{TA-IfTrue}]{\langle e,m,db\rangle\downarrow n\\ n\not=0\\ \\ c^{\prime}_{1}=c_{1};\texttt{set }pc\texttt{ to }\Delta(pc)\\ \\ \Delta^{\prime}=\Delta[pc\mapsto\Delta(pc)\cup\textstyle\bigcup_{x\in fv(e)}% \Delta(x)]}{\langle\Delta,\texttt{if}\ e\ \texttt{then}\ c_{1}\ \texttt{else}% \ c_{2},m,db\rangle\xrightarrow{\epsilon}\langle\Delta^{\prime},c^{\prime}_{1}% ,m,db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = TA-IfTrue ] ⟨ italic_e , italic_m , italic_d italic_b ⟩ ↓ italic_n italic_n ≠ 0 italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; set italic_p italic_c to roman_Δ ( italic_p italic_c ) roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_Δ [ italic_p italic_c ↦ roman_Δ ( italic_p italic_c ) ∪ ⋃ start_POSTSUBSCRIPT italic_x ∈ italic_f italic_v ( italic_e ) end_POSTSUBSCRIPT roman_Δ ( italic_x ) ] ⟨ roman_Δ , if italic_e then italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT else italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m , italic_d italic_b ⟩       \inferrule[before=TA-IfFalse]e,m,dbnn=0c2=c2;set pc to Δ(pc)Δ=Δ[pcΔ(pc)xfv(e)Δ(x)]Δ,ifethenc1elsec2,m,dbϵΔ,c2,m,dbformulae-sequence\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒TA-IfFalse𝑒𝑚𝑑𝑏𝑛𝑛0subscriptsuperscript𝑐2subscript𝑐2set 𝑝𝑐 to Δ𝑝𝑐superscriptΔΔdelimited-[]maps-to𝑝𝑐Δ𝑝𝑐subscript𝑥𝑓𝑣𝑒Δ𝑥Δif𝑒thensubscript𝑐1elsesubscript𝑐2𝑚𝑑𝑏italic-ϵsuperscriptΔsubscriptsuperscript𝑐2𝑚𝑑𝑏\inferrule*[before=\textsc{TA-IfFalse}]{\langle e,m,db\rangle\downarrow n\\ n=0\\ \\ c^{\prime}_{2}=c_{2};\texttt{set }pc\texttt{ to }\Delta(pc)\\ \\ \Delta^{\prime}=\Delta[pc\mapsto\Delta(pc)\cup\textstyle\bigcup_{x\in fv(e)}% \Delta(x)]}{\langle\Delta,\texttt{if}\ e\ \texttt{then}\ c_{1}\ \texttt{else}% \ c_{2},m,db\rangle\xrightarrow{\epsilon}\langle\Delta^{\prime},c^{\prime}_{2}% ,m,db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = TA-IfFalse ] ⟨ italic_e , italic_m , italic_d italic_b ⟩ ↓ italic_n italic_n = 0 italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; set italic_p italic_c to roman_Δ ( italic_p italic_c ) roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_Δ [ italic_p italic_c ↦ roman_Δ ( italic_p italic_c ) ∪ ⋃ start_POSTSUBSCRIPT italic_x ∈ italic_f italic_v ( italic_e ) end_POSTSUBSCRIPT roman_Δ ( italic_x ) ] ⟨ roman_Δ , if italic_e then italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT else italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m , italic_d italic_b ⟩

\inferrule[before=TA-WhileTrue]e,m,dbnn0c=c;whileedoc;set pc to Δ(pc)Δ=Δ[pcΔ(pc)xfv(e)Δ(x)]Δ,whileedoc,m,dbϵΔ,c,m,dbformulae-sequence\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒TA-WhileTrue𝑒𝑚𝑑𝑏𝑛𝑛0superscript𝑐𝑐while𝑒do𝑐set 𝑝𝑐 to Δ𝑝𝑐superscriptΔΔdelimited-[]maps-to𝑝𝑐Δ𝑝𝑐subscript𝑥𝑓𝑣𝑒Δ𝑥Δwhile𝑒do𝑐𝑚𝑑𝑏italic-ϵsuperscriptΔsuperscript𝑐𝑚𝑑𝑏\inferrule*[before=\textsc{TA-WhileTrue}]{\langle e,m,db\rangle\downarrow n\\ n\not=0\\ \\ c^{\prime}=c;\texttt{while}\ e\ \texttt{do}\ c;\texttt{set }pc\texttt{ to }% \Delta(pc)\\ \\ \Delta^{\prime}=\Delta[pc\mapsto\Delta(pc)\cup\textstyle\bigcup_{x\in fv(e)}% \Delta(x)]}{\langle\Delta,\texttt{while}\ e\ \texttt{do}\ c,m,db\rangle% \xrightarrow{\epsilon}\langle\Delta^{\prime},c^{\prime},m,db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = TA-WhileTrue ] ⟨ italic_e , italic_m , italic_d italic_b ⟩ ↓ italic_n italic_n ≠ 0 italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_c ; while italic_e do italic_c ; set italic_p italic_c to roman_Δ ( italic_p italic_c ) roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_Δ [ italic_p italic_c ↦ roman_Δ ( italic_p italic_c ) ∪ ⋃ start_POSTSUBSCRIPT italic_x ∈ italic_f italic_v ( italic_e ) end_POSTSUBSCRIPT roman_Δ ( italic_x ) ] ⟨ roman_Δ , while italic_e do italic_c , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_m , italic_d italic_b ⟩       \inferrule[before=TA-WhileFalse]e,m,dbnn=0c=set pc to Δ(pc)Δ=Δ[pcΔ(pc)xfv(e)Δ(x)]Δ,whileedoc,m,dbϵΔ,ϵ,m,db\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒TA-WhileFalse𝑒𝑚𝑑𝑏𝑛𝑛0superscript𝑐set 𝑝𝑐 to Δ𝑝𝑐superscriptΔΔdelimited-[]maps-to𝑝𝑐Δ𝑝𝑐subscript𝑥𝑓𝑣𝑒Δ𝑥Δwhile𝑒do𝑐𝑚𝑑𝑏italic-ϵsuperscriptΔitalic-ϵ𝑚𝑑𝑏\inferrule*[before=\textsc{TA-WhileFalse}]{\langle e,m,db\rangle\downarrow n\\ n=0\\ \\ c^{\prime}=\texttt{set }pc\texttt{ to }\Delta(pc)\\ \\ \Delta^{\prime}=\Delta[pc\mapsto\Delta(pc)\cup\textstyle\bigcup_{x\in fv(e)}% \Delta(x)]}{\langle\Delta,\texttt{while}\ e\ \texttt{do}\ c,m,db\rangle% \xrightarrow{\epsilon}\langle\Delta^{\prime},\epsilon,m,db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = TA-WhileFalse ] ⟨ italic_e , italic_m , italic_d italic_b ⟩ ↓ italic_n italic_n = 0 italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = set italic_p italic_c to roman_Δ ( italic_p italic_c ) roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_Δ [ italic_p italic_c ↦ roman_Δ ( italic_p italic_c ) ∪ ⋃ start_POSTSUBSCRIPT italic_x ∈ italic_f italic_v ( italic_e ) end_POSTSUBSCRIPT roman_Δ ( italic_x ) ] ⟨ roman_Δ , while italic_e do italic_c , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_ϵ , italic_m , italic_d italic_b ⟩

\inferrule[before=TA-Seq]Δ,c1,m,db𝛼Δ,c1,m,dbΔ,c1;c2,m,db𝛼Δ,c1;c2,m,db𝛼\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒TA-SeqΔsubscript𝑐1𝑚𝑑𝑏superscriptΔsuperscriptsubscript𝑐1superscript𝑚𝑑superscript𝑏Δsubscript𝑐1subscript𝑐2𝑚𝑑𝑏𝛼superscriptΔsuperscriptsubscript𝑐1subscript𝑐2superscript𝑚𝑑superscript𝑏\inferrule*[before=\textsc{TA-Seq}]{\langle\Delta,c_{1},m,db\rangle% \xrightarrow{\alpha}\langle\Delta^{\prime},c_{1}^{\prime},m^{\prime},db^{% \prime}\rangle\\ }{\langle\Delta,c_{1};c_{2},m,db\rangle\xrightarrow{\alpha}\langle\Delta^{% \prime},c_{1}^{\prime};c_{2},m^{\prime},db^{\prime}\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = TA-Seq ] ⟨ roman_Δ , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_α → end_ARROW ⟨ roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ ⟨ roman_Δ , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_α → end_ARROW ⟨ roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩       \inferrule[before=TA-SeqEmpty]Δ,ϵ;c,m,dbϵΔ,c,m,dbitalic-ϵ\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒TA-SeqEmptyΔitalic-ϵ𝑐𝑚𝑑𝑏Δ𝑐𝑚𝑑𝑏\inferrule*[before=\textsc{TA-SeqEmpty}]{\\ }{\langle\Delta,\epsilon;c,m,db\rangle\xrightarrow{\epsilon}\langle\Delta,c,m,% db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = TA-SeqEmpty ] ⟨ roman_Δ , italic_ϵ ; italic_c , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ roman_Δ , italic_c , italic_m , italic_d italic_b ⟩

\inferrule[before=TA-Output]e,m,dbvlβ=Δ(pc)xfv(e)Δ(x)Δ,out(e,u),m,dbvl,u,βΔ,ϵ,m,db\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒TA-Output𝑒𝑚𝑑𝑏𝑣𝑙𝛽Δ𝑝𝑐subscript𝑥𝑓𝑣𝑒Δ𝑥Δout𝑒𝑢𝑚𝑑𝑏𝑣𝑙𝑢𝛽Δitalic-ϵ𝑚𝑑𝑏\inferrule*[before=\textsc{TA-Output}]{\langle e,m,db\rangle\downarrow vl\\ \beta=\Delta(pc)\cup\textstyle\bigcup_{x\in fv(e)}\Delta(x)}{\langle\Delta,% \texttt{out}(e,u),m,db\rangle\xrightarrow{\langle vl,u,\beta\rangle}\langle% \Delta,\epsilon,m,db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = TA-Output ] ⟨ italic_e , italic_m , italic_d italic_b ⟩ ↓ italic_v italic_l italic_β = roman_Δ ( italic_p italic_c ) ∪ ⋃ start_POSTSUBSCRIPT italic_x ∈ italic_f italic_v ( italic_e ) end_POSTSUBSCRIPT roman_Δ ( italic_x ) ⟨ roman_Δ , out ( italic_e , italic_u ) , italic_m , italic_d italic_b ⟩ start_ARROW start_OVERACCENT ⟨ italic_v italic_l , italic_u , italic_β ⟩ end_OVERACCENT → end_ARROW ⟨ roman_Δ , italic_ϵ , italic_m , italic_d italic_b ⟩       \inferrule[before=TA-SetPC]Δ=Δ[pcδ]Δ,set pc to δ,m,dbϵΔ,ϵ,m,db\inferruledelimited-[]𝑏𝑒𝑓𝑜𝑟𝑒TA-SetPCsuperscriptΔΔdelimited-[]maps-to𝑝𝑐𝛿Δset 𝑝𝑐 to 𝛿𝑚𝑑𝑏italic-ϵsuperscriptΔitalic-ϵ𝑚𝑑𝑏\inferrule*[before=\textsc{TA-SetPC}]{\Delta^{\prime}=\Delta[pc\mapsto\delta]}% {\langle\Delta,\texttt{set }pc\texttt{ to }\delta,m,db\rangle\xrightarrow{% \epsilon}\langle\Delta^{\prime},\epsilon,m,db\rangle}∗ [ italic_b italic_e italic_f italic_o italic_r italic_e = TA-SetPC ] roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_Δ [ italic_p italic_c ↦ italic_δ ] ⟨ roman_Δ , set italic_p italic_c to italic_δ , italic_m , italic_d italic_b ⟩ start_ARROW overitalic_ϵ → end_ARROW ⟨ roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_ϵ , italic_m , italic_d italic_b ⟩

Figure 12: Taint analysis rules

The rules of the taint analysis presented in Fig. 12 are fairly straightforward. We use map** ΔΔ\Deltaroman_Δ to map each variable to a set of dependencies of variables and queries.

The rules for if rely on auxiliary command set pc to δset 𝑝𝑐 to 𝛿\texttt{set }pc\texttt{ to }\deltaset italic_p italic_c to italic_δ to restore the dependency set of pc𝑝𝑐pcitalic_p italic_c to its previous state (Δ(pc)Δ𝑝𝑐\Delta(pc)roman_Δ ( italic_p italic_c )) upon exiting the if branch. We sequentially composite this command with the body of if to ensure its execution after leaving the if branch’s body. The rules for while use set pc to δset 𝑝𝑐 to 𝛿\texttt{set }pc\texttt{ to }\deltaset italic_p italic_c to italic_δ in a similar manner.

The rule TA-Output uses fv(e)𝑓𝑣𝑒fv(e)italic_f italic_v ( italic_e ) to extract all the variables of expression e𝑒eitalic_e, and relies on the union of the ΔΔ\Deltaroman_Δs of those variables to calculate β𝛽\betaitalic_β, which is the set of dependencies the execution up to this output, depended on.

We extend the definition of trace τ𝜏\tauitalic_τ to a sequence of observations of the form vl,u,β𝑣𝑙𝑢𝛽\langle vl,u,\beta\rangle⟨ italic_v italic_l , italic_u , italic_β ⟩, and use the notation τusubscriptabsent𝑢𝜏absent\tau\negthickspace\mathrel{\downharpoonleft\mkern-5.7mu\downharpoonleft}_{u}italic_τ start_RELOP ⇃ ⇃ end_RELOP start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT to denote the sequence of all β𝛽\betaitalic_βs in τ𝜏\tauitalic_τ that u𝑢uitalic_u can observe. We use this notation to define function Q𝑄Qitalic_Q as follows:

Definition 13.

Given a database state db𝑑𝑏dbitalic_d italic_b and user u𝑢uitalic_u, such that c,m0,db𝜏usubscript𝜏𝑢𝑐subscript𝑚0𝑑𝑏absent\langle c,m_{0},db\rangle\xRightarrow{\tau}\negthickspace_{u}⟨ italic_c , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d italic_b ⟩ start_ARROW overitalic_τ ⇒ end_ARROW start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, Q(db)𝑄𝑑𝑏Q(db)italic_Q ( italic_d italic_b ) is defined as {ββτu}\{\beta\mid\beta\in\tau\negthickspace\mathrel{\downharpoonleft\mkern-5.7mu% \downharpoonleft}_{u}\}{ italic_β ∣ italic_β ∈ italic_τ start_RELOP ⇃ ⇃ end_RELOP start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT }

A proof of Lemma 3 can then proceed by a straightforward induction on the semantics.

In Def. 13 we formally define the function Q𝑄Qitalic_Q. This function satisfies a closure property that informally states that if on two given databases the output depended on different sets of queries, then the choice of the set of dependencies itself must have been due to the outcome of a query which is among the dependencies in both databases and evaluates to a different result.

Lemma 17.

For all db,dbΩD𝑑𝑏𝑑superscript𝑏subscriptΩ𝐷db,db^{\prime}\in\Omega_{D}italic_d italic_b , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT, if Q(db)Q(db)𝑄𝑑𝑏𝑄𝑑superscript𝑏Q(db)\neq Q(db^{\prime})italic_Q ( italic_d italic_b ) ≠ italic_Q ( italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), then there exists a particular query qQ(db)Q(db)𝑞𝑄𝑑𝑏𝑄𝑑superscript𝑏q\in Q(db)\cap Q(db^{\prime})italic_q ∈ italic_Q ( italic_d italic_b ) ∩ italic_Q ( italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) such that qdbqdb{\llbracket{q}\rrbracket^{db}}\neq{\llbracket{q}\rrbracket^{db^{\prime}}}⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT ≠ ⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT.

We say that database states db𝑑𝑏dbitalic_d italic_b and db𝑑superscript𝑏db^{\prime}italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are equivalent with respect to a dependency set S𝑆Sitalic_S (written as dbSdbsubscript𝑆𝑑𝑏𝑑superscript𝑏db\approx_{S}db^{\prime}italic_d italic_b ≈ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT) iff ydb=ydb{\llbracket{y}\rrbracket^{db}}={\llbracket{y}\rrbracket^{db^{\prime}}}⟦ italic_y ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_y ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT for all yS𝑦𝑆y\in Sitalic_y ∈ italic_S where y𝒬𝑦𝒬y\in\mathcal{Q}italic_y ∈ caligraphic_Q.

Lemma 18.

For all states db1𝑑subscript𝑏1db_{1}italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and db2𝑑subscript𝑏2db_{2}italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and users u𝑢uitalic_u, if c,m0,db1t1usubscriptsubscript𝑡1𝑢𝑐subscript𝑚0𝑑subscript𝑏1absent\langle c,m_{0},db_{1}\rangle\xRightarrow{t_{1}}\negthickspace_{u}⟨ italic_c , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ start_ARROW start_OVERACCENT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_OVERACCENT ⇒ end_ARROW start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, c,m0,db2t2usubscriptsubscript𝑡2𝑢𝑐subscript𝑚0𝑑subscript𝑏2absent\langle c,m_{0},db_{2}\rangle\xRightarrow{t_{2}}\negthickspace_{u}⟨ italic_c , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ start_ARROW start_OVERACCENT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_OVERACCENT ⇒ end_ARROW start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, QQ(db1)=Q(db2)𝑄𝑄𝑑subscript𝑏1𝑄𝑑subscript𝑏2Q\triangleq Q(db_{1})=Q(db_{2})italic_Q ≜ italic_Q ( italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = italic_Q ( italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) and db1Qdb2subscript𝑄𝑑subscript𝑏1𝑑subscript𝑏2db_{1}\approx_{Q}db_{2}italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≈ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, then t1u=t2ut_{1}\negthickspace\downharpoonright_{u}=t_{2}\negthickspace\downharpoonright_% {u}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⇂ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⇂ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT.

We then define k(QLu,prg)𝑘subscriptQL𝑢prgk(\mathrm{QL}_{u},\mathrm{prg})italic_k ( roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , roman_prg ) as the equivalence relation

{(db,db)ΩD2Q(db)=Q(db)(db,db)Q(db)},conditional-set𝑑𝑏𝑑superscript𝑏superscriptsubscriptΩ𝐷2𝑄𝑑𝑏𝑄𝑑superscript𝑏𝑑𝑏𝑑superscript𝑏𝑄subscript𝑑𝑏similar-to\{(db,db^{\prime})\in\Omega_{D}^{2}\mid Q(db)=Q(db^{\prime})\wedge(db,db^{% \prime})\in Q(db)_{\sim}\},{ ( italic_d italic_b , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∣ italic_Q ( italic_d italic_b ) = italic_Q ( italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∧ ( italic_d italic_b , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ italic_Q ( italic_d italic_b ) start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT } ,

that is, we partition each respective subset of databases db𝑑𝑏dbitalic_d italic_b that shares one set of queries Q(db)𝑄𝑑𝑏Q(db)italic_Q ( italic_d italic_b ) into equivalence classes according to the knowledge relation induced by Q(db)𝑄𝑑𝑏Q(db)italic_Q ( italic_d italic_b ).

Lemma 19.

prguk(QLu,prg)QLu\llbracket\mathrm{prg}\rrbracket_{u}\sqsubseteq k(\mathrm{QL}_{u},\mathrm{prg}% )\sqsubseteq\llbracket\mathrm{QL}_{u}\rrbracket⟦ roman_prg ⟧ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⊑ italic_k ( roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , roman_prg ) ⊑ ⟦ roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⟧.

Proof.

k(QLu,prg)QLuk(\mathrm{QL}_{u},\mathrm{prg})\sqsubseteq\llbracket\mathrm{QL}_{u}\rrbracketitalic_k ( roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , roman_prg ) ⊑ ⟦ roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⟧: Will in fact show that k(QLu,prg)mix([Q1],,[Qn])𝑘subscriptQL𝑢prgmixdelimited-[]subscriptsubscript𝑄1similar-todelimited-[]subscriptsubscript𝑄𝑛similar-tok(\mathrm{QL}_{u},\mathrm{prg})\in\mathrm{mix}([{Q_{1}}_{\sim}],\ldots,[{Q_{n}% }_{\sim}])italic_k ( roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , roman_prg ) ∈ roman_mix ( [ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ] , … , [ italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT ] ), where QLu={Q1,,Qn}subscriptQL𝑢subscript𝑄1subscript𝑄𝑛\mathrm{QL}_{u}=\{Q_{1},\ldots,Q_{n}\}roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = { italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }. For that, it suffices to show that every equivalence class x[k(QLu,prg)]𝑥delimited-[]𝑘subscriptQL𝑢prgx\in[k(\mathrm{QL}_{u},\mathrm{prg})]italic_x ∈ [ italic_k ( roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , roman_prg ) ] is also an equivalence class of one of the Qisubscript𝑄𝑖Q_{i}italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Let dbx𝑑𝑏𝑥db\in xitalic_d italic_b ∈ italic_x be arbitrary. Then claim that x[Q(db)]𝑥delimited-[]𝑄𝑑𝑏x\in[Q(db)]italic_x ∈ [ italic_Q ( italic_d italic_b ) ], which suffices since by Lemma 3, Q(db)𝑄𝑑𝑏Q(db)italic_Q ( italic_d italic_b ) is one of the Qisubscript𝑄𝑖Q_{i}italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. To establish this, just need to show that Q(db)=Q(db)𝑄𝑑𝑏𝑄𝑑superscript𝑏Q(db)=Q(db^{\prime})italic_Q ( italic_d italic_b ) = italic_Q ( italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) for all db𝑑superscript𝑏db^{\prime}italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that (db,db)Q(db)𝑑𝑏𝑑superscript𝑏𝑄subscript𝑑𝑏similar-to(db,db^{\prime})\in Q(db)_{\sim}( italic_d italic_b , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ italic_Q ( italic_d italic_b ) start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT, so that (db,db)k(QLu,prg)𝑑𝑏𝑑superscript𝑏𝑘subscriptQL𝑢prg(db,db^{\prime})\in k(\mathrm{QL}_{u},\mathrm{prg})( italic_d italic_b , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ italic_k ( roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , roman_prg ) as well. But this follows from Lemma 17: if some db𝑑superscript𝑏db^{\prime}italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT has (db,db)Q(db)𝑑𝑏𝑑superscript𝑏𝑄subscript𝑑𝑏similar-to(db,db^{\prime})\in Q(db)_{\sim}( italic_d italic_b , italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ italic_Q ( italic_d italic_b ) start_POSTSUBSCRIPT ∼ end_POSTSUBSCRIPT, then qdb=qdb{\llbracket{q}\rrbracket^{db}}={\llbracket{q}\rrbracket^{db^{\prime}}}⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT = ⟦ italic_q ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT for all qQ(db)𝑞𝑄𝑑𝑏q\in Q(db)italic_q ∈ italic_Q ( italic_d italic_b ), but then we must not have Q(db)Q(db)𝑄𝑑𝑏𝑄𝑑superscript𝑏Q(db)\neq Q(db^{\prime})italic_Q ( italic_d italic_b ) ≠ italic_Q ( italic_d italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ).

prguk(QLu,prg)\llbracket\mathrm{prg}\rrbracket_{u}\sqsubseteq k(\mathrm{QL}_{u},\mathrm{prg})⟦ roman_prg ⟧ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⊑ italic_k ( roman_QL start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , roman_prg ): Straightforward application of Lemma 18. ∎

Appendix F Query Analysis

F-A Symbolic Tuple Ordering

To show that the symbolic tuples ordering of Def. 9 induces a determinacy order and prove Lemma 20 we first need to define the evaluation of a symbolic tuple in a database state.

Symbolic tuple evaluation. The evaluation of a symbolic tuple T,ϕ,π𝑇italic-ϕ𝜋\langle T,\phi,\pi\rangle⟨ italic_T , italic_ϕ , italic_π ⟩ in the database state db𝑑𝑏dbitalic_d italic_b written as T,ϕ,πdb{\llbracket{\langle T,\phi,\pi\rangle}\rrbracket^{db}}⟦ ⟨ italic_T , italic_ϕ , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT is a π𝜋\piitalic_π-projection on the set of db𝑑𝑏dbitalic_d italic_b’s tuples defined on the join of tables in T𝑇Titalic_T that satisfy the constraint ϕitalic-ϕ\phiitalic_ϕ. Formally:

Definition 14.

Given database state db𝑑𝑏dbitalic_d italic_b and symbolic tuple T,ϕ,π𝑇italic-ϕ𝜋\langle T,\phi,\pi\rangle⟨ italic_T , italic_ϕ , italic_π ⟩, T,ϕ,πdb{\llbracket{\langle T,\phi,\pi\rangle}\rrbracket^{db}}⟦ ⟨ italic_T , italic_ϕ , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT is defined as:

{tpπtptTtdb,tpϕ}\displaystyle\{tp\negthickspace\downharpoonright_{\pi}\mid tp\in\prod_{t\in T}% {\llbracket{t}\rrbracket^{db}},tp\models\phi\}{ italic_t italic_p ⇂ start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ∣ italic_t italic_p ∈ ∏ start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT ⟦ italic_t ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT , italic_t italic_p ⊧ italic_ϕ }

where tpπsubscript𝜋𝑡𝑝absenttp\negthickspace\downharpoonright_{\pi}italic_t italic_p ⇂ start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT is a tuple with its columns limited to those in π𝜋\piitalic_π, and tpϕmodels𝑡𝑝italic-ϕtp\models\phiitalic_t italic_p ⊧ italic_ϕ means that tuple tp𝑡𝑝tpitalic_t italic_p satisfies formula ϕitalic-ϕ\phiitalic_ϕ.

We proceed to prove Lemma 20.

Lemma 20.

Given two sets of queries Q1subscript𝑄1Q_{1}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Q2subscript𝑄2Q_{2}italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, if sts(Q1)ststs(Q2)subscriptsquare-image-of-or-equalsststssubscript𝑄1stssubscript𝑄2\mathrm{sts}(Q_{1})\sqsubseteq_{\mathrm{st}}\mathrm{sts}(Q_{2})roman_sts ( italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⊑ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT roman_sts ( italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) then Q1Q2precedes-or-equalssubscript𝑄1subscript𝑄2Q_{1}\preceq Q_{2}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⪯ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Proof.

Assume Q1=sts(Q1)subscriptsubscript𝑄1stssubscript𝑄1\ell_{Q_{1}}=\mathrm{sts}(Q_{1})roman_ℓ start_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = roman_sts ( italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and Q2=sts(Q2)subscriptsubscript𝑄2stssubscript𝑄2\ell_{Q_{2}}=\mathrm{sts}(Q_{2})roman_ℓ start_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = roman_sts ( italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). By Def. 9 we want to show that if for all symbolic tuples T,ϕ,πQ1𝑇italic-ϕ𝜋subscriptsubscript𝑄1\langle T,\phi,\pi\rangle\in\ell_{Q_{1}}⟨ italic_T , italic_ϕ , italic_π ⟩ ∈ roman_ℓ start_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, there is a set of well-formed symbolic tuples S=T1,ϕ1,π1,,Tn,ϕn,πn𝑆subscript𝑇1subscriptitalic-ϕ1subscript𝜋1subscript𝑇𝑛subscriptitalic-ϕ𝑛subscript𝜋𝑛S=\langle T_{1},\phi_{1},\pi_{1}\rangle,...,\langle T_{n},\phi_{n},\pi_{n}\rangleitalic_S = ⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ , … , ⟨ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ such that SQ2𝑆subscriptsubscript𝑄2S\subseteq\ell_{Q_{2}}italic_S ⊆ roman_ℓ start_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, T1,,Tnsubscript𝑇1subscript𝑇𝑛T_{1},...,T_{n}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are disjoint, T(T1Tn)𝑇subscript𝑇1subscript𝑇𝑛T\subseteq(T_{1}\cup...\cup T_{n})italic_T ⊆ ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), ϕ(ϕ1ϕn)modelsitalic-ϕsubscriptitalic-ϕ1subscriptitalic-ϕ𝑛\phi\models(\phi_{1}\wedge...\wedge\phi_{n})italic_ϕ ⊧ ( italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ … ∧ italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), and dep(ϕ)π(π1πn)depitalic-ϕ𝜋subscript𝜋1subscript𝜋𝑛\mathrm{dep}(\phi)\cup\pi\subseteq(\pi_{1}\cup...\cup\pi_{n})roman_dep ( italic_ϕ ) ∪ italic_π ⊆ ( italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), then Q1Q2precedes-or-equalssubscript𝑄1subscript𝑄2Q_{1}\preceq Q_{2}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⪯ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

We assume an intermediate symbolic tuple stitrsubscriptstitr\mathrm{st}_{\mathrm{itr}}roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT and define it as T1Tn,ϕ1ϕn,π1πnsubscript𝑇1subscript𝑇𝑛subscriptitalic-ϕ1subscriptitalic-ϕ𝑛subscript𝜋1subscript𝜋𝑛\langle T_{1}\cup...\cup T_{n},\phi_{1}\wedge...\wedge\phi_{n},\pi_{1}\cup...% \cup\pi_{n}\rangle⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ … ∧ italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩. stitrsubscriptstitr\mathrm{st}_{\mathrm{itr}}roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT models the symbolic tuples created from the join of T1,ϕ1,π1,,Tn,ϕn,πnsubscript𝑇1subscriptitalic-ϕ1subscript𝜋1subscript𝑇𝑛subscriptitalic-ϕ𝑛subscript𝜋𝑛\langle T_{1},\phi_{1},\pi_{1}\rangle,...,\langle T_{n},\phi_{n},\pi_{n}\rangle⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ , … , ⟨ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩. Additionally, T1,,Tnsubscript𝑇1subscript𝑇𝑛T_{1},...,T_{n}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are disjoint, which by the definition of symbolic tuples means that π1,,πnsubscript𝜋1subscript𝜋𝑛\pi_{1},...,\pi_{n}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and the dependencies of ϕ1,,ϕnsubscriptitalic-ϕ1subscriptitalic-ϕ𝑛\phi_{1},...,\phi_{n}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are also disjoint, effectively making stitrsubscriptstitr\mathrm{st}_{\mathrm{itr}}roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT the symbolic tuple of the Cartesian product of tuples T1,ϕ1,π1,,Tn,ϕn,πnsubscript𝑇1subscriptitalic-ϕ1subscript𝜋1subscript𝑇𝑛subscriptitalic-ϕ𝑛subscript𝜋𝑛\langle T_{1},\phi_{1},\pi_{1}\rangle,...,\langle T_{n},\phi_{n},\pi_{n}\rangle⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ , … , ⟨ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩.

We want to show that the symbolic tuples in S𝑆Sitalic_S can determine stitrsubscriptstitr\mathrm{st}_{\mathrm{itr}}roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT:

db1,db2ΩD.for-all𝑑subscript𝑏1𝑑subscript𝑏2subscriptΩ𝐷\displaystyle\forall db_{1},db_{2}\in\Omega_{D}.∀ italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT .
stdb1=stdb2stSstitrdb1=stitrdb2\displaystyle{\llbracket{\mathrm{st}}\rrbracket^{db_{1}}}={\llbracket{\mathrm{% st}}\rrbracket^{db_{2}}}\ \forall\mathrm{st}\in S\rightarrow{\llbracket{% \mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}={\llbracket{\mathrm{st}_{% \mathrm{itr}}}\rrbracket^{db_{2}}}⟦ roman_st ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = ⟦ roman_st ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∀ roman_st ∈ italic_S → ⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = ⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (1)

For a specific database state db𝑑𝑏dbitalic_d italic_b, stitrdb{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db}}⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT would give us all the tuples defined on T1,,Tnsubscript𝑇1subscript𝑇𝑛T_{1},...,T_{n}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT satisfying ϕ1ϕnsubscriptitalic-ϕ1subscriptitalic-ϕ𝑛\phi_{1}\wedge...\wedge\phi_{n}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ … ∧ italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and projected on the columns in π1πnsubscript𝜋1subscript𝜋𝑛\pi_{1}\cup...\cup\pi_{n}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

Assume there is a pair of databases db1,db2ΩD𝑑subscript𝑏1𝑑subscript𝑏2subscriptΩ𝐷db_{1},db_{2}\in\Omega_{D}italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT such that stdb1=stdb2stS{\llbracket{\mathrm{st}}\rrbracket^{db_{1}}}={\llbracket{\mathrm{st}}% \rrbracket^{db_{2}}}\ \forall\mathrm{st}\in S⟦ roman_st ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = ⟦ roman_st ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∀ roman_st ∈ italic_S holds but stitrdb1stitrdb2{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}\not={\llbracket{% \mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}}⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ≠ ⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. By the assumption stdb1=stdb2stS{\llbracket{\mathrm{st}}\rrbracket^{db_{1}}}={\llbracket{\mathrm{st}}% \rrbracket^{db_{2}}}\ \forall\mathrm{st}\in S⟦ roman_st ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = ⟦ roman_st ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∀ roman_st ∈ italic_S we know that for all stSst𝑆\mathrm{st}\in Sroman_st ∈ italic_S, if tuple tp𝑡𝑝tpitalic_t italic_p is in stdb1{\llbracket{\mathrm{st}}\rrbracket^{db_{1}}}⟦ roman_st ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT it is also in stdb2{\llbracket{\mathrm{st}}\rrbracket^{db_{2}}}⟦ roman_st ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, and vice versa.

For stitrdb1stitrdb2{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}\not={\llbracket{% \mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}}⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ≠ ⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT to hold, we have to consider two cases:

  1. 1.

    There is a tuple tpstitrdb1tp\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}italic_t italic_p ∈ ⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT such that tp𝑡𝑝tpitalic_t italic_p cannot be constructed from the tuples in set {tpstdb1stS}\{tp^{\prime}\in{\llbracket{\mathrm{st}}\rrbracket^{db_{1}}}\mid\mathrm{st}\in S\}{ italic_t italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ ⟦ roman_st ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∣ roman_st ∈ italic_S }

    • -

      All of the symbolic tuples stSst𝑆\mathrm{st}\in Sroman_st ∈ italic_S are well-formed and T1,,Tnsubscript𝑇1subscript𝑇𝑛T_{1},...,T_{n}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are disjoint, which makes stitrsubscriptstitr\mathrm{st}_{\mathrm{itr}}roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT the symbolic tuple of the Cartesian product of S𝑆Sitalic_S. This means that tuple tpstitrdb1tp\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}italic_t italic_p ∈ ⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is defined on the product of tables T1,,Tnsubscript𝑇1subscript𝑇𝑛T_{1},...,T_{n}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, satisfies ϕ1ϕnsubscriptitalic-ϕ1subscriptitalic-ϕ𝑛\phi_{1}\wedge...\wedge\phi_{n}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ … ∧ italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and projected on π1πnsubscript𝜋1subscript𝜋𝑛\pi_{1}\cup...\cup\pi_{n}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Which means that each tuple tpstitrdb1tp\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}italic_t italic_p ∈ ⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is constructed from the merge of tuples tp1,,tpn𝑡subscript𝑝1𝑡subscript𝑝𝑛tp_{1},...,tp_{n}italic_t italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT where tpiTi,ϕi,πidb1tp_{i}\in{\llbracket{\langle T_{i},\phi_{i},\pi_{i}\rangle}\rrbracket^{db_{1}}}italic_t italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ⟦ ⟨ italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT for i=1,,n𝑖1𝑛i=1,...,nitalic_i = 1 , … , italic_n. Thus, this case is not possible.

  2. 2.

    There is a tuple stitrdb2{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}}⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT such that tp𝑡𝑝tpitalic_t italic_p cannot be constructed from the tuple set {tpstdb2stS}\{tp^{\prime}\in{\llbracket{\mathrm{st}}\rrbracket^{db_{2}}}\mid\mathrm{st}\in S\}{ italic_t italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ ⟦ roman_st ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∣ roman_st ∈ italic_S }

    • -

      Similar to the first case.

Next, we need to show that stitrsubscriptstitr\mathrm{st}_{\mathrm{itr}}roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT determines T,ϕ,π𝑇italic-ϕ𝜋\langle T,\phi,\pi\rangle⟨ italic_T , italic_ϕ , italic_π ⟩:

db1,db2ΩDfor-all𝑑subscript𝑏1𝑑subscript𝑏2subscriptΩ𝐷\displaystyle\forall db_{1},db_{2}\in\Omega_{D}∀ italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT
stitrdb1=stitrdb2T,ϕ,πdb1=T,ϕ,πdb2\displaystyle{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}={% \llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}}\rightarrow{% \llbracket{\langle T,\phi,\pi\rangle}\rrbracket^{db_{1}}}={\llbracket{\langle T% ,\phi,\pi\rangle}\rrbracket^{db_{2}}}⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = ⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT → ⟦ ⟨ italic_T , italic_ϕ , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = ⟦ ⟨ italic_T , italic_ϕ , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (2)

By stitrdb1=stitrdb2{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}={\llbracket{% \mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}}⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = ⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT we know tp1stitrdb1,tp2stitrdb2\forall tp_{1}\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}},% \exists tp_{2}\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}}∀ italic_t italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ ⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , ∃ italic_t italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ ⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and tp1=tp2𝑡subscript𝑝1𝑡subscript𝑝2tp_{1}=tp_{2}italic_t italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_t italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and tp2stitrdb2,tp1stitrdb1\forall tp_{2}\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}},% \exists tp_{1}\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}∀ italic_t italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ ⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , ∃ italic_t italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ ⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and tp2=tp1𝑡subscript𝑝2𝑡subscript𝑝1tp_{2}=tp_{1}italic_t italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_t italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

Intuitively, for a given database db𝑑𝑏dbitalic_d italic_b, stitrdb{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db}}⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT has has more columns and tuples than T,ϕ,πdb{\llbracket{\langle T,\phi,\pi\rangle}\rrbracket^{db}}⟦ ⟨ italic_T , italic_ϕ , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b end_POSTSUPERSCRIPT. Symbolic tuple T,ϕ,π𝑇italic-ϕ𝜋\langle T,\phi,\pi\rangle⟨ italic_T , italic_ϕ , italic_π ⟩ throws away some columns by limiting the resulting tuples to tables in T𝑇Titalic_T which is a subset of T1Tnsubscript𝑇1subscript𝑇𝑛T_{1}\cup...\cup T_{n}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and projecting on π𝜋\piitalic_π which is a subset of π1πnsubscript𝜋1subscript𝜋𝑛\pi_{1}\cup...\cup\pi_{n}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. It also eliminate some rows by applying ϕitalic-ϕ\phiitalic_ϕ to the result set, which is stronger than ϕ1ϕnsubscriptitalic-ϕ1subscriptitalic-ϕ𝑛\phi_{1}\wedge...\wedge\phi_{n}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ … ∧ italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

We need to show that applying these limitations maintains query determinacy. We consider these cases separately:

  • Columns: Projecting away some columns from the evaluation of stitrsubscriptstitr\mathrm{st}_{\mathrm{itr}}roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT is going to maintain query determinacy. We denote by tpπsubscript𝜋𝑡𝑝absenttp\negthickspace\downharpoonright_{\pi}italic_t italic_p ⇂ start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT, projecting tuple tp𝑡𝑝tpitalic_t italic_p to only columns specified in π𝜋\piitalic_π, additionally we use the notation col(T)col𝑇\mathrm{col}(T)roman_col ( italic_T ) to indicate the columns of T𝑇Titalic_T. We use the same notation for tuples and write col(tp)col𝑡𝑝\mathrm{col}(tp)roman_col ( italic_t italic_p ) to denote the set of columns of tuple tp𝑡𝑝tpitalic_t italic_p. For a tuple tp𝑡𝑝tpitalic_t italic_p such that tpstitrdb1tp\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}italic_t italic_p ∈ ⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and tpstitrdb2tp\in{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}}italic_t italic_p ∈ ⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, by projecting away some columns from tp𝑡𝑝tpitalic_t italic_p we end up with a new tuple tp=tpπ𝑡superscript𝑝𝑡𝑝subscript𝜋absenttp^{\prime}=tp\negthickspace\downharpoonright_{\pi}italic_t italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_t italic_p ⇂ start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT such that col(tp)col(tp)col𝑡superscript𝑝col𝑡𝑝\mathrm{col}(tp^{\prime})\subseteq\mathrm{col}(tp)roman_col ( italic_t italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⊆ roman_col ( italic_t italic_p ). Since tp𝑡𝑝tpitalic_t italic_p is in both stitrdb1{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{1}}}⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and stitrdb2{\llbracket{\mathrm{st}_{\mathrm{itr}}}\rrbracket^{db_{2}}}⟦ roman_st start_POSTSUBSCRIPT roman_itr end_POSTSUBSCRIPT ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, and by the definition of ordering ππ1πn𝜋subscript𝜋1subscript𝜋𝑛\pi\subseteq\pi_{1}\cup...\cup\pi_{n}italic_π ⊆ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we can conclude tp𝑡superscript𝑝tp^{\prime}italic_t italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT will also be in both T1Tn,ϕ1ϕn,πdb1{\llbracket{\langle T_{1}\cup...\cup T_{n},\phi_{1}\wedge...\wedge\phi_{n},\pi% \rangle}\rrbracket^{db_{1}}}⟦ ⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ … ∧ italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and T1Tn,ϕ1ϕn,πdb2{\llbracket{\langle T_{1}\cup...\cup T_{n},\phi_{1}\wedge...\wedge\phi_{n},\pi% \rangle}\rrbracket^{db_{2}}}⟦ ⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ … ∧ italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, this follows easily from Def. 14.

  • Rows: Removing some rows from the last step is going to maintain query determinacy. By the definition of ordering we know that dep(ϕ)ππ1πndepitalic-ϕ𝜋subscript𝜋1subscript𝜋𝑛\mathrm{dep}(\phi)\cup\pi\subseteq\pi_{1}\cup...\cup\pi_{n}roman_dep ( italic_ϕ ) ∪ italic_π ⊆ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and that T1,ϕ1,π1,,Tn,ϕn,πnsubscript𝑇1subscriptitalic-ϕ1subscript𝜋1subscript𝑇𝑛subscriptitalic-ϕ𝑛subscript𝜋𝑛\langle T_{1},\phi_{1},\pi_{1}\rangle,...,\langle T_{n},\phi_{n},\pi_{n}\rangle⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ , … , ⟨ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ are well-formed, which means that ϕitalic-ϕ\phiitalic_ϕ only applies to the columns that were retrieved by the intermediate tuple (projected to π𝜋\piitalic_π). Since ϕitalic-ϕ\phiitalic_ϕ is a stronger condition than ϕ1ϕnsubscriptitalic-ϕ1subscriptitalic-ϕ𝑛\phi_{1}\wedge...\wedge\phi_{n}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ … ∧ italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, for a tuple tp𝑡𝑝tpitalic_t italic_p such that tpT1Tn,ϕ1ϕn,πdb1tp\in{\llbracket{\langle T_{1}\cup...\cup T_{n},\phi_{1}\wedge...\wedge\phi_{n% },\pi\rangle}\rrbracket^{db_{1}}}italic_t italic_p ∈ ⟦ ⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ … ∧ italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and tpT1Tn,ϕ1ϕn,πdb2tp\in{\llbracket{\langle T_{1}\cup...\cup T_{n},\phi_{1}\wedge...\wedge\phi_{n% },\pi\rangle}\rrbracket^{db_{2}}}italic_t italic_p ∈ ⟦ ⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ … ∧ italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, if tp𝑡𝑝tpitalic_t italic_p satisfies ϕitalic-ϕ\phiitalic_ϕ then tp𝑡𝑝tpitalic_t italic_p would also be in both T1Tn,ϕ,πdb1{\llbracket{\langle T_{1}\cup...\cup T_{n},\phi,\pi\rangle}\rrbracket^{db_{1}}}⟦ ⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and T1Tn,ϕ,πdb2{\llbracket{\langle T_{1}\cup...\cup T_{n},\phi,\pi\rangle}\rrbracket^{db_{2}}}⟦ ⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Otherwise, if tp𝑡𝑝tpitalic_t italic_p is not in one of then, it is not going to be in the other one either.

  • Tables: Similar to the first case, for a tuple tp𝑡𝑝tpitalic_t italic_p such that tpT1Tn,ϕ,πdb1tp\in{\llbracket{\langle T_{1}\cup...\cup T_{n},\phi,\pi\rangle}\rrbracket^{db% _{1}}}italic_t italic_p ∈ ⟦ ⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and tpT1Tn,ϕ,πdb2tp\in{\llbracket{\langle T_{1}\cup...\cup T_{n},\phi,\pi\rangle}\rrbracket^{db% _{2}}}italic_t italic_p ∈ ⟦ ⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, by projecting away the columns of some of the tables from tp𝑡𝑝tpitalic_t italic_p we end up with a new tuple tp=tpcol(T)𝑡superscript𝑝𝑡𝑝subscriptcol𝑇absenttp^{\prime}=tp\negthickspace\downharpoonright_{\mathrm{col}(T)}italic_t italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_t italic_p ⇂ start_POSTSUBSCRIPT roman_col ( italic_T ) end_POSTSUBSCRIPT. Since tp𝑡𝑝tpitalic_t italic_p is in both T1Tn,ϕ1ϕn,πdb1{\llbracket{\langle T_{1}\cup...\cup T_{n},\phi_{1}\wedge...\wedge\phi_{n},\pi% \rangle}\rrbracket^{db_{1}}}⟦ ⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ … ∧ italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and T1Tn,ϕ1ϕn,πdb2{\llbracket{\langle T_{1}\cup...\cup T_{n},\phi_{1}\wedge...\wedge\phi_{n},\pi% \rangle}\rrbracket^{db_{2}}}⟦ ⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ … ∧ italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, and by Def. 9 TT1Tn𝑇subscript𝑇1subscript𝑇𝑛T\subseteq T_{1}\cup...\cup T_{n}italic_T ⊆ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we can conclude tsuperscript𝑡t^{\prime}italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT will also be in both T,ϕ,πdb1{\llbracket{\langle T,\phi,\pi\rangle}\rrbracket^{db_{1}}}⟦ ⟨ italic_T , italic_ϕ , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and T,ϕ,πdb2{\llbracket{\langle T,\phi,\pi\rangle}\rrbracket^{db_{2}}}⟦ ⟨ italic_T , italic_ϕ , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT.

(1) and (2) would give us:

db1,db2ΩDfor-all𝑑subscript𝑏1𝑑subscript𝑏2subscriptΩ𝐷\displaystyle\forall db_{1},db_{2}\in\Omega_{D}∀ italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT
stdb1=stdb2stST,ϕ,πdb1=T,ϕ,πdb2\displaystyle{\llbracket{\mathrm{st}}\rrbracket^{db_{1}}}={\llbracket{\mathrm{% st}}\rrbracket^{db_{2}}}\ \forall\mathrm{st}\in S\rightarrow{\llbracket{% \langle T,\phi,\pi\rangle}\rrbracket^{db_{1}}}={\llbracket{\langle T,\phi,\pi% \rangle}\rrbracket^{db_{2}}}⟦ roman_st ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = ⟦ roman_st ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∀ roman_st ∈ italic_S → ⟦ ⟨ italic_T , italic_ϕ , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = ⟦ ⟨ italic_T , italic_ϕ , italic_π ⟩ ⟧ start_POSTSUPERSCRIPT italic_d italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT

which allows us to conclude T1,ϕ1,π1Tn,ϕn,πnsubscript𝑇1subscriptitalic-ϕ1subscript𝜋1subscript𝑇𝑛subscriptitalic-ϕ𝑛subscript𝜋𝑛\langle T_{1},\phi_{1},\pi_{1}\rangle...\langle T_{n},\phi_{n},\pi_{n}\rangle⟨ italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ … ⟨ italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ determines T,ϕ,π𝑇italic-ϕ𝜋\langle T,\phi,\pi\rangle⟨ italic_T , italic_ϕ , italic_π ⟩.

Repeating this process for all of the symbolic tuples in Q1subscriptsubscript𝑄1\ell_{Q_{1}}roman_ℓ start_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT would give us Q2Q1subscript𝑄2subscript𝑄1Q_{2}\twoheadrightarrow Q_{1}italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ↠ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT which means Q1Q2precedes-or-equalssubscript𝑄1subscript𝑄2Q_{1}\preceq Q_{2}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⪯ italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. ∎

F-B Symbolic Tuple and DQ Ordering

We present the proof of Lemma 4.

See 4

Proof.

Assume σst({Q1,,Qn})={Q1,,Qn}subscript𝜎stsubscript𝑄1subscript𝑄𝑛subscriptsubscript𝑄1subscriptsubscript𝑄𝑛\sigma_{\mathrm{st}}(\{Q_{1},...,Q_{n}\})=\{\ell_{Q_{1}},...,\ell_{Q_{n}}\}italic_σ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT ( { italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ) = { roman_ℓ start_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , roman_ℓ start_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT } and σst({P1,,Pm})={P1,,Pm}subscript𝜎stsubscript𝑃1subscript𝑃𝑚subscriptsubscript𝑃1subscriptsubscript𝑃𝑚\sigma_{\mathrm{st}}(\{P_{1},...,P_{m}\})=\{\ell_{P_{1}},...,\ell_{P_{m}}\}italic_σ start_POSTSUBSCRIPT roman_st end_POSTSUBSCRIPT ( { italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } ) = { roman_ℓ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , roman_ℓ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT }. We have {Q1,,Qn}subscriptsubscript𝑄1subscriptsubscript𝑄𝑛\{\ell_{Q_{1}},...,\ell_{Q_{n}}\}{ roman_ℓ start_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , roman_ℓ start_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT } {P1,,Pm}subscriptsquare-image-of-or-equalsabsentsubscriptsubscript𝑃1subscriptsubscript𝑃𝑚\sqsubseteq_{*}\{\ell_{P_{1}},...,\ell_{P_{m}}\}⊑ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT { roman_ℓ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , roman_ℓ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT }.

By the definition of subscriptsquare-image-of-or-equals\sqsubseteq_{*}⊑ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT and Lemma 20, we know that for each Qisubscript𝑄𝑖Q_{i}italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in {Q1,,Qn}subscript𝑄1subscript𝑄𝑛\{Q_{1},...,Q_{n}\}{ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } there is at least one Pjsubscript𝑃𝑗P_{j}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in {P1,,Pm}subscript𝑃1subscript𝑃𝑚\{P_{1},...,P_{m}\}{ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } such that QiPjprecedes-or-equalssubscript𝑄𝑖subscript𝑃𝑗Q_{i}\preceq P_{j}italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⪯ italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

We apply tctc\mathrm{tc}roman_tc to Qisubscript𝑄𝑖Q_{i}italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Pjsubscript𝑃𝑗P_{j}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT which would give us tc(Qi)tc(Pj)tcsubscript𝑄𝑖tcsubscript𝑃𝑗\mathrm{tc}(Q_{i})\subseteq\mathrm{tc}(P_{j})roman_tc ( italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⊆ roman_tc ( italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). By applying tctc\mathrm{tc}roman_tc to every element of {P1,,Pm}subscript𝑃1subscript𝑃𝑚\{P_{1},...,P_{m}\}{ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT }, using the basic properties of \cup we will have tc(Qi)tc(P1)tc(Pm)tcsubscript𝑄𝑖tcsubscript𝑃1tcsubscript𝑃𝑚\mathrm{tc}(Q_{i})\subseteq\mathrm{tc}(P_{1})\cup...\cup\mathrm{tc}(P_{m})roman_tc ( italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⊆ roman_tc ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∪ … ∪ roman_tc ( italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) for all i{1,,n}𝑖1𝑛i\in\{1,...,n\}italic_i ∈ { 1 , … , italic_n }.

Since the tiling closure of each Qisubscript𝑄𝑖Q_{i}italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is individually less than tc(P1)tc(Pm)tcsubscript𝑃1tcsubscript𝑃𝑚\mathrm{tc}(P_{1})\cup...\cup\mathrm{tc}(P_{m})roman_tc ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∪ … ∪ roman_tc ( italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ), their union would still be less that tc(P1)tc(Pm)tcsubscript𝑃1tcsubscript𝑃𝑚\mathrm{tc}(P_{1})\cup...\cup\mathrm{tc}(P_{m})roman_tc ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∪ … ∪ roman_tc ( italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) which gives us:

tc(Q1)tc(Qn)tc(P1)tc(Pm)tcsubscript𝑄1tcsubscript𝑄𝑛tcsubscript𝑃1tcsubscript𝑃𝑚\displaystyle\mathrm{tc}(Q_{1})\cup...\cup\mathrm{tc}(Q_{n})\subseteq\mathrm{% tc}(P_{1})\cup...\cup\mathrm{tc}(P_{m})roman_tc ( italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∪ … ∪ roman_tc ( italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⊆ roman_tc ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∪ … ∪ roman_tc ( italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) (1)

We apply the tiling closure to both sides of (1) and rely on Lemma 10 to remove the nested uses of tctc\mathrm{tc}roman_tc, which would give us:

tc(Q1Qn)tc(P1Pm)tcsubscript𝑄1subscript𝑄𝑛tcsubscript𝑃1subscript𝑃𝑚\displaystyle\mathrm{tc}(Q_{1}\cup...\cup Q_{n})\subseteq\mathrm{tc}(P_{1}\cup% ...\cup P_{m})roman_tc ( italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⊆ roman_tc ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ … ∪ italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT )

which by the definition of \vee in the DQ would mean (Q1Qn)(P1Pm)square-image-of-or-equalssubscript𝑄1subscript𝑄𝑛subscript𝑃1subscript𝑃𝑚(Q_{1}\vee...\vee Q_{n})\sqsubseteq(P_{1}\vee...\vee P_{m})( italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∨ … ∨ italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⊑ ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∨ … ∨ italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT )