Disjunctive Policies for Database-Backed Programs
Abstract
When specifying security policies for databases, it is often natural to formulate disjunctive dependencies, where a piece of information may depend on at most one of two dependencies or , but not both. A formal semantic model of such disjunctive dependencies, the Quantale of Information, was recently introduced by Hunt and Sands as a generalization of the Lattice of Information. In this paper, we seek to contribute to the understanding of disjunctive dependencies in database-backed programs and introduce a practical framework to statically enforce disjunctive security policies. To that end, we introduce the Determinacy Quantale, a new query-based structure which captures the ordering of disjunctive information in databases. This structure can be understood as a query-based counterpart to the Quantale of Information. Based on this structure, we design a sound enforcement mechanism to check disjunctive policies for database-backed programs. This mechanism is based on a type-based analysis for a simple imperative language with database queries, which is precise enough to accommodate a variety of row- and column-level database policies flexibly while kee** track of disjunctions due to control flow. We validate our mechanism by implementing it in a tool, DiVerT, and demonstrate its feasibility on a number of use cases.
I Introduction
Database security and information flow security have largely evolved as two disparate areas [1, 2], while sharing closely-related foundations and mechanisms to enforce security. Modern applications commonly rely on shared database backends to provide rich functionality to a multitude of mutually distrusting users. In response to frontend demands, database query languages, with features such as triggers, store procedures, and user-defined functions, have increasingly come to resemble full-fledged programming languages, thus calling into question the adequacy of the underlying access control models [3, 4]. A security policy describes the totality of expectations that we have of a computer system in the face of adversaries that seek to satisfy objectives that may differ from ours. In the context of database systems, whose purpose is to retain and provide information, the security policies of interest constrain who is allowed to learn what parts of that information. A class of such security policies which has proven particularly challenging to enforce with the methods of database security are disjunctive policies, which states that given two pieces of information, some entity may either learn one or the other, but not both.
A common example of disjunctive policies are databases which contain personally identifiable information, such as medical trial data. Biometric parameters of participants are important confounders that must be considered when drawing conclusions from the data, but at the same time releasing too many parameters of any one participant (such as their height, age and weight) might be sufficient to deanonymize them with high confidence [5]. Hence, a security policy for such a database may specify that the user may learn height and age, or height and weight, or age and weight, but not all three. Other examples of scenarios where disjunctive policies are useful include differential privacy [6] and secret sharing.
In this paper, we combine insights from database security and information flow research to develop a formal model for reasoning about disjunctive information in database-backed programs, and thus take a step towards reconciling the two fields. Our model makes it possible to reason about the semantic information dependencies in a program that performs queries, and compare them against a disjunctive policy. Building upon this, we propose a provably sound static enforcement mechanism that ensure that the policy is satisfied.
It is customary in information flow models to represent information as an equivalence relation on states, with the refinement order of equivalence relations corresponding to having more information. This representation can be used for both the actual information conveyed by a computational process and the bound imposed on it as part of a simple, non-disjunctive security policy. The possible equivalence relations on a given universe of states form a structure called the Lattice of Information (LoI) [7], in which security-relevant questions can be answered, such as whether a program reveals no more information than is allowed by the security policy, or what information is revealed by the combination of two programs. Similar questions have been addressed in the database community using an analogous object called the Disclosure Lattice [8]. We observe that this definition is actually insufficient to characterize information, which motivates us to introduce a more specific structure based on query determinacy, the Determinacy Lattice (DL). The formal relation between the Disclosure Lattice or our definition and LoI was hitherto unexplored, and more importantly neither of them can be used to represent disjunctions as seen in our motivating example.
Recently, Hunt and Sands [9] proposed a new information flow structure called the Quantale of Information (QoI), which seeks to address this shortcoming and establish a formal setting for representing, combining and comparing disjunctions of information. We build upon this work to introduce an analogous structure, the Determinacy Quantale (DQ), representing disjunctive dependencies in database-backed programs. As we show, this structure can be formally related to the QoI, and this relationship is analogous to that between the LoI and the DL. We then use the DQ to design a knowledge-based security condition that relates disjunctive dependencies in database-backed programs to disjunctive policies.
We are the first to address the problem of enforcing disjunctive policies. Prior works that develop language-based enforcement techniques in database-backed applications do not support disjunctive policies, while database-level dependencies are restricted to coarse approximations that incorrectly reject secure programs, such as our previous example [10, 11, 12, 13, 14].
Perhaps unsurprisingly, path sensitivity of a static analysis is key to capturing disjunctive dependencies. We show how standard flow-sensitive type-based dependency analysis [15] can be adapted to a compositional path-sensitive analysis and thus capture disjunctive dependencies in terms of database queries. To represent these dependencies in the DQ model, we introduce a sound approximation of the information disclosed by each database query which is precise enough to represent complex combinations of both row- and column-level dependencies. Finally, in the DQ, the combination of these analyses can be proven sound with respect to our security condition. We expect that the overall architecture of the resulting soundness proof, in which we relate a sequence of abstractions of the behaviour of a program to ordered elements of the DQ, can be generalized to many other enforcement mechanisms for our security condition.
To demonstrate the practicality of our approach, we implement this type-based dependency analysis and query approximation for database-backed programs and evaluate it on a test suite and some use cases which effectively illustrate the need for disjunctive dependencies and disjunctive policies.
Summary of contributions.
-
•
We introduce a formal model for reasoning about disjunctive dependencies and policies in databases. In the process, we show how to reconcile perspectives from the database security and information flow communities.
-
•
We introduce a database-specific model of knowledge, the Determinacy Lattice, and a disjunctive extension, called the Determinacy Quantale, and explore their relationship to established general-purpose semantic models.
-
•
Using our model, we define an extensional security condition for database-backed programs that accommodates disjunctive policies.
-
•
We propose a type-based program analysis to capture disjunctive dependencies in database-backed programs, combine them with a novel abstraction of queries, and prove them sound with respect to our security condition. This is presented as an instance of a generalizable architecture for such soundness proofs.
-
•
We implement a prototype tool that uses type-based dependency analysis and query approximation to verify query-based disjunctive policies for database-backed programs, and demonstrate its feasibility on a test suite and a number of use cases.
The rest of paper is structured as follows. After reviewing preliminaries in Section II, we give our account of the DL and introduce the DQ in Section III-C. In Section IV-B, we formalize our model of database-backed programs and the security policies we impose on them, culminating in a formal security condition. We present enforcement mechanisms in Section V, and their implementation and evaluation in Section VI. In Section VII, we contextualize our contributions with a discussion of related work, and finally summarize conclusions in Section VIII.
II Background
II-A Lattice of Information
An equivalence relation on a set is a binary relation that is reflexive, symmetric, and transitive. For example, the equivalence relation parity on the set is defined as . An equivalence relation partitions its underlying domain into disjoint equivalence classes. Given an equivalence relation on a set and , denotes the unique equivalence class induced by that belongs to. We write to denote the set of all equivalence classes induced by . We call a partition of and hereafter we may also refer to each element, i.e. equivalence class, of the partition as a cell. For example, parity partitions into cells and .
Equivalence relations over states are commonly used to represent an agent’s knowledge, by relating two states whenever the agent cannot distinguish between them. When an equivalence relation models knowledge, we also call the cells induced by it knowledge sets. These have a distinct intuitive interpretation when we consider functions that take in some state and return an agent’s view of it. We will write the equivalence relation induced by the output of as . In that case, in a state , the knowledge set represents the agent’s remaining uncertainty about the state, in the sense of all the states that the agent still considers possible, after observing the output of . The agent knows anything that is true in all states in the knowledge set. In this paper, we use the terms knowledge and information interchangeably.
A complete lattice is a set equipped with a partial ordering (reflexive, antisymmetric, and transitive) relation, maximal and minimal elements and for this relation and a join (least upper bound) for any subset of elements. The meet (greatest lower bound) of a subset can be defined as the join of the set of all lower bounds of that subset [16]. The Lattice of Information (LoI) [7] is a structure for representing the ordering of information with equivalence relations. Let be the set of all equivalence relations defined on a given domain . The LoI ranks these equivalence relations based on the information they reveal about the underlying domain. Given two equivalence relations , this ordering can be defined as follows:
For any set , the least upper bound of is the equivalence relation defined as:
Formally, denotes the LoI on domain , with ordering relation and join . The top element in the lattice is the most precise equivalence relation id such that , and the bottom element is the least precise equivalence relation .
The join of any two equivalence relations , being their least upper bound, is the least informative equivalence relation that is at least as informative as either of and (i.e. is an upper bound on both), and thus represents the information that is conveyed from learning both and . We refer to this as the conjunction of the information in and .
II-B Quantale of Information
The LoI captures the conjunction of any two information sources and as the join of their respective equivalence relations. However, it does not offer an operator that would yield a representation of their disjunction, that is, the information that can be obtained from having access to one of them, but not both. In fact, the disjunction can not in general be represented as a single equivalence relation, and thus an element of the LoI, at all. To address this limitation, Hunt and Sands [9] propose a generalization of the LoI called the Quantale of Information (QoI). A quantale is a complete lattice with an additional binary “tensor” operator . In the QoI, the tensor is used to represent conjunction, while the lattice join represents disjunction.
The core idea behind the quantale structure is to interpret the disjunction of several knowledge relations as describing all knowledge relations in which the knowledge always comes from one of the . More concretely, in any possible state , the agent’s knowledge should equal its knowledge in the same state in one of the disjuncts, . Which disjunct it is may depend on the state, so the agent may have knowledge from in the state but knowledge from in some other state . Relations that satisfy this condition are called tilings, based on a picture of covering (since every state needs to be in some equivalence class) the space of possible states with knowledge sets drawn from any of the disjuncts. Following Hunt and Sands, we define the set of all tilings
where is a set of equivalence relations.
We would like to think of a relation as describing no more knowledge than a disjunction if it’s bounded above by some in the LoI, and more generally define the quantale ordering for as . The resulting relation is not antisymmetric on general sets of relations or even es of general sets, reflecting the circumstance that there may be multiple es representing the same knowledge. As it is standard in lattice theory [17], we use the downwards closure operator to obtain canonical representations of the order cycles of and hence construct a partial order.
The tiling closure of a set of equivalence relations ,
then canonically represents the knowledge permitted by the disjunction . The set can still be interpreted as a list of possible equivalence relations, now including any equivalence relation that does not reveal more information than the disjunction.
We then take the elements of the QoI on a state set to be all tiling closures of subsets of , with the ordering being set inclusion. For the tensor , we rely on the join operator of the LoI to calculate the least upper bound of any possible pair of equivalence relations in and and then canonicalise the result. Since the sets are interpreted disjunctively, the join can simply be defined as .
Example 1.
Program 1 operates on a secret integer between -2 and 3, outputting to user whether it is greater than zero, and either (if it isn’t) whether it is even, or (if it is) whether it equals 0 or 1 (by dividing by 2, rounding down and testing for 0). We expect the information released by the program ( in Fig. 1) to be bounded by the disjunction of the knowledge relations capturing the two possible branches (resp. , ).
This could not be accurately expressed with LoI operations, since , and are all incomparable, but the join of and (as the only available nontrivial way of combining them) is equal to and so would equally bound a program that directly releases . However, can be tiled with equivalence classes from and , and we in fact have . So in the QoI, , and hence .
III Information Ordering in Databases
Our goal is to introduce our semantic model for the information revealed by database queries, the Determinacy Lattice, and its extension to disjunctive dependencies, the Determinacy Quantale. To this end, we first review a standard formalism for reasoning about databases that we will employ.
III-A A Primer on Relational Database Models
We use the relational model to formally define databases [18]. In this model, we distinguish between the database schema , which specifies the structure of the database, and the database state , which specifies its actual content.
A database schema is a (nonempty) finite set of relation schemas , written as . A relation schema (table) is defined as a set of attributes paired with a set of constraints, where an attribute is a name paired with a domain. The number of attributes in (written as ) is referred to as its arity. A tuple is a set of data representing a single record within a relation schema. Each tuple contains values for each attribute defined in the relation schema.
A database state is a snapshot of the database schema at a particular point in time. It represents the actual data stored in the database, consisting of a collection of tables and their respective tuples. We write to represent the tuples of table under database state .
We write to denote the set of all database states of . A database configuration is where is the database schema and is a set of integrity constraints. We denote where is an appropriate notion of constraint being satisfied. An integrity constraint is an assertion about a database that must be satisfied for a database state to be considered valid. Various classes of integrity constraints exist, for instance functional dependencies which capture primary-key constraints, and inclusion dependencies which are used in foreign-key constraints [18].
Relational calculus. We rely on the Domain Relational Calculus (DRC) for our query language. In the DRC, a (non-boolean) query over a database schema has the form , where is a sequence of variables, is a first order formula over , and the free variables of are those in . The evaluation of a query , denoted by , is the set of tuples that satisfy the formula with respect to . A boolean query is written as , and its evaluation is defined to be the boolean value if and only if some tuple in satisfies . We use to indicate the universe of all possible queries.
The domain relational calculus employed here follows the standard convention, and we refer the reader to the relevant literature for a more comprehensive description of DRC [18].
|
||||||
|
Example 2.
The database schema in Fig. 2 contains relations for employees and managers . A query returning the set of tuples containing the division names and the salary of the managers of each division can be written as:
Views. In DRC, a database view is a relation defined by the result of a non-boolean query. Database views act as virtual tables and, as we will see, are useful when defining security policies. Formally, a view defined over database schema is a tuple , where is the view identifier and is the non-boolean query over schema defining the view. The query may refer to other views, but we assume that views do not have cyclic dependencies.
The materialization of a view in a database state is the evaluation of its defining query in that state, i.e., . We use to refer to the defining query of view . We extend relational calculus in the standard way to work with views [3].
III-B Determinacy Lattice
Given query sets , query determinacy [19] captures whether results of the queries in are always sufficient to determine the result of the queries in .
Definition 1.
determines (denoted by ) iff for all database states , , if = for all , then = for all .
Intuitively, means that pairs of databases for which all queries in return the same result also give the same result under any query in . This is in fact equivalent to the initial gloss that the results of queries in can be computed from the results of queries in , as we show in detail in Appendix A.
Query determinacy allows us to define an ordering on sets of queries based on the information they reveal. We call this ordering determinacy order, denote it by , and define it as , iff .
Example 3.
Consider queries and defined on the relations of Fig. 2. Query discloses the and the of the employees while only returns their . Intuitively, reveals more information than , which means .
This definition of determinacy order is a preorder (reflexive and transitive), but not necessarily a partial order, as it is not anti-symmetric. In other words, and does not necessarily mean that . As in Section II-A, this essentially means that query sets are not canonical representations of the information revealed by them. To rectify this, we form the closure under the determinacy order, so the determinacy order becomes set inclusion. Intuitively, will contain all the queries in whose answers can be inferred by the set of queries . Formally, is defined as:
Using the definitions of determinacy order and closure , we can then define the Determinacy Lattice as follows:
Definition 2.
Given a universe of queries , the Determinacy Lattice is a complete lattice such that:
-
•
-
•
iff
-
•
-
•
, ,
where is the determinacy order on .
Disclosure order and information flow properties. Our definition of the Determinacy Lattice is similar to the definition of the Disclosure Lattice introduced by Bender et al. [8]. A Disclosure Lattice is a lattice built upon a disclosure order, which is a partial order on sets of queries satisfying additional conditions that are expected of an ordering according to the amount of information disclosed by each set of queries. Bender et al. [8] define the disclosure order as follows:
Definition 3.
Given a universe of queries , a disclosure order is a preorder on that satisfies the following properties:
-
1.
For all , if then
-
2.
If and then
The first property in this definition ensures that adding new elements to a set of queries only increases the amount of disclosed information and the second property allows us to derive a meaningful upper bound on the information disclosure.
The intended use of disclosure order was to order sets of queries based on the amount of information they reveal about the underlying database. However, we make the observation that this definition is not specific enough to characterize information disclosure in the information flow sense. For example, consider query containment [18], defined as:
Definition 4.
Given queries , we say that is contained in , denoted by , if for every database states , we have .
Query containment satisfies all of the requirements of a disclosure order (Def. 3), but it is not enough to guarantee security. To illustrate this, consider a database with a single table given in Fig. 3.
Table has a single column , and contains values , , and , where is a secret value that can be either or . We thus consider two possible instances of this database, one where contains values , , and and another where it contains , , and . Now, consider the following queries:
where and are just logical copies of table . It is common practice to make logical copies of relation and use them in queries with self-joins [20]. The result of query is always and . The result of query is if the secret is and if is . As it is evident, for these queries, query containment holds and the result of query is contained in the results of . However, an observer seeing the result of query can learn the value of secret .
This example illustrates that query containment (a disclosure order) is not sufficient to guarantee the confidentiality of the secret in an information flow setting. To ensure information flow security, we require a stronger condition, such as the notion of query determinacy order (Def. 1) that we chose to rely on in this paper.
Relation between the DL and the LoI. There exists a close relationship between the DL and the LoI. Specifically, a query defined over a database schema induces an equivalence relation on database states . We can formally define this equivalence relation as:
We write to denote the set of all equivalence classes induced by . Given an equivalence relation on set and , denotes the equivalence class induced by to which the database state belongs. We further lift this definition to sets of queries :
This interpretation of database queries as equivalence relations provides a direct connection between the DL and the LoI, where the lattice elements correspond to , the ordering to the determinacy order , and join and meet follow the definitions of the DL.
Lemma 1.
For all , there is a complete lattice homomorphism from the Determinacy Lattice to the Lattice of Information defined on .
We prove this Lemma in Appendix B. To the extent that we believe to accurately represent the information conveyed by the queries in , this lemma implies that joins and order comparisons can be performed in the DL without explicit reference to the LoI.
III-C Determinacy Quantale
We introduce a generalization of the Determinacy Lattice, called the Determinacy Quantale (DQ), to represent disjunctive dependencies. Our definition of the DQ is intended as a counterpart to the QoI [9], analogously to how the DL corresponds to the LoI. To achieve this, we define a query-set counterpart of the tiling closure operator to capture the disjunction of sets of queries. Since sets of queries correspond to LoI elements (equivalence relations), disjunctive QoI elements (sets of equivalence relations) will be represented as sets of sets of queries. Each set of queries in the outer set represents a possible combination of queries that does not reveal more information than is allowed by the disjunction.
Analogously to the QoI, the tiling closure of a set of sets of queries is defined by forming the downward closure under (from the DL) of their mix. The query-set equivalent of the mix operator is defined on a set of sets of queries such that for as follows:
where denotes the equivalence classes of as defined previously. We then define the tiling closure for a set of elements of the DL as .
We then formally define the Determinacy Quantale as follows.
Definition 5.
Given a universe of queries , let be the Determinacy Lattice defined on . The Determinacy Quantale is the quantale , with:
-
•
-
•
-
•
-
•
-
•
, , ,
where .
In Appendix C we show that Def. 5 satisfies the usual quantale axioms [9]. As with the DL and LoI, the DQ embeds into a QoI by a quantale homomorphism. This QoI is defined on sets of equivalence relations derived from sets of sets of queries by the following map:
Definition 6.
Given a set of sets of queries ,
We can then formally state the relationship between the DQ and this quantale as follows.
Lemma 2.
For all , there is a quantale homomorphism from the Determinacy Quantale to the Quantale of Information defined on .
Example 4.
To illustrate the Determinacy Quantale in practice, consider Program 2, which issues either query or to the database. Query q1 returns the and columns of the entry in table if the role of that entry is . Similarly, query q2 returns the and columns if the role of the entry in is .
Consider a policy defined on queries and . v1 and v2, which respectively project on the and , and the and columns of , are used in defining the disjunctive security policy .
For this example, we assume a database that has only one row in the table, and we also limit the domain of possible roles to . These limitations are necessary in order to have a finite representation of the potential query sets and enables us to effectively depict the sets produced by the and operators.
Program 2 depicts a disjunction that – ignoring variable y – depends either on or (i.e., ), which on the DQ can be represented as a point . Similarly, the policy can be represented on the DQ by .
Illustrating this point requires calculating the set of v1 and v2, which includes all sets of queries whose equivalence relation can be constructed from the equivalence classes of and . Unfortunately, for any sufficiently rich query language, our definition of inevitably yields an infinite set, as infinitely many queries that are “morally equivalent” or even the same up to renaming variables represent the same knowledge set. To compactly represent such infinite sets, we will pick just one representative, and define
as a closure operator that adds all equivalent queries. Then will be the set , , , , where and .
Therefore, we can depict the policy as the point , , , on the DQ. Similarly, the DQ point of the Program 2 (i.e., ), can also be depicted by the point on the DQ. We illustrate the part of the DQ which includes these points in Fig. 4, and as it is evident from the figure, conclude that Program 2 is inline with the policy.
IV Security Framework
Drawing on the quantale model of dependencies for programs and databases, we develop an extensional condition that defines security for programs that interact with databases and support disjunctive security policies. We will later use the security condition to prove soundness of enforcement mechanisms in Section V. Specifically, we formalize the syntax and semantics of a simple imperative language with database queries. Programs read the input from the database via queries, while users receive the output through predefined output channels. We define (disjunctive) security policies as views over the database and interpret them end-to-end. We then use this model to define a knowledge-based security condition for our setting.
IV-A Language
Syntax. The syntax for the commands of our language as depicted in Fig. 5, primarily consists of standard commands such as assignment, conditionals, and loops. The command outputs the result of evaluating expression to user . The command issues the query to the database and stores the result in variable . For modeling the queries, we rely on conjunctive queries with comparison introduced in Section V-A.
Expressions can be variables , values (integers) , binary operations , single tuples , and set of tuples . For simplicity, we do not provide de-constructors for database tuples.
Semantics. As discussed in Section III-C, a database state (or simply state) is defined with respect to a schema and a finite set of integrity constraints. A configuration consists of a command , a memory map** variables to values, and a state .
The semantics of expressions is mostly standard and its rules are presented in Fig. 6. We use judgments of the form to denote that an expression evaluates to value in memory and state . For simplicity, we refrain from defining binary operations on tuples, unless the underlying database query is boolean.
We use judgments of the form to denote that a configuration in one step evaluates to memory and state and (possibly) produces an observation ; we write whenever a command produces no observation. We write to denote a memory with variable assigned the value .
Fig. 7 provides the semantic rules for commands. The query evaluation rule QueryEval is similar to assignment as it evaluates a query into state and stores the result in the variable . We use the command to produce an observation. Formally, an observation is a tuple , where is the identifier of the user observing the output and is the result of evaluating expression , which is either a simple value or the result set of a non-boolean query.
We write to denote when takes one or more steps to reach configuration while producing the trace (sequence of observations) . We omit the final configuration whenever it is irrelevant and write .
IV-B Security Model
We now introduce our knowledge-based security model for disjunctive security policies. For simplicity, we denote the initial program memory by and assume it is fixed and public to all users, hence the only way to input sensitive information is through database queries. Users make observations through output channels, hence their knowledge of the database is determined by what they can infer based on these observations. This model induces standard equivalence relations for database states and observation traces.
Database state equivalence. Two states and are equivalent with respect to a set of tables and views , written as , iff all tables and views in have identical contents in and . Formally, states and are equivalent with respect to iff for all view and for all table . A set of tables and views induces an equivalence relation, and for a state , the equivalence class contains all states that are equivalent to with respect to .
Trace equivalence. We use trace projection to define trace equivalence. The projection of a trace for user written as is the sequence of all observations in that can observe. Traces and are equivalent with respect to user , written as , iff the projection of one of them to is the prefix of the other, i.e., or .
Equivalence of trace prefixes is a standard technicality needed to ignore leaks due to program’s progress/termination [21], and here we adapt a definition of trace equivalence which does not differentiate between program divergence and termination [14].
User knowledge. When executing a program , we assume memory is always initially in the all-zero state . Thus, we can view a program’s execution for any user as a function from database to user-observable output traces, when . This function induces an equivalence relation on databases, , which characterizes the knowledge of conveyed by the output of to .
Security policy. A security policy is a list of user policies (written as ) for each user . User policies are defined as views and table identifiers over a database schema, and determine what a user is allowed to observe. Fig. 8 presents the syntax of disjunctive policies for our model. They are defined as a set of sets in order to represent a disjunction of conjunctions of simpler policies. A conjunction is a set of view and table identifiers, and a disjunction is a set of conjunctions. For example, the policy for user who is allowed to see table and view , or view but not both, is defined as .
The overall policy of the system, written as , is the list of user policies. Per Def. 6, the policy can be represented semantically as an element of the Quantale of Information. Thus, we can formulate our security condition as the assertion that the knowledge of the database that the execution of the program conveys to is bounded above by the disjunctive knowledge allowed by the policy, .
Definition 7.
The program is secure for the user and policy if .
V Enforcement of Disjunctive Policies
Having formulated the security condition, we would like to prove that useful programs satisfy it. To this end, we introduce a sound static enforcement mechanism, which imposes some structural limitations on the policy and trades off some completeness for the sake of efficiency and ease of analysis.
Fig. 9 illustrates how our mechanism functions at a high level. We assume as input a program and policy in the format described in Fig. 5 and Fig. 8 respectively. The program is then subjected to a static dependency analysis (Section V-B), which computes an overapproximate set of possible paths of control flow through the program, along with the queries (dependencies) retrieved for each path, giving an element of the DQ, that is a (disjunctive) set of (conjunctive) sets of queries. Per Fig. 8, the policy is also already given in this format.
We would like to verify that the program dependencies are bounded by the policy in the DQ, as by Lemma 2, this entails the security condition (Def. 7) that the disjunctive information that is revealed by the program is bounded above by the QoI interpretation of the policy. However, checking DQ ordering on general queries may be computationally costly. We therefore abstract (Section V-C) both the policy and the path dependencies into a more tractable format (symbolic tuples), which again overapproximates the information they can retrieve. To guarantee soundness, we require that the views in the policy are such that this abstraction is lossless for them. Finally, as the security check (Section V-D), we compute a tractable comparison on sets of sets of symbolic tuples that can be shown to imply DQ ordering.
V-A Conjunctive Queries
While our theoretical definitions are based on the fully-general domain relational calculus as a query language, to avoid complexity, our enforcement mechanism will work with a restricted subset called conjunctive queries with comparisons (CQCs). This language is a subset of relational calculus that only employs conjunction () and existential quantification () and omits disjunction (), negation (), and universal quantification (). CQCs can model SELECT-FROM-WHERE portion of SQL, where there are only AND and comparisons in the WHERE clause.
Our language for (non-boolean) CQC over a database schema employs the standard notation [18, 20], and has the form :
where are relations in , and are their variables. We use to denote the set of variables appearing in the body of the query . are formulae of the form where is the comparison operator which could be anything from and and are either variables in or constants.
We require that . Without loss of generality, we assume that there are no self-joins in the query. In case of queries with self-joins, we can make logical copies of the relations to accommodate them [20]. The body of a CQC comprises two parts, namely the relation identifiers referred to as , and the conditions denoted by .
Similarly to Section III-A, the evaluation of on the database state (denoted by ) is defined by taking all tuples in the cartesian product of in that satisfy , and projecting to the column set .
Example 5.
Consider the database schema in Fig. 2. The following query returns a set of tuples containing the names of divisions whose managers have a salary of more than :
V-B Type-based Dependency Analysis
Our static dependency analysis builds on the generic type system of van Delft et al. [15] and extends it with support for disjunctive dependencies. We intuitively expect that a disjunctive dependency analysis must be path-sensitive, so as to distinguish between different executions and also keep track of the history of observations. Both of these requirements are often challenging for type-based analyses, which do not naturally align with the execution order. We will first illustrate these challenges with examples and then present our analysis.
Program 3 illustrates the need for path sensitivity. The analysis should distinguish between the then branch, where variable depends on the set , and the else branch where depends on . Our reference analysis [15] would join these two sets at the end of the if statement, ultimately yielding the dependency set . In our analysis, these sets are never joined, but instead combined to form a set of sets, namely, , where the outer set represents a disjunctive dependency and the inner sets represent conjunctive dependency.
Program 4 illustrates the need to keep track of the observation history. It outputs at lines and , and the dependency set of in both places is . However, this program will always output both and . Now, if a policy only allows user to see either query or , the outputs at lines and will be incorrectly accepted. Hence, the analysis should account for all outputs to user .
Fig. 10 depicts the rules of our disjunctive dependency analysis. We use judgments of the form , where is an environment map** variables to set of sets of dependencies . The set of variables is , where are program variables, are users, and is the program context. The dependencies are , where are variables and are queries that can be issued to a database. We use to indicate the dependencies of all outputs to user .
We start by introducing the operators and auxiliary functions employed within the rules, and then proceed to explain the rules themselves. The operator is used to join two (or more) sets of sets, defined as:
For example, the join of and is:
Intuitively, the result of the join operator is a set of sets capturing the product of the original sets of sets under the set union operation. We use this operator to calculate all the possible combinations of two environments.
represents the sequential composition of two environments. Intuitively, is the same as but updated with all of the dependencies that have been previously established in . Formally:
For example, the sequential composition of the environments
evaluates to
Finally, the operator calculates the union of two environments: . This operator is used in conditionals to capture the disjunctive join of the two branches. For example, in line 5 in Program 3, and , and the result of would be .
For loops, we rely on the fixed-point of , denoted by , which we define as:
where and .
In these rules, is the identity environment, defined as , and denotes the free variables of expression .
T-Assign updates the dependency set of the assigned variable to the set of the free variables of expression and , otherwise it matches the identity environment. Rule T-QueryEval is similar to assignment, except that instead of , it adds query to the dependency set.
T-If sequentially composes the dependency sets of each branch with the environment , thus adding variables of the branch condition to the dependency set of each branch. Finally, these environments ( and ) are joined disjunctively using the operator.
T-While uses the fixed-point operator to calculate the dependency set of the loop. To do so, it first calculates the dependency set of the loop body, which is sequentially composed with to account for the dependencies to the loop condition. Finally, the fixed-point operator computes the dependency set of the while loop.
T-Output relies on the dependency set including , and , where includes all the variables of the expression outputted to user , captures the implicit dependencies to the path conditions, and is the dependency set of user and captures the history of dependencies that user might have observed up to this point. Observe that by the definition of sequential composition, all the dependencies of the previous outputs will be added to .
This analysis yields a final environment . The result of the analysis is the value of this environment for the user identifier , which includes both queries and program variables. Since program variables do not contain sensitive information, and we are primarily concerned with queries, we refine the result of to only include queries. This refined outcome defines the ultimate result of our analysis, denoted as :
The soundness proof of our enforcement relies on the circumstance that, if the set of queries on which the -outputs of depend when running on a database state are denoted by , then this set is guaranteed to be found in the set produced by the dependency analysis. We show how to define using a taint-tracking semantics presented in Appendix E. Formally, this gives rise to the following soundness condition for the dependency analysis.
Lemma 3.
For all , .
V-C Query Abstraction
Even for CQCs, comparing the information revealed by sets of queries is hard in general. To define a well-behaved and more tractable determinacy order on which to build our DQ, we introduce another overapproximating abstraction, which we will use to soundly label queries and policies.
We define a symbolic tuple as , where is a set of table identifiers, is a boolean combination of equality, inequality, and comparisons over the columns of the tables in , and is a subset of the columns of the tables in . In a symbolic tuple, denotes the query’s projection on the columns of the tables in , and defines the constraints over the rows.
Example 6.
The symbolic tuple of query defined on the relations of Fig. 2 would be .
While calculating the exact set of symbolic tuples of a relational calculus query is intractable for many classes of queries, it is tractable for conjunctive queries with comparison (CQC). Given a conjunctive query , the function computes a symbolic tuple from as follows:
where and defined in Section V-A return the relation identifiers and conditionals of , respectively. Here, is the query obtained by recursively replacing views with their definitions. We lift this definition to sets of queries , and define as .
Using , we define the function for a set of sets of queries as follows:
Policy Analysis. The function can also be used to map a disjunctive security policy to a set of labels. However, in order to ensure soundness and avoid approximation, we place some constraints on policies. (1) To make computing the set of symbolic tuples tractable we only support policies with views in the CQC form. (2) We require that the symbolic tuples of views be well-formed, which we define as:
Definition 8.
The symbolic tuple is said to be well-formed if it satisfies .
where and returns the column dependency set of .
Well-formedness ensures that the symbolic tuples are precise, at the expense of limiting a view to only applying constrains on the columns which it projects on.
Furthermore, we treat the table identifiers used in policies as special views that return the whole table. For instance, a policy which allows access to table can be rewritten as view .
As discussed in Section IV, the disjunctive security policy of user (written as ) is a set of conjunctions , interpreted as a disjunction of conjunctions of table and view identifiers. For a policy that adheres to the constraints mentioned earlier, is defined as follows:
Labels. In our model, a security label is defined as a set of symbolic tuples, and we define the ordering relation of two labels, written as , as follows:
Definition 9.
iff for all symbolic tuples , there are well-formed symbolic tuples in such that , are disjoint, , and .
To ensure soundness, we assume that all of the symbolic tuples in the right hand side of are well-formed. This definition relies on entailment to check the ordering of , and write which means that any assignment that satisfies also satisfies .
Example 7.
Consider symbolic tuples and . We have since , , and .
V-D Enforcement
The dependency analysis of Section V-B extracts the dependencies of program ’s outputs to user and produces . Applying to yields a set of labels, each bounding the information revealed in some path, the -knowledge of (denoted by ). We interpret this as a disjunction, as any execution follows along one particular path.
Similarly, applying to the disjunctive security policy of user (i.e., ) results in a set of labels. Each label faithfully captures one conjunction, and so the policy is also represented as a set of labels , interpreted disjunctively.
By Lemma 2, to verify that the security condition is satisfied, it is sufficient to establish that in the DQ. However, checking in the DQ is not generally tractable. For the security check, we therefore instead perform a twofold approximation: we check ordering for the conjunctive inner sets using the approximate ordering , and approximate the mix-based ordering on the disjunctive outer sets in a way that loses little relative to our analysis:
Definition 10.
We say that iff
where and are labels, and is the symbolic tuple ordering of Def. 9. To ensure faithful labeling of policies, we assume all of the symbolic tuples in are well-formed as defined in Def. 8. We can then formalize the relationship between and as follows.
Lemma 4.
If , then in the DQ, .
We refer the readers to Appendix F-B for the proof of this Lemma.
V-E Soundness Proof
Fig. 11 outlines the overall architecture of our enforcement mechanism and the correctness assertion that we make of it.
The rightmost column of Fig. 11 represents a chain of information order relations in the QoI, which we establish for each enforcement step. Following the chain from bottom to top, we obtain the security condition of Def. 7. At the same time, the “left boundary” of the figure, comprising the D.A., abstractions and check, represents the computations that are actually performed to check a program.
VI Implementation and Evaluation
In this section, we describe our prototype DiVerT [22], which implements the type-based dependency analysis of Section V-B and query abstraction of Section V-C to verify the security of database-backed programs. We then evaluate DiVerT’s effectiveness using functional tests and an assortment of real-world-inspired use cases.
VI-A Implementation
To evaluate the feasibility and security of our approach in practice, we implemented the type-based dependency analysis of Section V-B. For the sake of practicality, instead of CQC, DiVerT uses the SELECT-FROM-WHERE portion of SQL, which is analogous to CQC as described in Section V-A. Following the query analysis of Section V-C, these SQL queries are then converted into symbolic tuples. For the security check, the symbolic tuples with the result of the program analysis must be compared to those representing the policy; to perform this comparison following Def. 9, we use the Z3 SMT solver [23]. Our implementation operates on programs in the language presented in Section IV-A, with the addition of two macros @Table@ and @Policy@ for defining the tables’ schema and the security policy.
VI-B Test suite
To validate our implementation, we use a functional test suite consisting of 20 programs, designed to capture a broad variety of examples of disjunctive dependencies. This suite includes programs with row- and column-level policies of varying granularity levels, and those necessitating the use of SMT solvers for verification. Furthermore, the tests verify the behaviour of the dependency analysis by incorporating complex conditionals, loops, and implicit and explicit outputs. The tests can be found in the implementation repository [22].
VI-C Use cases
We evaluate DiVerT on four use cases inspired by real-world problems in which disjunctive policies naturally arise. The purpose of this evaluation is to validate the security analysis of DiVerT on realistic scenarios involving disjunctive policies, and ensure that its behaviour is consistent with the definitions of Section IV-B. Rather than analysing complete applications for each example, we therefore focus on smaller kernels that capture the core security-critical behaviour of the respective problem.
Privacy-preserving location service. Multilateration is a technique to determine the location of a user by measuring their distance to known reference points [24]. Two distances are sufficient to narrow a user’s location down to one of two points on a map, and three identify the location unambiguously. Consider a location service provider which tracks, for some number of users, not only their precise location but also their distances to certain points of interest (PoI) such as restaurants or shops. An advertiser wants to query this service to provide location-based ads. For example, if the user is close to a shop , and has a sale going on, the user may be enticed by this information.
Privacy and business considerations make it desirable to not reveal the precise location of the user to the advertisement company accessing the database, while still allowing for some location-based services in this vein. If the advertiser were to learn the distance of a single user to two or more PoIs at a specific time, the user’s location could be inferred. However, we may still want to release the user’s distance to any one PoI which they are currently closest to. This can be interpreted as a disjunctive policy, in which the information revealed for each user is bounded by the disjunction of that user’s distances to some single PoI.
The database schema consists of a single table Distance(id, poi, dis, loc), which stores the ID of each user, the name of the PoI, their distance, and the user’s precise location. We implement a small example with two PoIs and two users . Let the view for each user and PoI be defined as the query SELECT id, poi FROM Distance WHERE id = AND poi = . The disjunctive policy then covers every combination of user and PoI as a possibility: .
We test two programs against this policy. In one, the advertiser uses internal parameters identifying a target user and interest, and issues a single query requesting that user’s distance from the relevant point of interest. In the other, the advertiser still targets a particular user, but queries all of that user’s distances. As expected, DiVerT accepts the former program, but rejects the latter.
Privacy-preserving data publishing. Expanding upon the motivating example in the introduction, we consider the case of programs querying a database with personally identifiable information (i.e., quasi-identifiers). As discussed before, revealing too many quasi-identifiers may make it possible to identify an individual. We consider the example of a medical database [5] with a table Patients(zip, gen, dis) storing the ZIP code of residence, gender and disease of patients. An agent querying the database should not learn more than two of these at a time. For simplicity’s sake, we only consider queries that retrieve the same data from each patient. Defining SELECT dis, gen FROM Patients, SELECT zip, gen FROM Patients, and SELECT zip, dis FROM Patients, the disjunctive policy can then be written as .
Once again, we validate two programs against this policy. Branching on an internal parameter, the client will issue one query to select data for either male or female patients. In the first program, all queries take the form of SELECT dis FROM Patients WHERE gen = , whereas in the second one, one of the queries additionally filters on the ZIP code: SELECT dis FROM Patients WHERE gen = AND zip = . Again, only the latter program is rejected by DiVerT. This reveals a potential subtlety, as data dependency and hence release of information may arise not only from what columns are selected, but also from conditions restricting the set of rows.
Secret sharing. We implement a secret sharing schema that splits a secret value into shares . These shares are then distributed among parties , each receiving a unique share. A secure secret sharing schema requires that the secret can only be reconstructed if or more participants combine their shares. If the number of combined shares is less than , no information about the secret should be revealed. This requirement naturally translates to a disjunctive policy , stipulating that participants can each only learn one share.
We assume that the shares are created by a secure secret sharing schema and are then stored in a database. The database schema consists of the table Shares(shareID, shareVal) which stores the ID of each share and their corresponding value.
The policy only allows a user to read one of the shares (i.e., only one row of the table). We define the view for each share as SELECT shareVal, shareID FROM Shares WHERE shareID = where . The corresponding disjunctive policy is going to look like .
We implement a program that executes a subroutine for each user, issuing a database query to retrieve the user’s share. For example the query for a user to retrieve the share number is SELECT shareVal FROM Shares WHERE shareID = and it is correctly accepted by DiVerT. If the same user issues another query to retrieve share number , it violates the policy and hence the program is rejected. This scenario shows that DiVerT is able to correctly enforce row-level policies precisely.
Online shop. This use case models an online shop and a user with a gift card can only use it to “buy” items that match the value of the gift card. Here we consider a scenario with an online shop that only provides digital items and they are stored in a database. The database schema consists of the items table Items(id, name, data) which stores the ID and name of each digital item. We define a view for each item as SELECT data, name FROM Items WHERE name = where is the item’s name.
Assume a database that has the items Movie, CinemaTicket, Audiobook, Ebook, and GymMem. A policy should only allow the user to access a certain amount of items whose value adds up to value of gift card. For instance a disjunctive policy may look like: .
We model a user program that issues queries to select items, e.g., SELECT data FROM Items WHERE name = .
DiVerT accepts this query because view allows the user to access Movie. We create two different scenarios; in one the user issues another query asking for Audiobook, which DiVerT rejects. In the second scenario, the user asks for CinemaTicket which is allowed by the policy, and hence DiVerT accepts it.
VII Related Work
This section puts our contributions in the context of related works in the areas of information flow security and database security, discussing security models of dependencies and tractable enforcement mechanisms. To our knowledge, we are the first to explore enforcement mechanisms for disjunctive policies, as well as to reconcile semantic models of (disjunctive) dependencies across the areas of information flow control and database access control.
Security models. Semantic models of dependencies have a long history since the introduction of the Lattice of Information (LoI) by Landauer and Redmond [7]. These models define a lattice structure to represent information as equivalence relations ordered by refinement and serve as cornerstone to justify soundness of various dependency analysis at the heart of enforcement mechanisms for security. For example, the universal lattice by Hunt and Sands [25] models dependencies between program variables such that the lattice elements are sets of variables ordered by set containment, and uses it to justify soundness against baseline security conditions, e.g., noninterference [26].
Within the database community, Bender et al. [8, 4] define the notion of Disclosure Lattice to represent the information disclosed by sets of database queries. Disclosure Lattice has been further developed by Guarnieri et al. [14] to enforce conjunctive information-flow policies for database-backed programs. We point out that not all disclosure orders are suitable to represent information disclosure in the context of information flow control: By studying its relation to LoI, we show that query determinacy and the stronger notion of equivalent query rewriting [19] provide sound abstraction, while query containment does not.
Our work builds on recent work by Hunt and Sands [9], which provides a semantic model for disjunctive dependencies, under the notion of the Quantale of Information. We study quantale structures in the context of databases, providing support for disjunctive policies in database-backed programs. While these policies are rooted in the area of access control, cf. ethical wall policies [27], the work of Hunt and Sands [9] is the first to provide an extensional characterization as information-flow policies. Drawing on our new notion of Determinacy Quantale, we develop a security condition to capture the security of database-backed programs in presence of disjunctive database policies.
Enforcement mechanisms. The problem of enforcing disjunctive policies for programs and/or databases is completely unexplored. We study how a standard type-based program analysis [15], equipped the notion of path sensitivity, can be adapted to statically capture disjunctive program dependencies.
At the core of our analysis is a new abstraction of database queries which enables flexible enforcement of disjunctive policies by means of SMT solvers, as witnessed by our use cases. An immediate benefit of our Determinacy Quantale is that we can prove soundness of the enforcement with respect to a solid semantic baseline for disjunctive dependencies.
There exists a wide array of works enforcing conjunctive policies for database-backed programs. Guarnieri et al. [14] propose dynamic monitoring to enforce database policies. Their abstractions are limited to boolean queries and rely on the Disclosure Lattice of Bender et al. [8, 4], which may cause soundness issues when assuming query containment as the underlying lattice order.
Language-integrated queries are supported by a range of works such as SIF [10] and JsLinq [12], SeLinks [11], UrFlow [28], DAISY [14], Jacqueline [29], and LWeb [13] for row- and column-level conjunctive policies. These works apply PL-based enforcement techniques such as type systems, dependent types, refinement types, and symbolic execution to database-backed programs [30, 31, 13, 14], but lack support for expressing and enforcing disjunctive policies.
Li and Zhang [32] explore path-sensitive program analysis to improve precision of information flow analysis, yet they do not consider disjunctive policies. QAPLA [33] is a database access control middleware supporting complex security policies, such as linking and aggregation policies, with focus only on access control.
VIII Conclusions
We presented a case for the significance of disjunctive dependency analysis to the security of database-backed programs. After reviewing recent theoretical developments in representing disjunctive information, we introduced two structures, the Determinacy Lattice and the Determinacy Quantale, as database-oriented counterparts to theoretical structures representing simple and disjunctive knowledge respectively.
Using these structures, we formulated a security condition which expresses that a database-backed program satisfies a given disjunctive policy. In order to enforce this security condition, we developed a type-based static analysis to compute a bound on the disjunctive dependencies of database-backed programs in a model language. By a series of approximations, this bound itself can be tractably compared to the representation of a static policy.
These steps constitute an enforcement mechanism for disjunctive policies, which we proved sound with respect to our security condition. To showcase this enforcement mechanism, we implemented it in our prototype tool, DiVerT. In order to validate this prototype and the overall framework, we verified the tool on a set of functional tests covering a variety of language features and disjunctive information patterns, as well as several use cases representing real-world scenarios in which we want to enforce disjunctive policies.
IX Acknowledgements
We are grateful to David Sands and Roberto Guanciale for fruitful discussions, and would also like to thank the anonymous reviewers for their insightful comments and feedback.
This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation, the Swedish Research Council (VR), and the Swedish Foundation for Strategic Research (SSF).
References
- [1] E. Bertino and R. Sandhu, “Database security-concepts, approaches, and challenges,” IEEE Transactions on Dependable and Secure Computing, vol. 2, no. 1, pp. 2–19, 2005.
- [2] A. Sabelfeld and A. C. Myers, “Language-based information-flow security,” IEEE Journal on Selected Areas in Communications, vol. 21, no. 1, pp. 5–19, 2003.
- [3] M. Guarnieri, S. Marinovic, and D. Basin, “Strong and provably secure database access control,” in IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2016, pp. 163–178.
- [4] G. Bender, L. Kot, and J. Gehrke, “Explainable security for relational databases,” in ACM SIGMOD International Conference on Management of data. ACM, 2014, pp. 1411–1422.
- [5] L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-based Systems, vol. 10, no. 05, pp. 557–570, 2002.
- [6] C. Dwork, “Differential privacy,” in Automata, Languages and Programming: 33rd International Colloquium. Springer, 2006, pp. 1–12.
- [7] J. Landauer and T. Redmond, “A lattice of information,” in Proceedings Computer Security Foundations Workshop. IEEE, 1993, pp. 65–70.
- [8] G. M. Bender, L. Kot, J. Gehrke, and C. Koch, “Fine-grained disclosure control for app ecosystems,” in ACM SIGMOD International Conference on Management of Data. ACM, 2013, pp. 869–880.
- [9] S. Hunt and D. Sands, “A quantale of information,” in IEEE 34th Computer Security Foundations Symposium (CSF). IEEE, 2021, pp. 1–15.
- [10] S. Chong, K. Vikram, A. C. Myers et al., “Sif: Enforcing confidentiality and integrity in web applications,” in 16th USENIX Security Symposium (USENIX Security). USENIX Association, 2007, pp. 1–16.
- [11] B. J. Corcoran, N. Swamy, and M. Hicks, “Cross-tier, label-based security enforcement for web applications,” in ACM SIGMOD International Conference on Management of data. ACM, 2009, pp. 269–282.
- [12] M. Balliu, B. Liebe, D. Schoepe, and A. Sabelfeld, “Jslinq: Building secure applications across tiers,” in 6th ACM Conference on Data and Application Security and Privacy. ACM, 2016, pp. 307–318.
- [13] J. Parker, N. Vazou, and M. Hicks, “Lweb: Information flow security for multi-tier web applications,” Proceedings of the ACM on Programming Languages, vol. 3, no. POPL, pp. 1–30, 2019.
- [14] M. Guarnieri, M. Balliu, D. Schoepe, D. Basin, and A. Sabelfeld, “Information-flow control for database-backed applications,” in IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2019, pp. 79–94.
- [15] B. v. Delft, S. Hunt, and D. Sands, “Very static enforcement of dynamic policies,” in International Conference on Principles of Security and Trust. Springer, 2015, pp. 32–52.
- [16] I. Kaplansky, Set Theory and Metric Spaces. AMS Chelsea Publishing, 2001.
- [17] B. A. Davey and H. A. Priestley, Introduction to lattices and order. Cambridge University Press, 2002.
- [18] S. Abiteboul, R. Hull, and V. Vianu, Foundations of databases. Addison-Wesley, 1995.
- [19] A. Nash, L. Segoufin, and V. Vianu, “Views and queries: Determinacy and rewriting,” ACM Transactions on Database Systems (TODS), vol. 35, no. 3, pp. 1–41, 2010.
- [20] Q. Wang and K. Yi, “Conjunctive queries with comparisons,” in International Conference on Management of Data. ACM, 2022, pp. 108–121.
- [21] A. Askarov, S. Hunt, A. Sabelfeld, and D. Sands, “Termination-insensitive noninterference leaks more than just a bit,” in 13th European Symposium on Research in Computer Security. Springer, 2008, pp. 333–348.
- [22] A. M. Ahmadian, M. Soloviev, and M. Balliu, “Divert,” 2023, software release. [Online]. Available: https://github.com/KTH-LangSec/DiVerT
- [23] L. De Moura and N. Bjørner, “Z3: An efficient smt solver,” in International conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS). Springer, 2008, pp. 337–340.
- [24] W. Murphy and W. Hereman, “Determination of a position in three dimensions using trilateration and approximate distances,” Department of Mathematical and Computer Sciences, Colorado School of Mines, Golden, Colorado, MCS-95, vol. 7, p. 19, 1995.
- [25] S. Hunt and D. Sands, “On flow-sensitive security types,” in 33rd SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL. ACM, 2006, pp. 79–90.
- [26] J. A. Goguen and J. Meseguer, “Security policies and security models,” in IEEE Symposium on Security and Privacy (S&P). IEEE, 1982, pp. 11–20.
- [27] D. F. Brewer and M. J. Nash, “The chinese wall security policy.” in IEEE Symposium on Security and Privacy (S&P). IEEE, 1989, pp. 206–214.
- [28] A. Chlipala, “Static checking of dynamically-varying security policies in database-backed applications,” in 9th USENIX Symposium on Operating Systems Design and Implementation. USENIX Association, 2010, pp. 105–118.
- [29] J. Yang, T. Hance, T. H. Austin, A. Solar-Lezama, C. Flanagan, and S. Chong, “Precise, dynamic information flow for database-backed applications,” ACM SIGPLAN Notices, vol. 51, no. 6, pp. 631–647, 2016.
- [30] N. Swamy, B. J. Corcoran, and M. Hicks, “Fable: A language for enforcing user-defined security policies,” in IEEE Symposium on Security and Privacy (S&P). IEEE, 2008, pp. 369–383.
- [31] L. Lourenço and L. Caires, “Dependent information flow types,” in 42nd SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL. ACM, 2015, pp. 317–328.
- [32] P. Li and D. Zhang, “Towards a flow- and path-sensitive information flow analysis,” in 30th IEEE Computer Security Foundations Symposium (CSF). IEEE, 2017, pp. 53–67.
- [33] A. Mehta, E. Elnikety, K. Harvey, D. Garg, and P. Druschel, “Qapla: Policy compliance for database-backed systems,” in 26th USENIX Security Symposium (USENIX Security). USENIX Association, 2017, pp. 1463–1479.
Appendix A Interpretations of Query Determinacy
We prove the following technical lemma to show that the two intuitive interpretations of the definition of query determinacy are equivalent.
Lemma 5.
If is recursively enumerable and and are computable, then the following are equivalent:
-
(i)
For all , if , then .
-
(ii)
There exists a computable such that for all , .
Proof.
(ii)(i): Suppose , and is as in (ii). Then , and .
(i)(ii): Let be the partial function that enumerates and for a given returns the first it finds such that . This is computable, per the algorithmic description provided. This does not necessarily satisfy , but we do have by definition (since the enumeration of will either encounter or another such that eventually). Hence by (i). So defining by , we find that as required. ∎
Instantiating Lemma 5 with as the set of possible databases, as the function that computes the results of the queries in on , and as the same for , we find that determining indeed means that the (results of) queries in are always sufficient to determine (compute) the result of the queries in .
Appendix B Relation Between DL and LoI
We first prove some auxiliary lemmas, and then proceed to prove Lemma 1.
Lemma 6.
For sets of queries , the ordering on the DL implies on the LoI defined on :
Proof.
The definition of the ordering relation of the LoI (Section II) and would give us:
(1) |
By the definition of equivalence relations for query sets (), for all we have:
(2) |
(1) and (2) would give us:
(3) |
On the other hand, by the definition of the Determinacy Lattice 2, we have . From the definition of determinacy ordering, means . By the definition of query determinacy (Def. 1) we know that if:
(4) |
It is evident from (3) and (4) that holds. ∎
Relying on Def. 6 to establish the set of equivalence relations derived from a set of sets of queries, we propose following lemma:
Lemma 7.
For any set of sets of queries , the join of on the DL implies the join of on the LoI defined on :
Proof.
Assume there is a set of queries such that .
By the definition of the Determinacy Lattice III-B, we have which would give us . By the definitions of and query determinacy(Def. 1), it is straightforward to see and . Replacing with , by the definition of query determinacy (Def. 1) we have :
(1) |
and :
(2) |
(1) and (2) would give us:
(3) |
Assume is an equivalence relation . By the definition of the join of the LoI (Section II):
and by the definition of equivalence relations for query sets, for all we have:
(4) |
(3) and (4) would allow us to conclude , hence . ∎
See 1
Proof.
To prove this homomorphism, we need to show that the Determinacy Lattice’s ordering and join, as well as the top and bottom elements imply their LoI counterparts. Lemmas 6 and 7 provide the proofs of ordering and join. The proof for top and bottom elements:
-
•
-
•
follows trivially from the definition of and ∼. ∎
Appendix C Determinacy Quantale Axioms
We follow the approach of [9] to prove that our definition of the Determinacy Quantale is indeed a quantale. We begin by defining what is a quantale.
Definition 11.
A quantale is a structure such that:
-
1.
is a complete join-semilattice
-
2.
is monoid, that is is associative and
-
3.
distributes over .
A quantale is called commutative when its operator is commutative [9].
Next, we prove some lemmas that are later used in the proof of Theorem 2.
Lemma 8.
Both and are closure operators.
Proof.
A closure operator is a function from the power set of domain to itself that satisfies the following properties for all sets :
-
•
is extensive:
-
•
is increasing:
-
•
is idempotent:
It is straightforward to show that both and satisfy these conditions. ∎
Definition 12.
For a closure operator defined on the domain , and a function , say that weakly commutes with if for all .
Lemma 9.
Let be a closure operator and let . Suppose that weakly commutes with and that weakly commutes with in each argument. Then:
-
1.
-
2.
Proof.
Routine, following the properties of closure operator. ∎
Lemma 10.
Let , the union operator weakly commutes with :
Proof.
It suffices to show iff , which follows easily from the definitions of and . ∎
Lemma 11.
The join operator of DL weakly commutes with in each argument
Proof.
Let , and let . If is tiled by then is tiled by . This follows easily from the definition of the equivalence relation induced by a query (i.e., ∼), , Lemma 7 and the fact that . ∎
Lemma 12.
Given two sets of sets of queries it holds that:
Proof.
By Lemma 11 we know that the join operator of DL weakly commutes with in each argument. We apply this lemma to the definition of operator:
∎
Lemma 13.
Given two sets of sets of queries it holds that:
Proof.
Follows directly from the definition of in the DL and the commutativity of union operator .
∎
Now, we show that DQ in Def. 5 is a quantale.
Theorem 2.
The Determinacy Quantale is a commutative quantale.
Proof.
We have to show that our definition of Determinacy Quantale respects the quantale axioms of Def. 11.
-
1.
Showing is a complete join-semilattice is straightforward following Lemma 8 and the fact that is a closure operator.
-
2.
We should show that is associative and is a unit:
- a.
-
b.
To show that is a unit for we need to show that . Using as the unit, and applying the definition of will give us:
which following the associativity of gives us .
- 3.
-
4.
Commutativity of is inherited directly from Lemma 13 and the commutativity of in DL.
∎
Appendix D Relation Between DQ and QoI
We first provide some auxiliary lemmas, and then proceed to prove Lemma 2.
Lemma 14.
Given sets of sets of queries , on the DQ implies on the QoI defined on :
Proof.
Trivial from the Def. 6. ∎
Lemma 15.
on the DQ implies on the QoI defined on .
Proof.
Trivial from the Def. 6. ∎
Lemma 16.
Given sets of sets of queries , on the DQ implies on the QoI defined on .
See 2
Proof.
To prove this homomorphism, we need to show that the Determinacy Quantale’s ordering, join and tensor, as well as the top and bottom elements imply their QoI counterparts. Lemmas 14, 15, and 16 provide the proofs of ordering, join and tensor, respectively. The proof of the top element:
-
•
follows from Def. 6 and Lemma 1, and the proof of the bottom element:
-
•
is trivial. ∎
Appendix E Correctness of Dependency Analysis
To show that the diagram in Fig. 11 commutes, we aim to show commutativity for each cell in it. In this section, we establish this for the bottommost cell of it. To that end, we need to establish that the QoI point that corresponds to the query list extracted from a program by the dependency analysis is an upper bound on the knowledge relation induced by .
The basic outline of the argument rests on identifying a particular single equivalence relation , which satisfies . Intuitively, this relation captures how much information the program could leak at most if it output the full result of every query that its output depends on. As long as the analysis is sound, this is an instantiation of the disjunction represented by QL, with each disjunct selected precisely for those starting configurations where the program’s output turns out to depend on the queries enumerated in that disjunct.
For a fixed program and user , we assume the existence of a function from databases to sets of queries, which returns the set of those queries performed when executing on database whose result taints some output to the user . We formally define the function by relying on a taint analysis.
Taint analysis. The semantics of the taint analysis enriches the normal operational semantics of the language in the sense that it has transitions whenever the operational one does, and acts the same on those components of a configuration that exist in the operational one; so runs in it can be put in one-to-one correspondence to operational ones.
The rules of the taint analysis presented in Fig. 12 are fairly straightforward. We use map** to map each variable to a set of dependencies of variables and queries.
The rules for if rely on auxiliary command to restore the dependency set of to its previous state () upon exiting the if branch. We sequentially composite this command with the body of if to ensure its execution after leaving the if branch’s body. The rules for while use in a similar manner.
The rule TA-Output uses to extract all the variables of expression , and relies on the union of the s of those variables to calculate , which is the set of dependencies the execution up to this output, depended on.
We extend the definition of trace to a sequence of observations of the form , and use the notation to denote the sequence of all s in that can observe. We use this notation to define function as follows:
Definition 13.
Given a database state and user , such that , is defined as
A proof of Lemma 3 can then proceed by a straightforward induction on the semantics.
In Def. 13 we formally define the function . This function satisfies a closure property that informally states that if on two given databases the output depended on different sets of queries, then the choice of the set of dependencies itself must have been due to the outcome of a query which is among the dependencies in both databases and evaluates to a different result.
Lemma 17.
For all , if , then there exists a particular query such that .
We say that database states and are equivalent with respect to a dependency set (written as ) iff for all where .
Lemma 18.
For all states and and users , if , , and , then .
We then define as the equivalence relation
that is, we partition each respective subset of databases that shares one set of queries into equivalence classes according to the knowledge relation induced by .
Lemma 19.
.
Proof.
: Will in fact show that , where . For that, it suffices to show that every equivalence class is also an equivalence class of one of the . Let be arbitrary. Then claim that , which suffices since by Lemma 3, is one of the . To establish this, just need to show that for all such that , so that as well. But this follows from Lemma 17: if some has , then for all , but then we must not have .
: Straightforward application of Lemma 18. ∎
Appendix F Query Analysis
F-A Symbolic Tuple Ordering
To show that the symbolic tuples ordering of Def. 9 induces a determinacy order and prove Lemma 20 we first need to define the evaluation of a symbolic tuple in a database state.
Symbolic tuple evaluation. The evaluation of a symbolic tuple in the database state written as is a -projection on the set of ’s tuples defined on the join of tables in that satisfy the constraint . Formally:
Definition 14.
Given database state and symbolic tuple , is defined as:
where is a tuple with its columns limited to those in , and means that tuple satisfies formula .
We proceed to prove Lemma 20.
Lemma 20.
Given two sets of queries and , if then .
Proof.
Assume and . By Def. 9 we want to show that if for all symbolic tuples , there is a set of well-formed symbolic tuples such that , are disjoint, , , and , then .
We assume an intermediate symbolic tuple and define it as . models the symbolic tuples created from the join of . Additionally, are disjoint, which by the definition of symbolic tuples means that and the dependencies of are also disjoint, effectively making the symbolic tuple of the Cartesian product of tuples .
We want to show that the symbolic tuples in can determine :
(1) |
For a specific database state , would give us all the tuples defined on satisfying and projected on the columns in .
Assume there is a pair of databases such that holds but . By the assumption we know that for all , if tuple is in it is also in , and vice versa.
For to hold, we have to consider two cases:
-
1.
There is a tuple such that cannot be constructed from the tuples in set
-
-
All of the symbolic tuples are well-formed and are disjoint, which makes the symbolic tuple of the Cartesian product of . This means that tuple is defined on the product of tables , satisfies , and projected on . Which means that each tuple is constructed from the merge of tuples where for . Thus, this case is not possible.
-
-
-
2.
There is a tuple such that cannot be constructed from the tuple set
-
-
Similar to the first case.
-
-
Next, we need to show that determines :
(2) |
By we know and , and and .
Intuitively, for a given database , has has more columns and tuples than . Symbolic tuple throws away some columns by limiting the resulting tuples to tables in which is a subset of and projecting on which is a subset of . It also eliminate some rows by applying to the result set, which is stronger than .
We need to show that applying these limitations maintains query determinacy. We consider these cases separately:
-
•
Columns: Projecting away some columns from the evaluation of is going to maintain query determinacy. We denote by , projecting tuple to only columns specified in , additionally we use the notation to indicate the columns of . We use the same notation for tuples and write to denote the set of columns of tuple . For a tuple such that and , by projecting away some columns from we end up with a new tuple such that . Since is in both and , and by the definition of ordering , we can conclude will also be in both and , this follows easily from Def. 14.
-
•
Rows: Removing some rows from the last step is going to maintain query determinacy. By the definition of ordering we know that and that are well-formed, which means that only applies to the columns that were retrieved by the intermediate tuple (projected to ). Since is a stronger condition than , for a tuple such that and , if satisfies then would also be in both and . Otherwise, if is not in one of then, it is not going to be in the other one either.
-
•
Tables: Similar to the first case, for a tuple such that and , by projecting away the columns of some of the tables from we end up with a new tuple . Since is in both and , and by Def. 9 , we can conclude will also be in both and .
(1) and (2) would give us:
which allows us to conclude determines .
Repeating this process for all of the symbolic tuples in would give us which means . ∎
F-B Symbolic Tuple and DQ Ordering
We present the proof of Lemma 4.
See 4
Proof.
Assume and . We have .
By the definition of and Lemma 20, we know that for each in there is at least one in such that .
We apply to and which would give us . By applying to every element of , using the basic properties of we will have for all .
Since the tiling closure of each is individually less than , their union would still be less that which gives us:
(1) |
We apply the tiling closure to both sides of (1) and rely on Lemma 10 to remove the nested uses of , which would give us:
which by the definition of in the DQ would mean ∎