-
IDT: Dual-Task Adversarial Attacks for Privacy Protection
Authors:
Pedro Faustini,
Shakila Mahjabin Tonni,
Annabelle McIver,
Qiongkai Xu,
Mark Dras
Abstract:
Natural language processing (NLP) models may leak private information in different ways, including membership inference, reconstruction or attribute inference attacks. Sensitive information may not be explicit in the text, but hidden in underlying writing characteristics. Methods to protect privacy can involve using representations inside models that are demonstrated not to detect sensitive attrib…
▽ More
Natural language processing (NLP) models may leak private information in different ways, including membership inference, reconstruction or attribute inference attacks. Sensitive information may not be explicit in the text, but hidden in underlying writing characteristics. Methods to protect privacy can involve using representations inside models that are demonstrated not to detect sensitive attributes or -- for instance, in cases where users might not trust a model, the sort of scenario of interest here -- changing the raw text before models can have access to it. The goal is to rewrite text to prevent someone from inferring a sensitive attribute (e.g. the gender of the author, or their location by the writing style) whilst kee** the text useful for its original intention (e.g. the sentiment of a product review). The few works tackling this have focused on generative techniques. However, these often create extensively different texts from the original ones or face problems such as mode collapse. This paper explores a novel adaptation of adversarial attack techniques to manipulate a text to deceive a classifier w.r.t one task (privacy) whilst kee** the predictions of another classifier trained for another task (utility) unchanged. We propose IDT, a method that analyses predictions made by auxiliary and interpretable models to identify which tokens are important to change for the privacy task, and which ones should be kept for the utility task. We evaluate different datasets for NLP suitable for different tasks. Automatic and human evaluations show that IDT retains the utility of text, while also outperforming existing methods when deceiving a classifier w.r.t privacy task.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
The Privacy-Utility Trade-off in the Topics API
Authors:
Mário S. Alvim,
Natasha Fernandes,
Annabelle McIver,
Gabriel H. Nunes
Abstract:
The ongoing deprecation of third-party cookies by web browser vendors has sparked the proposal of alternative methods to support more privacy-preserving personalized advertising on web browsers and applications. The Topics API is being proposed by Google to provide third-parties with "coarse-grained advertising topics that the page visitor might currently be interested in". In this paper, we analy…
▽ More
The ongoing deprecation of third-party cookies by web browser vendors has sparked the proposal of alternative methods to support more privacy-preserving personalized advertising on web browsers and applications. The Topics API is being proposed by Google to provide third-parties with "coarse-grained advertising topics that the page visitor might currently be interested in". In this paper, we analyze the re-identification risks for individual Internet users and the utility provided to advertising companies by the Topics API, i.e. learning the most popular topics and distinguishing between real and random topics. We provide theoretical results dependent only on the API parameters that can be readily applied to evaluate the privacy and utility implications of future API updates, including novel general upper-bounds that account for adversaries with access to unknown, arbitrary side information, the value of the differential privacy parameter $ε$, and experimental results on real-world data that validate our theoretical model.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Bayes' capacity as a measure for reconstruction attacks in federated learning
Authors:
Sayan Biswas,
Mark Dras,
Pedro Faustini,
Natasha Fernandes,
Annabelle McIver,
Catuscia Palamidessi,
Parastoo Sadeghi
Abstract:
Within the machine learning community, reconstruction attacks are a principal attack of concern and have been identified even in federated learning, which was designed with privacy preservation in mind. In federated learning, it has been shown that an adversary with knowledge of the machine learning architecture is able to infer the exact value of a training element given an observation of the wei…
▽ More
Within the machine learning community, reconstruction attacks are a principal attack of concern and have been identified even in federated learning, which was designed with privacy preservation in mind. In federated learning, it has been shown that an adversary with knowledge of the machine learning architecture is able to infer the exact value of a training element given an observation of the weight updates performed during stochastic gradient descent. In response to these threats, the privacy community recommends the use of differential privacy in the stochastic gradient descent algorithm, termed DP-SGD. However, DP has not yet been formally established as an effective countermeasure against reconstruction attacks. In this paper, we formalise the reconstruction threat model using the information-theoretic framework of quantitative information flow. We show that the Bayes' capacity, related to the Sibson mutual information of order infinity, represents a tight upper bound on the leakage of the DP-SGD algorithm to an adversary interested in performing a reconstruction attack. We provide empirical results demonstrating the effectiveness of this measure for comparing mechanisms against reconstruction threats.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Source-level reasoning for quantitative information flow
Authors:
Chris Chen,
Annabelle McIver,
Carroll Morgan
Abstract:
We present a novel formal system for proving quantitative-leakage properties of programs. Based on a theory of Quantitative Information Flow (QIF) that models information leakage as a noisy communication channel, it uses "gain-functions" for the description and measurement of expected leaks.
We use a small imperative programming language, augmented with leakage features, and with it express adve…
▽ More
We present a novel formal system for proving quantitative-leakage properties of programs. Based on a theory of Quantitative Information Flow (QIF) that models information leakage as a noisy communication channel, it uses "gain-functions" for the description and measurement of expected leaks.
We use a small imperative programming language, augmented with leakage features, and with it express adversaries' activities in the style of, but more generally than, the Hoare triples or expectation transformers that traditionally express deterministic or probabilistic correctness but without information flow.
The programs are annotated with "gain-expressions" that capture simple adversarial settings such as "Guess the secret in one try." but also much more general ones; and our formal syntax and logic -based framework enables us to transform such gain-expressions that apply after a program has finished to ones that equivalently apply before the program has begun.
In that way we enable a formal proof-based reasoning system for QIF at the source level. We apply it to the %programming language we have chosen, and demonstrate its effectiveness in a number of small but sometimes intricate situations.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
A Quantitative Information Flow Analysis of the Topics API
Authors:
Mário S. Alvim,
Natasha Fernandes,
Annabelle McIver,
Gabriel H. Nunes
Abstract:
Third-party cookies have been a privacy concern since cookies were first developed in the mid 1990s, but more strict cookie policies were only introduced by Internet browser vendors in the early 2010s. More recently, due to regulatory changes, browser vendors have started to completely block third-party cookies, with both Firefox and Safari already compliant.
The Topics API is being proposed by…
▽ More
Third-party cookies have been a privacy concern since cookies were first developed in the mid 1990s, but more strict cookie policies were only introduced by Internet browser vendors in the early 2010s. More recently, due to regulatory changes, browser vendors have started to completely block third-party cookies, with both Firefox and Safari already compliant.
The Topics API is being proposed by Google as an additional and less intrusive source of information for interest-based advertising (IBA), following the upcoming deprecation of third-party cookies. Initial results published by Google estimate the probability of a correct re-identification of a random individual would be below 3% while still supporting IBA.
In this paper, we analyze the re-identification risk for individual Internet users introduced by the Topics API from the perspective of Quantitative Information Flow (QIF), an information- and decision-theoretic framework. Our model allows a theoretical analysis of both privacy and utility aspects of the API and their trade-off, and we show that the Topics API does have better privacy than third-party cookies. We leave the utility analyses for future work.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
A novel analysis of utility in privacy pipelines, using Kronecker products and quantitative information flow
Authors:
Mário S. Alvim,
Natasha Fernandes,
Annabelle McIver,
Carroll Morgan,
Gabriel H. Nunes
Abstract:
We combine Kronecker products, and quantitative information flow, to give a novel formal analysis for the fine-grained verification of utility in complex privacy pipelines. The combination explains a surprising anomaly in the behaviour of utility of privacy-preserving pipelines -- that sometimes a reduction in privacy results also in a decrease in utility. We use the standard measure of utility fo…
▽ More
We combine Kronecker products, and quantitative information flow, to give a novel formal analysis for the fine-grained verification of utility in complex privacy pipelines. The combination explains a surprising anomaly in the behaviour of utility of privacy-preserving pipelines -- that sometimes a reduction in privacy results also in a decrease in utility. We use the standard measure of utility for Bayesian analysis, introduced by Ghosh at al., to produce tractable and rigorous proofs of the fine-grained statistical behaviour leading to the anomaly. More generally, we offer the prospect of formal-analysis tools for utility that complement extant formal analyses of privacy. We demonstrate our results on a number of common privacy-preserving designs.
△ Less
Submitted 7 November, 2023; v1 submitted 21 August, 2023;
originally announced August 2023.
-
Directional Privacy for Deep Learning
Authors:
Pedro Faustini,
Natasha Fernandes,
Shakila Tonni,
Annabelle McIver,
Mark Dras
Abstract:
Differentially Private Stochastic Gradient Descent (DP-SGD) is a key method for applying privacy in the training of deep learning models. It applies isotropic Gaussian noise to gradients during training, which can perturb these gradients in any direction, damaging utility. Metric DP, however, can provide alternative mechanisms based on arbitrary metrics that might be more suitable for preserving u…
▽ More
Differentially Private Stochastic Gradient Descent (DP-SGD) is a key method for applying privacy in the training of deep learning models. It applies isotropic Gaussian noise to gradients during training, which can perturb these gradients in any direction, damaging utility. Metric DP, however, can provide alternative mechanisms based on arbitrary metrics that might be more suitable for preserving utility. In this paper, we apply \textit{directional privacy}, via a mechanism based on the von Mises-Fisher (VMF) distribution, to perturb gradients in terms of \textit{angular distance} so that gradient direction is broadly preserved. We show that this provides both $ε$-DP and $εd$-privacy for deep learning training, rather than the $(ε, δ)$-privacy of the Gaussian mechanism. Experiments on key datasets then indicate that the VMF mechanism can outperform the Gaussian in the utility-privacy trade-off. In particular, our experiments provide a direct empirical comparison of privacy between the two approaches in terms of their ability to defend against reconstruction and membership inference.
△ Less
Submitted 26 November, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Explaining epsilon in local differential privacy through the lens of quantitative information flow
Authors:
Natasha Fernandes,
Annabelle McIver,
Parastoo Sadeghi
Abstract:
The study of leakage measures for privacy has been a subject of intensive research and is an important aspect of understanding how privacy leaks occur in computer systems. Differential privacy has been a focal point in the privacy community for some years and yet its leakage characteristics are not completely understood. In this paper we bring together two areas of research -- information theory a…
▽ More
The study of leakage measures for privacy has been a subject of intensive research and is an important aspect of understanding how privacy leaks occur in computer systems. Differential privacy has been a focal point in the privacy community for some years and yet its leakage characteristics are not completely understood. In this paper we bring together two areas of research -- information theory and the g-leakage framework of quantitative information flow (QIF) -- to give an operational interpretation for the epsilon parameter of local differential privacy. We find that epsilon emerges as a capacity measure in both frameworks; via (log)-lift, a popular measure in information theory; and via max-case g-leakage, which we introduce to describe the leakage of any system to Bayesian adversaries modelled using ``worst-case'' assumptions under the QIF framework. Our characterisation resolves an important question of interpretability of epsilon and consolidates a number of disparate results covering the literature of both information theory and quantitative information flow.
△ Less
Submitted 18 May, 2023; v1 submitted 23 October, 2022;
originally announced October 2022.
-
Universal Optimality and Robust Utility Bounds for Metric Differential Privacy
Authors:
Natasha Fernandes,
Annabelle McIver,
Catuscia Palamidessi,
Ming Ding
Abstract:
We study the privacy-utility trade-off in the context of metric differential privacy. Ghosh et al. introduced the idea of universal optimality to characterise the best mechanism for a certain query that simultaneously satisfies (a fixed) $ε$-differential privacy constraint whilst at the same time providing better utility compared to any other $ε$-differentially private mechanism for the same query…
▽ More
We study the privacy-utility trade-off in the context of metric differential privacy. Ghosh et al. introduced the idea of universal optimality to characterise the best mechanism for a certain query that simultaneously satisfies (a fixed) $ε$-differential privacy constraint whilst at the same time providing better utility compared to any other $ε$-differentially private mechanism for the same query. They showed that the Geometric mechanism is "universally optimal" for the class of counting queries. On the other hand, Brenner and Nissim showed that outside the space of counting queries, and for the Bayes risk loss function, no such universally optimal mechanisms exist. In this paper we use metric differential privacy and quantitative information flow as the fundamental principle for studying universal optimality. Metric differential privacy is a generalisation of both standard (i.e., central) differential privacy and local differential privacy, and it is increasingly being used in various application domains, for instance in location privacy and in privacy preserving machine learning. Using this framework we are able to clarify Nissim and Brenner's negative results, showing (a) that in fact all privacy types contain optimal mechanisms relative to certain kinds of non-trivial loss functions, and (b) extending and generalising their negative results beyond Bayes risk specifically to a wide class of non-trivial loss functions. We also propose weaker universal benchmarks of utility called "privacy type capacities". We show that such capacities always exist and can be computed using a convex optimisation algorithm.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
Flexible and scalable privacy assessment for very large datasets, with an application to official governmental microdata
Authors:
Mário S. Alvim,
Natasha Fernandes,
Annabelle McIver,
Carroll Morgan,
Gabriel H. Nunes
Abstract:
We present a systematic refactoring of the conventional treatment of privacy analyses, basing it on mathematical concepts from the framework of Quantitative Information Flow (QIF). The approach we suggest brings three principal advantages: it is flexible, allowing for precise quantification and comparison of privacy risks for attacks both known and novel; it can be computationally tractable for ve…
▽ More
We present a systematic refactoring of the conventional treatment of privacy analyses, basing it on mathematical concepts from the framework of Quantitative Information Flow (QIF). The approach we suggest brings three principal advantages: it is flexible, allowing for precise quantification and comparison of privacy risks for attacks both known and novel; it can be computationally tractable for very large, longitudinal datasets; and its results are explainable both to politicians and to the general public. We apply our approach to a very large case study: the Educational Censuses of Brazil, curated by the governmental agency INEP, which comprise over 90 attributes of approximately 50 million individuals released longitudinally every year since 2007. These datasets have only very recently (2018-2021) attracted legislation to regulate their privacy -- while at the same time continuing to maintain the openness that had been sought in Brazilian society. INEP's reaction to that legislation was the genesis of our project with them. In our conclusions here we share the scientific, technical, and communication lessons we learned in the process.
△ Less
Submitted 25 July, 2022; v1 submitted 28 April, 2022;
originally announced April 2022.
-
The Laplace Mechanism has optimal utility for differential privacy over continuous queries
Authors:
Natasha Fernandes,
Annabelle McIver,
Carroll Morgan
Abstract:
Differential Privacy protects individuals' data when statistical queries are published from aggregated databases: applying "obfuscating" mechanisms to the query results makes the released information less specific but, unavoidably, also decreases its utility. Yet it has been shown that for discrete data (e.g. counting queries), a mandated degree of privacy and a reasonable interpretation of loss o…
▽ More
Differential Privacy protects individuals' data when statistical queries are published from aggregated databases: applying "obfuscating" mechanisms to the query results makes the released information less specific but, unavoidably, also decreases its utility. Yet it has been shown that for discrete data (e.g. counting queries), a mandated degree of privacy and a reasonable interpretation of loss of utility, the Geometric obfuscating mechanism is optimal: it loses as little utility as possible. For continuous query results however (e.g. real numbers) the optimality result does not hold. Our contribution here is to show that optimality is regained by using the Laplace mechanism for the obfuscation. The technical apparatus involved includes the earlier discrete result by Ghosh et al., recent work on abstract channels and their geometric representation as hyper-distributions, and the dual interpretations of distance between distributions provided by the Kantorovich-Rubinstein Theorem.
△ Less
Submitted 26 July, 2021; v1 submitted 15 May, 2021;
originally announced May 2021.
-
Correctness by construction for probabilistic programs
Authors:
Annabelle McIver,
Carroll Morgan
Abstract:
The "correct by construction" paradigm is an important component of modern Formal Methods, and here we use the probabilistic Guarded-Command Language $\mathit{pGCL}$ to illustrate its application to $\mathit{probabilistic}$ programming. $\mathit{pGCL}$ extends Dijkstra's guarded-command language $\mathit{GCL}$ with probabilistic choice, and is equipped with a correctness-preserving refinement rela…
▽ More
The "correct by construction" paradigm is an important component of modern Formal Methods, and here we use the probabilistic Guarded-Command Language $\mathit{pGCL}$ to illustrate its application to $\mathit{probabilistic}$ programming. $\mathit{pGCL}$ extends Dijkstra's guarded-command language $\mathit{GCL}$ with probabilistic choice, and is equipped with a correctness-preserving refinement relation $(\sqsubseteq)$ that enables compact, abstract specifications of probabilistic properties to be transformed gradually to concrete, executable code by applying mathematical insights in a systematic and layered way. Characteristically for "correctness by construction", as far as possible the reasoning in each refinement-step layer does not depend on earlier layers, and does not affect later ones. We demonstrate the technique by deriving a fair-coin implementation of any given discrete probability distribution. In the special case of simulating a fair die, our correct-by-construction algorithm turns out to be "within spitting distance" of Knuth and Yao's optimal solution.
△ Less
Submitted 30 July, 2020;
originally announced July 2020.
-
Reasoning with failures
Authors:
Hamid Jahanian,
Annabelle McIver
Abstract:
Safety Instrumented Systems (SIS) protect major hazard facilities, e.g. power plants, against catastrophic accidents. An SIS consists of hardware components and a controller software -- the ``program''. Current safety analyses of SIS' include the construction of a fault tree, summarising potential faults of the components and how they can arise within an SIS. The exercise of identifying faults typ…
▽ More
Safety Instrumented Systems (SIS) protect major hazard facilities, e.g. power plants, against catastrophic accidents. An SIS consists of hardware components and a controller software -- the ``program''. Current safety analyses of SIS' include the construction of a fault tree, summarising potential faults of the components and how they can arise within an SIS. The exercise of identifying faults typically relies on the experience of the safety engineer. Unfortunately the program part is often too complicated to be analysed in such a ``by hand" manner and so the impact it has on the resulting safety analysis is not accurately captured. In this paper we demonstrate how a formal model for faults and failure modes can be used to analyse the impact of an SIS program. We outline the underlying concepts of \emph{Failure Mode Reasoning} and its application in safety analysis, and we illustrate the ideas on a practical example.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
Failure Mode Reasoning in Model Based Safety Analysis
Authors:
Hamid Jahanian,
David Parker,
Marc Zeller,
Annabelle McIver,
Yiannis Papadopoulos
Abstract:
Failure Mode Reasoning (FMR) is a novel approach for analyzing failure in a Safety Instrumented System (SIS). The method uses an automatic analysis of an SIS program to calculate potential failures in parts of the SIS. In this paper we use a case study from the power industry to demonstrate how FMR can be utilized in conjunction with other model-based safety analysis methods, such as HiP-HOPS and…
▽ More
Failure Mode Reasoning (FMR) is a novel approach for analyzing failure in a Safety Instrumented System (SIS). The method uses an automatic analysis of an SIS program to calculate potential failures in parts of the SIS. In this paper we use a case study from the power industry to demonstrate how FMR can be utilized in conjunction with other model-based safety analysis methods, such as HiP-HOPS and CFT, in order to achieve a comprehensive safety analysis of SIS. In this case study, FMR covers the analysis of SIS inputs while HiP-HOPS/CFT models the faults of logic solver and final elements. The SIS program is analyzed by FMR and the results are exported to HiP-HOPS/CFT via automated interfaces. The final outcome is the collective list of SIS failure modes along with their reliability measures. We present and review the results from both qualitative and quantitative perspectives.
△ Less
Submitted 19 July, 2020; v1 submitted 11 May, 2020;
originally announced May 2020.
-
Generalised Differential Privacy for Text Document Processing
Authors:
Natasha Fernandes,
Mark Dras,
Annabelle McIver
Abstract:
We address the problem of how to "obfuscate" texts by removing stylistic clues which can identify authorship, whilst preserving (as much as possible) the content of the text. In this paper we combine ideas from "generalised differential privacy" and machine learning techniques for text processing to model privacy for text documents. We define a privacy mechanism that operates at the level of text…
▽ More
We address the problem of how to "obfuscate" texts by removing stylistic clues which can identify authorship, whilst preserving (as much as possible) the content of the text. In this paper we combine ideas from "generalised differential privacy" and machine learning techniques for text processing to model privacy for text documents. We define a privacy mechanism that operates at the level of text documents represented as "bags-of-words" - these representations are typical in machine learning and contain sufficient information to carry out many kinds of classification tasks including topic identification and authorship attribution (of the original documents). We show that our mechanism satisfies privacy with respect to a metric for semantic similarity, thereby providing a balance between utility, defined by the semantic content of texts, with the obfuscation of stylistic clues. We demonstrate our implementation on a "fan fiction" dataset, confirming that it is indeed possible to disguise writing style effectively whilst preserving enough information and variation for accurate content classification tasks.
△ Less
Submitted 5 February, 2019; v1 submitted 26 November, 2018;
originally announced November 2018.
-
Author Obfuscation Using Generalised Differential Privacy
Authors:
Natasha Fernandes,
Mark Dras,
Annabelle McIver
Abstract:
The problem of obfuscating the authorship of a text document has received little attention in the literature to date. Current approaches are ad-hoc and rely on assumptions about an adversary's auxiliary knowledge which makes it difficult to reason about the privacy properties of these methods. Differential privacy is a well-known and robust privacy approach, but its reliance on the notion of adjac…
▽ More
The problem of obfuscating the authorship of a text document has received little attention in the literature to date. Current approaches are ad-hoc and rely on assumptions about an adversary's auxiliary knowledge which makes it difficult to reason about the privacy properties of these methods. Differential privacy is a well-known and robust privacy approach, but its reliance on the notion of adjacency between datasets has prevented its application to text document privacy. However, generalised differential privacy permits the application of differential privacy to arbitrary datasets endowed with a metric and has been demonstrated on problems involving the release of individual data points. In this paper we show how to apply generalised differential privacy to author obfuscation by utilising existing tools and methods from the stylometry and natural language processing literature.
△ Less
Submitted 22 May, 2018;
originally announced May 2018.
-
An Algebraic Approach for Reasoning About Information Flow
Authors:
Arthur Américo,
Mário S. Alvim,
Annabelle McIver
Abstract:
This paper concerns the analysis of information leaks in security systems. We address the problem of specifying and analyzing large systems in the (standard) channel model used in quantitative information flow (QIF). We propose several operators which match typical interactions between system components. We explore their algebraic properties with respect to the security-preserving refinement relat…
▽ More
This paper concerns the analysis of information leaks in security systems. We address the problem of specifying and analyzing large systems in the (standard) channel model used in quantitative information flow (QIF). We propose several operators which match typical interactions between system components. We explore their algebraic properties with respect to the security-preserving refinement relation defined by Alvim et al. and McIver et al.
We show how the algebra can be used to simplify large system specifications in order to facilitate the computation of information leakage bounds. We demonstrate our results on the specification and analysis of the Crowds Protocol. Finally, we use the algebra to justify a new algorithm to compute leakage bounds for this protocol.
△ Less
Submitted 8 July, 2018; v1 submitted 24 January, 2018;
originally announced January 2018.
-
A New Proof Rule for Almost-Sure Termination
Authors:
Annabelle McIver,
Carroll Morgan,
Benjamin Lucien Kaminski,
Joost-Pieter Katoen
Abstract:
An important question for a probabilistic program is whether the probability mass of all its diverging runs is zero, that is that it terminates "almost surely". Proving that can be hard, and this paper presents a new method for doing so; it is expressed in a program logic, and so applies directly to source code. The programs may contain both probabilistic- and demonic choice, and the probabilistic…
▽ More
An important question for a probabilistic program is whether the probability mass of all its diverging runs is zero, that is that it terminates "almost surely". Proving that can be hard, and this paper presents a new method for doing so; it is expressed in a program logic, and so applies directly to source code. The programs may contain both probabilistic- and demonic choice, and the probabilistic choices may depend on the current state.
As do other researchers, we use variant functions (a.k.a. "super-martingales") that are real-valued and probabilistically might decrease on each loop iteration; but our key innovation is that the amount as well as the probability of the decrease are parametric.
We prove the soundness of the new rule, indicate where its applicability goes beyond existing rules, and explain its connection to classical results on denumerable (non-demonic) Markov chains.
△ Less
Submitted 25 December, 2017; v1 submitted 9 November, 2017;
originally announced November 2017.
-
Abstract Hidden Markov Models: a monadic account of quantitative information flow
Authors:
Annabelle McIver,
Carroll Morgan,
Tahiry Rabehaja
Abstract:
Hidden Markov Models, HMM's, are mathematical models of Markov processes with state that is hidden, but from which information can leak. They are typically represented as 3-way joint-probability distributions.
We use HMM's as denotations of probabilistic hidden-state sequential programs: for that, we recast them as `abstract' HMM's, computations in the Giry monad $\mathbb{D}$, and we equip them…
▽ More
Hidden Markov Models, HMM's, are mathematical models of Markov processes with state that is hidden, but from which information can leak. They are typically represented as 3-way joint-probability distributions.
We use HMM's as denotations of probabilistic hidden-state sequential programs: for that, we recast them as `abstract' HMM's, computations in the Giry monad $\mathbb{D}$, and we equip them with a partial order of increasing security. However to encode the monadic type with hiding over some state $\mathcal{X}$ we use $\mathbb{D}\mathcal{X}\to \mathbb{D}^2\mathcal{X}$ rather than the conventional $\mathcal{X}{\to}\mathbb{D}\mathcal{X}$ that suffices for Markov models whose state is not hidden. We illustrate the $\mathbb{D}\mathcal{X}\to \mathbb{D}^2\mathcal{X}$ construction with a small Haskell prototype.
We then present uncertainty measures as a generalisation of the extant diversity of probabilistic entropies, with characteristic analytic properties for them, and show how the new entropies interact with the order of increasing security. Furthermore, we give a `backwards' uncertainty-transformer semantics for HMM's that is dual to the `forwards' abstract HMM's - it is an analogue of the duality between forwards, relational semantics and backwards, predicate-transformer semantics for imperative programs with demonic choice.
Finally, we argue that, from this new denotational-semantic viewpoint, one can see that the Dalenius desideratum for statistical databases is actually an issue in compositionality. We propose a means for taking it into account.
△ Less
Submitted 28 March, 2019; v1 submitted 4 August, 2017;
originally announced August 2017.
-
A new rule for almost-certain termination of probabilistic- and demonic programs
Authors:
Annabelle McIver,
Carroll Morgan
Abstract:
Extending our own and others' earlier approaches to reasoning about termination of probabilistic programs, we propose and prove a new rule for termination with probability one, also known as "almost-certain termination". The rule uses both (non-strict) super martingales and guarantees of progress, together, and it seems to cover significant cases that earlier methods do not. In particular, it suff…
▽ More
Extending our own and others' earlier approaches to reasoning about termination of probabilistic programs, we propose and prove a new rule for termination with probability one, also known as "almost-certain termination". The rule uses both (non-strict) super martingales and guarantees of progress, together, and it seems to cover significant cases that earlier methods do not. In particular, it suffices for termination of the unbounded symmetric random walk in both one- and two dimensions: for the first, we give a proof; for the second, we use a theorem of Foster to argue that a proof exists. Non-determinism (i.e. demonic choice) is supported; but we do currently restrict to discrete distributions.
△ Less
Submitted 6 January, 2017; v1 submitted 4 December, 2016;
originally announced December 2016.
-
Compositional security and collateral leakage
Authors:
N. Bordenabe,
A. McIver,
C Morgan,
T. Rabehaja
Abstract:
In quantitative information flow we say that program $Q$ is "at least as secure as" $P$ just when the amount of secret information flowing from $Q$ is never more than flows from $P$, with of course a suitable quantification of "flow". This secure-refinement order $\sqsubseteq$ is compositional just when $P{\sqsubseteq}Q$ implies ${\cal C}(P){\sqsubseteq}{\cal C}(Q)$ for any context ${\cal C}$, aga…
▽ More
In quantitative information flow we say that program $Q$ is "at least as secure as" $P$ just when the amount of secret information flowing from $Q$ is never more than flows from $P$, with of course a suitable quantification of "flow". This secure-refinement order $\sqsubseteq$ is compositional just when $P{\sqsubseteq}Q$ implies ${\cal C}(P){\sqsubseteq}{\cal C}(Q)$ for any context ${\cal C}$, again with a suitable definition of "context".
Remarkable however is that leaks caused by executing $P,Q$ might not be limited to their declared variables: they might impact correlated secrets in variables declared and initialised in some broader context to which $P,Q$ do not refer even implicitly. We call such leaks collateral because their effect is felt in domains of which (the programmers of) $P, Q$ might be wholly unaware: our inspiration is the "Dalenius" phenomenon for statistical databases.
We show that a proper treatment of these collateral leaks is necessary for a compositional program semantics for read/write "open" programs. By adapting a recent Hidden-Markov denotational model for non-interference security, so that it becomes "collateral aware", we give techniques and examples (e.g.\ public-key encryption) to show how collateral leakage can be calculated and then bounded in its severity.
△ Less
Submitted 18 April, 2016;
originally announced April 2016.
-
A Rigorous Analysis of AODV and its Variants
Authors:
Peter Höfner,
Rob van Glabbeek,
Wee Lum Tan,
Marius Portmann,
Annabelle McIver,
Ansgar Fehnker
Abstract:
In this paper we present a rigorous analysis of the Ad hoc On-Demand Distance Vector (AODV) routing protocol using a formal specification in AWN (Algebra for Wireless Networks), a process algebra which has been specifically tailored for the modelling of Mobile Ad Hoc Networks and Wireless Mesh Network protocols. Our formalisation models the exact details of the core functionality of AODV, such as…
▽ More
In this paper we present a rigorous analysis of the Ad hoc On-Demand Distance Vector (AODV) routing protocol using a formal specification in AWN (Algebra for Wireless Networks), a process algebra which has been specifically tailored for the modelling of Mobile Ad Hoc Networks and Wireless Mesh Network protocols. Our formalisation models the exact details of the core functionality of AODV, such as route discovery, route maintenance and error handling. We demonstrate how AWN can be used to reason about critical protocol correctness properties by providing a detailed proof of loop freedom. In contrast to evaluations using simulation or other formal methods such as model checking, our proof is generic and holds for any possible network scenario in terms of network topology, node mobility, traffic pattern, etc. A key contribution of this paper is the demonstration of how the reasoning and proofs can relatively easily be adapted to protocol variants.
△ Less
Submitted 30 December, 2015;
originally announced December 2015.
-
Automated Analysis of AODV using UPPAAL
Authors:
Ansgar Fehnker,
Rob van Glabbeek,
Peter Höfner,
Annabelle McIver,
Marius Portmann,
Wee Lum Tan
Abstract:
This paper describes an automated, formal and rigorous analysis of the Ad hoc On-Demand Distance Vector (AODV) routing protocol, a popular protocol used in wireless mesh networks.
We give a brief overview of a model of AODV implemented in the UPPAAL model checker. It is derived from a process-algebraic model which reflects precisely the intention of AODV and accurately captures the protocol spec…
▽ More
This paper describes an automated, formal and rigorous analysis of the Ad hoc On-Demand Distance Vector (AODV) routing protocol, a popular protocol used in wireless mesh networks.
We give a brief overview of a model of AODV implemented in the UPPAAL model checker. It is derived from a process-algebraic model which reflects precisely the intention of AODV and accurately captures the protocol specification. Furthermore, we describe experiments carried out to explore AODV's behaviour in all network topologies up to 5 nodes. We were able to automatically locate problematic and undesirable behaviours. This is in particular useful to discover protocol limitations and to develop improved variants. This use of model checking as a diagnostic tool complements other formal-methods-based protocol modelling and verification techniques, such as process algebra.
△ Less
Submitted 22 December, 2015;
originally announced December 2015.
-
A Process Algebra for Wireless Mesh Networks
Authors:
Ansgar Fehnker,
Rob van Glabbeek,
Peter Höfner,
Annabelle McIver,
Marius Portmann,
Wee Lum Tan
Abstract:
We propose a process algebra for wireless mesh networks that combines novel treatments of local broadcast, conditional unicast and data structures. In this framework, we model the Ad-hoc On-Demand Distance Vector (AODV) routing protocol and (dis)prove crucial properties such as loop freedom and packet delivery.
We propose a process algebra for wireless mesh networks that combines novel treatments of local broadcast, conditional unicast and data structures. In this framework, we model the Ad-hoc On-Demand Distance Vector (AODV) routing protocol and (dis)prove crucial properties such as loop freedom and packet delivery.
△ Less
Submitted 22 December, 2015;
originally announced December 2015.
-
Modelling and Analysis of AODV in UPPAAL
Authors:
Ansgar Fehnker,
Rob van Glabbeek,
Peter Höfner,
Annabelle McIver,
Marius Portmann,
Wee Lum Tan
Abstract:
This paper describes work in progress towards an automated formal and rigorous analysis of the Ad hoc On-Demand Distance Vector (AODV) routing protocol, a popular protocol used in ad hoc wireless networks. We give a brief overview of a model of AODV implemented in the UPPAAL model checker, and describe experiments carried out to explore AODV's behaviour in two network topologies. We were able to l…
▽ More
This paper describes work in progress towards an automated formal and rigorous analysis of the Ad hoc On-Demand Distance Vector (AODV) routing protocol, a popular protocol used in ad hoc wireless networks. We give a brief overview of a model of AODV implemented in the UPPAAL model checker, and describe experiments carried out to explore AODV's behaviour in two network topologies. We were able to locate automatically and confirm some known problematic and undesirable behaviours. We believe this use of model checking as a diagnostic tool complements other formal methods based protocol modelling and verification techniques, such as process algebras. Model checking is in particular useful for the discovery of protocol limitations and in the development of improved variations.
△ Less
Submitted 22 December, 2015;
originally announced December 2015.
-
Conditioning in Probabilistic Programming
Authors:
Friedrich Gretz,
Nils Jansen,
Benjamin Lucien Kaminski,
Joost-Pieter Katoen,
Annabelle McIver,
Federico Olmedo
Abstract:
We investigate the semantic intricacies of conditioning, a main feature in probabilistic programming. We provide a weakest (liberal) pre-condition (w(l)p) semantics for the elementary probabilistic programming language pGCL extended with conditioning. We prove that quantitative weakest (liberal) pre-conditions coincide with conditional (liberal) expected rewards in Markov chains and show that sema…
▽ More
We investigate the semantic intricacies of conditioning, a main feature in probabilistic programming. We provide a weakest (liberal) pre-condition (w(l)p) semantics for the elementary probabilistic programming language pGCL extended with conditioning. We prove that quantitative weakest (liberal) pre-conditions coincide with conditional (liberal) expected rewards in Markov chains and show that semantically conditioning is a truly conservative extension. We present two program transformations which entirely eliminate conditioning from any program and prove their correctness using the w(l)p-semantics. Finally, we show how the w(l)p-semantics can be used to determine conditional probabilities in a parametric anonymity protocol and show that an inductive w(l)p-semantics for conditioning in non-deterministic probabilistic programs cannot exist.
△ Less
Submitted 1 April, 2015;
originally announced April 2015.
-
Probabilistic Rely-guarantee Calculus
Authors:
Annabelle McIver,
Tahiry Rabehaja,
Georg Struth
Abstract:
Jones' rely-guarantee calculus for shared variable concurrency is extended to include probabilistic behaviours. We use an algebraic approach which combines and adapts probabilistic Kleene algebras with concurrent Kleene algebra. Soundness of the algebra is shown relative to a general probabilistic event structure semantics. The main contribution of this paper is a collection of rely-guarantee rule…
▽ More
Jones' rely-guarantee calculus for shared variable concurrency is extended to include probabilistic behaviours. We use an algebraic approach which combines and adapts probabilistic Kleene algebras with concurrent Kleene algebra. Soundness of the algebra is shown relative to a general probabilistic event structure semantics. The main contribution of this paper is a collection of rely-guarantee rules built on top of that semantics. In particular, we show how to obtain bounds on probabilities by deriving rely-guarantee rules within the true-concurrent denotational semantics. The use of these rules is illustrated by a detailed verification of a simple probabilistic concurrent program: a faulty Eratosthenes sieve.
△ Less
Submitted 2 June, 2015; v1 submitted 1 September, 2014;
originally announced September 2014.
-
A Process Algebra for Wireless Mesh Networks used for Modelling, Verifying and Analysing AODV
Authors:
Ansgar Fehnker,
Rob van Glabbeek,
Peter Höfner,
Annabelle McIver,
Marius Portmann,
Wee Lum Tan
Abstract:
We propose AWN (Algebra for Wireless Networks), a process algebra tailored to the modelling of Mobile Ad hoc Network (MANET) and Wireless Mesh Network (WMN) protocols. It combines novel treatments of local broadcast, conditional unicast and data structures.
In this framework we present a rigorous analysis of the Ad hoc On-Demand Distance Vector (AODV) protocol, a popular routing protocol designe…
▽ More
We propose AWN (Algebra for Wireless Networks), a process algebra tailored to the modelling of Mobile Ad hoc Network (MANET) and Wireless Mesh Network (WMN) protocols. It combines novel treatments of local broadcast, conditional unicast and data structures.
In this framework we present a rigorous analysis of the Ad hoc On-Demand Distance Vector (AODV) protocol, a popular routing protocol designed for MANETs and WMNs, and one of the four protocols currently standardised by the IETF MANET working group.
We give a complete and unambiguous specification of this protocol, thereby formalising the RFC of AODV, the de facto standard specification, given in English prose. In doing so, we had to make non-evident assumptions to resolve ambiguities occurring in that specification. Our formalisation models the exact details of the core functionality of AODV, such as route maintenance and error handling, and only omits timing aspects.
The process algebra allows us to formalise and (dis)prove crucial properties of mesh network routing protocols such as loop freedom and packet delivery. We are the first to provide a detailed proof of loop freedom of AODV. In contrast to evaluations using simulation or model checking, our proof is generic and holds for any possible network scenario in terms of network topology, node mobility, etc. Due to ambiguities and contradictions the RFC specification allows several interpretations; we show for more than 5000 of them whether they are loop free or not, thereby demonstrating how the reasoning and proofs can relatively easily be adapted to protocol variants.
Using our formal and unambiguous specification, we find shortcomings of AODV that affect performance, e.g. the establishment of non-optimal routes, and some routes not being found at all. We formalise improvements in the same process algebra; carrying over the proofs is again easy.
△ Less
Submitted 30 December, 2013;
originally announced December 2013.
-
An Event Structure Model for Probabilistic Concurrent Kleene Algebra
Authors:
Annabelle McIver,
Tahiry Rabehaja,
Georg Struth
Abstract:
We give a new true-concurrent model for probabilistic concurrent Kleene algebra. The model is based on probabilistic event structures, which combines ideas from Katoen's work on probabilistic concurrency and Varacca's probabilistic prime event structures. The event structures are compared with a true-concurrent version of Segala's probabilistic simulation. Finally, the algebraic properties of the…
▽ More
We give a new true-concurrent model for probabilistic concurrent Kleene algebra. The model is based on probabilistic event structures, which combines ideas from Katoen's work on probabilistic concurrency and Varacca's probabilistic prime event structures. The event structures are compared with a true-concurrent version of Segala's probabilistic simulation. Finally, the algebraic properties of the model are summarised to the extent that they can be used to derive techniques such as probabilistic rely/guarantee inference rules.
△ Less
Submitted 8 October, 2013;
originally announced October 2013.
-
Probabilistic Concurrent Kleene Algebra
Authors:
Annabelle McIver,
Tahiry Rabehaja,
Georg Struth
Abstract:
We provide an extension of concurrent Kleene algebras to account for probabilistic properties. The algebra yields a unified framework containing nondeterminism, concurrency and probability and is sound with respect to the set of probabilistic automata modulo probabilistic simulation. We use the resulting algebra to generalise the algebraic formulation of a variant of Jones' rely/guarantee calculus…
▽ More
We provide an extension of concurrent Kleene algebras to account for probabilistic properties. The algebra yields a unified framework containing nondeterminism, concurrency and probability and is sound with respect to the set of probabilistic automata modulo probabilistic simulation. We use the resulting algebra to generalise the algebraic formulation of a variant of Jones' rely/guarantee calculus.
△ Less
Submitted 11 June, 2013;
originally announced June 2013.
-
Weak Concurrent Kleene Algebra with Application to Algebraic Verification
Authors:
Annabelle McIver,
Tahiry Rabehaja,
Georg Struth
Abstract:
We propose a generalisation of concurrent Kleene algebra \cite{Hoa09} that can take account of probabilistic effects in the presence of concurrency. The algebra is proved sound with respect to a model of automata modulo a variant of rooted $η$-simulation equivalence. Applicability is demonstrated by algebraic treatments of two examples: algebraic may testing and Rabin's solution to the choice coor…
▽ More
We propose a generalisation of concurrent Kleene algebra \cite{Hoa09} that can take account of probabilistic effects in the presence of concurrency. The algebra is proved sound with respect to a model of automata modulo a variant of rooted $η$-simulation equivalence. Applicability is demonstrated by algebraic treatments of two examples: algebraic may testing and Rabin's solution to the choice coordination problem.
△ Less
Submitted 30 January, 2013;
originally announced January 2013.
-
Model exploration and analysis for quantitative safety refinement in probabilistic B
Authors:
Ukachukwu Ndukwu,
Annabelle McIver
Abstract:
The role played by counterexamples in standard system analysis is well known; but less common is a notion of counterexample in probabilistic systems refinement. In this paper we extend previous work using counterexamples to inductive invariant properties of probabilistic systems, demonstrating how they can be used to extend the technique of bounded model checking-style analysis for the refinement…
▽ More
The role played by counterexamples in standard system analysis is well known; but less common is a notion of counterexample in probabilistic systems refinement. In this paper we extend previous work using counterexamples to inductive invariant properties of probabilistic systems, demonstrating how they can be used to extend the technique of bounded model checking-style analysis for the refinement of quantitative safety specifications in the probabilistic B language. In particular, we show how the method can be adapted to cope with refinements incorporating probabilistic loops. Finally, we demonstrate the technique on pB models summarising a one-step refinement of a randomised algorithm for finding the minimum cut of undirected graphs, and that for the dependability analysis of a controller design.
△ Less
Submitted 21 June, 2011;
originally announced June 2011.
-
Hidden-Markov Program Algebra with iteration
Authors:
Annabelle McIver,
Larissa Meinicke,
Carroll Morgan
Abstract:
We use Hidden Markov Models to motivate a quantitative compositional semantics for noninterference-based security with iteration, including a refinement- or "implements" relation that compares two programs with respect to their information leakage; and we propose a program algebra for source-level reasoning about such programs, in particular as a means of establishing that an "implementation" prog…
▽ More
We use Hidden Markov Models to motivate a quantitative compositional semantics for noninterference-based security with iteration, including a refinement- or "implements" relation that compares two programs with respect to their information leakage; and we propose a program algebra for source-level reasoning about such programs, in particular as a means of establishing that an "implementation" program leaks no more than its "specification" program. <p>This joins two themes: we extend our earlier work, having iteration but only qualitative, by making it quantitative; and we extend our earlier quantitative work by including iteration. <p>We advocate stepwise refinement and source-level program algebra, both as conceptual reasoning tools and as targets for automated assistance. A selection of algebraic laws is given to support this view in the case of quantitative noninterference; and it is demonstrated on a simple iterated password-guessing attack.
△ Less
Submitted 1 February, 2011;
originally announced February 2011.
-
Compositional closure for Bayes Risk in probabilistic noninterference
Authors:
Annabelle McIver,
Larissa Meinicke,
Carroll Morgan
Abstract:
We give a sequential model for noninterference security including probability (but not demonic choice), thus supporting reasoning about the likelihood that high-security values might be revealed by observations of low-security activity. Our novel methodological contribution is the definition of a refinement order and its use to compare security measures between specifications and (their supposed)…
▽ More
We give a sequential model for noninterference security including probability (but not demonic choice), thus supporting reasoning about the likelihood that high-security values might be revealed by observations of low-security activity. Our novel methodological contribution is the definition of a refinement order and its use to compare security measures between specifications and (their supposed) implementations. This contrasts with the more common practice of evaluating the security of individual programs in isolation.
The appropriateness of our model and order is supported by our showing that our refinement order is the greatest compositional relation --the compositional closure-- with respect to our semantics and an "elementary" order based on Bayes Risk --- a security measure already in widespread use. We also relate refinement to other measures such as Shannon Entropy.
By applying the approach to a non-trivial example, the anonymous-majority Three-Judges protocol, we demonstrate by example that correctness arguments can be simplified by the sort of layered developments --through levels of increasing detail-- that are allowed and encouraged by compositional semantics.
△ Less
Submitted 7 July, 2010;
originally announced July 2010.
-
An expectation transformer approach to predicate abstraction and data independence for probabilistic programs
Authors:
Ukachukwu Ndukwu,
Annabelle McIver
Abstract:
In this paper we revisit the well-known technique of predicate abstraction to characterise performance attributes of system models incorporating probability. We recast the theory using expectation transformers, and identify transformer properties which correspond to abstractions that yield nevertheless exact bound on the performance of infinite state probabilistic systems. In addition, we extend t…
▽ More
In this paper we revisit the well-known technique of predicate abstraction to characterise performance attributes of system models incorporating probability. We recast the theory using expectation transformers, and identify transformer properties which correspond to abstractions that yield nevertheless exact bound on the performance of infinite state probabilistic systems. In addition, we extend the developed technique to the special case of "data independent" programs incorporating probability. Finally, we demonstrate the subtleness of the extended technique by using the PRISM model checking tool to analyse an infinite state protocol, obtaining exact bounds on its performance.
△ Less
Submitted 25 June, 2010;
originally announced June 2010.
-
Proceedings First Workshop on Quantitative Formal Methods: Theory and Applications
Authors:
Suzana Andova,
Annabelle McIver,
Pedro D'Argenio,
Pieter Cuijpers,
Jasen Markovski,
Caroll Morgan,
Manuel Núñez
Abstract:
This volume contains the papers presented at the 1st workshop on Quantitative Formal Methods: Theory and Applications, which was held in Eindhoven on 3 November 2009 as part of the International Symposium on Formal Methods 2009. This volume contains the final versions of all contributions accepted for presentation at the workshop.
This volume contains the papers presented at the 1st workshop on Quantitative Formal Methods: Theory and Applications, which was held in Eindhoven on 3 November 2009 as part of the International Symposium on Formal Methods 2009. This volume contains the final versions of all contributions accepted for presentation at the workshop.
△ Less
Submitted 10 December, 2009;
originally announced December 2009.
-
Results on the quantitative mu-calculus qMu
Authors:
Annabelle McIver,
Carroll Morgan
Abstract:
The mu-calculus is a powerful tool for specifying and verifying transition systems, including those with both demonic and angelic choice; its quantitative generalisation qMu extends that to probabilistic choice.
We show that for a finite-state system the logical interpretation of qMu, via fixed-points in a domain of real-valued functions into [0,1], is equivalent to an operational interpretati…
▽ More
The mu-calculus is a powerful tool for specifying and verifying transition systems, including those with both demonic and angelic choice; its quantitative generalisation qMu extends that to probabilistic choice.
We show that for a finite-state system the logical interpretation of qMu, via fixed-points in a domain of real-valued functions into [0,1], is equivalent to an operational interpretation given as a turn-based gambling game between two players.
The logical interpretation provides direct access to axioms, laws and meta-theorems. The operational, game- based interpretation aids the intuition and continues in the more general context to provide a surprisingly practical specification tool.
A corollary of our proofs is an extension of Everett's singly-nested games result in the finite turn-based case: we prove well-definedness of the minimax value, and existence of fixed memoriless strategies, for all qMu games/formulae, of arbitrary (including alternating) nesting structure.
△ Less
Submitted 14 September, 2003;
originally announced September 2003.