-
Optimal Universal Quantum Encoding for Statistical Inference
Authors:
Farhad Farokhi
Abstract:
Optimal encoding of classical data for statistical inference using quantum computing is investigated. A universal encoder is sought that is optimal for a wide array of statistical inference tasks. Accuracy of any statistical inference is shown to be upper bounded by a term that is proportional to maximal quantum leakage from the classical data, i.e., the input to the inference model, through its q…
▽ More
Optimal encoding of classical data for statistical inference using quantum computing is investigated. A universal encoder is sought that is optimal for a wide array of statistical inference tasks. Accuracy of any statistical inference is shown to be upper bounded by a term that is proportional to maximal quantum leakage from the classical data, i.e., the input to the inference model, through its quantum encoding. This demonstrates that the maximal quantum leakage is a universal measure of the quality of the encoding strategy for statistical inference as it only depends on the quantum encoding of the data and not the inference task itself. The optimal universal encoding strategy, i.e., the encoding strategy that maximizes the maximal quantum leakage, is proved to be attained by pure states. When there are enough qubits, basis encoding is proved to be universally optimal. An iterative method for numerically computing the optimal universal encoding strategy is presented.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Measuring Quantum Information Leakage Under Detection Threat
Authors:
Farhad Farokhi,
Sejeong Kim
Abstract:
Gentle quantum leakage is proposed as a measure of information leakage to arbitrary eavesdroppers that aim to avoid detection. Gentle (also sometimes referred to as weak or non-demolition) measurements are used to encode the desire of the eavesdropper to evade detection. The gentle quantum leakage meets important axioms proposed for measures of information leakage including positivity, independenc…
▽ More
Gentle quantum leakage is proposed as a measure of information leakage to arbitrary eavesdroppers that aim to avoid detection. Gentle (also sometimes referred to as weak or non-demolition) measurements are used to encode the desire of the eavesdropper to evade detection. The gentle quantum leakage meets important axioms proposed for measures of information leakage including positivity, independence, and unitary invariance. Global depolarizing noise, an important family of physical noise in quantum devices, is shown to reduce gentle quantum leakage (and hence can be used as a mechanism to ensure privacy or security). A lower bound for the gentle quantum leakage based on asymmetric approximate cloning is presented. This lower bound relates information leakage to mutual incompatibility of quantum states. A numerical example, based on the encoding in the celebrated BB84 quantum key distribution algorithm, is used to demonstrate the results.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Barycentric and Pairwise Renyi Quantum Leakage
Authors:
Farhad Farokhi
Abstract:
Barycentric and pairwise quantum Renyi leakages are proposed as two measures of information leakage for privacy and security analysis in quantum computing and communication systems. These quantities both require minimal assumptions on the eavesdropper, i.e., they do not make any assumptions on the eavesdropper's attack strategy or the statistical prior on the secret or private classical data encod…
▽ More
Barycentric and pairwise quantum Renyi leakages are proposed as two measures of information leakage for privacy and security analysis in quantum computing and communication systems. These quantities both require minimal assumptions on the eavesdropper, i.e., they do not make any assumptions on the eavesdropper's attack strategy or the statistical prior on the secret or private classical data encoded in the quantum system. They also satisfy important properties of positivity, independence, post-processing inequality, and unitary invariance. The barycentric quantum Renyi leakage can be computed by solving a semi-definite program and the pairwise quantum Renyi leakage possesses an explicit formula. The barycentric and pairwise quantum Renyi leakages form upper bounds on the maximal quantum leakage, the sandwiched quantum $α$-mutual information, the accessible information, and the Holevo's information. Furthermore, differentially-private quantum channels are shown to bound these measures of information leakage. Global and local depolarizing channels, that are common models of noise in quantum computing and communication, restrict private or secure information leakage. Finally, a privacy-utility trade-off formula in quantum machine learning using variational circuits is developed. The privacy guarantees can only be strengthened, i.e., information leakage can only be reduced, if the performance degradation grows larger and vice versa.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Information Leakage from Data Updates in Machine Learning Models
Authors:
Tian Hui,
Farhad Farokhi,
Olga Ohrimenko
Abstract:
In this paper we consider the setting where machine learning models are retrained on updated datasets in order to incorporate the most up-to-date information or reflect distribution shifts. We investigate whether one can infer information about these updates in the training data (e.g., changes to attribute values of records). Here, the adversary has access to snapshots of the machine learning mode…
▽ More
In this paper we consider the setting where machine learning models are retrained on updated datasets in order to incorporate the most up-to-date information or reflect distribution shifts. We investigate whether one can infer information about these updates in the training data (e.g., changes to attribute values of records). Here, the adversary has access to snapshots of the machine learning model before and after the change in the dataset occurs. Contrary to the existing literature, we assume that an attribute of a single or multiple training data points are changed rather than entire data records are removed or added. We propose attacks based on the difference in the prediction confidence of the original model and the updated model. We evaluate our attack methods on two public datasets along with multi-layer perceptron and logistic regression models. We validate that two snapshots of the model can result in higher information leakage in comparison to having access to only the updated model. Moreover, we observe that data records with rare values are more vulnerable to attacks, which points to the disparate vulnerability of privacy attacks in the update setting. When multiple records with the same original attribute value are updated to the same new value (i.e., repeated changes), the attacker is more likely to correctly guess the updated values since repeated changes leave a larger footprint on the trained model. These observations point to vulnerability of machine learning models to attribute inference attacks in the update setting.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Distributionally Time-Varying Online Stochastic Optimization under Polyak-Łojasiewicz Condition with Application in Conditional Value-at-Risk Statistical Learning
Authors:
Yuen-Man Pun,
Farhad Farokhi,
Iman Shames
Abstract:
In this work, we consider a sequence of stochastic optimization problems following a time-varying distribution via the lens of online optimization. Assuming that the loss function satisfies the Polyak-Łojasiewicz condition, we apply online stochastic gradient descent and establish its dynamic regret bound that is composed of cumulative distribution drifts and cumulative gradient biases caused by s…
▽ More
In this work, we consider a sequence of stochastic optimization problems following a time-varying distribution via the lens of online optimization. Assuming that the loss function satisfies the Polyak-Łojasiewicz condition, we apply online stochastic gradient descent and establish its dynamic regret bound that is composed of cumulative distribution drifts and cumulative gradient biases caused by stochasticity. The distribution metric we adopt here is Wasserstein distance, which is well-defined without the absolute continuity assumption or with a time-varying support set. We also establish a regret bound of online stochastic proximal gradient descent when the objective function is regularized. Moreover, we show that the above framework can be applied to the Conditional Value-at-Risk (CVaR) learning problem. Particularly, we improve an existing proof on the discovery of the PL condition of the CVaR problem, resulting in a regret bound of online stochastic gradient descent.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Maximal Information Leakage from Quantum Encoding of Classical Data
Authors:
Farhad Farokhi
Abstract:
A new measure of information leakage for quantum encoding of classical data is defined. An adversary can access a single copy of the state of a quantum system that encodes some classical data and is interested in correctly guessing a general randomized or deterministic function of the data (e.g., a specific feature or attribute of the data in quantum machine learning) that is unknown to the securi…
▽ More
A new measure of information leakage for quantum encoding of classical data is defined. An adversary can access a single copy of the state of a quantum system that encodes some classical data and is interested in correctly guessing a general randomized or deterministic function of the data (e.g., a specific feature or attribute of the data in quantum machine learning) that is unknown to the security analyst. The resulting measure of information leakage, referred to as maximal quantum leakage, is the multiplicative increase of the probability of correctly guessing any function of the classical data upon observing measurements of the quantum state. Maximal quantum leakage is shown to satisfy post-processing inequality (i.e., applying a quantum channel reduces information leakage) and independence property (i.e., leakage is zero if the quantum state is independent of the classical data), which are fundamental properties required for privacy and security analysis. It also bounds accessible information. Effects of global and local depolarizing noise models on the maximal quantum leakage are established.
△ Less
Submitted 1 January, 2024; v1 submitted 24 July, 2023;
originally announced July 2023.
-
Privacy Against Hypothesis-Testing Adversaries for Quantum Computing
Authors:
Farhad Farokhi
Abstract:
A novel definition for data privacy in quantum computing based on quantum hypothesis testing is presented in this paper. The parameters in this privacy notion possess an operational interpretation based on the success/failure of an omnipotent adversary being able to distinguish the private categories to which the data belongs using arbitrary measurements on quantum states. Important properties of…
▽ More
A novel definition for data privacy in quantum computing based on quantum hypothesis testing is presented in this paper. The parameters in this privacy notion possess an operational interpretation based on the success/failure of an omnipotent adversary being able to distinguish the private categories to which the data belongs using arbitrary measurements on quantum states. Important properties of post processing and composition are then proved for the new notion of privacy. The relationship between privacy against hypothesis-testing adversaries, defined in this paper, and quantum differential privacy are then examined. It is shown that these definitions are intertwined in some parameter regimes. This enables us to provide an interpretation for the privacy budget in quantum differential privacy based on its relationship with privacy against hypothesis testing adversaries.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
Learning Safety Filters for Unknown Discrete-Time Linear Systems
Authors:
Farhad Farokhi,
Alex S. Leong,
Mohammad Zamani,
Iman Shames
Abstract:
A learning-based safety filter is developed for discrete-time linear time-invariant systems with unknown models subject to Gaussian noises with unknown covariance. Safety is characterized using polytopic constraints on the states and control inputs. The empirically learned model and process noise covariance with their confidence bounds are used to construct a robust optimization problem for minima…
▽ More
A learning-based safety filter is developed for discrete-time linear time-invariant systems with unknown models subject to Gaussian noises with unknown covariance. Safety is characterized using polytopic constraints on the states and control inputs. The empirically learned model and process noise covariance with their confidence bounds are used to construct a robust optimization problem for minimally modifying nominal control actions to ensure safety with high probability. The optimization problem relies on tightening the original safety constraints. The magnitude of the tightening is larger at the beginning since there is little information to construct reliable models, but shrinks with time as more data becomes available.
△ Less
Submitted 8 May, 2023; v1 submitted 31 October, 2021;
originally announced November 2021.
-
Optimal Stochastic Evasive Maneuvers Using the Schrodinger's Equation
Authors:
Farhad Farokhi,
Magnus Egerstedt
Abstract:
In this paper, preys with stochastic evasion policies are considered. The stochasticity adds unpredictable changes to the prey's path for avoiding predator's attacks. The prey's cost function is composed of two terms balancing the unpredictability factor (by using stochasticity to make the task of forecasting its future positions by the predator difficult) and energy consumption (the least amount…
▽ More
In this paper, preys with stochastic evasion policies are considered. The stochasticity adds unpredictable changes to the prey's path for avoiding predator's attacks. The prey's cost function is composed of two terms balancing the unpredictability factor (by using stochasticity to make the task of forecasting its future positions by the predator difficult) and energy consumption (the least amount of energy required for performing a maneuver). The optimal probability density functions of the actions of the prey for trading-off unpredictability and energy consumption is shown to be characterized by the stationary Schrodinger's equation.
△ Less
Submitted 10 October, 2021;
originally announced October 2021.
-
Zero-Error Feedback Capacity for Bounded Stabilization and Finite-State Additive Noise Channels
Authors:
Amir Saberi,
Farhad Farokhi,
Girish Nair
Abstract:
This article studies the zero-error feedback capacity of {\em causal} discrete channels with memory. First, by extending the classical zero-error feedback capacity concept, a new notion of {\em uniform zero-error feedback capacity} $ C_{0f} $ for such channels is introduced. Using this notion a tight condition for {bounded} stabilization of unstable {noisy} linear systems via causal channels is ob…
▽ More
This article studies the zero-error feedback capacity of {\em causal} discrete channels with memory. First, by extending the classical zero-error feedback capacity concept, a new notion of {\em uniform zero-error feedback capacity} $ C_{0f} $ for such channels is introduced. Using this notion a tight condition for {bounded} stabilization of unstable {noisy} linear systems via causal channels is obtained, assuming no {channel} state information at either end of the channel.
△ Less
Submitted 1 June, 2022; v1 submitted 9 August, 2021;
originally announced August 2021.
-
Measuring Information Leakage in Non-stochastic Brute-Force Guessing
Authors:
Ni Ding,
Farhad Farokhi
Abstract:
This paper proposes an operational measure of non-stochastic information leakage to formalize privacy against a brute-force guessing adversary. The information is measured by non-probabilistic uncertainty of uncertain variables, the non-stochastic counterparts of random variables. For $X$ that is related to released data $Y$, the non-stochastic brute-force leakage is measured by the complexity of…
▽ More
This paper proposes an operational measure of non-stochastic information leakage to formalize privacy against a brute-force guessing adversary. The information is measured by non-probabilistic uncertainty of uncertain variables, the non-stochastic counterparts of random variables. For $X$ that is related to released data $Y$, the non-stochastic brute-force leakage is measured by the complexity of exhaustively checking all the possibilities of the private attribute $U$ of $X$ by an adversary. The complexity refers to the number of trials to successfully guess $U$. Maximizing this leakage over all possible private attributes $U$ gives rise to the maximal (i.e., worst-case) non-stochastic brute-force guessing leakage. This is proved to be fully determined by the minimal non-stochastic uncertainty of $X$ given $Y$, which also determines the worst-case attribute $U$ indicating the highest privacy risk if $Y$ is disclosed. The maximal non-stochastic brute-force guessing leakage is shown to be proportional to the non-stochastic identifiability of $X$ given $Y$ and upper bounds the existing maximin information. The latter quantifies the information leakage when an adversary must perfectly guess $U$ in one-shot via $Y$. Experiments are used to demonstrate the tradeoff between the maximal non-stochastic brute-force guessing leakage and the data utility (measured by the maximum quantization error) and to illustrate the relationship between maximin information and stochastic one-shot maximal leakage.
△ Less
Submitted 2 July, 2021;
originally announced July 2021.
-
Sharing in a Trustless World: Privacy-Preserving Data Analytics with Potentially Cheating Participants
Authors:
Tham Nguyen,
Hassan Jameel Asghar,
Raghav Bhakar,
Dali Kaafar,
Farhad Farokhi
Abstract:
Lack of trust between organisations and privacy concerns about their data are impediments to an otherwise potentially symbiotic joint data analysis. We propose DataRing, a data sharing system that allows mutually mistrusting participants to query each others' datasets in a privacy-preserving manner while ensuring the correctness of input datasets and query answers even in the presence of (cheating…
▽ More
Lack of trust between organisations and privacy concerns about their data are impediments to an otherwise potentially symbiotic joint data analysis. We propose DataRing, a data sharing system that allows mutually mistrusting participants to query each others' datasets in a privacy-preserving manner while ensuring the correctness of input datasets and query answers even in the presence of (cheating) participants deviating from their true datasets. By relying on the assumption that if only a small subset of rows of the true dataset are known, participants cannot submit answers to queries deviating significantly from their true datasets. We employ differential privacy and a suite of cryptographic tools to ensure individual privacy for each participant's dataset and data confidentiality from the system. Our results show that the evaluation of 10 queries on a dataset with 10 attributes and 500,000 records is achieved in 90.63 seconds. DataRing could detect cheating participant that deviates from its true dataset in few queries with high accuracy.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Safe Learning of Uncertain Environments
Authors:
Farhad Farokhi,
Alex Leong,
Iman Shames,
Mohammad Zamani
Abstract:
In many learning based control methodologies, learning the unknown dynamic model precedes the control phase, while the aim is to control the system such that it remains in some safe region of the state space. In this work, our aim is to guarantee safety while learning and control proceed simultaneously. Specifically, we consider the problem of safe learning in nonlinear control-affine systems subj…
▽ More
In many learning based control methodologies, learning the unknown dynamic model precedes the control phase, while the aim is to control the system such that it remains in some safe region of the state space. In this work, our aim is to guarantee safety while learning and control proceed simultaneously. Specifically, we consider the problem of safe learning in nonlinear control-affine systems subject to unknown additive uncertainty. We first model the uncertainty as a Gaussian noise and use state measurements to learn its mean and covariance. We provide rigorous time-varying bounds on the mean and covariance of the uncertainty and employ them to modify the control input via an optimization program with potentially time-varying safety constraints. We show that with an arbitrarily large probability we can guarantee that the state will remain in the safe set, while learning and control are carried out simultaneously, provided that a feasible solution exists for the optimization problem. We provide a secondary formulation of this optimization that is computationally more efficient. This is based on tightening the safety constraints to counter the uncertainty about the learned mean and covariance. The magnitude of the tightening can be decreased as our confidence in the learned mean and covariance increases (i.e., as we gather more measurements about the environment). Extensions of the method are provided for non-Gaussian process noise with unknown mean and covariance as well as Gaussian uncertainties with state-dependent mean and covariance to accommodate more general environments.
△ Less
Submitted 13 May, 2021; v1 submitted 1 March, 2021;
originally announced March 2021.
-
A Linear Reduction Method for Local Differential Privacy and Log-lift
Authors:
Ni Ding,
Yucheng Liu,
Farhad Farokhi
Abstract:
This paper considers the problem of publishing data $X$ while protecting correlated sensitive information $S$. We propose a linear method to generate the sanitized data $Y$ with the same alphabet $\mathcal{Y} = \mathcal{X}$ that attains local differential privacy (LDP) and log-lift at the same time. It is revealed that both LDP and log-lift are inversely proportional to the statistical distance be…
▽ More
This paper considers the problem of publishing data $X$ while protecting correlated sensitive information $S$. We propose a linear method to generate the sanitized data $Y$ with the same alphabet $\mathcal{Y} = \mathcal{X}$ that attains local differential privacy (LDP) and log-lift at the same time. It is revealed that both LDP and log-lift are inversely proportional to the statistical distance between conditional probability $P_{Y|S}(x|s)$ and marginal probability $P_{Y}(x)$: the closer the two probabilities are, the more private $Y$ is. Specifying $P_{Y|S}(x|s)$ that linearly reduces this distance $|P_{Y|S}(x|s) - P_Y(x)| = (1-α)|P_{X|S}(x|s) - P_X(x)|,\forall s,x$ for some $α\in (0,1]$, we study the problem of how to generate $Y$ from the original data $S$ and $X$. The Markov randomization/sanitization scheme $P_{Y|X}(x|x') = P_{Y|S,X}(x|s,x')$ is obtained by solving linear equations. The optimal non-Markov sanitization, the transition probability $P_{Y|S,X}(x|s,x')$ that depends on $S$, can be determined by maximizing the data utility subject to linear equality constraints. We compute the solution for two linear utility function: the expected distance and total variance distance. It is shown that the non-Markov randomization significantly improves data utility and the marginal probability $P_X(x)$ remains the same after the linear sanitization method: $P_Y(x) = P_X(x), \forall x \in \mathcal{X}$.
△ Less
Submitted 26 January, 2021; v1 submitted 24 January, 2021;
originally announced January 2021.
-
Optimal Pre-Processing to Achieve Fairness and Its Relationship with Total Variation Barycenter
Authors:
Farhad Farokhi
Abstract:
We use disparate impact, i.e., the extent that the probability of observing an output depends on protected attributes such as race and gender, to measure fairness. We prove that disparate impact is upper bounded by the total variation distance between the distribution of the inputs given the protected attributes. We then use pre-processing, also known as data repair, to enforce fairness. We show t…
▽ More
We use disparate impact, i.e., the extent that the probability of observing an output depends on protected attributes such as race and gender, to measure fairness. We prove that disparate impact is upper bounded by the total variation distance between the distribution of the inputs given the protected attributes. We then use pre-processing, also known as data repair, to enforce fairness. We show that utility degradation, i.e., the extent that the success of a forecasting model changes by pre-processing the data, is upper bounded by the total variation distance between the distribution of the data before and after pre-processing. Hence, the problem of finding the optimal pre-processing regiment for enforcing fairness can be cast as minimizing total variations distance between the distribution of the data before and after pre-processing subject to a constraint on the total variation distance between the distribution of the inputs given protected attributes. This problem is a linear program that can be efficiently solved. We show that this problem is intimately related to finding the barycenter (i.e., center of mass) of two distributions when distances in the probability space are measured by total variation distance. We also investigate the effect of differential privacy on fairness using the proposed the total variation distances. We demonstrate the results using numerical experimentation with a practice dataset.
△ Less
Submitted 17 January, 2021;
originally announced January 2021.
-
Gradient Sparsification Can Improve Performance of Differentially-Private Convex Machine Learning
Authors:
Farhad Farokhi
Abstract:
We use gradient sparsification to reduce the adverse effect of differential privacy noise on performance of private machine learning models. To this aim, we employ compressed sensing and additive Laplace noise to evaluate differentially-private gradients. Noisy privacy-preserving gradients are used to perform stochastic gradient descent for training machine learning models. Sparsification, achieve…
▽ More
We use gradient sparsification to reduce the adverse effect of differential privacy noise on performance of private machine learning models. To this aim, we employ compressed sensing and additive Laplace noise to evaluate differentially-private gradients. Noisy privacy-preserving gradients are used to perform stochastic gradient descent for training machine learning models. Sparsification, achieved by setting the smallest gradient entries to zero, can reduce the convergence speed of the training algorithm. However, by sparsification and compressed sensing, the dimension of communicated gradient and the magnitude of additive noise can be reduced. The interplay between these effects determines whether gradient sparsification improves the performance of differentially-private machine learning models. We investigate this analytically in the paper. We prove that, for small privacy budgets, compression can improve performance of privacy-preserving machine learning models. However, for large privacy budgets, compression does not necessarily improve the performance. Intuitively, this is because the effect of privacy-preserving noise is minimal in large privacy budget regime and thus improvements from gradient sparsification cannot compensate for its slower convergence.
△ Less
Submitted 1 December, 2020; v1 submitted 30 November, 2020;
originally announced November 2020.
-
When Machine Learning Meets Privacy: A Survey and Outlook
Authors:
Bo Liu,
Ming Ding,
Sina Shaham,
Wenny Rahayu,
Farhad Farokhi,
Zihuai Lin
Abstract:
The newly emerged machine learning (e.g. deep learning) methods have become a strong driving force to revolutionize a wide range of industries, such as smart healthcare, financial technology, and surveillance systems. Meanwhile, privacy has emerged as a big concern in this machine learning-based artificial intelligence era. It is important to note that the problem of privacy preservation in the co…
▽ More
The newly emerged machine learning (e.g. deep learning) methods have become a strong driving force to revolutionize a wide range of industries, such as smart healthcare, financial technology, and surveillance systems. Meanwhile, privacy has emerged as a big concern in this machine learning-based artificial intelligence era. It is important to note that the problem of privacy preservation in the context of machine learning is quite different from that in traditional data privacy protection, as machine learning can act as both friend and foe. Currently, the work on the preservation of privacy and machine learning (ML) is still in an infancy stage, as most existing solutions only focus on privacy problems during the machine learning process. Therefore, a comprehensive study on the privacy preservation problems and machine learning is required. This paper surveys the state of the art in privacy issues and solutions for machine learning. The survey covers three categories of interactions between privacy and machine learning: (i) private machine learning, (ii) machine learning aided privacy protection, and (iii) machine learning-based privacy attack and corresponding protection schemes. The current research progress in each category is reviewed and the key challenges are identified. Finally, based on our in-depth analysis of the area of privacy and machine learning, we point out future research directions in this field.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
Non-Stochastic Private Function Evaluation
Authors:
Farhad Farokhi,
Girish Nair
Abstract:
We consider private function evaluation to provide query responses based on private data of multiple untrusted entities in such a way that each cannot learn something substantially new about the data of others. First, we introduce perfect non-stochastic privacy in a two-party scenario. Perfect privacy amounts to conditional unrelatedness of the query response and the private uncertain variable of…
▽ More
We consider private function evaluation to provide query responses based on private data of multiple untrusted entities in such a way that each cannot learn something substantially new about the data of others. First, we introduce perfect non-stochastic privacy in a two-party scenario. Perfect privacy amounts to conditional unrelatedness of the query response and the private uncertain variable of other individuals conditioned on the uncertain variable of a given entity. We show that perfect privacy can be achieved for queries that are functions of the common uncertain variable, a generalization of the common random variable. We compute the closest approximation of the queries that do not take this form. To provide a trade-off between privacy and utility, we relax the notion of perfect privacy. We define almost perfect privacy and show that this new definition equates to using conditional disassociation instead of conditional unrelatedness in the definition of perfect privacy. Then, we generalize the definitions to multi-party function evaluation (more than two data entities). We prove that uniform quantization of query responses, where the quantization resolution is a function of privacy budget and sensitivity of the query (cf., differential privacy), achieves function evaluation privacy.
△ Less
Submitted 19 October, 2020;
originally announced October 2020.
-
Deconvoluting Kernel Density Estimation and Regression for Locally Differentially Private Data
Authors:
Farhad Farokhi
Abstract:
Local differential privacy has become the gold-standard of privacy literature for gathering or releasing sensitive individual data points in a privacy-preserving manner. However, locally differential data can twist the probability density of the data because of the additive noise used to ensure privacy. In fact, the density of privacy-preserving data (no matter how many samples we gather) is alway…
▽ More
Local differential privacy has become the gold-standard of privacy literature for gathering or releasing sensitive individual data points in a privacy-preserving manner. However, locally differential data can twist the probability density of the data because of the additive noise used to ensure privacy. In fact, the density of privacy-preserving data (no matter how many samples we gather) is always flatter in comparison with the density function of the original data points due to convolution with privacy-preserving noise density function. The effect is especially more pronounced when using slow-decaying privacy-preserving noises, such as the Laplace noise. This can result in under/over-estimation of the heavy-hitters. This is an important challenge facing social scientists due to the use of differential privacy in the 2020 Census in the United States. In this paper, we develop density estimation methods using smoothing kernels. We use the framework of deconvoluting kernel density estimators to remove the effect of privacy-preserving noise. This approach also allows us to adapt the results from non-parameteric regression with errors-in-variables to develop regression models based on locally differentially private data. We demonstrate the performance of the developed methods on financial and demographic datasets.
△ Less
Submitted 8 November, 2020; v1 submitted 27 August, 2020;
originally announced August 2020.
-
Security Versus Privacy
Authors:
Farhad Farokhi,
Peyman Mohajerin Esfahani
Abstract:
Linear queries can be submitted to a server containing private data. The server provides a response to the queries systematically corrupted using an additive noise to preserve the privacy of those whose data is stored on the server. The measure of privacy is inversely proportional to the trace of the Fisher information matrix. It is assumed that an adversary can inject a false bias to the response…
▽ More
Linear queries can be submitted to a server containing private data. The server provides a response to the queries systematically corrupted using an additive noise to preserve the privacy of those whose data is stored on the server. The measure of privacy is inversely proportional to the trace of the Fisher information matrix. It is assumed that an adversary can inject a false bias to the responses. The measure of the security, capturing the ease of detecting the presence of the false data injection, is the sensitivity of the Kullback-Leiber divergence to the additive bias. An optimization problem for balancing privacy and security is proposed and subsequently solved. It is shown that the level of guaranteed privacy times the level of security equals a constant. Therefore, by increasing the level of privacy, the security guarantees can only be weakened and vice versa. Similar results are developed under the differential privacy framework.
△ Less
Submitted 10 August, 2020;
originally announced August 2020.
-
Distributionally-Robust Machine Learning Using Locally Differentially-Private Data
Authors:
Farhad Farokhi
Abstract:
We consider machine learning, particularly regression, using locally-differentially private datasets. The Wasserstein distance is used to define an ambiguity set centered at the empirical distribution of the dataset corrupted by local differential privacy noise. The ambiguity set is shown to contain the probability distribution of unperturbed, clean data. The radius of the ambiguity set is a funct…
▽ More
We consider machine learning, particularly regression, using locally-differentially private datasets. The Wasserstein distance is used to define an ambiguity set centered at the empirical distribution of the dataset corrupted by local differential privacy noise. The ambiguity set is shown to contain the probability distribution of unperturbed, clean data. The radius of the ambiguity set is a function of the privacy budget, spread of the data, and the size of the problem. Hence, machine learning with locally-differentially private datasets can be rewritten as a distributionally-robust optimization. For general distributions, the distributionally-robust optimization problem can relaxed as a regularized machine learning problem with the Lipschitz constant of the machine learning model as a regularizer. For linear and logistic regression, this regularizer is the dual norm of the model parameters. For Gaussian data, the distributionally-robust optimization problem can be solved exactly to find an optimal regularizer. This approach results in an entirely new regularizer for training linear regression models. Training with this novel regularizer can be posed as a semi-definite program. Finally, the performance of the proposed distributionally-robust machine learning training is demonstrated on practical datasets.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
Online Stochastic Convex Optimization: Wasserstein Distance Variation
Authors:
Iman Shames,
Farhad Farokhi
Abstract:
Distributionally-robust optimization is often studied for a fixed set of distributions rather than time-varying distributions that can drift significantly over time (which is, for instance, the case in finance and sociology due to underlying expansion of economy and evolution of demographics). This motivates understanding conditions on probability distributions, using the Wasserstein distance, tha…
▽ More
Distributionally-robust optimization is often studied for a fixed set of distributions rather than time-varying distributions that can drift significantly over time (which is, for instance, the case in finance and sociology due to underlying expansion of economy and evolution of demographics). This motivates understanding conditions on probability distributions, using the Wasserstein distance, that can be used to model time-varying environments. We can then use these conditions in conjunction with online stochastic optimization to adapt the decisions. We considers an online proximal-gradient method to track the minimizers of expectations of smooth convex functions parameterised by a random variable whose probability distributions continuously evolve over time at a rate similar to that of the rate at which the decision maker acts. We revisit the concepts of estimation and tracking error inspired by systems and control literature and provide bounds for them under strong convexity, Lipschitzness of the gradient, and bounds on the probability distribution drift. Further, noting that computing projections for a general feasible sets might not be amenable to online implementation (due to computational constraints), we propose an exact penalty method. Doing so allows us to relax the uniform boundedness of the gradient and establish dynamic regret bounds for tracking and estimation error. We further introduce a constraint-tightening approach and relate the amount of tightening to the probability of satisfying the constraints.
△ Less
Submitted 29 September, 2020; v1 submitted 2 June, 2020;
originally announced June 2020.
-
An Explicit Formula for the Zero-Error Feedback Capacity of a Class of Finite-State Additive Noise Channels
Authors:
Amir Saberi,
Farhad Farokhi,
Girish N. Nair
Abstract:
It is known that for a discrete channel with correlated additive noise, the ordinary capacity with or without feedback both equal $ \log q-\mathcal{H} (Z) $, where $ \mathcal{H}(Z) $ is the entropy rate of the noise process $ Z $ and $ q $ is the alphabet size. In this paper, a class of finite-state additive noise channels is introduced. It is shown that the zero-error feedback capacity of such ch…
▽ More
It is known that for a discrete channel with correlated additive noise, the ordinary capacity with or without feedback both equal $ \log q-\mathcal{H} (Z) $, where $ \mathcal{H}(Z) $ is the entropy rate of the noise process $ Z $ and $ q $ is the alphabet size. In this paper, a class of finite-state additive noise channels is introduced. It is shown that the zero-error feedback capacity of such channels is either zero or $C_{0f} =\log q -h (Z) $, where $ h (Z) $ is the {\em topological entropy} of the noise process. A topological condition is given when the zero-error capacity is zero, with or without feedback. Moreover, the zero-error capacity without feedback is lower-bounded by $ \log q-2 h (Z) $. We explicitly compute the zero-error feedback capacity for several examples, including channels with isolated errors and a Gilbert-Elliot channel.
△ Less
Submitted 29 May, 2020;
originally announced June 2020.
-
Measuring Information Leakage in Non-stochastic Brute-Force Guessing
Authors:
Farhad Farokhi,
Ni Ding
Abstract:
We propose an operational measure of information leakage in a non-stochastic setting to formalize privacy against a brute-force guessing adversary. We use uncertain variables, non-probabilistic counterparts of random variables, to construct a guessing framework in which an adversary is interested in determining private information based on uncertain reports. We consider brute-force trial-and-error…
▽ More
We propose an operational measure of information leakage in a non-stochastic setting to formalize privacy against a brute-force guessing adversary. We use uncertain variables, non-probabilistic counterparts of random variables, to construct a guessing framework in which an adversary is interested in determining private information based on uncertain reports. We consider brute-force trial-and-error guessing in which an adversary can potentially check all the possibilities of the private information that are compatible with the available outputs to find the actual private realization. The ratio of the worst-case number of guesses for the adversary in the presence of the output and in the absence of it captures the reduction in the adversary's guessing complexity and is thus used as a measure of private information leakage. We investigate the relationship between the newly-developed measure of information leakage with the existing non-stochastic maximin information and stochastic maximal leakage that are shown arise in one-shot guessing.
△ Less
Submitted 27 January, 2021; v1 submitted 22 April, 2020;
originally announced April 2020.
-
Bounded State Estimation over Finite-State Channels: Relating Topological Entropy and Zero-Error Capacity
Authors:
Amir Saberi,
Farhad Farokhi,
Girish N. Nair
Abstract:
We investigate state estimation of linear systems over channels having a finite state not known by the transmitter or receiver. We show that similar to memoryless channels, zero-error capacity is the right figure of merit for achieving bounded estimation errors. We then consider finite-state, worst-case versions of the common erasure and additive noise channels models, in which the noise is govern…
▽ More
We investigate state estimation of linear systems over channels having a finite state not known by the transmitter or receiver. We show that similar to memoryless channels, zero-error capacity is the right figure of merit for achieving bounded estimation errors. We then consider finite-state, worst-case versions of the common erasure and additive noise channels models, in which the noise is governed by a finite-state machine without any statistical structure. Upper and lower bounds on their zero-error capacities are derived, revealing a connection with the {\em topological entropy} of the channel dynamics. Separate necessary and sufficient conditions for bounded linear state estimation errors via such channels are obtained. These estimation conditions bring together the topological entropies of the linear system and the discrete channel.
△ Less
Submitted 4 October, 2021; v1 submitted 24 March, 2020;
originally announced March 2020.
-
The Cost of Privacy in Asynchronous Differentially-Private Machine Learning
Authors:
Farhad Farokhi,
Nan Wu,
David Smith,
Mohamed Ali Kaafar
Abstract:
We consider training machine learning models using Training data located on multiple private and geographically-scattered servers with different privacy settings. Due to the distributed nature of the data, communicating with all collaborating private data owners simultaneously may prove challenging or altogether impossible. In this paper, we develop differentially-private asynchronous algorithms f…
▽ More
We consider training machine learning models using Training data located on multiple private and geographically-scattered servers with different privacy settings. Due to the distributed nature of the data, communicating with all collaborating private data owners simultaneously may prove challenging or altogether impossible. In this paper, we develop differentially-private asynchronous algorithms for collaboratively training machine-learning models on multiple private datasets. The asynchronous nature of the algorithms implies that a central learner interacts with the private data owners one-on-one whenever they are available for communication without needing to aggregate query responses to construct gradients of the entire fitness function. Therefore, the algorithm efficiently scales to many data owners. We define the cost of privacy as the difference between the fitness of a privacy-preserving machine-learning model and the fitness of trained machine-learning model in the absence of privacy concerns. We prove that we can forecast the performance of the proposed privacy-preserving asynchronous algorithms. We demonstrate that the cost of privacy has an upper bound that is inversely proportional to the combined size of the training datasets squared and the sum of the privacy budgets squared. We validate the theoretical results with experiments on financial and medical datasets. The experiments illustrate that collaboration among more than 10 data owners with at least 10,000 records with privacy budgets greater than or equal to 1 results in a superior machine-learning model in comparison to a model trained in isolation on only one of the datasets, illustrating the value of collaboration and the cost of the privacy. The number of the collaborating datasets can be lowered if the privacy budget is higher.
△ Less
Submitted 29 June, 2020; v1 submitted 18 March, 2020;
originally announced March 2020.
-
Data and Model Dependencies of Membership Inference Attack
Authors:
Shakila Mahjabin Tonni,
Dinusha Vatsalan,
Farhad Farokhi,
Dali Kaafar,
Zhigang Lu,
Gioacchino Tangari
Abstract:
Machine learning (ML) models have been shown to be vulnerable to Membership Inference Attacks (MIA), which infer the membership of a given data point in the target dataset by observing the prediction output of the ML model. While the key factors for the success of MIA have not yet been fully understood, existing defense mechanisms such as using L2 regularization \cite{10shokri2017membership} and d…
▽ More
Machine learning (ML) models have been shown to be vulnerable to Membership Inference Attacks (MIA), which infer the membership of a given data point in the target dataset by observing the prediction output of the ML model. While the key factors for the success of MIA have not yet been fully understood, existing defense mechanisms such as using L2 regularization \cite{10shokri2017membership} and dropout layers \cite{salem2018ml} take only the model's overfitting property into consideration. In this paper, we provide an empirical analysis of the impact of both the data and ML model properties on the vulnerability of ML techniques to MIA. Our results reveal the relationship between MIA accuracy and properties of the dataset and training model in use. In particular, we show that the size of shadow dataset, the class and feature balance and the entropy of the target dataset, the configurations and fairness of the training model are the most influential factors. Based on those experimental findings, we conclude that along with model overfitting, multiple properties jointly contribute to MIA success instead of any single property. Building on our experimental findings, we propose using those data and model properties as regularizers to protect ML models against MIA. Our results show that the proposed defense mechanisms can reduce the MIA accuracy by up to 25\% without sacrificing the ML model prediction utility.
△ Less
Submitted 25 July, 2020; v1 submitted 17 February, 2020;
originally announced February 2020.
-
Uniformly Bounded State Estimation over Multiple Access Channels
Authors:
Ghassen Zafzouf,
Girish N. Nair,
Farhad Farokhi
Abstract:
This paper addresses the problem of distributed state estimation via multiple access channels (MACs). We consider a scenario where two encoders are simultaneously communicating their measurements through a noisy channel. Firstly, the zero-error capacity region of the general M-input, single-output MAC is characterized using tools from nonstochastic information theory. Next, we show that a tight co…
▽ More
This paper addresses the problem of distributed state estimation via multiple access channels (MACs). We consider a scenario where two encoders are simultaneously communicating their measurements through a noisy channel. Firstly, the zero-error capacity region of the general M-input, single-output MAC is characterized using tools from nonstochastic information theory. Next, we show that a tight condition to be able to achieve uniformly bounded state estimation errors can be given in terms of the channel zero-error capacity region. This criterion relates the channel properties to the plant dynamics. These results pave the way towards understanding information flows in networked control systems with multiple transmitters.
△ Less
Submitted 22 December, 2022; v1 submitted 9 February, 2020;
originally announced February 2020.
-
Regularization Helps with Mitigating Poisoning Attacks: Distributionally-Robust Machine Learning Using the Wasserstein Distance
Authors:
Farhad Farokhi
Abstract:
We use distributionally-robust optimization for machine learning to mitigate the effect of data poisoning attacks. We provide performance guarantees for the trained model on the original data (not including the poison records) by training the model for the worst-case distribution on a neighbourhood around the empirical distribution (extracted from the training dataset corrupted by a poisoning atta…
▽ More
We use distributionally-robust optimization for machine learning to mitigate the effect of data poisoning attacks. We provide performance guarantees for the trained model on the original data (not including the poison records) by training the model for the worst-case distribution on a neighbourhood around the empirical distribution (extracted from the training dataset corrupted by a poisoning attack) defined using the Wasserstein distance. We relax the distributionally-robust machine learning problem by finding an upper bound for the worst-case fitness based on the empirical sampled-averaged fitness and the Lipschitz-constant of the fitness function (on the data for given model parameters) as regularizer. For regression models, we prove that this regularizer is equal to the dual norm of the model parameters. We use the Wine Quality dataset, the Boston Housing Market dataset, and the Adult dataset for demonstrating the results of this paper.
△ Less
Submitted 28 January, 2020;
originally announced January 2020.
-
Modelling and Quantifying Membership Information Leakage in Machine Learning
Authors:
Farhad Farokhi,
Mohamed Ali Kaafar
Abstract:
Machine learning models have been shown to be vulnerable to membership inference attacks, i.e., inferring whether individuals' data have been used for training models. The lack of understanding about factors contributing success of these attacks motivates the need for modelling membership information leakage using information theory and for investigating properties of machine learning models and t…
▽ More
Machine learning models have been shown to be vulnerable to membership inference attacks, i.e., inferring whether individuals' data have been used for training models. The lack of understanding about factors contributing success of these attacks motivates the need for modelling membership information leakage using information theory and for investigating properties of machine learning models and training algorithms that can reduce membership information leakage. We use conditional mutual information leakage to measure the amount of information leakage from the trained machine learning model about the presence of an individual in the training dataset. We devise an upper bound for this measure of information leakage using Kullback--Leibler divergence that is more amenable to numerical computation. We prove a direct relationship between the Kullback--Leibler membership information leakage and the probability of success for a hypothesis-testing adversary examining whether a particular data record belongs to the training dataset of a machine learning model. We show that the mutual information leakage is a decreasing function of the training dataset size and the regularization weight. We also prove that, if the sensitivity of the machine learning model (defined in terms of the derivatives of the fitness with respect to model parameters) is high, more membership information is potentially leaked. This illustrates that complex models, such as deep neural networks, are more susceptible to membership inference attacks in comparison to simpler models with fewer degrees of freedom. We show that the amount of the membership information leakage is reduced by $\mathcal{O}(\log^{1/2}(δ^{-1})ε^{-1})$ when using Gaussian $(ε,δ)$-differentially-private additive noises.
△ Less
Submitted 27 April, 2020; v1 submitted 28 January, 2020;
originally announced January 2020.
-
Privacy-Preserving Public Release of Datasets for Support Vector Machine Classification
Authors:
Farhad Farokhi
Abstract:
We consider the problem of publicly releasing a dataset for support vector machine classification while not infringing on the privacy of data subjects (i.e., individuals whose private information is stored in the dataset). The dataset is systematically obfuscated using an additive noise for privacy protection. Motivated by the Cramer-Rao bound, inverse of the trace of the Fisher information matrix…
▽ More
We consider the problem of publicly releasing a dataset for support vector machine classification while not infringing on the privacy of data subjects (i.e., individuals whose private information is stored in the dataset). The dataset is systematically obfuscated using an additive noise for privacy protection. Motivated by the Cramer-Rao bound, inverse of the trace of the Fisher information matrix is used as a measure of the privacy. Conditions are established for ensuring that the classifier extracted from the original dataset and the obfuscated one are close to each other (capturing the utility). The optimal noise distribution is determined by maximizing a weighted sum of the measures of privacy and utility. The optimal privacy-preserving noise is proved to achieve local differential privacy. The results are generalized to a broader class of optimization-based supervised machine learning algorithms. Applicability of the methodology is demonstrated on multiple datasets.
△ Less
Submitted 28 December, 2019;
originally announced December 2019.
-
Develo** Non-Stochastic Privacy-Preserving Policies Using Agglomerative Clustering
Authors:
Ni Ding,
Farhad Farokhi
Abstract:
We consider a non-stochastic privacy-preserving problem in which an adversary aims to infer sensitive information $S$ from publicly accessible data $X$ without using statistics. We consider the problem of generating and releasing a quantization $\hat{X}$ of $X$ to minimize the privacy leakage of $S$ to $\hat{X}$ while maintaining a certain level of utility (or, inversely, the quantization loss). T…
▽ More
We consider a non-stochastic privacy-preserving problem in which an adversary aims to infer sensitive information $S$ from publicly accessible data $X$ without using statistics. We consider the problem of generating and releasing a quantization $\hat{X}$ of $X$ to minimize the privacy leakage of $S$ to $\hat{X}$ while maintaining a certain level of utility (or, inversely, the quantization loss). The variables $S$ and $S$ are treated as bounded and non-probabilistic, but are otherwise general. We consider two existing non-stochastic privacy measures, namely the maximum uncertainty reduction $L_0(S \rightarrow \hat{X})$ and the refined information $I_*(S; \hat{X})$ (also called the maximin information) of $S$. For each privacy measure, we propose a corresponding agglomerative clustering algorithm that converges to a locally optimal quantization solution $\hat{X}$ by iteratively merging elements in the alphabet of $X$. To instantiate the solution to this problem, we consider two specific utility measures, the worst-case resolution of $X$ by observing $\hat{X}$ and the maximal distortion of the released data $\hat{X}$. We show that the value of the maximin information $I_*(S; \hat{X})$ can be determined by dividing the confusability graph into connected subgraphs. Hence, $I_*(S; \hat{X})$ can be reduced by merging nodes connecting subgraphs. The relation to the probabilistic information-theoretic privacy is also studied by noting that the G{á}cs-K{ö}rner common information is the stochastic version of $I_*$ and indicates the attainability of statistical indistinguishability.
△ Less
Submitted 12 July, 2020; v1 submitted 12 November, 2019;
originally announced November 2019.
-
Noiseless Privacy
Authors:
Farhad Farokhi
Abstract:
In this paper, we define noiseless privacy, as a non-stochastic rival to differential privacy, requiring that the outputs of a mechanism (i.e., function composition of a privacy-preserving map** and a query) can attain only a few values while varying the data of an individual (the logarithm of the number of the distinct values is bounded by the privacy budget). Therefore, the output of the mecha…
▽ More
In this paper, we define noiseless privacy, as a non-stochastic rival to differential privacy, requiring that the outputs of a mechanism (i.e., function composition of a privacy-preserving map** and a query) can attain only a few values while varying the data of an individual (the logarithm of the number of the distinct values is bounded by the privacy budget). Therefore, the output of the mechanism is not fully informative of the data of the individuals in the dataset. We prove several guarantees for noiselessly-private mechanisms. The information content of the output about the data of an individual, even if an adversary knows all the other entries of the private dataset, is bounded by the privacy budget. The zero-error capacity of memory-less channels using noiselessly private mechanisms for transmission is upper bounded by the privacy budget. The performance of a non-stochastic hypothesis-testing adversary is bounded again by the privacy budget. Finally, assuming that an adversary has access to a stochastic prior on the dataset, we prove that the estimation error of the adversary for individual entries of the dataset is lower bounded by a decreasing function of the privacy budget. In this case, we also show that the maximal information leakage is bounded by the privacy budget. In addition to privacy guarantees, we prove that noiselessly-private mechanisms admit composition theorem and post-processing does not weaken their privacy guarantees. We prove that quantization operators can ensure noiseless privacy if the number of quantization levels is appropriately selected based on the sensitivity of the query and the privacy budget. Finally, we illustrate the privacy merits of noiseless privacy using multiple datasets in energy and transport.
△ Less
Submitted 28 October, 2019;
originally announced October 2019.
-
Differential Privacy for Evolving Almost-Periodic Datasets with Continual Linear Queries: Application to Energy Data Privacy
Authors:
Farhad Farokhi
Abstract:
For evolving datasets with continual reports, the composition rule for differential privacy (DP) dictates that the scale of DP noise must grow linearly with the number of the queries, or that the privacy budget must be split equally between all the queries, so that the privacy budget across all the queries remains bounded and consistent with the privacy guarantees. To avoid this drawback of DP, we…
▽ More
For evolving datasets with continual reports, the composition rule for differential privacy (DP) dictates that the scale of DP noise must grow linearly with the number of the queries, or that the privacy budget must be split equally between all the queries, so that the privacy budget across all the queries remains bounded and consistent with the privacy guarantees. To avoid this drawback of DP, we consider datasets containing almost periodic time series, composed of periodic components and noisy variations on top that are independent across periods. Our interest in these datasets is motivated by that, for reporting on private periodic time series, we do not need to divide the privacy budget across the entire, possibly infinite, horizon. Instead, for periodic time series, we generate DP reports for the first period and report the same DP reports periodically. In practice, however, exactly periodic time series do not exist as the data always contains small variations due to random or uncertain events. For instance, the energy consumption of a household may repeat the same daily pattern with slight variations due to minor changes to the habits of the individuals. The underlying periodic pattern is a function of the private information of the households. It might be desired to protect the privacy of households by not leaking information about the recurring patterns while the individual daily variations are almost noise-like with little to no privacy concerns (depending on the situation). Motivated by this, we define DP for almost periodic datasets and develop a Laplace mechanism for responding to linear queries. We provide statistical tools for testing the validity of almost periodicity assumption. We use multiple energy datasets containing smart-meter measurements of households to validate almost periodicity assumption. We generate DP aggregate reports and investigate their utility.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
Taking a Lesson from Quantum Particles for Statistical Data Privacy
Authors:
Farhad Farokhi
Abstract:
Privacy is under threat from artificial intelligence revolution fueled by unprecedented abundance of data. Differential privacy, an established candidate for privacy protection, is susceptible to adversarial attacks, acts conservatively, and leads to miss-implementations because of lacking systematic methods for setting its parameters (known as the privacy budget). An alternative is information-th…
▽ More
Privacy is under threat from artificial intelligence revolution fueled by unprecedented abundance of data. Differential privacy, an established candidate for privacy protection, is susceptible to adversarial attacks, acts conservatively, and leads to miss-implementations because of lacking systematic methods for setting its parameters (known as the privacy budget). An alternative is information-theoretic privacy using entropy with the drawback of requiring prior distribution of the private data. Here, by using the Fisher information, information-theoretic privacy framework is extended to avoid unnecessary assumptions on the private data. The optimal privacy-preserving additive noise, extracted by minimizing the Fisher information, must follow the time-independent Schrodinger's equation. A fundamental trade-off between privacy and utility is also proved, reminiscent of the Heisenberg uncertainty principle.
△ Less
Submitted 14 August, 2019;
originally announced August 2019.
-
Temporally Discounted Differential Privacy for Evolving Datasets on an Infinite Horizon
Authors:
Farhad Farokhi
Abstract:
We define discounted differential privacy, as an alternative to (conventional) differential privacy, to investigate privacy of evolving datasets, containing time series over an unbounded horizon. We use privacy loss as a measure of the amount of information leaked by the reports at a certain fixed time. We observe that privacy losses are weighted equally across time in the definition of differenti…
▽ More
We define discounted differential privacy, as an alternative to (conventional) differential privacy, to investigate privacy of evolving datasets, containing time series over an unbounded horizon. We use privacy loss as a measure of the amount of information leaked by the reports at a certain fixed time. We observe that privacy losses are weighted equally across time in the definition of differential privacy, and therefore the magnitude of privacy-preserving additive noise must grow without bound to ensure differential privacy over an infinite horizon. Motivated by the discounted utility theory within the economics literature, we use exponential and hyperbolic discounting of privacy losses across time to relax the definition of differential privacy under continual observations. This implies that privacy losses in distant past are less important than the current ones to an individual. We use discounted differential privacy to investigate privacy of evolving datasets using additive Laplace noise and show that the magnitude of the additive noise can remain bounded under discounted differential privacy. We illustrate the quality of privacy-preserving mechanisms satisfying discounted differential privacy on smart-meter measurement time-series of real households, made publicly available by Ausgrid (an Australian electricity distribution company).
△ Less
Submitted 27 January, 2020; v1 submitted 12 August, 2019;
originally announced August 2019.
-
A Game-Theoretic Approach to Adversarial Linear Support Vector Classification
Authors:
Farhad Farokhi
Abstract:
In this paper, we employ a game-theoretic model to analyze the interaction between an adversary and a classifier. There are two classes (i.e., positive and negative classes) to which data points can belong. The adversary is interested in maximizing the probability of miss-detection for the positive class (i.e., false negative probability). The adversary however does not want to significantly modif…
▽ More
In this paper, we employ a game-theoretic model to analyze the interaction between an adversary and a classifier. There are two classes (i.e., positive and negative classes) to which data points can belong. The adversary is interested in maximizing the probability of miss-detection for the positive class (i.e., false negative probability). The adversary however does not want to significantly modify the data point so that it still maintains favourable traits of the original class. The classifier, on the other hand, is interested in maximizing the probability of correct detection for the positive class (i.e., true positive probability) subject to a lower-bound on the probability of correct detection for the negative class (i.e., true negative probability). For conditionally Gaussian data points (conditioned on the class) and linear support vector machine classifiers, we rewrite the optimization problems of the adversary and the classifier as convex optimization problems and use best response dynamics to learn an equilibrium of the game. This results in computing a linear support vector machine classifier that is robust against adversarial input manipulations. We illustrate the framework on a synthetic dataset and a public Cardiovascular Disease dataset.
△ Less
Submitted 24 June, 2019;
originally announced June 2019.
-
The Value of Collaboration in Convex Machine Learning with Differential Privacy
Authors:
Nan Wu,
Farhad Farokhi,
David Smith,
Mohamed Ali Kaafar
Abstract:
In this paper, we apply machine learning to distributed private data owned by multiple data owners, entities with access to non-overlap** training datasets. We use noisy, differentially-private gradients to minimize the fitness cost of the machine learning model using stochastic gradient descent. We quantify the quality of the trained model, using the fitness cost, as a function of privacy budge…
▽ More
In this paper, we apply machine learning to distributed private data owned by multiple data owners, entities with access to non-overlap** training datasets. We use noisy, differentially-private gradients to minimize the fitness cost of the machine learning model using stochastic gradient descent. We quantify the quality of the trained model, using the fitness cost, as a function of privacy budget and size of the distributed datasets to capture the trade-off between privacy and utility in machine learning. This way, we can predict the outcome of collaboration among privacy-aware data owners prior to executing potentially computationally-expensive machine learning algorithms. Particularly, we show that the difference between the fitness of the trained machine learning model using differentially-private gradient queries and the fitness of the trained machine model in the absence of any privacy concerns is inversely proportional to the size of the training datasets squared and the privacy budget squared. We successfully validate the performance prediction with the actual performance of the proposed privacy-aware learning algorithms, applied to: financial datasets for determining interest rates of loans using regression; and detecting credit card frauds using support vector machines.
△ Less
Submitted 23 June, 2019;
originally announced June 2019.
-
Non-Stochastic Hypothesis Testing with Application to Privacy Against Hypothesis-Testing Adversary
Authors:
Farhad Farokhi
Abstract:
In this paper, we consider privacy against hypothesis testing adversaries within a non-stochastic framework. We develop a theory of non-stochastic hypothesis testing by borrowing the notion of uncertain variables from non-stochastic information theory. We define tests as binary-valued map**s on uncertain variables and prove a fundamental bound on the best performance of tests in non-stochastic h…
▽ More
In this paper, we consider privacy against hypothesis testing adversaries within a non-stochastic framework. We develop a theory of non-stochastic hypothesis testing by borrowing the notion of uncertain variables from non-stochastic information theory. We define tests as binary-valued map**s on uncertain variables and prove a fundamental bound on the best performance of tests in non-stochastic hypothesis testing. We use this bound to develop a measure of privacy. We then construct reporting policies with prescribed privacy and utility guarantees. The utility of a reporting policy is measured by the distance between the reported and original values. We illustrate the effects of using such privacy-preserving reporting polices on a publicly-available practical dataset of preferences and demographics of young individuals, aged between 15-30, with Slovakian nationality.
△ Less
Submitted 15 April, 2019;
originally announced April 2019.
-
Implementing Homomorphic Encryption Based Secure Feedback Control for Physical Systems
Authors:
Julian Tran,
Farhad Farokhi,
Michael Cantoni,
Iman Shames
Abstract:
This paper is about an encryption based approach to the secure implementation of feedback controllers for physical systems. Specifically, Paillier's homomorphic encryption is used to digitally implement a class of linear dynamic controllers, which includes the commonplace static gain and PID type feedback control laws as special cases. The developed implementation is amenable to Field Programmable…
▽ More
This paper is about an encryption based approach to the secure implementation of feedback controllers for physical systems. Specifically, Paillier's homomorphic encryption is used to digitally implement a class of linear dynamic controllers, which includes the commonplace static gain and PID type feedback control laws as special cases. The developed implementation is amenable to Field Programmable Gate Array (FPGA) realization. Experimental results, including timing analysis and resource usage characteristics for different encryption key lengths, are presented for the realization of an inverted pendulum controller; as this is an unstable plant, the control is necessarily fast.
△ Less
Submitted 27 March, 2019; v1 submitted 19 February, 2019;
originally announced February 2019.
-
State Estimation over Worst-Case Erasure and Symmetric Channels with Memory
Authors:
Amir Saberi,
Farhad Farokhi,
Girish N. Nair
Abstract:
Worst-case models of erasure and symmetric channels are investigated, in which the number of channel errors occurring in each sliding window of a given length is bounded. Upper and lower bounds on their zero-error capacities are derived, with the lower bounds revealing a connection with the topological entropy of the channel dynamics. Necessary and sufficient conditions for linear state estimation…
▽ More
Worst-case models of erasure and symmetric channels are investigated, in which the number of channel errors occurring in each sliding window of a given length is bounded. Upper and lower bounds on their zero-error capacities are derived, with the lower bounds revealing a connection with the topological entropy of the channel dynamics. Necessary and sufficient conditions for linear state estimation with bounded estimation errors via such channels are then obtained, by extending previous results for non-stochastic memoryless channels to those with finite memory. These estimation conditions involve the topological entropies of the linear system and the channel.
△ Less
Submitted 2 February, 2019;
originally announced February 2019.
-
Secure and Private Implementation of Dynamic Controllers Using Semi-Homomorphic Encryption
Authors:
Carlos Murguia,
Farhad Farokhi,
Iman Shames
Abstract:
This paper presents a secure and private implementation of linear time-invariant dynamic controllers using Paillier's encryption, a semi-homomorphic encryption method. To avoid overflow or underflow within the encryption domain, the state of the controller is reset periodically. A control design approach is presented to ensure stability and optimize performance of the closed-loop system with encry…
▽ More
This paper presents a secure and private implementation of linear time-invariant dynamic controllers using Paillier's encryption, a semi-homomorphic encryption method. To avoid overflow or underflow within the encryption domain, the state of the controller is reset periodically. A control design approach is presented to ensure stability and optimize performance of the closed-loop system with encrypted controller.
△ Less
Submitted 20 June, 2019; v1 submitted 10 December, 2018;
originally announced December 2018.
-
Development and Analysis of Deterministic Privacy-Preserving Policies Using Non-Stochastic Information Theory
Authors:
Farhad Farokhi
Abstract:
A deterministic privacy metric using non-stochastic information theory is developed. Particularly, minimax information is used to construct a measure of information leakage, which is inversely proportional to the measure of privacy. Anyone can submit a query to a trusted agent with access to a non-stochastic uncertain private dataset. Optimal deterministic privacy-preserving policies for respondin…
▽ More
A deterministic privacy metric using non-stochastic information theory is developed. Particularly, minimax information is used to construct a measure of information leakage, which is inversely proportional to the measure of privacy. Anyone can submit a query to a trusted agent with access to a non-stochastic uncertain private dataset. Optimal deterministic privacy-preserving policies for responding to the submitted query are computed by maximizing the measure of privacy subject to a constraint on the worst-case quality of the response (i.e., the worst-case difference between the response by the agent and the output of the query computed on the private dataset). The optimal privacy-preserving policy is proved to be a piecewise constant function in the form of a quantization operator applied on the output of the submitted query. The measure of privacy is also used to analyze the performance of $k$-anonymity methodology (a popular deterministic mechanism for privacy-preserving release of datasets using suppression and generalization techniques), proving that it is in fact not privacy-preserving.
△ Less
Submitted 22 January, 2019; v1 submitted 25 October, 2018;
originally announced October 2018.
-
Ensuring Privacy with Constrained Additive Noise by Minimizing Fisher Information
Authors:
Farhad Farokhi,
Henrik Sandberg
Abstract:
The problem of preserving the privacy of individual entries of a database when responding to linear or nonlinear queries with constrained additive noise is considered. For privacy protection, the response to the query is systematically corrupted with an additive random noise whose support is a subset or equal to a pre-defined constraint set. A measure of privacy using the inverse of the trace of t…
▽ More
The problem of preserving the privacy of individual entries of a database when responding to linear or nonlinear queries with constrained additive noise is considered. For privacy protection, the response to the query is systematically corrupted with an additive random noise whose support is a subset or equal to a pre-defined constraint set. A measure of privacy using the inverse of the trace of the Fisher information matrix is developed. The Cramer-Rao bound relates the variance of any estimator of the database entries to the introduced privacy measure. The probability density that minimizes the trace of the Fisher information (as a proxy for maximizing the measure of privacy) is computed. An extension to dynamic problems is also presented. Finally, the results are compared to the differential privacy methodology.
△ Less
Submitted 28 August, 2018;
originally announced August 2018.
-
Private and Secure Coordination of Match-Making for Heavy-Duty Vehicle Platooning
Authors:
Farhad Farokhi,
Iman Shames,
Karl H. Johansson
Abstract:
A secure and private framework for inter-agent communication and coordination is developed. This allows an agent, in our case a fleet owner, to ask questions or submit queries in an encrypted fashion using semi-homomorphic encryption. The submitted query can be about the interest of the other fleet owners for using a road at a specific time of the day, for instance, for the purpose of collaborativ…
▽ More
A secure and private framework for inter-agent communication and coordination is developed. This allows an agent, in our case a fleet owner, to ask questions or submit queries in an encrypted fashion using semi-homomorphic encryption. The submitted query can be about the interest of the other fleet owners for using a road at a specific time of the day, for instance, for the purpose of collaborative vehicle platooning. The other agents can then provide appropriate responses without knowing the content of the questions or the queries. Strong privacy and security guarantees are provided for the agent who is submitting the queries. It is also shown that the amount of the information that this agent can extract from the other agent is bounded. In fact, with submitting one query, a sophisticated agent can at most extract the answer to two queries. This secure communication platform is used subsequently to develop a distributed coordination mechanisms among fleet owners.
△ Less
Submitted 27 February, 2017;
originally announced February 2017.
-
Budget-Constrained Contract Design for Effort-Averse Sensors in Averaging Based Estimation
Authors:
Farhad Farokhi,
Iman Shames,
Michael Cantoni
Abstract:
Consider a group of effort-averse, or lazy, sensors that seek to minimize the effort invested to collect measurements of a variable. Increasing the effort invested by the sensors improves the quality of the measurements provided to the central planner but this incurs increased costs to the sensors. The central planner, which processes the sensor measurements, employs an averaging estimator. It als…
▽ More
Consider a group of effort-averse, or lazy, sensors that seek to minimize the effort invested to collect measurements of a variable. Increasing the effort invested by the sensors improves the quality of the measurements provided to the central planner but this incurs increased costs to the sensors. The central planner, which processes the sensor measurements, employs an averaging estimator. It also determines contracts for rewarding sensors based on the measurements obtained. The problem of designing a contract that yields an estimation-error based quality-of-service level in return for the reward extended to sensors is investigated in this paper. To this end, a game is formulated between the central planner and the sensors. Conditions for the existence and uniqueness of an equilibrium are identified. The equilibrium is constructed explicitly and its properties in response to a reward based contract are studied. It turns out that the central planner, while not being able to directly measure the effort invested by the sensors, can enhance the estimation quality by rewarding each sensor based on the distance of its measurements from the output of the averaging estimator. Ultimately, optimal contracts are designed from the perspective of the budget required for achieving a specified level of estimation error.
△ Less
Submitted 14 February, 2016; v1 submitted 28 September, 2015;
originally announced September 2015.
-
Mutual Information as Privacy-Loss Measure in Strategic Communication
Authors:
Farhad Farokhi,
Girish Nair
Abstract:
A game is introduced to study the effect of privacy in strategic communication between well-informed senders and a receiver. The receiver wants to accurately estimate a random variable. The sender, however, wants to communicate a message that balances a trade-off between providing an accurate measurement and minimizing the amount of leaked private information, which is assumed to be correlated wit…
▽ More
A game is introduced to study the effect of privacy in strategic communication between well-informed senders and a receiver. The receiver wants to accurately estimate a random variable. The sender, however, wants to communicate a message that balances a trade-off between providing an accurate measurement and minimizing the amount of leaked private information, which is assumed to be correlated with the to-be-estimated variable. The mutual information between the transmitted message and the private information is used as a measure of the amount of leaked information. An equilibrium is constructed and its properties are investigated.
△ Less
Submitted 18 September, 2015;
originally announced September 2015.
-
On Reconstructability of Quadratic Utility Functions from the Iterations in Gradient Methods
Authors:
Farhad Farokhi,
Iman Shames,
Michael G. Rabbat,
Mikael Johansson
Abstract:
In this paper, we consider a scenario where an eavesdropper can read the content of messages transmitted over a network. The nodes in the network are running a gradient algorithm to optimize a quadratic utility function where such a utility optimization is a part of a decision making process by an administrator. We are interested in understanding the conditions under which the eavesdropper can rec…
▽ More
In this paper, we consider a scenario where an eavesdropper can read the content of messages transmitted over a network. The nodes in the network are running a gradient algorithm to optimize a quadratic utility function where such a utility optimization is a part of a decision making process by an administrator. We are interested in understanding the conditions under which the eavesdropper can reconstruct the utility function or a scaled version of it and, as a result, gain insight into the decision-making process. We establish that if the parameter of the gradient algorithm, i.e.,~the step size, is chosen appropriately, the task of reconstruction becomes practically impossible for a class of Bayesian filters with uniform priors. We establish what step-size rules should be employed to ensure this.
△ Less
Submitted 17 September, 2015;
originally announced September 2015.
-
Quadratic Gaussian Privacy Games
Authors:
Farhad Farokhi,
Henrik Sandberg,
Iman Shames,
Michael Cantoni
Abstract:
A game-theoretic model for analysing the effects of privacy on strategic communication between agents is devised. In the model, a sender wishes to provide an accurate measurement of the state to a receiver while also protecting its private information (which is correlated with the state) private from a malicious agent that may eavesdrop on its communications with the receiver. A family of nontrivi…
▽ More
A game-theoretic model for analysing the effects of privacy on strategic communication between agents is devised. In the model, a sender wishes to provide an accurate measurement of the state to a receiver while also protecting its private information (which is correlated with the state) private from a malicious agent that may eavesdrop on its communications with the receiver. A family of nontrivial equilibria, in which the communicated messages carry information, is constructed and its properties are studied.
△ Less
Submitted 17 September, 2015;
originally announced September 2015.
-
Promoting Truthful Behaviour in Participatory-Sensing Mechanisms
Authors:
Farhad Farokhi,
Iman Shames,
Michael Cantoni
Abstract:
In this paper, the interplay between a class of nonlinear estimators and strategic sensors is studied in several participatory-sensing scenarios. It is shown that for the class of estimators, if the strategic sensors have access to noiseless measurements of the to-be-estimated-variable, truth-telling is an equilibrium of the game that models the interplay between the sensors and the estimator. Fur…
▽ More
In this paper, the interplay between a class of nonlinear estimators and strategic sensors is studied in several participatory-sensing scenarios. It is shown that for the class of estimators, if the strategic sensors have access to noiseless measurements of the to-be-estimated-variable, truth-telling is an equilibrium of the game that models the interplay between the sensors and the estimator. Furthermore, performance of the proposed estimators is examined in the case that the strategic sensors form coalitions and in the presence of noise.
△ Less
Submitted 10 March, 2015;
originally announced March 2015.