Search | arXiv e-print repository

Optimal Universal Quantum Encoding for Statistical Inference

Abstract: Optimal encoding of classical data for statistical inference using quantum computing is investigated. A universal encoder is sought that is optimal for a wide array of statistical inference tasks. Accuracy of any statistical inference is shown to be upper bounded by a term that is proportional to maximal quantum leakage from the classical data, i.e., the input to the inference model, through its q… ▽ More Optimal encoding of classical data for statistical inference using quantum computing is investigated. A universal encoder is sought that is optimal for a wide array of statistical inference tasks. Accuracy of any statistical inference is shown to be upper bounded by a term that is proportional to maximal quantum leakage from the classical data, i.e., the input to the inference model, through its quantum encoding. This demonstrates that the maximal quantum leakage is a universal measure of the quality of the encoding strategy for statistical inference as it only depends on the quantum encoding of the data and not the inference task itself. The optimal universal encoding strategy, i.e., the encoding strategy that maximizes the maximal quantum leakage, is proved to be attained by pure states. When there are enough qubits, basis encoding is proved to be universally optimal. An iterative method for numerically computing the optimal universal encoding strategy is presented. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.11433 [pdf, ps, other]

Measuring Quantum Information Leakage Under Detection Threat

Authors: Farhad Farokhi, Sejeong Kim

Abstract: Gentle quantum leakage is proposed as a measure of information leakage to arbitrary eavesdroppers that aim to avoid detection. Gentle (also sometimes referred to as weak or non-demolition) measurements are used to encode the desire of the eavesdropper to evade detection. The gentle quantum leakage meets important axioms proposed for measures of information leakage including positivity, independenc… ▽ More Gentle quantum leakage is proposed as a measure of information leakage to arbitrary eavesdroppers that aim to avoid detection. Gentle (also sometimes referred to as weak or non-demolition) measurements are used to encode the desire of the eavesdropper to evade detection. The gentle quantum leakage meets important axioms proposed for measures of information leakage including positivity, independence, and unitary invariance. Global depolarizing noise, an important family of physical noise in quantum devices, is shown to reduce gentle quantum leakage (and hence can be used as a mechanism to ensure privacy or security). A lower bound for the gentle quantum leakage based on asymmetric approximate cloning is presented. This lower bound relates information leakage to mutual incompatibility of quantum states. A numerical example, based on the encoding in the celebrated BB84 quantum key distribution algorithm, is used to demonstrate the results. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2402.06156 [pdf, ps, other]

Barycentric and Pairwise Renyi Quantum Leakage

Authors: Farhad Farokhi

Abstract: Barycentric and pairwise quantum Renyi leakages are proposed as two measures of information leakage for privacy and security analysis in quantum computing and communication systems. These quantities both require minimal assumptions on the eavesdropper, i.e., they do not make any assumptions on the eavesdropper's attack strategy or the statistical prior on the secret or private classical data encod… ▽ More Barycentric and pairwise quantum Renyi leakages are proposed as two measures of information leakage for privacy and security analysis in quantum computing and communication systems. These quantities both require minimal assumptions on the eavesdropper, i.e., they do not make any assumptions on the eavesdropper's attack strategy or the statistical prior on the secret or private classical data encoded in the quantum system. They also satisfy important properties of positivity, independence, post-processing inequality, and unitary invariance. The barycentric quantum Renyi leakage can be computed by solving a semi-definite program and the pairwise quantum Renyi leakage possesses an explicit formula. The barycentric and pairwise quantum Renyi leakages form upper bounds on the maximal quantum leakage, the sandwiched quantum $α$-mutual information, the accessible information, and the Holevo's information. Furthermore, differentially-private quantum channels are shown to bound these measures of information leakage. Global and local depolarizing channels, that are common models of noise in quantum computing and communication, restrict private or secure information leakage. Finally, a privacy-utility trade-off formula in quantum machine learning using variational circuits is developed. The privacy guarantees can only be strengthened, i.e., information leakage can only be reduced, if the performance degradation grows larger and vice versa. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2309.11022 [pdf, other]

doi 10.1145/3605764.3623905

Information Leakage from Data Updates in Machine Learning Models

Authors: Tian Hui, Farhad Farokhi, Olga Ohrimenko

Abstract: In this paper we consider the setting where machine learning models are retrained on updated datasets in order to incorporate the most up-to-date information or reflect distribution shifts. We investigate whether one can infer information about these updates in the training data (e.g., changes to attribute values of records). Here, the adversary has access to snapshots of the machine learning mode… ▽ More In this paper we consider the setting where machine learning models are retrained on updated datasets in order to incorporate the most up-to-date information or reflect distribution shifts. We investigate whether one can infer information about these updates in the training data (e.g., changes to attribute values of records). Here, the adversary has access to snapshots of the machine learning model before and after the change in the dataset occurs. Contrary to the existing literature, we assume that an attribute of a single or multiple training data points are changed rather than entire data records are removed or added. We propose attacks based on the difference in the prediction confidence of the original model and the updated model. We evaluate our attack methods on two public datasets along with multi-layer perceptron and logistic regression models. We validate that two snapshots of the model can result in higher information leakage in comparison to having access to only the updated model. Moreover, we observe that data records with rare values are more vulnerable to attacks, which points to the disparate vulnerability of privacy attacks in the update setting. When multiple records with the same original attribute value are updated to the same new value (i.e., repeated changes), the attacker is more likely to correctly guess the updated values since repeated changes leave a larger footprint on the trained model. These observations point to vulnerability of machine learning models to attribute inference attacks in the update setting. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Journal ref: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security (AISec '23), November 30, 2023, Copenhagen, Denmark

arXiv:2309.09411 [pdf, other]

Distributionally Time-Varying Online Stochastic Optimization under Polyak-Łojasiewicz Condition with Application in Conditional Value-at-Risk Statistical Learning

Authors: Yuen-Man Pun, Farhad Farokhi, Iman Shames

Abstract: In this work, we consider a sequence of stochastic optimization problems following a time-varying distribution via the lens of online optimization. Assuming that the loss function satisfies the Polyak-Łojasiewicz condition, we apply online stochastic gradient descent and establish its dynamic regret bound that is composed of cumulative distribution drifts and cumulative gradient biases caused by s… ▽ More In this work, we consider a sequence of stochastic optimization problems following a time-varying distribution via the lens of online optimization. Assuming that the loss function satisfies the Polyak-Łojasiewicz condition, we apply online stochastic gradient descent and establish its dynamic regret bound that is composed of cumulative distribution drifts and cumulative gradient biases caused by stochasticity. The distribution metric we adopt here is Wasserstein distance, which is well-defined without the absolute continuity assumption or with a time-varying support set. We also establish a regret bound of online stochastic proximal gradient descent when the objective function is regularized. Moreover, we show that the above framework can be applied to the Conditional Value-at-Risk (CVaR) learning problem. Particularly, we improve an existing proof on the discovery of the PL condition of the CVaR problem, resulting in a regret bound of online stochastic gradient descent. △ Less

Submitted 17 September, 2023; originally announced September 2023.

arXiv:2307.12529 [pdf, ps, other]

doi 10.1103/PhysRevA.109.022608

Maximal Information Leakage from Quantum Encoding of Classical Data

Authors: Farhad Farokhi

Abstract: A new measure of information leakage for quantum encoding of classical data is defined. An adversary can access a single copy of the state of a quantum system that encodes some classical data and is interested in correctly guessing a general randomized or deterministic function of the data (e.g., a specific feature or attribute of the data in quantum machine learning) that is unknown to the securi… ▽ More A new measure of information leakage for quantum encoding of classical data is defined. An adversary can access a single copy of the state of a quantum system that encodes some classical data and is interested in correctly guessing a general randomized or deterministic function of the data (e.g., a specific feature or attribute of the data in quantum machine learning) that is unknown to the security analyst. The resulting measure of information leakage, referred to as maximal quantum leakage, is the multiplicative increase of the probability of correctly guessing any function of the classical data upon observing measurements of the quantum state. Maximal quantum leakage is shown to satisfy post-processing inequality (i.e., applying a quantum channel reduces information leakage) and independence property (i.e., leakage is zero if the quantum state is independent of the classical data), which are fundamental properties required for privacy and security analysis. It also bounds accessible information. Effects of global and local depolarizing noise models on the maximal quantum leakage are established. △ Less

Submitted 1 January, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

Journal ref: Physical Review A, 109(2), 022608, 2024

arXiv:2302.12405 [pdf, other]

Privacy Against Hypothesis-Testing Adversaries for Quantum Computing

Authors: Farhad Farokhi

Abstract: A novel definition for data privacy in quantum computing based on quantum hypothesis testing is presented in this paper. The parameters in this privacy notion possess an operational interpretation based on the success/failure of an omnipotent adversary being able to distinguish the private categories to which the data belongs using arbitrary measurements on quantum states. Important properties of… ▽ More A novel definition for data privacy in quantum computing based on quantum hypothesis testing is presented in this paper. The parameters in this privacy notion possess an operational interpretation based on the success/failure of an omnipotent adversary being able to distinguish the private categories to which the data belongs using arbitrary measurements on quantum states. Important properties of post processing and composition are then proved for the new notion of privacy. The relationship between privacy against hypothesis-testing adversaries, defined in this paper, and quantum differential privacy are then examined. It is shown that these definitions are intertwined in some parameter regimes. This enables us to provide an interpretation for the privacy budget in quantum differential privacy based on its relationship with privacy against hypothesis testing adversaries. △ Less

Submitted 23 February, 2023; originally announced February 2023.

arXiv:2111.00631 [pdf, ps, other]

Learning Safety Filters for Unknown Discrete-Time Linear Systems

Authors: Farhad Farokhi, Alex S. Leong, Mohammad Zamani, Iman Shames

Abstract: A learning-based safety filter is developed for discrete-time linear time-invariant systems with unknown models subject to Gaussian noises with unknown covariance. Safety is characterized using polytopic constraints on the states and control inputs. The empirically learned model and process noise covariance with their confidence bounds are used to construct a robust optimization problem for minima… ▽ More A learning-based safety filter is developed for discrete-time linear time-invariant systems with unknown models subject to Gaussian noises with unknown covariance. Safety is characterized using polytopic constraints on the states and control inputs. The empirically learned model and process noise covariance with their confidence bounds are used to construct a robust optimization problem for minimally modifying nominal control actions to ensure safety with high probability. The optimization problem relies on tightening the original safety constraints. The magnitude of the tightening is larger at the beginning since there is little information to construct reliable models, but shrinks with time as more data becomes available. △ Less

Submitted 8 May, 2023; v1 submitted 31 October, 2021; originally announced November 2021.

arXiv:2110.04956 [pdf, other]

Optimal Stochastic Evasive Maneuvers Using the Schrodinger's Equation

Authors: Farhad Farokhi, Magnus Egerstedt

Abstract: In this paper, preys with stochastic evasion policies are considered. The stochasticity adds unpredictable changes to the prey's path for avoiding predator's attacks. The prey's cost function is composed of two terms balancing the unpredictability factor (by using stochasticity to make the task of forecasting its future positions by the predator difficult) and energy consumption (the least amount… ▽ More In this paper, preys with stochastic evasion policies are considered. The stochasticity adds unpredictable changes to the prey's path for avoiding predator's attacks. The prey's cost function is composed of two terms balancing the unpredictability factor (by using stochasticity to make the task of forecasting its future positions by the predator difficult) and energy consumption (the least amount of energy required for performing a maneuver). The optimal probability density functions of the actions of the prey for trading-off unpredictability and energy consumption is shown to be characterized by the stationary Schrodinger's equation. △ Less

Submitted 10 October, 2021; originally announced October 2021.

arXiv:2108.03874 [pdf, other]

Zero-Error Feedback Capacity for Bounded Stabilization and Finite-State Additive Noise Channels

Authors: Amir Saberi, Farhad Farokhi, Girish Nair

Abstract: This article studies the zero-error feedback capacity of {\em causal} discrete channels with memory. First, by extending the classical zero-error feedback capacity concept, a new notion of {\em uniform zero-error feedback capacity} $ C_{0f} $ for such channels is introduced. Using this notion a tight condition for {bounded} stabilization of unstable {noisy} linear systems via causal channels is ob… ▽ More This article studies the zero-error feedback capacity of {\em causal} discrete channels with memory. First, by extending the classical zero-error feedback capacity concept, a new notion of {\em uniform zero-error feedback capacity} $ C_{0f} $ for such channels is introduced. Using this notion a tight condition for {bounded} stabilization of unstable {noisy} linear systems via causal channels is obtained, assuming no {channel} state information at either end of the channel. △ Less

Submitted 1 June, 2022; v1 submitted 9 August, 2021; originally announced August 2021.

Comments: arXiv admin note: text overlap with arXiv:2006.00892

arXiv:2107.01113 [pdf, ps, other]

Measuring Information Leakage in Non-stochastic Brute-Force Guessing

Authors: Ni Ding, Farhad Farokhi

Abstract: This paper proposes an operational measure of non-stochastic information leakage to formalize privacy against a brute-force guessing adversary. The information is measured by non-probabilistic uncertainty of uncertain variables, the non-stochastic counterparts of random variables. For $X$ that is related to released data $Y$, the non-stochastic brute-force leakage is measured by the complexity of… ▽ More This paper proposes an operational measure of non-stochastic information leakage to formalize privacy against a brute-force guessing adversary. The information is measured by non-probabilistic uncertainty of uncertain variables, the non-stochastic counterparts of random variables. For $X$ that is related to released data $Y$, the non-stochastic brute-force leakage is measured by the complexity of exhaustively checking all the possibilities of the private attribute $U$ of $X$ by an adversary. The complexity refers to the number of trials to successfully guess $U$. Maximizing this leakage over all possible private attributes $U$ gives rise to the maximal (i.e., worst-case) non-stochastic brute-force guessing leakage. This is proved to be fully determined by the minimal non-stochastic uncertainty of $X$ given $Y$, which also determines the worst-case attribute $U$ indicating the highest privacy risk if $Y$ is disclosed. The maximal non-stochastic brute-force guessing leakage is shown to be proportional to the non-stochastic identifiability of $X$ given $Y$ and upper bounds the existing maximin information. The latter quantifies the information leakage when an adversary must perfectly guess $U$ in one-shot via $Y$. Experiments are used to demonstrate the tradeoff between the maximal non-stochastic brute-force guessing leakage and the data utility (measured by the maximum quantization error) and to illustrate the relationship between maximin information and stochastic one-shot maximal leakage. △ Less

Submitted 2 July, 2021; originally announced July 2021.

Comments: 11 pages, 4 figures

arXiv:2106.09904 [pdf, other]

Sharing in a Trustless World: Privacy-Preserving Data Analytics with Potentially Cheating Participants

Authors: Tham Nguyen, Hassan Jameel Asghar, Raghav Bhakar, Dali Kaafar, Farhad Farokhi

Abstract: Lack of trust between organisations and privacy concerns about their data are impediments to an otherwise potentially symbiotic joint data analysis. We propose DataRing, a data sharing system that allows mutually mistrusting participants to query each others' datasets in a privacy-preserving manner while ensuring the correctness of input datasets and query answers even in the presence of (cheating… ▽ More Lack of trust between organisations and privacy concerns about their data are impediments to an otherwise potentially symbiotic joint data analysis. We propose DataRing, a data sharing system that allows mutually mistrusting participants to query each others' datasets in a privacy-preserving manner while ensuring the correctness of input datasets and query answers even in the presence of (cheating) participants deviating from their true datasets. By relying on the assumption that if only a small subset of rows of the true dataset are known, participants cannot submit answers to queries deviating significantly from their true datasets. We employ differential privacy and a suite of cryptographic tools to ensure individual privacy for each participant's dataset and data confidentiality from the system. Our results show that the evaluation of 10 queries on a dataset with 10 attributes and 500,000 records is achieved in 90.63 seconds. DataRing could detect cheating participant that deviates from its true dataset in few queries with high accuracy. △ Less

Submitted 18 June, 2021; originally announced June 2021.

arXiv:2103.01413 [pdf, other]

Safe Learning of Uncertain Environments

Authors: Farhad Farokhi, Alex Leong, Iman Shames, Mohammad Zamani

Abstract: In many learning based control methodologies, learning the unknown dynamic model precedes the control phase, while the aim is to control the system such that it remains in some safe region of the state space. In this work, our aim is to guarantee safety while learning and control proceed simultaneously. Specifically, we consider the problem of safe learning in nonlinear control-affine systems subj… ▽ More In many learning based control methodologies, learning the unknown dynamic model precedes the control phase, while the aim is to control the system such that it remains in some safe region of the state space. In this work, our aim is to guarantee safety while learning and control proceed simultaneously. Specifically, we consider the problem of safe learning in nonlinear control-affine systems subject to unknown additive uncertainty. We first model the uncertainty as a Gaussian noise and use state measurements to learn its mean and covariance. We provide rigorous time-varying bounds on the mean and covariance of the uncertainty and employ them to modify the control input via an optimization program with potentially time-varying safety constraints. We show that with an arbitrarily large probability we can guarantee that the state will remain in the safe set, while learning and control are carried out simultaneously, provided that a feasible solution exists for the optimization problem. We provide a secondary formulation of this optimization that is computationally more efficient. This is based on tightening the safety constraints to counter the uncertainty about the learned mean and covariance. The magnitude of the tightening can be decreased as our confidence in the learned mean and covariance increases (i.e., as we gather more measurements about the environment). Extensions of the method are provided for non-Gaussian process noise with unknown mean and covariance as well as Gaussian uncertainties with state-dependent mean and covariance to accommodate more general environments. △ Less

Submitted 13 May, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

arXiv:2101.09689 [pdf, ps, other]

A Linear Reduction Method for Local Differential Privacy and Log-lift

Authors: Ni Ding, Yucheng Liu, Farhad Farokhi

Abstract: This paper considers the problem of publishing data $X$ while protecting correlated sensitive information $S$. We propose a linear method to generate the sanitized data $Y$ with the same alphabet $\mathcal{Y} = \mathcal{X}$ that attains local differential privacy (LDP) and log-lift at the same time. It is revealed that both LDP and log-lift are inversely proportional to the statistical distance be… ▽ More This paper considers the problem of publishing data $X$ while protecting correlated sensitive information $S$. We propose a linear method to generate the sanitized data $Y$ with the same alphabet $\mathcal{Y} = \mathcal{X}$ that attains local differential privacy (LDP) and log-lift at the same time. It is revealed that both LDP and log-lift are inversely proportional to the statistical distance between conditional probability $P_{Y|S}(x|s)$ and marginal probability $P_{Y}(x)$: the closer the two probabilities are, the more private $Y$ is. Specifying $P_{Y|S}(x|s)$ that linearly reduces this distance $|P_{Y|S}(x|s) - P_Y(x)| = (1-α)|P_{X|S}(x|s) - P_X(x)|,\forall s,x$ for some $α\in (0,1]$, we study the problem of how to generate $Y$ from the original data $S$ and $X$. The Markov randomization/sanitization scheme $P_{Y|X}(x|x') = P_{Y|S,X}(x|s,x')$ is obtained by solving linear equations. The optimal non-Markov sanitization, the transition probability $P_{Y|S,X}(x|s,x')$ that depends on $S$, can be determined by maximizing the data utility subject to linear equality constraints. We compute the solution for two linear utility function: the expected distance and total variance distance. It is shown that the non-Markov randomization significantly improves data utility and the marginal probability $P_X(x)$ remains the same after the linear sanitization method: $P_Y(x) = P_X(x), \forall x \in \mathcal{X}$. △ Less

Submitted 26 January, 2021; v1 submitted 24 January, 2021; originally announced January 2021.

arXiv:2101.06811 [pdf, ps, other]

Optimal Pre-Processing to Achieve Fairness and Its Relationship with Total Variation Barycenter

Authors: Farhad Farokhi

Abstract: We use disparate impact, i.e., the extent that the probability of observing an output depends on protected attributes such as race and gender, to measure fairness. We prove that disparate impact is upper bounded by the total variation distance between the distribution of the inputs given the protected attributes. We then use pre-processing, also known as data repair, to enforce fairness. We show t… ▽ More We use disparate impact, i.e., the extent that the probability of observing an output depends on protected attributes such as race and gender, to measure fairness. We prove that disparate impact is upper bounded by the total variation distance between the distribution of the inputs given the protected attributes. We then use pre-processing, also known as data repair, to enforce fairness. We show that utility degradation, i.e., the extent that the success of a forecasting model changes by pre-processing the data, is upper bounded by the total variation distance between the distribution of the data before and after pre-processing. Hence, the problem of finding the optimal pre-processing regiment for enforcing fairness can be cast as minimizing total variations distance between the distribution of the data before and after pre-processing subject to a constraint on the total variation distance between the distribution of the inputs given protected attributes. This problem is a linear program that can be efficiently solved. We show that this problem is intimately related to finding the barycenter (i.e., center of mass) of two distributions when distances in the probability space are measured by total variation distance. We also investigate the effect of differential privacy on fairness using the proposed the total variation distances. We demonstrate the results using numerical experimentation with a practice dataset. △ Less

Submitted 17 January, 2021; originally announced January 2021.

arXiv:2011.14572 [pdf, ps, other]

Gradient Sparsification Can Improve Performance of Differentially-Private Convex Machine Learning

Authors: Farhad Farokhi

Abstract: We use gradient sparsification to reduce the adverse effect of differential privacy noise on performance of private machine learning models. To this aim, we employ compressed sensing and additive Laplace noise to evaluate differentially-private gradients. Noisy privacy-preserving gradients are used to perform stochastic gradient descent for training machine learning models. Sparsification, achieve… ▽ More We use gradient sparsification to reduce the adverse effect of differential privacy noise on performance of private machine learning models. To this aim, we employ compressed sensing and additive Laplace noise to evaluate differentially-private gradients. Noisy privacy-preserving gradients are used to perform stochastic gradient descent for training machine learning models. Sparsification, achieved by setting the smallest gradient entries to zero, can reduce the convergence speed of the training algorithm. However, by sparsification and compressed sensing, the dimension of communicated gradient and the magnitude of additive noise can be reduced. The interplay between these effects determines whether gradient sparsification improves the performance of differentially-private machine learning models. We investigate this analytically in the paper. We prove that, for small privacy budgets, compression can improve performance of privacy-preserving machine learning models. However, for large privacy budgets, compression does not necessarily improve the performance. Intuitively, this is because the effect of privacy-preserving noise is minimal in large privacy budget regime and thus improvements from gradient sparsification cannot compensate for its slower convergence. △ Less

Submitted 1 December, 2020; v1 submitted 30 November, 2020; originally announced November 2020.

Comments: Fixed typos and a mistake in the proof of Proposition 1

arXiv:2011.11819 [pdf, other]

When Machine Learning Meets Privacy: A Survey and Outlook

Authors: Bo Liu, Ming Ding, Sina Shaham, Wenny Rahayu, Farhad Farokhi, Zihuai Lin

Abstract: The newly emerged machine learning (e.g. deep learning) methods have become a strong driving force to revolutionize a wide range of industries, such as smart healthcare, financial technology, and surveillance systems. Meanwhile, privacy has emerged as a big concern in this machine learning-based artificial intelligence era. It is important to note that the problem of privacy preservation in the co… ▽ More The newly emerged machine learning (e.g. deep learning) methods have become a strong driving force to revolutionize a wide range of industries, such as smart healthcare, financial technology, and surveillance systems. Meanwhile, privacy has emerged as a big concern in this machine learning-based artificial intelligence era. It is important to note that the problem of privacy preservation in the context of machine learning is quite different from that in traditional data privacy protection, as machine learning can act as both friend and foe. Currently, the work on the preservation of privacy and machine learning (ML) is still in an infancy stage, as most existing solutions only focus on privacy problems during the machine learning process. Therefore, a comprehensive study on the privacy preservation problems and machine learning is required. This paper surveys the state of the art in privacy issues and solutions for machine learning. The survey covers three categories of interactions between privacy and machine learning: (i) private machine learning, (ii) machine learning aided privacy protection, and (iii) machine learning-based privacy attack and corresponding protection schemes. The current research progress in each category is reviewed and the key challenges are identified. Finally, based on our in-depth analysis of the area of privacy and machine learning, we point out future research directions in this field. △ Less

Submitted 23 November, 2020; originally announced November 2020.

Comments: This work is accepted by ACM Computing Surveys

arXiv:2010.09968 [pdf, ps, other]

Non-Stochastic Private Function Evaluation

Authors: Farhad Farokhi, Girish Nair

Abstract: We consider private function evaluation to provide query responses based on private data of multiple untrusted entities in such a way that each cannot learn something substantially new about the data of others. First, we introduce perfect non-stochastic privacy in a two-party scenario. Perfect privacy amounts to conditional unrelatedness of the query response and the private uncertain variable of… ▽ More We consider private function evaluation to provide query responses based on private data of multiple untrusted entities in such a way that each cannot learn something substantially new about the data of others. First, we introduce perfect non-stochastic privacy in a two-party scenario. Perfect privacy amounts to conditional unrelatedness of the query response and the private uncertain variable of other individuals conditioned on the uncertain variable of a given entity. We show that perfect privacy can be achieved for queries that are functions of the common uncertain variable, a generalization of the common random variable. We compute the closest approximation of the queries that do not take this form. To provide a trade-off between privacy and utility, we relax the notion of perfect privacy. We define almost perfect privacy and show that this new definition equates to using conditional disassociation instead of conditional unrelatedness in the definition of perfect privacy. Then, we generalize the definitions to multi-party function evaluation (more than two data entities). We prove that uniform quantization of query responses, where the quantization resolution is a function of privacy budget and sensitivity of the query (cf., differential privacy), achieves function evaluation privacy. △ Less

Submitted 19 October, 2020; originally announced October 2020.

arXiv:2008.12466 [pdf, other]

Deconvoluting Kernel Density Estimation and Regression for Locally Differentially Private Data

Authors: Farhad Farokhi

Abstract: Local differential privacy has become the gold-standard of privacy literature for gathering or releasing sensitive individual data points in a privacy-preserving manner. However, locally differential data can twist the probability density of the data because of the additive noise used to ensure privacy. In fact, the density of privacy-preserving data (no matter how many samples we gather) is alway… ▽ More Local differential privacy has become the gold-standard of privacy literature for gathering or releasing sensitive individual data points in a privacy-preserving manner. However, locally differential data can twist the probability density of the data because of the additive noise used to ensure privacy. In fact, the density of privacy-preserving data (no matter how many samples we gather) is always flatter in comparison with the density function of the original data points due to convolution with privacy-preserving noise density function. The effect is especially more pronounced when using slow-decaying privacy-preserving noises, such as the Laplace noise. This can result in under/over-estimation of the heavy-hitters. This is an important challenge facing social scientists due to the use of differential privacy in the 2020 Census in the United States. In this paper, we develop density estimation methods using smoothing kernels. We use the framework of deconvoluting kernel density estimators to remove the effect of privacy-preserving noise. This approach also allows us to adapt the results from non-parameteric regression with errors-in-variables to develop regression models based on locally differentially private data. We demonstrate the performance of the developed methods on financial and demographic datasets. △ Less

Submitted 8 November, 2020; v1 submitted 27 August, 2020; originally announced August 2020.

Comments: updated reference list, deeper numerical analysis

arXiv:2008.04477 [pdf, other]

doi 10.1109/CDC.2018.8619460

Security Versus Privacy

Authors: Farhad Farokhi, Peyman Mohajerin Esfahani

Abstract: Linear queries can be submitted to a server containing private data. The server provides a response to the queries systematically corrupted using an additive noise to preserve the privacy of those whose data is stored on the server. The measure of privacy is inversely proportional to the trace of the Fisher information matrix. It is assumed that an adversary can inject a false bias to the response… ▽ More Linear queries can be submitted to a server containing private data. The server provides a response to the queries systematically corrupted using an additive noise to preserve the privacy of those whose data is stored on the server. The measure of privacy is inversely proportional to the trace of the Fisher information matrix. It is assumed that an adversary can inject a false bias to the responses. The measure of the security, capturing the ease of detecting the presence of the false data injection, is the sensitivity of the Kullback-Leiber divergence to the additive bias. An optimization problem for balancing privacy and security is proposed and subsequently solved. It is shown that the level of guaranteed privacy times the level of security equals a constant. Therefore, by increasing the level of privacy, the security guarantees can only be weakened and vice versa. Similar results are developed under the differential privacy framework. △ Less

Submitted 10 August, 2020; originally announced August 2020.

Journal ref: 2018 IEEE Conference on Decision and Control (CDC)

arXiv:2006.13488 [pdf, other]

Distributionally-Robust Machine Learning Using Locally Differentially-Private Data

Authors: Farhad Farokhi

Abstract: We consider machine learning, particularly regression, using locally-differentially private datasets. The Wasserstein distance is used to define an ambiguity set centered at the empirical distribution of the dataset corrupted by local differential privacy noise. The ambiguity set is shown to contain the probability distribution of unperturbed, clean data. The radius of the ambiguity set is a funct… ▽ More We consider machine learning, particularly regression, using locally-differentially private datasets. The Wasserstein distance is used to define an ambiguity set centered at the empirical distribution of the dataset corrupted by local differential privacy noise. The ambiguity set is shown to contain the probability distribution of unperturbed, clean data. The radius of the ambiguity set is a function of the privacy budget, spread of the data, and the size of the problem. Hence, machine learning with locally-differentially private datasets can be rewritten as a distributionally-robust optimization. For general distributions, the distributionally-robust optimization problem can relaxed as a regularized machine learning problem with the Lipschitz constant of the machine learning model as a regularizer. For linear and logistic regression, this regularizer is the dual norm of the model parameters. For Gaussian data, the distributionally-robust optimization problem can be solved exactly to find an optimal regularizer. This approach results in an entirely new regularizer for training linear regression models. Training with this novel regularizer can be posed as a semi-definite program. Finally, the performance of the proposed distributionally-robust machine learning training is demonstrated on practical datasets. △ Less

Submitted 24 June, 2020; originally announced June 2020.

arXiv:2006.01397 [pdf, ps, other]

Online Stochastic Convex Optimization: Wasserstein Distance Variation

Authors: Iman Shames, Farhad Farokhi

Abstract: Distributionally-robust optimization is often studied for a fixed set of distributions rather than time-varying distributions that can drift significantly over time (which is, for instance, the case in finance and sociology due to underlying expansion of economy and evolution of demographics). This motivates understanding conditions on probability distributions, using the Wasserstein distance, tha… ▽ More Distributionally-robust optimization is often studied for a fixed set of distributions rather than time-varying distributions that can drift significantly over time (which is, for instance, the case in finance and sociology due to underlying expansion of economy and evolution of demographics). This motivates understanding conditions on probability distributions, using the Wasserstein distance, that can be used to model time-varying environments. We can then use these conditions in conjunction with online stochastic optimization to adapt the decisions. We considers an online proximal-gradient method to track the minimizers of expectations of smooth convex functions parameterised by a random variable whose probability distributions continuously evolve over time at a rate similar to that of the rate at which the decision maker acts. We revisit the concepts of estimation and tracking error inspired by systems and control literature and provide bounds for them under strong convexity, Lipschitzness of the gradient, and bounds on the probability distribution drift. Further, noting that computing projections for a general feasible sets might not be amenable to online implementation (due to computational constraints), we propose an exact penalty method. Doing so allows us to relax the uniform boundedness of the gradient and establish dynamic regret bounds for tracking and estimation error. We further introduce a constraint-tightening approach and relate the amount of tightening to the probability of satisfying the constraints. △ Less

Submitted 29 September, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

arXiv:2006.00892 [pdf, ps, other]

An Explicit Formula for the Zero-Error Feedback Capacity of a Class of Finite-State Additive Noise Channels

Authors: Amir Saberi, Farhad Farokhi, Girish N. Nair

Abstract: It is known that for a discrete channel with correlated additive noise, the ordinary capacity with or without feedback both equal $ \log q-\mathcal{H} (Z) $, where $ \mathcal{H}(Z) $ is the entropy rate of the noise process $ Z $ and $ q $ is the alphabet size. In this paper, a class of finite-state additive noise channels is introduced. It is shown that the zero-error feedback capacity of such ch… ▽ More It is known that for a discrete channel with correlated additive noise, the ordinary capacity with or without feedback both equal $ \log q-\mathcal{H} (Z) $, where $ \mathcal{H}(Z) $ is the entropy rate of the noise process $ Z $ and $ q $ is the alphabet size. In this paper, a class of finite-state additive noise channels is introduced. It is shown that the zero-error feedback capacity of such channels is either zero or $C_{0f} =\log q -h (Z) $, where $ h (Z) $ is the {\em topological entropy} of the noise process. A topological condition is given when the zero-error capacity is zero, with or without feedback. Moreover, the zero-error capacity without feedback is lower-bounded by $ \log q-2 h (Z) $. We explicitly compute the zero-error feedback capacity for several examples, including channels with isolated errors and a Gilbert-Elliot channel. △ Less

Submitted 29 May, 2020; originally announced June 2020.

Comments: arXiv admin note: text overlap with arXiv:2003.11954

arXiv:2004.10911 [pdf, ps, other]

Measuring Information Leakage in Non-stochastic Brute-Force Guessing

Authors: Farhad Farokhi, Ni Ding

Abstract: We propose an operational measure of information leakage in a non-stochastic setting to formalize privacy against a brute-force guessing adversary. We use uncertain variables, non-probabilistic counterparts of random variables, to construct a guessing framework in which an adversary is interested in determining private information based on uncertain reports. We consider brute-force trial-and-error… ▽ More We propose an operational measure of information leakage in a non-stochastic setting to formalize privacy against a brute-force guessing adversary. We use uncertain variables, non-probabilistic counterparts of random variables, to construct a guessing framework in which an adversary is interested in determining private information based on uncertain reports. We consider brute-force trial-and-error guessing in which an adversary can potentially check all the possibilities of the private information that are compatible with the available outputs to find the actual private realization. The ratio of the worst-case number of guesses for the adversary in the presence of the output and in the absence of it captures the reduction in the adversary's guessing complexity and is thus used as a measure of private information leakage. We investigate the relationship between the newly-developed measure of information leakage with the existing non-stochastic maximin information and stochastic maximal leakage that are shown arise in one-shot guessing. △ Less

Submitted 27 January, 2021; v1 submitted 22 April, 2020; originally announced April 2020.

arXiv:2003.11954 [pdf, other]

Bounded State Estimation over Finite-State Channels: Relating Topological Entropy and Zero-Error Capacity

Authors: Amir Saberi, Farhad Farokhi, Girish N. Nair

Abstract: We investigate state estimation of linear systems over channels having a finite state not known by the transmitter or receiver. We show that similar to memoryless channels, zero-error capacity is the right figure of merit for achieving bounded estimation errors. We then consider finite-state, worst-case versions of the common erasure and additive noise channels models, in which the noise is govern… ▽ More We investigate state estimation of linear systems over channels having a finite state not known by the transmitter or receiver. We show that similar to memoryless channels, zero-error capacity is the right figure of merit for achieving bounded estimation errors. We then consider finite-state, worst-case versions of the common erasure and additive noise channels models, in which the noise is governed by a finite-state machine without any statistical structure. Upper and lower bounds on their zero-error capacities are derived, revealing a connection with the {\em topological entropy} of the channel dynamics. Separate necessary and sufficient conditions for bounded linear state estimation errors via such channels are obtained. These estimation conditions bring together the topological entropies of the linear system and the discrete channel. △ Less

Submitted 4 October, 2021; v1 submitted 24 March, 2020; originally announced March 2020.

Comments: arXiv admin note: text overlap with arXiv:1902.00726

arXiv:2003.08500 [pdf, ps, other]

The Cost of Privacy in Asynchronous Differentially-Private Machine Learning

Authors: Farhad Farokhi, Nan Wu, David Smith, Mohamed Ali Kaafar

Abstract: We consider training machine learning models using Training data located on multiple private and geographically-scattered servers with different privacy settings. Due to the distributed nature of the data, communicating with all collaborating private data owners simultaneously may prove challenging or altogether impossible. In this paper, we develop differentially-private asynchronous algorithms f… ▽ More We consider training machine learning models using Training data located on multiple private and geographically-scattered servers with different privacy settings. Due to the distributed nature of the data, communicating with all collaborating private data owners simultaneously may prove challenging or altogether impossible. In this paper, we develop differentially-private asynchronous algorithms for collaboratively training machine-learning models on multiple private datasets. The asynchronous nature of the algorithms implies that a central learner interacts with the private data owners one-on-one whenever they are available for communication without needing to aggregate query responses to construct gradients of the entire fitness function. Therefore, the algorithm efficiently scales to many data owners. We define the cost of privacy as the difference between the fitness of a privacy-preserving machine-learning model and the fitness of trained machine-learning model in the absence of privacy concerns. We prove that we can forecast the performance of the proposed privacy-preserving asynchronous algorithms. We demonstrate that the cost of privacy has an upper bound that is inversely proportional to the combined size of the training datasets squared and the sum of the privacy budgets squared. We validate the theoretical results with experiments on financial and medical datasets. The experiments illustrate that collaboration among more than 10 data owners with at least 10,000 records with privacy budgets greater than or equal to 1 results in a superior machine-learning model in comparison to a model trained in isolation on only one of the datasets, illustrating the value of collaboration and the cost of the privacy. The number of the collaborating datasets can be lowered if the privacy budget is higher. △ Less

Submitted 29 June, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

arXiv:2002.06856 [pdf, other]

Data and Model Dependencies of Membership Inference Attack

Authors: Shakila Mahjabin Tonni, Dinusha Vatsalan, Farhad Farokhi, Dali Kaafar, Zhigang Lu, Gioacchino Tangari

Abstract: Machine learning (ML) models have been shown to be vulnerable to Membership Inference Attacks (MIA), which infer the membership of a given data point in the target dataset by observing the prediction output of the ML model. While the key factors for the success of MIA have not yet been fully understood, existing defense mechanisms such as using L2 regularization \cite{10shokri2017membership} and d… ▽ More Machine learning (ML) models have been shown to be vulnerable to Membership Inference Attacks (MIA), which infer the membership of a given data point in the target dataset by observing the prediction output of the ML model. While the key factors for the success of MIA have not yet been fully understood, existing defense mechanisms such as using L2 regularization \cite{10shokri2017membership} and dropout layers \cite{salem2018ml} take only the model's overfitting property into consideration. In this paper, we provide an empirical analysis of the impact of both the data and ML model properties on the vulnerability of ML techniques to MIA. Our results reveal the relationship between MIA accuracy and properties of the dataset and training model in use. In particular, we show that the size of shadow dataset, the class and feature balance and the entropy of the target dataset, the configurations and fairness of the training model are the most influential factors. Based on those experimental findings, we conclude that along with model overfitting, multiple properties jointly contribute to MIA success instead of any single property. Building on our experimental findings, we propose using those data and model properties as regularizers to protect ML models against MIA. Our results show that the proposed defense mechanisms can reduce the MIA accuracy by up to 25\% without sacrificing the ML model prediction utility. △ Less

Submitted 25 July, 2020; v1 submitted 17 February, 2020; originally announced February 2020.

arXiv:2002.03294 [pdf, other]

Uniformly Bounded State Estimation over Multiple Access Channels

Authors: Ghassen Zafzouf, Girish N. Nair, Farhad Farokhi

Abstract: This paper addresses the problem of distributed state estimation via multiple access channels (MACs). We consider a scenario where two encoders are simultaneously communicating their measurements through a noisy channel. Firstly, the zero-error capacity region of the general M-input, single-output MAC is characterized using tools from nonstochastic information theory. Next, we show that a tight co… ▽ More This paper addresses the problem of distributed state estimation via multiple access channels (MACs). We consider a scenario where two encoders are simultaneously communicating their measurements through a noisy channel. Firstly, the zero-error capacity region of the general M-input, single-output MAC is characterized using tools from nonstochastic information theory. Next, we show that a tight condition to be able to achieve uniformly bounded state estimation errors can be given in terms of the channel zero-error capacity region. This criterion relates the channel properties to the plant dynamics. These results pave the way towards understanding information flows in networked control systems with multiple transmitters. △ Less

Submitted 22 December, 2022; v1 submitted 9 February, 2020; originally announced February 2020.

arXiv:2001.10655 [pdf, ps, other]

Regularization Helps with Mitigating Poisoning Attacks: Distributionally-Robust Machine Learning Using the Wasserstein Distance

Authors: Farhad Farokhi

Abstract: We use distributionally-robust optimization for machine learning to mitigate the effect of data poisoning attacks. We provide performance guarantees for the trained model on the original data (not including the poison records) by training the model for the worst-case distribution on a neighbourhood around the empirical distribution (extracted from the training dataset corrupted by a poisoning atta… ▽ More We use distributionally-robust optimization for machine learning to mitigate the effect of data poisoning attacks. We provide performance guarantees for the trained model on the original data (not including the poison records) by training the model for the worst-case distribution on a neighbourhood around the empirical distribution (extracted from the training dataset corrupted by a poisoning attack) defined using the Wasserstein distance. We relax the distributionally-robust machine learning problem by finding an upper bound for the worst-case fitness based on the empirical sampled-averaged fitness and the Lipschitz-constant of the fitness function (on the data for given model parameters) as regularizer. For regression models, we prove that this regularizer is equal to the dual norm of the model parameters. We use the Wine Quality dataset, the Boston Housing Market dataset, and the Adult dataset for demonstrating the results of this paper. △ Less

Submitted 28 January, 2020; originally announced January 2020.

arXiv:2001.10648 [pdf, ps, other]

Modelling and Quantifying Membership Information Leakage in Machine Learning

Authors: Farhad Farokhi, Mohamed Ali Kaafar

Abstract: Machine learning models have been shown to be vulnerable to membership inference attacks, i.e., inferring whether individuals' data have been used for training models. The lack of understanding about factors contributing success of these attacks motivates the need for modelling membership information leakage using information theory and for investigating properties of machine learning models and t… ▽ More Machine learning models have been shown to be vulnerable to membership inference attacks, i.e., inferring whether individuals' data have been used for training models. The lack of understanding about factors contributing success of these attacks motivates the need for modelling membership information leakage using information theory and for investigating properties of machine learning models and training algorithms that can reduce membership information leakage. We use conditional mutual information leakage to measure the amount of information leakage from the trained machine learning model about the presence of an individual in the training dataset. We devise an upper bound for this measure of information leakage using Kullback--Leibler divergence that is more amenable to numerical computation. We prove a direct relationship between the Kullback--Leibler membership information leakage and the probability of success for a hypothesis-testing adversary examining whether a particular data record belongs to the training dataset of a machine learning model. We show that the mutual information leakage is a decreasing function of the training dataset size and the regularization weight. We also prove that, if the sensitivity of the machine learning model (defined in terms of the derivatives of the fitness with respect to model parameters) is high, more membership information is potentially leaked. This illustrates that complex models, such as deep neural networks, are more susceptible to membership inference attacks in comparison to simpler models with fewer degrees of freedom. We show that the amount of the membership information leakage is reduced by $\mathcal{O}(\log^{1/2}(δ^{-1})ε^{-1})$ when using Gaussian $(ε,δ)$-differentially-private additive noises. △ Less

Submitted 27 April, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

arXiv:1912.12576 [pdf, ps, other]

Privacy-Preserving Public Release of Datasets for Support Vector Machine Classification

Authors: Farhad Farokhi

Abstract: We consider the problem of publicly releasing a dataset for support vector machine classification while not infringing on the privacy of data subjects (i.e., individuals whose private information is stored in the dataset). The dataset is systematically obfuscated using an additive noise for privacy protection. Motivated by the Cramer-Rao bound, inverse of the trace of the Fisher information matrix… ▽ More We consider the problem of publicly releasing a dataset for support vector machine classification while not infringing on the privacy of data subjects (i.e., individuals whose private information is stored in the dataset). The dataset is systematically obfuscated using an additive noise for privacy protection. Motivated by the Cramer-Rao bound, inverse of the trace of the Fisher information matrix is used as a measure of the privacy. Conditions are established for ensuring that the classifier extracted from the original dataset and the obfuscated one are close to each other (capturing the utility). The optimal noise distribution is determined by maximizing a weighted sum of the measures of privacy and utility. The optimal privacy-preserving noise is proved to achieve local differential privacy. The results are generalized to a broader class of optimization-based supervised machine learning algorithms. Applicability of the methodology is demonstrated on multiple datasets. △ Less

Submitted 28 December, 2019; originally announced December 2019.

Journal ref: IEEE Transactions on Big Data, 2020

arXiv:1911.04842 [pdf, other]

Develo** Non-Stochastic Privacy-Preserving Policies Using Agglomerative Clustering

Authors: Ni Ding, Farhad Farokhi

Abstract: We consider a non-stochastic privacy-preserving problem in which an adversary aims to infer sensitive information $S$ from publicly accessible data $X$ without using statistics. We consider the problem of generating and releasing a quantization $\hat{X}$ of $X$ to minimize the privacy leakage of $S$ to $\hat{X}$ while maintaining a certain level of utility (or, inversely, the quantization loss). T… ▽ More We consider a non-stochastic privacy-preserving problem in which an adversary aims to infer sensitive information $S$ from publicly accessible data $X$ without using statistics. We consider the problem of generating and releasing a quantization $\hat{X}$ of $X$ to minimize the privacy leakage of $S$ to $\hat{X}$ while maintaining a certain level of utility (or, inversely, the quantization loss). The variables $S$ and $S$ are treated as bounded and non-probabilistic, but are otherwise general. We consider two existing non-stochastic privacy measures, namely the maximum uncertainty reduction $L_0(S \rightarrow \hat{X})$ and the refined information $I_*(S; \hat{X})$ (also called the maximin information) of $S$. For each privacy measure, we propose a corresponding agglomerative clustering algorithm that converges to a locally optimal quantization solution $\hat{X}$ by iteratively merging elements in the alphabet of $X$. To instantiate the solution to this problem, we consider two specific utility measures, the worst-case resolution of $X$ by observing $\hat{X}$ and the maximal distortion of the released data $\hat{X}$. We show that the value of the maximin information $I_*(S; \hat{X})$ can be determined by dividing the confusability graph into connected subgraphs. Hence, $I_*(S; \hat{X})$ can be reduced by merging nodes connecting subgraphs. The relation to the probabilistic information-theoretic privacy is also studied by noting that the G{á}cs-K{ö}rner common information is the stochastic version of $I_*$ and indicates the attainability of statistical indistinguishability. △ Less

Submitted 12 July, 2020; v1 submitted 12 November, 2019; originally announced November 2019.

Comments: 14 pages, 9 figures

arXiv:1910.13027 [pdf, ps, other]

Noiseless Privacy

Authors: Farhad Farokhi

Abstract: In this paper, we define noiseless privacy, as a non-stochastic rival to differential privacy, requiring that the outputs of a mechanism (i.e., function composition of a privacy-preserving map** and a query) can attain only a few values while varying the data of an individual (the logarithm of the number of the distinct values is bounded by the privacy budget). Therefore, the output of the mecha… ▽ More In this paper, we define noiseless privacy, as a non-stochastic rival to differential privacy, requiring that the outputs of a mechanism (i.e., function composition of a privacy-preserving map** and a query) can attain only a few values while varying the data of an individual (the logarithm of the number of the distinct values is bounded by the privacy budget). Therefore, the output of the mechanism is not fully informative of the data of the individuals in the dataset. We prove several guarantees for noiselessly-private mechanisms. The information content of the output about the data of an individual, even if an adversary knows all the other entries of the private dataset, is bounded by the privacy budget. The zero-error capacity of memory-less channels using noiselessly private mechanisms for transmission is upper bounded by the privacy budget. The performance of a non-stochastic hypothesis-testing adversary is bounded again by the privacy budget. Finally, assuming that an adversary has access to a stochastic prior on the dataset, we prove that the estimation error of the adversary for individual entries of the dataset is lower bounded by a decreasing function of the privacy budget. In this case, we also show that the maximal information leakage is bounded by the privacy budget. In addition to privacy guarantees, we prove that noiselessly-private mechanisms admit composition theorem and post-processing does not weaken their privacy guarantees. We prove that quantization operators can ensure noiseless privacy if the number of quantization levels is appropriately selected based on the sensitivity of the query and the privacy budget. Finally, we illustrate the privacy merits of noiseless privacy using multiple datasets in energy and transport. △ Less

Submitted 28 October, 2019; originally announced October 2019.

arXiv:1909.11812 [pdf, ps, other]

Differential Privacy for Evolving Almost-Periodic Datasets with Continual Linear Queries: Application to Energy Data Privacy

Authors: Farhad Farokhi

Abstract: For evolving datasets with continual reports, the composition rule for differential privacy (DP) dictates that the scale of DP noise must grow linearly with the number of the queries, or that the privacy budget must be split equally between all the queries, so that the privacy budget across all the queries remains bounded and consistent with the privacy guarantees. To avoid this drawback of DP, we… ▽ More For evolving datasets with continual reports, the composition rule for differential privacy (DP) dictates that the scale of DP noise must grow linearly with the number of the queries, or that the privacy budget must be split equally between all the queries, so that the privacy budget across all the queries remains bounded and consistent with the privacy guarantees. To avoid this drawback of DP, we consider datasets containing almost periodic time series, composed of periodic components and noisy variations on top that are independent across periods. Our interest in these datasets is motivated by that, for reporting on private periodic time series, we do not need to divide the privacy budget across the entire, possibly infinite, horizon. Instead, for periodic time series, we generate DP reports for the first period and report the same DP reports periodically. In practice, however, exactly periodic time series do not exist as the data always contains small variations due to random or uncertain events. For instance, the energy consumption of a household may repeat the same daily pattern with slight variations due to minor changes to the habits of the individuals. The underlying periodic pattern is a function of the private information of the households. It might be desired to protect the privacy of households by not leaking information about the recurring patterns while the individual daily variations are almost noise-like with little to no privacy concerns (depending on the situation). Motivated by this, we define DP for almost periodic datasets and develop a Laplace mechanism for responding to linear queries. We provide statistical tools for testing the validity of almost periodicity assumption. We use multiple energy datasets containing smart-meter measurements of households to validate almost periodicity assumption. We generate DP aggregate reports and investigate their utility. △ Less

Submitted 25 September, 2019; originally announced September 2019.

arXiv:1908.04954 [pdf, ps, other]

Taking a Lesson from Quantum Particles for Statistical Data Privacy

Authors: Farhad Farokhi

Abstract: Privacy is under threat from artificial intelligence revolution fueled by unprecedented abundance of data. Differential privacy, an established candidate for privacy protection, is susceptible to adversarial attacks, acts conservatively, and leads to miss-implementations because of lacking systematic methods for setting its parameters (known as the privacy budget). An alternative is information-th… ▽ More Privacy is under threat from artificial intelligence revolution fueled by unprecedented abundance of data. Differential privacy, an established candidate for privacy protection, is susceptible to adversarial attacks, acts conservatively, and leads to miss-implementations because of lacking systematic methods for setting its parameters (known as the privacy budget). An alternative is information-theoretic privacy using entropy with the drawback of requiring prior distribution of the private data. Here, by using the Fisher information, information-theoretic privacy framework is extended to avoid unnecessary assumptions on the private data. The optimal privacy-preserving additive noise, extracted by minimizing the Fisher information, must follow the time-independent Schrodinger's equation. A fundamental trade-off between privacy and utility is also proved, reminiscent of the Heisenberg uncertainty principle. △ Less

Submitted 14 August, 2019; originally announced August 2019.

arXiv:1908.03995 [pdf, ps, other]

Temporally Discounted Differential Privacy for Evolving Datasets on an Infinite Horizon

Authors: Farhad Farokhi

Abstract: We define discounted differential privacy, as an alternative to (conventional) differential privacy, to investigate privacy of evolving datasets, containing time series over an unbounded horizon. We use privacy loss as a measure of the amount of information leaked by the reports at a certain fixed time. We observe that privacy losses are weighted equally across time in the definition of differenti… ▽ More We define discounted differential privacy, as an alternative to (conventional) differential privacy, to investigate privacy of evolving datasets, containing time series over an unbounded horizon. We use privacy loss as a measure of the amount of information leaked by the reports at a certain fixed time. We observe that privacy losses are weighted equally across time in the definition of differential privacy, and therefore the magnitude of privacy-preserving additive noise must grow without bound to ensure differential privacy over an infinite horizon. Motivated by the discounted utility theory within the economics literature, we use exponential and hyperbolic discounting of privacy losses across time to relax the definition of differential privacy under continual observations. This implies that privacy losses in distant past are less important than the current ones to an individual. We use discounted differential privacy to investigate privacy of evolving datasets using additive Laplace noise and show that the magnitude of the additive noise can remain bounded under discounted differential privacy. We illustrate the quality of privacy-preserving mechanisms satisfying discounted differential privacy on smart-meter measurement time-series of real households, made publicly available by Ausgrid (an Australian electricity distribution company). △ Less

Submitted 27 January, 2020; v1 submitted 12 August, 2019; originally announced August 2019.

Journal ref: 2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS)

arXiv:1906.09721 [pdf, ps, other]

A Game-Theoretic Approach to Adversarial Linear Support Vector Classification

Authors: Farhad Farokhi

Abstract: In this paper, we employ a game-theoretic model to analyze the interaction between an adversary and a classifier. There are two classes (i.e., positive and negative classes) to which data points can belong. The adversary is interested in maximizing the probability of miss-detection for the positive class (i.e., false negative probability). The adversary however does not want to significantly modif… ▽ More In this paper, we employ a game-theoretic model to analyze the interaction between an adversary and a classifier. There are two classes (i.e., positive and negative classes) to which data points can belong. The adversary is interested in maximizing the probability of miss-detection for the positive class (i.e., false negative probability). The adversary however does not want to significantly modify the data point so that it still maintains favourable traits of the original class. The classifier, on the other hand, is interested in maximizing the probability of correct detection for the positive class (i.e., true positive probability) subject to a lower-bound on the probability of correct detection for the negative class (i.e., true negative probability). For conditionally Gaussian data points (conditioned on the class) and linear support vector machine classifiers, we rewrite the optimization problems of the adversary and the classifier as convex optimization problems and use best response dynamics to learn an equilibrium of the game. This results in computing a linear support vector machine classifier that is robust against adversarial input manipulations. We illustrate the framework on a synthetic dataset and a public Cardiovascular Disease dataset. △ Less

Submitted 24 June, 2019; originally announced June 2019.

arXiv:1906.09679 [pdf, ps, other]

The Value of Collaboration in Convex Machine Learning with Differential Privacy

Authors: Nan Wu, Farhad Farokhi, David Smith, Mohamed Ali Kaafar

Abstract: In this paper, we apply machine learning to distributed private data owned by multiple data owners, entities with access to non-overlap** training datasets. We use noisy, differentially-private gradients to minimize the fitness cost of the machine learning model using stochastic gradient descent. We quantify the quality of the trained model, using the fitness cost, as a function of privacy budge… ▽ More In this paper, we apply machine learning to distributed private data owned by multiple data owners, entities with access to non-overlap** training datasets. We use noisy, differentially-private gradients to minimize the fitness cost of the machine learning model using stochastic gradient descent. We quantify the quality of the trained model, using the fitness cost, as a function of privacy budget and size of the distributed datasets to capture the trade-off between privacy and utility in machine learning. This way, we can predict the outcome of collaboration among privacy-aware data owners prior to executing potentially computationally-expensive machine learning algorithms. Particularly, we show that the difference between the fitness of the trained machine learning model using differentially-private gradient queries and the fitness of the trained machine model in the absence of any privacy concerns is inversely proportional to the size of the training datasets squared and the privacy budget squared. We successfully validate the performance prediction with the actual performance of the proposed privacy-aware learning algorithms, applied to: financial datasets for determining interest rates of loans using regression; and detecting credit card frauds using support vector machines. △ Less

Submitted 23 June, 2019; originally announced June 2019.

Comments: Accepted in IEEE S&P 2020

Journal ref: IEEE Symposium on Security and Privacy 2020 (IEEE SP 2020)

arXiv:1904.07377 [pdf, other]

Non-Stochastic Hypothesis Testing with Application to Privacy Against Hypothesis-Testing Adversary

Authors: Farhad Farokhi

Abstract: In this paper, we consider privacy against hypothesis testing adversaries within a non-stochastic framework. We develop a theory of non-stochastic hypothesis testing by borrowing the notion of uncertain variables from non-stochastic information theory. We define tests as binary-valued map**s on uncertain variables and prove a fundamental bound on the best performance of tests in non-stochastic h… ▽ More In this paper, we consider privacy against hypothesis testing adversaries within a non-stochastic framework. We develop a theory of non-stochastic hypothesis testing by borrowing the notion of uncertain variables from non-stochastic information theory. We define tests as binary-valued map**s on uncertain variables and prove a fundamental bound on the best performance of tests in non-stochastic hypothesis testing. We use this bound to develop a measure of privacy. We then construct reporting policies with prescribed privacy and utility guarantees. The utility of a reporting policy is measured by the distance between the reported and original values. We illustrate the effects of using such privacy-preserving reporting polices on a publicly-available practical dataset of preferences and demographics of young individuals, aged between 15-30, with Slovakian nationality. △ Less

Submitted 15 April, 2019; originally announced April 2019.

arXiv:1902.06899 [pdf, ps, other]

doi 10.1016/j.conengprac.2020.104350

Implementing Homomorphic Encryption Based Secure Feedback Control for Physical Systems

Authors: Julian Tran, Farhad Farokhi, Michael Cantoni, Iman Shames

Abstract: This paper is about an encryption based approach to the secure implementation of feedback controllers for physical systems. Specifically, Paillier's homomorphic encryption is used to digitally implement a class of linear dynamic controllers, which includes the commonplace static gain and PID type feedback control laws as special cases. The developed implementation is amenable to Field Programmable… ▽ More This paper is about an encryption based approach to the secure implementation of feedback controllers for physical systems. Specifically, Paillier's homomorphic encryption is used to digitally implement a class of linear dynamic controllers, which includes the commonplace static gain and PID type feedback control laws as special cases. The developed implementation is amenable to Field Programmable Gate Array (FPGA) realization. Experimental results, including timing analysis and resource usage characteristics for different encryption key lengths, are presented for the realization of an inverted pendulum controller; as this is an unstable plant, the control is necessarily fast. △ Less

Submitted 27 March, 2019; v1 submitted 19 February, 2019; originally announced February 2019.

Journal ref: Control Engineering Practice, Volume 97, April 2020, 104350

arXiv:1902.00726 [pdf, other]

State Estimation over Worst-Case Erasure and Symmetric Channels with Memory

Authors: Amir Saberi, Farhad Farokhi, Girish N. Nair

Abstract: Worst-case models of erasure and symmetric channels are investigated, in which the number of channel errors occurring in each sliding window of a given length is bounded. Upper and lower bounds on their zero-error capacities are derived, with the lower bounds revealing a connection with the topological entropy of the channel dynamics. Necessary and sufficient conditions for linear state estimation… ▽ More Worst-case models of erasure and symmetric channels are investigated, in which the number of channel errors occurring in each sliding window of a given length is bounded. Upper and lower bounds on their zero-error capacities are derived, with the lower bounds revealing a connection with the topological entropy of the channel dynamics. Necessary and sufficient conditions for linear state estimation with bounded estimation errors via such channels are then obtained, by extending previous results for non-stochastic memoryless channels to those with finite memory. These estimation conditions involve the topological entropies of the linear system and the channel. △ Less

Submitted 2 February, 2019; originally announced February 2019.

arXiv:1812.04168 [pdf, ps, other]

Secure and Private Implementation of Dynamic Controllers Using Semi-Homomorphic Encryption

Authors: Carlos Murguia, Farhad Farokhi, Iman Shames

Abstract: This paper presents a secure and private implementation of linear time-invariant dynamic controllers using Paillier's encryption, a semi-homomorphic encryption method. To avoid overflow or underflow within the encryption domain, the state of the controller is reset periodically. A control design approach is presented to ensure stability and optimize performance of the closed-loop system with encry… ▽ More This paper presents a secure and private implementation of linear time-invariant dynamic controllers using Paillier's encryption, a semi-homomorphic encryption method. To avoid overflow or underflow within the encryption domain, the state of the controller is reset periodically. A control design approach is presented to ensure stability and optimize performance of the closed-loop system with encrypted controller. △ Less

Submitted 20 June, 2019; v1 submitted 10 December, 2018; originally announced December 2018.

Comments: Improved numerical example

arXiv:1810.11153 [pdf, ps, other]

doi 10.1109/TIFS.2019.2903660

Development and Analysis of Deterministic Privacy-Preserving Policies Using Non-Stochastic Information Theory

Authors: Farhad Farokhi

Abstract: A deterministic privacy metric using non-stochastic information theory is developed. Particularly, minimax information is used to construct a measure of information leakage, which is inversely proportional to the measure of privacy. Anyone can submit a query to a trusted agent with access to a non-stochastic uncertain private dataset. Optimal deterministic privacy-preserving policies for respondin… ▽ More A deterministic privacy metric using non-stochastic information theory is developed. Particularly, minimax information is used to construct a measure of information leakage, which is inversely proportional to the measure of privacy. Anyone can submit a query to a trusted agent with access to a non-stochastic uncertain private dataset. Optimal deterministic privacy-preserving policies for responding to the submitted query are computed by maximizing the measure of privacy subject to a constraint on the worst-case quality of the response (i.e., the worst-case difference between the response by the agent and the output of the query computed on the private dataset). The optimal privacy-preserving policy is proved to be a piecewise constant function in the form of a quantization operator applied on the output of the submitted query. The measure of privacy is also used to analyze the performance of $k$-anonymity methodology (a popular deterministic mechanism for privacy-preserving release of datasets using suppression and generalization techniques), proving that it is in fact not privacy-preserving. △ Less

Submitted 22 January, 2019; v1 submitted 25 October, 2018; originally announced October 2018.

Comments: improved introduction and numerical example

Journal ref: IEEE Transactions on Information Forensics and Security, 2019

arXiv:1808.09565 [pdf, other]

Ensuring Privacy with Constrained Additive Noise by Minimizing Fisher Information

Authors: Farhad Farokhi, Henrik Sandberg

Abstract: The problem of preserving the privacy of individual entries of a database when responding to linear or nonlinear queries with constrained additive noise is considered. For privacy protection, the response to the query is systematically corrupted with an additive random noise whose support is a subset or equal to a pre-defined constraint set. A measure of privacy using the inverse of the trace of t… ▽ More The problem of preserving the privacy of individual entries of a database when responding to linear or nonlinear queries with constrained additive noise is considered. For privacy protection, the response to the query is systematically corrupted with an additive random noise whose support is a subset or equal to a pre-defined constraint set. A measure of privacy using the inverse of the trace of the Fisher information matrix is developed. The Cramer-Rao bound relates the variance of any estimator of the database entries to the introduced privacy measure. The probability density that minimizes the trace of the Fisher information (as a proxy for maximizing the measure of privacy) is computed. An extension to dynamic problems is also presented. Finally, the results are compared to the differential privacy methodology. △ Less

Submitted 28 August, 2018; originally announced August 2018.

arXiv:1702.08582 [pdf, other]

Private and Secure Coordination of Match-Making for Heavy-Duty Vehicle Platooning

Authors: Farhad Farokhi, Iman Shames, Karl H. Johansson

Abstract: A secure and private framework for inter-agent communication and coordination is developed. This allows an agent, in our case a fleet owner, to ask questions or submit queries in an encrypted fashion using semi-homomorphic encryption. The submitted query can be about the interest of the other fleet owners for using a road at a specific time of the day, for instance, for the purpose of collaborativ… ▽ More A secure and private framework for inter-agent communication and coordination is developed. This allows an agent, in our case a fleet owner, to ask questions or submit queries in an encrypted fashion using semi-homomorphic encryption. The submitted query can be about the interest of the other fleet owners for using a road at a specific time of the day, for instance, for the purpose of collaborative vehicle platooning. The other agents can then provide appropriate responses without knowing the content of the questions or the queries. Strong privacy and security guarantees are provided for the agent who is submitting the queries. It is also shown that the amount of the information that this agent can extract from the other agent is bounded. In fact, with submitting one query, a sophisticated agent can at most extract the answer to two queries. This secure communication platform is used subsequently to develop a distributed coordination mechanisms among fleet owners. △ Less

Submitted 27 February, 2017; originally announced February 2017.

arXiv:1509.08193 [pdf, other]

Budget-Constrained Contract Design for Effort-Averse Sensors in Averaging Based Estimation

Authors: Farhad Farokhi, Iman Shames, Michael Cantoni

Abstract: Consider a group of effort-averse, or lazy, sensors that seek to minimize the effort invested to collect measurements of a variable. Increasing the effort invested by the sensors improves the quality of the measurements provided to the central planner but this incurs increased costs to the sensors. The central planner, which processes the sensor measurements, employs an averaging estimator. It als… ▽ More Consider a group of effort-averse, or lazy, sensors that seek to minimize the effort invested to collect measurements of a variable. Increasing the effort invested by the sensors improves the quality of the measurements provided to the central planner but this incurs increased costs to the sensors. The central planner, which processes the sensor measurements, employs an averaging estimator. It also determines contracts for rewarding sensors based on the measurements obtained. The problem of designing a contract that yields an estimation-error based quality-of-service level in return for the reward extended to sensors is investigated in this paper. To this end, a game is formulated between the central planner and the sensors. Conditions for the existence and uniqueness of an equilibrium are identified. The equilibrium is constructed explicitly and its properties in response to a reward based contract are studied. It turns out that the central planner, while not being able to directly measure the effort invested by the sensors, can enhance the estimation quality by rewarding each sensor based on the distance of its measurements from the output of the averaging estimator. Ultimately, optimal contracts are designed from the perspective of the budget required for achieving a specified level of estimation error. △ Less

Submitted 14 February, 2016; v1 submitted 28 September, 2015; originally announced September 2015.

Comments: Improved literature review

arXiv:1509.05502 [pdf, other]

Mutual Information as Privacy-Loss Measure in Strategic Communication

Authors: Farhad Farokhi, Girish Nair

Abstract: A game is introduced to study the effect of privacy in strategic communication between well-informed senders and a receiver. The receiver wants to accurately estimate a random variable. The sender, however, wants to communicate a message that balances a trade-off between providing an accurate measurement and minimizing the amount of leaked private information, which is assumed to be correlated wit… ▽ More A game is introduced to study the effect of privacy in strategic communication between well-informed senders and a receiver. The receiver wants to accurately estimate a random variable. The sender, however, wants to communicate a message that balances a trade-off between providing an accurate measurement and minimizing the amount of leaked private information, which is assumed to be correlated with the to-be-estimated variable. The mutual information between the transmitted message and the private information is used as a measure of the amount of leaked information. An equilibrium is constructed and its properties are investigated. △ Less

Submitted 18 September, 2015; originally announced September 2015.

arXiv:1509.05500 [pdf, ps, other]

On Reconstructability of Quadratic Utility Functions from the Iterations in Gradient Methods

Authors: Farhad Farokhi, Iman Shames, Michael G. Rabbat, Mikael Johansson

Abstract: In this paper, we consider a scenario where an eavesdropper can read the content of messages transmitted over a network. The nodes in the network are running a gradient algorithm to optimize a quadratic utility function where such a utility optimization is a part of a decision making process by an administrator. We are interested in understanding the conditions under which the eavesdropper can rec… ▽ More In this paper, we consider a scenario where an eavesdropper can read the content of messages transmitted over a network. The nodes in the network are running a gradient algorithm to optimize a quadratic utility function where such a utility optimization is a part of a decision making process by an administrator. We are interested in understanding the conditions under which the eavesdropper can reconstruct the utility function or a scaled version of it and, as a result, gain insight into the decision-making process. We establish that if the parameter of the gradient algorithm, i.e.,~the step size, is chosen appropriately, the task of reconstruction becomes practically impossible for a class of Bayesian filters with uniform priors. We establish what step-size rules should be employed to ensure this. △ Less

Submitted 17 September, 2015; originally announced September 2015.

arXiv:1509.05497 [pdf, other]

doi 10.1109/CDC.2015.7402923

Quadratic Gaussian Privacy Games

Authors: Farhad Farokhi, Henrik Sandberg, Iman Shames, Michael Cantoni

Abstract: A game-theoretic model for analysing the effects of privacy on strategic communication between agents is devised. In the model, a sender wishes to provide an accurate measurement of the state to a receiver while also protecting its private information (which is correlated with the state) private from a malicious agent that may eavesdrop on its communications with the receiver. A family of nontrivi… ▽ More A game-theoretic model for analysing the effects of privacy on strategic communication between agents is devised. In the model, a sender wishes to provide an accurate measurement of the state to a receiver while also protecting its private information (which is correlated with the state) private from a malicious agent that may eavesdrop on its communications with the receiver. A family of nontrivial equilibria, in which the communicated messages carry information, is constructed and its properties are studied. △ Less

Submitted 17 September, 2015; originally announced September 2015.

Comments: Accepted for Presentation at the 54th IEEE Conference on Decision and Control (CDC 2015)

arXiv:1503.02784 [pdf, other]

doi 10.1109/LSP.2015.2412122

Promoting Truthful Behaviour in Participatory-Sensing Mechanisms

Authors: Farhad Farokhi, Iman Shames, Michael Cantoni

Abstract: In this paper, the interplay between a class of nonlinear estimators and strategic sensors is studied in several participatory-sensing scenarios. It is shown that for the class of estimators, if the strategic sensors have access to noiseless measurements of the to-be-estimated-variable, truth-telling is an equilibrium of the game that models the interplay between the sensors and the estimator. Fur… ▽ More In this paper, the interplay between a class of nonlinear estimators and strategic sensors is studied in several participatory-sensing scenarios. It is shown that for the class of estimators, if the strategic sensors have access to noiseless measurements of the to-be-estimated-variable, truth-telling is an equilibrium of the game that models the interplay between the sensors and the estimator. Furthermore, performance of the proposed estimators is examined in the case that the strategic sensors form coalitions and in the presence of noise. △ Less

Submitted 10 March, 2015; originally announced March 2015.

Comments: IEEE Signal Processing Letters, In Press

Showing 1–50 of 55 results for author: Farokhi, F