Differentially Private $K$-means Clustering Applied to Meter Data Analysis and Synthesis
Authors:
Nikhil Ravi,
Anna Scaglione,
Sachin Kadam,
Reinhard Gentz,
Sean Peisert,
Brent Lunghino,
Emmanuel Levijarvi,
Aram Shumavon
Abstract:
The proliferation of smart meters has resulted in a large amount of data being generated. It is increasingly apparent that methods are required for allowing a variety of stakeholders to leverage the data in a manner that preserves the privacy of the consumers. The sector is scrambling to define policies, such as the so called `15/15 rule', to respond to the need. However, the current policies fail…
▽ More
The proliferation of smart meters has resulted in a large amount of data being generated. It is increasingly apparent that methods are required for allowing a variety of stakeholders to leverage the data in a manner that preserves the privacy of the consumers. The sector is scrambling to define policies, such as the so called `15/15 rule', to respond to the need. However, the current policies fail to adequately guarantee privacy. In this paper, we address the problem of allowing third parties to apply $K$-means clustering, obtaining customer labels and centroids for a set of load time series by applying the framework of differential privacy. We leverage the method to design an algorithm that generates differentially private synthetic load data consistent with the labeled data. We test our algorithm's utility by answering summary statistics such as average daily load profiles for a 2-dimensional synthetic dataset and a real-world power load dataset.
△ Less
Submitted 22 April, 2022; v1 submitted 7 December, 2021;
originally announced December 2021.
Optimum Noise Mechanism for Differentially Private Queries in Discrete Finite Sets
Authors:
Sachin Kadam,
Anna Scaglione,
Nikhil Ravi,
Sean Peisert,
Brent Lunghino,
Aram Shumavon
Abstract:
The Differential Privacy (DP) literature often centers on meeting privacy constraints by introducing noise to the query, typically using a pre-specified parametric distribution model with one or two degrees of freedom. However, this emphasis tends to neglect the crucial considerations of response accuracy and utility, especially in the context of categorical or discrete numerical database queries,…
▽ More
The Differential Privacy (DP) literature often centers on meeting privacy constraints by introducing noise to the query, typically using a pre-specified parametric distribution model with one or two degrees of freedom. However, this emphasis tends to neglect the crucial considerations of response accuracy and utility, especially in the context of categorical or discrete numerical database queries, where the parameters defining the noise distribution are finite and could be chosen optimally. This paper addresses this gap by introducing a novel framework for designing an optimal noise Probability Mass Function (PMF) tailored to discrete and finite query sets. Our approach considers the modulo summation of random noise as the DP mechanism, aiming to present a tractable solution that not only satisfies privacy constraints but also minimizes query distortion. Unlike existing approaches focused solely on meeting privacy constraints, our framework seeks to optimize the noise distribution under an arbitrary $(ε, δ)$ constraint, thereby enhancing the accuracy and utility of the response. We demonstrate that the optimal PMF can be obtained through solving a Mixed-Integer Linear Program (MILP). Additionally, closed-form solutions for the optimal PMF are provided, minimizing the probability of error for two specific cases. Numerical experiments highlight the superior performance of our proposed optimal mechanisms compared to state-of-the-art methods. This paper contributes to the DP literature by presenting a clear and systematic approach to designing noise mechanisms that not only satisfy privacy requirements but also optimize query distortion. The framework introduced here opens avenues for improved privacy-preserving database queries, offering significant enhancements in response accuracy and utility.
△ Less
Submitted 8 April, 2024; v1 submitted 23 November, 2021;
originally announced November 2021.