-
What Are the Chances? Explaining the Epsilon Parameter in Differential Privacy
Authors:
Priyanka Nanayakkara,
Mary Anne Smart,
Rachel Cummings,
Gabriel Kaptchuk,
Elissa Redmiles
Abstract:
Differential privacy (DP) is a mathematical privacy notion increasingly deployed across government and industry. With DP, privacy protections are probabilistic: they are bounded by the privacy budget parameter, $ε$. Prior work in health and computational science finds that people struggle to reason about probabilistic risks. Yet, communicating the implications of $ε$ to people contributing their d…
▽ More
Differential privacy (DP) is a mathematical privacy notion increasingly deployed across government and industry. With DP, privacy protections are probabilistic: they are bounded by the privacy budget parameter, $ε$. Prior work in health and computational science finds that people struggle to reason about probabilistic risks. Yet, communicating the implications of $ε$ to people contributing their data is vital to avoiding privacy theater -- presenting meaningless privacy protection as meaningful -- and empowering more informed data-sharing decisions. Drawing on best practices in risk communication and usability, we develop three methods to convey probabilistic DP guarantees to end users: two that communicate odds and one offering concrete examples of DP outputs.
We quantitatively evaluate these explanation methods in a vignette survey study ($n=963$) via three metrics: objective risk comprehension, subjective privacy understanding of DP guarantees, and self-efficacy. We find that odds-based explanation methods are more effective than (1) output-based methods and (2) state-of-the-art approaches that gloss over information about $ε$. Further, when offered information about $ε$, respondents are more willing to share their data than when presented with a state-of-the-art DP explanation; this willingness to share is sensitive to $ε$ values: as privacy protections weaken, respondents are less likely to share data.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
Addressing Privacy Threats from Machine Learning
Authors:
Mary Anne Smart
Abstract:
Every year at NeurIPS, machine learning researchers gather and discuss exciting applications of machine learning in areas such as public health, disaster response, climate change, education, and more. However, many of these same researchers are expressing growing concern about applications of machine learning for surveillance (Nanayakkara et al., 2021). This paper presents a brief overview of stra…
▽ More
Every year at NeurIPS, machine learning researchers gather and discuss exciting applications of machine learning in areas such as public health, disaster response, climate change, education, and more. However, many of these same researchers are expressing growing concern about applications of machine learning for surveillance (Nanayakkara et al., 2021). This paper presents a brief overview of strategies for resisting these surveillance technologies and calls for greater collaboration between machine learning and human-computer interaction researchers to address the threats that these technologies pose.
△ Less
Submitted 24 October, 2021;
originally announced November 2021.
-
Approximate Data Deletion from Machine Learning Models
Authors:
Zachary Izzo,
Mary Anne Smart,
Kamalika Chaudhuri,
James Zou
Abstract:
Deleting data from a trained machine learning (ML) model is a critical task in many applications. For example, we may want to remove the influence of training points that might be out of date or outliers. Regulations such as EU's General Data Protection Regulation also stipulate that individuals can request to have their data deleted. The naive approach to data deletion is to retrain the ML model…
▽ More
Deleting data from a trained machine learning (ML) model is a critical task in many applications. For example, we may want to remove the influence of training points that might be out of date or outliers. Regulations such as EU's General Data Protection Regulation also stipulate that individuals can request to have their data deleted. The naive approach to data deletion is to retrain the ML model on the remaining data, but this is too time consuming. In this work, we propose a new approximate deletion method for linear and logistic models whose computational cost is linear in the the feature dimension $d$ and independent of the number of training data $n$. This is a significant gain over all existing methods, which all have superlinear time dependence on the dimension. We also develop a new feature-injection test to evaluate the thoroughness of data deletion from ML models.
△ Less
Submitted 23 February, 2021; v1 submitted 24 February, 2020;
originally announced February 2020.