-
Fast Private Kernel Density Estimation via Locality Sensitive Quantization
Authors:
Tal Wagner,
Yonatan Naamad,
Nina Mishra
Abstract:
We study efficient mechanisms for differentially private kernel density estimation (DP-KDE). Prior work for the Gaussian kernel described algorithms that run in time exponential in the number of dimensions $d$. This paper breaks the exponential barrier, and shows how the KDE can privately be approximated in time linear in $d$, making it feasible for high-dimensional data. We also present improved…
▽ More
We study efficient mechanisms for differentially private kernel density estimation (DP-KDE). Prior work for the Gaussian kernel described algorithms that run in time exponential in the number of dimensions $d$. This paper breaks the exponential barrier, and shows how the KDE can privately be approximated in time linear in $d$, making it feasible for high-dimensional data. We also present improved bounds for low-dimensional data.
Our results are obtained through a general framework, which we term Locality Sensitive Quantization (LSQ), for constructing private KDE mechanisms where existing KDE approximation techniques can be applied. It lets us leverage several efficient non-private KDE methods -- like Random Fourier Features, the Fast Gauss Transform, and Locality Sensitive Hashing -- and ``privatize'' them in a black-box manner. Our experiments demonstrate that our resulting DP-KDE mechanisms are fast and accurate on large datasets in both high and low dimensions.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Explaining a machine learning decision to physicians via counterfactuals
Authors:
Supriya Nagesh,
Nina Mishra,
Yonatan Naamad,
James M. Rehg,
Mehul A. Shah,
Alexei Wagner
Abstract:
Machine learning models perform well on several healthcare tasks and can help reduce the burden on the healthcare system. However, the lack of explainability is a major roadblock to their adoption in hospitals. \textit{How can the decision of an ML model be explained to a physician?} The explanations considered in this paper are counterfactuals (CFs), hypothetical scenarios that would have resulte…
▽ More
Machine learning models perform well on several healthcare tasks and can help reduce the burden on the healthcare system. However, the lack of explainability is a major roadblock to their adoption in hospitals. \textit{How can the decision of an ML model be explained to a physician?} The explanations considered in this paper are counterfactuals (CFs), hypothetical scenarios that would have resulted in the opposite outcome. Specifically, time-series CFs are investigated, inspired by the way physicians converse and reason out decisions `I would have given the patient a vasopressor if their blood pressure was lower and falling'. Key properties of CFs that are particularly meaningful in clinical settings are outlined: physiological plausibility, relevance to the task and sparse perturbations. Past work on CF generation does not satisfy these properties, specifically plausibility in that realistic time-series CFs are not generated. A variational autoencoder (VAE)-based approach is proposed that captures these desired properties. The method produces CFs that improve on prior approaches quantitatively (more plausible CFs as evaluated by their likelihood w.r.t original data distribution, and 100$\times$ faster at generating CFs) and qualitatively (2$\times$ more plausible and relevant) as evaluated by three physicians.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
Multi-Commodity Flow with In-Network Processing
Authors:
Moses Charikar,
Yonatan Naamad,
Jennifer Rexford,
X. Kelvin Zou
Abstract:
Modern networks run "middleboxes" that offer services ranging from network address translation and server load balancing to firewalls, encryption, and compression. In an industry trend known as Network Functions Virtualization (NFV), these middleboxes run as virtual machines on any commodity server, and the switches steer traffic through the relevant chain of services. Network administrators must…
▽ More
Modern networks run "middleboxes" that offer services ranging from network address translation and server load balancing to firewalls, encryption, and compression. In an industry trend known as Network Functions Virtualization (NFV), these middleboxes run as virtual machines on any commodity server, and the switches steer traffic through the relevant chain of services. Network administrators must decide how many middleboxes to run, where to place them, and how to direct traffic through them, based on the traffic load and the server and network capacity. Rather than placing specific kinds of middleboxes on each processing node, we argue that server virtualization allows each server node to host all middlebox functions, and simply vary the fraction of resources devoted to each one. This extra flexibility fundamentally changes the optimization problem the network administrators must solve to a new kind of multi-commodity flow problem, where the traffic flows consume bandwidth on the links as well as processing resources on the nodes. We show that allocating resources to maximize the processed flow can be optimized exactly via a linear programming formulation, and to arbitrary accuracy via an efficient combinatorial algorithm. Our experiments with real traffic and topologies show that a joint optimization of node and link resources leads to an efficient use of bandwidth and processing capacity. We also study a class of design problems that decide where to provide node capacity to best process and route a given set of demands, and demonstrate both approximation algorithms and hardness results for these problems.
△ Less
Submitted 25 February, 2018;
originally announced February 2018.
-
On Finding Dense Common Subgraphs
Authors:
Moses Charikar,
Yonatan Naamad,
Jimmy Wu
Abstract:
We study the recently introduced problem of finding dense common subgraphs: Given a sequence of graphs that share the same vertex set, the goal is to find a subset of vertices $S$ that maximizes some aggregate measure of the density of the subgraphs induced by $S$ in each of the given graphs. Different choices for the aggregation function give rise to variants of the problem that were studied rece…
▽ More
We study the recently introduced problem of finding dense common subgraphs: Given a sequence of graphs that share the same vertex set, the goal is to find a subset of vertices $S$ that maximizes some aggregate measure of the density of the subgraphs induced by $S$ in each of the given graphs. Different choices for the aggregation function give rise to variants of the problem that were studied recently. We settle many of the questions left open by previous works, showing NP-hardness, hardness of approximation, non-trivial approximation algorithms, and an integrality gap for a natural relaxation.
△ Less
Submitted 18 February, 2018;
originally announced February 2018.