-
FastLloyd: Federated, Accurate, Secure, and Tunable $k$-Means Clustering with Differential Privacy
Authors:
Abdulrahman Diaa,
Thomas Humphries,
Florian Kerschbaum
Abstract:
We study the problem of privacy-preserving $k$-means clustering in the horizontally federated setting. Existing federated approaches using secure computation, suffer from substantial overheads and do not offer output privacy. At the same time, differentially private (DP) $k$-means algorithms assume a trusted central curator and do not extend to federated settings. Naively combining the secure and…
▽ More
We study the problem of privacy-preserving $k$-means clustering in the horizontally federated setting. Existing federated approaches using secure computation, suffer from substantial overheads and do not offer output privacy. At the same time, differentially private (DP) $k$-means algorithms assume a trusted central curator and do not extend to federated settings. Naively combining the secure and DP solutions results in a protocol with impractical overhead. Instead, our work provides enhancements to both the DP and secure computation components, resulting in a design that is faster, more private, and more accurate than previous work. By utilizing the computational DP model, we design a lightweight, secure aggregation-based approach that achieves four orders of magnitude speed-up over state-of-the-art related work. Furthermore, we not only maintain the utility of the state-of-the-art in the central model of DP, but we improve the utility further by taking advantage of constrained clustering techniques.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
PEPSI: Practically Efficient Private Set Intersection in the Unbalanced Setting
Authors:
Rasoul Akhavan Mahdavi,
Nils Lukas,
Faezeh Ebrahimianghazani,
Thomas Humphries,
Bailey Kacsmar,
John Premkumar,
Xinda Li,
Simon Oya,
Ehsan Amjadian,
Florian Kerschbaum
Abstract:
Two parties with private data sets can find shared elements using a Private Set Intersection (PSI) protocol without revealing any information beyond the intersection. Circuit PSI protocols privately compute an arbitrary function of the intersection - such as its cardinality, and are often employed in an unbalanced setting where one party has more data than the other. Existing protocols are either…
▽ More
Two parties with private data sets can find shared elements using a Private Set Intersection (PSI) protocol without revealing any information beyond the intersection. Circuit PSI protocols privately compute an arbitrary function of the intersection - such as its cardinality, and are often employed in an unbalanced setting where one party has more data than the other. Existing protocols are either computationally inefficient or require extensive server-client communication on the order of the larger set. We introduce Practically Efficient PSI or PEPSI, a non-interactive solution where only the client sends its encrypted data. PEPSI can process an intersection of 1024 client items with a million server items in under a second, using less than 5 MB of communication. Our work is over 4 orders of magnitude faster than an existing non-interactive circuit PSI protocol and requires only 10% of the communication. It is also up to 20 times faster than the work of Ion et al., which computes a limited set of functions and has communication costs proportional to the larger set. Our work is the first to demonstrate that non-interactive circuit PSI can be practically applied in an unbalanced setting.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
A New Spectral Conjugate Subgradient Method with Application in Computed Tomography Image Reconstruction
Authors:
Milagros Loreto,
Thomas Humphries,
Chella Raghavan,
Kenneth Wu,
Sam Kwak
Abstract:
A new spectral conjugate subgradient method is presented to solve nonsmooth unconstrained optimization problems. The method combines the spectral conjugate gradient method for smooth problems with the spectral subgradient method for nonsmooth problems. We study the effect of two different choices of line search, as well as three formulas for determining the conjugate directions. In addition to num…
▽ More
A new spectral conjugate subgradient method is presented to solve nonsmooth unconstrained optimization problems. The method combines the spectral conjugate gradient method for smooth problems with the spectral subgradient method for nonsmooth problems. We study the effect of two different choices of line search, as well as three formulas for determining the conjugate directions. In addition to numerical experiments with standard nonsmooth test problems, we also apply the method to several image reconstruction problems in computed tomography, using total variation regularization. Performance profiles are used to compare the performance of the algorithm using different line search strategies and conjugate directions to that of the original spectral subgradient method. Our results show that the spectral conjugate subgradient algorithm outperforms the original spectral subgradient method, and that the use of the Polak-Ribiere formula for conjugate directions provides the best and most robust performance.
△ Less
Submitted 5 June, 2024; v1 submitted 26 September, 2023;
originally announced September 2023.
-
Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions
Authors:
Abdulrahman Diaa,
Lucas Fenaux,
Thomas Humphries,
Marian Dietz,
Faezeh Ebrahimianghazani,
Bailey Kacsmar,
Xinda Li,
Nils Lukas,
Rasoul Akhavan Mahdavi,
Simon Oya,
Ehsan Amjadian,
Florian Kerschbaum
Abstract:
Machine Learning as a Service (MLaaS) is an increasingly popular design where a company with abundant computing resources trains a deep neural network and offers query access for tasks like image classification. The challenge with this design is that MLaaS requires the client to reveal their potentially sensitive queries to the company hosting the model. Multi-party computation (MPC) protects the…
▽ More
Machine Learning as a Service (MLaaS) is an increasingly popular design where a company with abundant computing resources trains a deep neural network and offers query access for tasks like image classification. The challenge with this design is that MLaaS requires the client to reveal their potentially sensitive queries to the company hosting the model. Multi-party computation (MPC) protects the client's data by allowing encrypted inferences. However, current approaches suffer from prohibitively large inference times. The inference time bottleneck in MPC is the evaluation of non-linear layers such as ReLU activation functions. Motivated by the success of previous work co-designing machine learning and MPC, we develop an activation function co-design. We replace all ReLUs with a polynomial approximation and evaluate them with single-round MPC protocols, which give state-of-the-art inference times in wide-area networks. Furthermore, to address the accuracy issues previously encountered with polynomial activations, we propose a novel training algorithm that gives accuracy competitive with plaintext models. Our evaluation shows between $3$ and $110\times$ speedups in inference time on large models with up to $23$ million parameters while maintaining competitive inference accuracy.
△ Less
Submitted 16 April, 2024; v1 submitted 14 June, 2023;
originally announced June 2023.
-
Cache Me If You Can: Accuracy-Aware Inference Engine for Differentially Private Data Exploration
Authors:
Miti Mazmudar,
Thomas Humphries,
Jiaxiang Liu,
Matthew Rafuse,
Xi He
Abstract:
Differential privacy (DP) allows data analysts to query databases that contain users' sensitive information while providing a quantifiable privacy guarantee to users. Recent interactive DP systems such as APEx provide accuracy guarantees over the query responses, but fail to support a large number of queries with a limited total privacy budget, as they process incoming queries independently from p…
▽ More
Differential privacy (DP) allows data analysts to query databases that contain users' sensitive information while providing a quantifiable privacy guarantee to users. Recent interactive DP systems such as APEx provide accuracy guarantees over the query responses, but fail to support a large number of queries with a limited total privacy budget, as they process incoming queries independently from past queries. We present an interactive, accuracy-aware DP query engine, CacheDP, which utilizes a differentially private cache of past responses, to answer the current workload at a lower privacy budget, while meeting strict accuracy guarantees. We integrate complex DP mechanisms with our structured cache, through novel cache-aware DP cost optimization. Our thorough evaluation illustrates that CacheDP can accurately answer various workload sequences, while lowering the privacy loss as compared to related work.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Self-Attention Generative Adversarial Network for Iterative Reconstruction of CT Images
Authors:
Ruiwen Xing,
Thomas Humphries,
Dong Si
Abstract:
Computed tomography (CT) uses X-ray measurements taken from sensors around the body to generate tomographic images of the human body. Conventional reconstruction algorithms can be used if the X-ray data are adequately sampled and of high quality; however, concerns such as reducing dose to the patient, or geometric limitations on data acquisition, may result in low quality or incomplete data. Image…
▽ More
Computed tomography (CT) uses X-ray measurements taken from sensors around the body to generate tomographic images of the human body. Conventional reconstruction algorithms can be used if the X-ray data are adequately sampled and of high quality; however, concerns such as reducing dose to the patient, or geometric limitations on data acquisition, may result in low quality or incomplete data. Images reconstructed from these data using conventional methods are of poor quality, due to noise and other artifacts. The aim of this study is to train a single neural network to reconstruct high-quality CT images from noisy or incomplete CT scan data, including low-dose, sparse-view, and limited-angle scenarios. To accomplish this task, we train a generative adversarial network (GAN) as a signal prior, to be used in conjunction with the iterative simultaneous algebraic reconstruction technique (SART) for CT data. The network includes a self-attention block to model long-range dependencies in the data. We compare our Self-Attention GAN for CT image reconstruction with several state-of-the-art approaches, including denoising cycle GAN, CIRCLE GAN, and a total variation superiorized algorithm. Our approach is shown to have comparable overall performance to CIRCLE GAN, while outperforming the other two approaches.
△ Less
Submitted 23 December, 2021;
originally announced December 2021.
-
Selective MPC: Distributed Computation of Differentially Private Key-Value Statistics
Authors:
Thomas Humphries,
Rasoul Akhavan Mahdavi,
Shannon Veitch,
Florian Kerschbaum
Abstract:
Key-value data is a naturally occurring data type that has not been thoroughly investigated in the local trust model. Existing local differentially private (LDP) solutions for computing statistics over key-value data suffer from the inherent accuracy limitations of each user adding their own noise. Multi-party computation (MPC) maintains better accuracy than LDP and similarly does not require a tr…
▽ More
Key-value data is a naturally occurring data type that has not been thoroughly investigated in the local trust model. Existing local differentially private (LDP) solutions for computing statistics over key-value data suffer from the inherent accuracy limitations of each user adding their own noise. Multi-party computation (MPC) maintains better accuracy than LDP and similarly does not require a trusted central party. However, naively applying MPC to key-value data results in prohibitively expensive computation costs. In this work, we present selective multi-party computation, a novel approach to distributed computation that leverages DP leakage to efficiently and accurately compute statistics over key-value data. By providing each party with a view of a random subset of the data, we can capture subtractive noise. We prove that our protocol satisfies pure DP and is provably secure in the combined DP/MPC model. Our empirical evaluation demonstrates that we can compute statistics over 10,000 keys in 20 seconds and can scale up to 30 servers while obtaining results for a single key in under a second.
△ Less
Submitted 30 August, 2022; v1 submitted 26 July, 2021;
originally announced July 2021.
-
Investigating Membership Inference Attacks under Data Dependencies
Authors:
Thomas Humphries,
Simon Oya,
Lindsey Tulloch,
Matthew Rafuse,
Ian Goldberg,
Urs Hengartner,
Florian Kerschbaum
Abstract:
Training machine learning models on privacy-sensitive data has become a popular practice, driving innovation in ever-expanding fields. This has opened the door to new attacks that can have serious privacy implications. One such attack, the Membership Inference Attack (MIA), exposes whether or not a particular data point was used to train a model. A growing body of literature uses Differentially Pr…
▽ More
Training machine learning models on privacy-sensitive data has become a popular practice, driving innovation in ever-expanding fields. This has opened the door to new attacks that can have serious privacy implications. One such attack, the Membership Inference Attack (MIA), exposes whether or not a particular data point was used to train a model. A growing body of literature uses Differentially Private (DP) training algorithms as a defence against such attacks. However, these works evaluate the defence under the restrictive assumption that all members of the training set, as well as non-members, are independent and identically distributed. This assumption does not hold for many real-world use cases in the literature. Motivated by this, we evaluate membership inference with statistical dependencies among samples and explain why DP does not provide meaningful protection (the privacy parameter $ε$ scales with the training set size $n$) in this more general case. We conduct a series of empirical evaluations with off-the-shelf MIAs using training sets built from real-world data showing different types of dependencies among samples. Our results reveal that training set dependencies can severely increase the performance of MIAs, and therefore assuming that data samples are statistically independent can significantly underestimate the performance of MIAs.
△ Less
Submitted 14 June, 2023; v1 submitted 22 October, 2020;
originally announced October 2020.
-
Materials for hydrogen-based energy storage: Past, recent progress and future outlook
Authors:
Volodymyr A. Yartys,
Marcello Baricco,
Jose Bellosta von Colbe,
Didier Blanchard,
Robert C. Bowman Jr.,
Darren P. Broom,
Craig E. Buckley,
Fei Chang,
** Chen,
Young Whan Cho,
Jean-Claude Crivello,
Fermin Cuevas,
William I. F. David,
Petra E. de Jongh,
Roman V. Denys,
Martin Dornheim,
Michael Felderhoff,
Yaroslav Filinchuk,
George E. Froudakis,
David M. Grant,
Bjørn C. Hauback,
Ladislav Havela,
Teng He,
Michael Hirscher,
Terry D. Humphries
, et al. (23 additional authors not shown)
Abstract:
Magnesium hydride owns the largest share of publications on solid materials for hydrogen storage. The Magnesium group of international experts contributing to IEA Task 32 Hydrogen Based Energy Storage recently published two review papers presenting the activities of the group focused on magnesium hydride based materials and on Mg based compounds for hydrogen and energy storage. This review article…
▽ More
Magnesium hydride owns the largest share of publications on solid materials for hydrogen storage. The Magnesium group of international experts contributing to IEA Task 32 Hydrogen Based Energy Storage recently published two review papers presenting the activities of the group focused on magnesium hydride based materials and on Mg based compounds for hydrogen and energy storage. This review article not only overviews the latest activities on both fundamental aspects of Mg-based hydrides and their applications, but also presents a historic overview on the topic and outlines projected future developments. Particular attention is paid to the theoretical and experimental studies of Mg-H system at extreme pressures, kinetics and thermodynamics of the systems based on MgH2,nanostructuring, new Mg-based compounds and novel composites, and catalysis in the Mg based H storage systems. Finally, thermal energy storage and upscaled H storage systems accommodating MgH2 are presented.
△ Less
Submitted 7 May, 2020;
originally announced May 2020.
-
Superiorized method for metal artifact reduction
Authors:
T. Humphries,
J. Wang
Abstract:
Metal artifact reduction (MAR) is a challenging problem in computed tomography (CT) imaging. A popular class of MAR methods replace sinogram measurements that are corrupted by metal with artificial data. While these ``projection completion'' approaches are successful in eliminating severe artifacts, secondary artifacts may be introduced by the artificial data. In this paper, we propose an approach…
▽ More
Metal artifact reduction (MAR) is a challenging problem in computed tomography (CT) imaging. A popular class of MAR methods replace sinogram measurements that are corrupted by metal with artificial data. While these ``projection completion'' approaches are successful in eliminating severe artifacts, secondary artifacts may be introduced by the artificial data. In this paper, we propose an approach which uses projection completion to generate a prior image, which is then incorporated into an iterative reconstruction algorithm based on the superiorization framework. The prior image is reconstructed using normalized metal artifact reduction (NMAR), a popular projection completion approach. The iterative algorithm is a modified version of the simultaneous algebraic reconstruction technique (SART), which reduces artifacts by incorporating a polyenergetic forward model, least-squares weighting, and superiorization. The penalty function used for superiorization is a weighted average between a total variation (TV) term and a term promoting similarity with the prior image, similar to penalty functions used in prior image constrained compressive sensing. Because the prior is largely free of severe metal artifacts, these artifacts are discouraged from arising during iterative reconstruction; additionally, because the iterative approach uses the original projection data, it is able to recover information that is lost during the NMAR process. We perform numerical experiments modeling a simple geometric object, as well as several more realistic scenarios such as metal pins, bilateral hip implants, and dental fillings placed within an anatomical phantom. The proposed iterative algorithm is largely successful at eliminating severe metal artifacts as well as secondary artifacts introduced by the NMAR process, especially lost edges of bone structures in the neighborhood of the metal regions.
△ Less
Submitted 9 March, 2020;
originally announced March 2020.
-
Superiorized algorithm for reconstruction of CT images from sparse-view and limited-angle polyenergetic data
Authors:
T. Humphries,
J. Winn,
A. Faridani
Abstract:
Recent work in CT imaging has seen increased interest in the use of total variation (TV) and related penalties to regularize problems involving reconstruction from undersampled or incomplete data. Superiorization is a recently proposed heuristic which provides an automatic procedure to "superiorize" an iterative reconstruction algorithm with respect to a chosen objective function, such as TV. Unde…
▽ More
Recent work in CT imaging has seen increased interest in the use of total variation (TV) and related penalties to regularize problems involving reconstruction from undersampled or incomplete data. Superiorization is a recently proposed heuristic which provides an automatic procedure to "superiorize" an iterative reconstruction algorithm with respect to a chosen objective function, such as TV. Under certain conditions, the superiorized algorithm is guaranteed to find a solution that is as satisfactory as any found by the original algorithm with respect to satisfying the constraints of the problem; this solution is also expected to be superior with respect to the chosen objective.
Most work on superiorization has used reconstruction algorithms which assume a linear measurement model, which in the case of CT corresponds to data generated from a monoenergetic X-ray beam. Many CT systems generate X-rays from a polyenergetic spectrum, however, in which the measured data represent an integral of object attenuation over all energies in the spectrum. This inconsistency with the linear model produces the well-known beam hardening artifacts, which impair analysis of CT images.
In this work we superiorize an iterative algorithm for reconstruction from polyenergetic data, using both TV and an anisotropic TV (ATV) penalty. We apply the superiorized algorithm in numerical phantom experiments modeling both sparse-view and limited-angle scenarios. In our experiments, the superiorized algorithm successfully finds solutions which are as constraints-compatible as those found by the original algorithm, with significantly reduced TV and ATV values. The superiorized algorithm thus produces images with greatly reduced sparse-view and limited angle artifacts, which are also largely free of the beam hardening artifacts that would be present if a superiorized version of a monoenergetic algorithm were used.
△ Less
Submitted 12 July, 2017; v1 submitted 10 January, 2017;
originally announced January 2017.
-
A Closer Look At Differential Evolution For The Optimal Well Placement Problem
Authors:
Grazieli L. C. Carosio,
Thomas D. Humphries,
Ronald D. Haynes,
Colin G. Farquharson
Abstract:
Energy demand has increased considerably with the growth of world population, increasing the interest in the hydrocarbon reservoir management problem. Companies are concerned with maximizing oil recovery while minimizing capital investment and operational costs. A first step in solving this problem is to consider optimal well placement. In this work, we investigate the Differential Evolution (DE)…
▽ More
Energy demand has increased considerably with the growth of world population, increasing the interest in the hydrocarbon reservoir management problem. Companies are concerned with maximizing oil recovery while minimizing capital investment and operational costs. A first step in solving this problem is to consider optimal well placement. In this work, we investigate the Differential Evolution (DE) optimization method, using distinct configurations with respect to population size, mutation factor, crossover probability, and mutation strategy, to solve the well placement problem. By assuming a bare control procedure, one optimizes the parameters representing positions of injection and production wells. The Tenth SPE Comparative Solution Project and MATLAB Reservoir Simulation Toolbox (MRST) are the benchmark dataset and simulator used, respectively. The goal is to evaluate the performance of DE in solving this important real-world problem. We show that DE can find high-quality solutions, when compared with a reference from the literature, and a preliminary analysis on the results of multiple experiments gives useful information on how DE configuration impacts its performance.
△ Less
Submitted 26 April, 2015;
originally announced April 2015.
-
Convergence analysis of a polyenergetic SART algorithm
Authors:
Thomas Humphries
Abstract:
Purpose: We analyze a recently proposed polyenergetic version of the simultaneous algebraic reconstruction technique (SART). This algorithm, denoted pSART, replaces the monoenergetic forward projection operation used by SART with a post-log, polyenergetic forward projection, while leaving the rest of the algorithm unchanged. While the proposed algorithm provides good results empirically, convergen…
▽ More
Purpose: We analyze a recently proposed polyenergetic version of the simultaneous algebraic reconstruction technique (SART). This algorithm, denoted pSART, replaces the monoenergetic forward projection operation used by SART with a post-log, polyenergetic forward projection, while leaving the rest of the algorithm unchanged. While the proposed algorithm provides good results empirically, convergence of the algorithm was not established mathematically in the original paper.
Methods: We analyze pSART as a nonlinear fixed point iteration by explicitly computing the Jacobian of the iteration. A necessary condition for convergence is that the spectral radius of the Jacobian, evaluated at the fixed point, is less than one. A short proof of convergence for SART is also provided as a basis for comparison.
Results: We show that the pSART algorithm is not guaranteed to converge, in general. The Jacobian of the iteration depends on several factors, including the system matrix and how one models the energy dependence of the linear attenuation coefficient. We provide a simple numerical example that shows that the spectral radius of the Jacobian matrix is not guaranteed to be less than one. A second set of numerical experiments using realistic CT system matrices, however, indicates that conditions for convergence are likely to be satisfied in practice.
Conclusion: Although pSART is not mathematically guaranteed to converge, our numerical experiments indicate that it will tend to converge at roughly the same rate as SART for system matrices of the type encountered in CT imaging. Thus we conclude that the algorithm is still a useful method for reconstruction of polyenergetic CT data.
△ Less
Submitted 12 May, 2015; v1 submitted 6 January, 2015;
originally announced January 2015.
-
Joint optimization of well placement and control for nonconventional well types
Authors:
Thomas D. Humphries,
Ronald D. Haynes
Abstract:
Optimal well placement and optimal well control are two important areas of study in oilfield development. Although the two problems differ in several respects, both are important considerations in optimizing total oilfield production, and so recent work in the field has considered the problem of addressing both problems jointly. Two general approaches to addressing the joint problem are a simultan…
▽ More
Optimal well placement and optimal well control are two important areas of study in oilfield development. Although the two problems differ in several respects, both are important considerations in optimizing total oilfield production, and so recent work in the field has considered the problem of addressing both problems jointly. Two general approaches to addressing the joint problem are a simultaneous approach, where all parameters are optimized at the same time, or a sequential approach, where a distinction between placement and control parameters is maintained by separating the optimization problem into two (or more) stages, some of which consider only a subset of the total number of variables. This latter approach divides the problem into smaller ones which are easier to solve, but may not explore search space as fully as a simultaneous approach.
In this paper we combine a stochastic global algorithm (Particle Swarm Optimization) and a local search (Mesh Adaptive Direct Search) to compare several simultaneous and sequential approaches to the joint placement and control problem. In particular, we study how increasing the complexity of well models (requiring more variables to describe the well's location and path) affects the respective performances of the two approaches. The results of several experiments with synthetic reservoir models suggest that the sequential approaches are better able to deal with increasingly complex well parameterizations than the simultaneous approaches.
△ Less
Submitted 10 October, 2014; v1 submitted 15 September, 2014;
originally announced September 2014.
-
Efficient Optimization of the Likelihood Function in Gaussian Process Modelling
Authors:
Andrew Butler,
Thomas D. Humphries,
Pritam Ranjan,
Ronald D. Haynes
Abstract:
Gaussian Process (GP) models are popular statistical surrogates used for emulating computationally expensive computer simulators. The quality of a GP model fit can be assessed by a goodness of fit measure based on optimized likelihood. Finding the global maximum of the likelihood function for a GP model is typically very challenging as the likelihood surface often has multiple local optima, and an…
▽ More
Gaussian Process (GP) models are popular statistical surrogates used for emulating computationally expensive computer simulators. The quality of a GP model fit can be assessed by a goodness of fit measure based on optimized likelihood. Finding the global maximum of the likelihood function for a GP model is typically very challenging as the likelihood surface often has multiple local optima, and an explicit expression for the gradient of the likelihood function is typically unavailable. Previous methods for optimizing the likelihood function (e.g. MacDonald et al. (2013)) have proven to be robust and accurate, though relatively inefficient. We propose several likelihood optimization techniques, including two modified multi-start local search techniques, based on the method implemented by MacDonald et al. (2013), that are equally as reliable, and significantly more efficient. A hybridization of the global search algorithm Dividing Rectangles (DIRECT) with the local optimization algorithm BFGS provides a comparable GP model quality for a fraction of the computational cost, and is the preferred optimization technique when computational resources are limited. We use several test functions and a real application from an oil reservoir simulation to test and compare the performance of the proposed methods with the one implemented by MacDonald et al. (2013) in the R library GPfit. The proposed method is implemented in a Matlab package, GPMfit.
△ Less
Submitted 26 September, 2013;
originally announced September 2013.