-
Hawk: Accurate and Fast Privacy-Preserving Machine Learning Using Secure Lookup Table Computation
Authors:
Hamza Saleem,
Amir Ziashahabi,
Muhammad Naveed,
Salman Avestimehr
Abstract:
Training machine learning models on data from multiple entities without direct data sharing can unlock applications otherwise hindered by business, legal, or ethical constraints. In this work, we design and implement new privacy-preserving machine learning protocols for logistic regression and neural network models. We adopt a two-server model where data owners secret-share their data between two…
▽ More
Training machine learning models on data from multiple entities without direct data sharing can unlock applications otherwise hindered by business, legal, or ethical constraints. In this work, we design and implement new privacy-preserving machine learning protocols for logistic regression and neural network models. We adopt a two-server model where data owners secret-share their data between two servers that train and evaluate the model on the joint data. A significant source of inefficiency and inaccuracy in existing methods arises from using Yao's garbled circuits to compute non-linear activation functions. We propose new methods for computing non-linear functions based on secret-shared lookup tables, offering both computational efficiency and improved accuracy.
Beyond introducing leakage-free techniques, we initiate the exploration of relaxed security measures for privacy-preserving machine learning. Instead of claiming that the servers gain no knowledge during the computation, we contend that while some information is revealed about access patterns to lookup tables, it maintains epsilon-dX-privacy. Leveraging this relaxation significantly reduces the computational resources needed for training. We present new cryptographic protocols tailored to this relaxed security paradigm and define and analyze the leakage. Our evaluations show that our logistic regression protocol is up to 9x faster, and the neural network training is up to 688x faster than SecureML. Notably, our neural network achieves an accuracy of 96.6% on MNIST in 15 epochs, outperforming prior benchmarks that capped at 93.4% using the same architecture.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Towards Code Generation from BDD Test Case Specifications: A Vision
Authors:
Leon Chemnitz,
David Reichenbach,
Hani Aldebes,
Mariam Naveed,
Krishna Narasimhan,
Mira Mezini
Abstract:
Automatic code generation has recently attracted large attention and is becoming more significant to the software development process. Solutions based on Machine Learning and Artificial Intelligence are being used to increase human and software efficiency in potent and innovative ways. In this paper, we aim to leverage these developments and introduce a novel approach to generating frontend compon…
▽ More
Automatic code generation has recently attracted large attention and is becoming more significant to the software development process. Solutions based on Machine Learning and Artificial Intelligence are being used to increase human and software efficiency in potent and innovative ways. In this paper, we aim to leverage these developments and introduce a novel approach to generating frontend component code for the popular Angular framework. We propose to do this using behavior-driven development test specifications as input to a transformer-based machine learning model. Our approach aims to drastically reduce the development time needed for web applications while potentially increasing software quality and introducing new research ideas toward automatic code generation.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Secure & Private Federated Neuroimaging
Authors:
Dimitris Stripelis,
Umang Gupta,
Hamza Saleem,
Nikhil Dhinagar,
Tanmay Ghai,
Rafael Chrysovalantis Anastasiou,
Armaghan Asghar,
Greg Ver Steeg,
Srivatsan Ravi,
Muhammad Naveed,
Paul M. Thompson,
Jose Luis Ambite
Abstract:
The amount of biomedical data continues to grow rapidly. However, collecting data from multiple sites for joint analysis remains challenging due to security, privacy, and regulatory concerns. To overcome this challenge, we use Federated Learning, which enables distributed training of neural network models over multiple data sources without sharing data. Each site trains the neural network over its…
▽ More
The amount of biomedical data continues to grow rapidly. However, collecting data from multiple sites for joint analysis remains challenging due to security, privacy, and regulatory concerns. To overcome this challenge, we use Federated Learning, which enables distributed training of neural network models over multiple data sources without sharing data. Each site trains the neural network over its private data for some time, then shares the neural network parameters (i.e., weights, gradients) with a Federation Controller, which in turn aggregates the local models, sends the resulting community model back to each site, and the process repeats. Our Federated Learning architecture, MetisFL, provides strong security and privacy. First, sample data never leaves a site. Second, neural network parameters are encrypted before transmission and the global neural model is computed under fully-homomorphic encryption. Finally, we use information-theoretic methods to limit information leakage from the neural model to prevent a curious site from performing model inversion or membership attacks. We present a thorough evaluation of the performance of secure, private federated learning in neuroimaging tasks, including for predicting Alzheimer's disease and estimating BrainAGE from magnetic resonance imaging (MRI) studies, in challenging, heterogeneous federated environments where sites have different amounts of data and statistical distributions.
△ Less
Submitted 28 August, 2023; v1 submitted 10 May, 2022;
originally announced May 2022.
-
Blizzard: a Distributed Consensus Protocol for Mobile Devices
Authors:
Mehrdad Kiamari,
Bhaskar Krishnamachari,
Muhammad Naveed,
Seokgu Yun
Abstract:
We present Blizzard, a Byzantine Fault Tolerant (BFT) distributed ledger protocol that is aimed at making mobile devices first-class citizens in the consensus process. Blizzard introduces a novel two-tier architecture by having the mobile nodes communicate through online brokers, and includes a decentralized matching scheme to ensure each node connects to a certain number of random brokers. Throug…
▽ More
We present Blizzard, a Byzantine Fault Tolerant (BFT) distributed ledger protocol that is aimed at making mobile devices first-class citizens in the consensus process. Blizzard introduces a novel two-tier architecture by having the mobile nodes communicate through online brokers, and includes a decentralized matching scheme to ensure each node connects to a certain number of random brokers. Through mathematical analysis, we derive a guaranteed safety region (i.e. the set of ratios of malicious nodes and malicious brokers for which the safety is assured) for the Blizzard protocol. Liveness is shown as well. We analyze the performance of Blizzard in terms of its throughput, latency and message complexity. Through experiments based on a software implementation, we show that Blizzard is capable of throughput on the order of several thousand transactions per second per shard, and sub-second confirmation latency.
△ Less
Submitted 6 January, 2022;
originally announced January 2022.
-
Characterizing Improper Input Validation Vulnerabilities of Mobile Crowdsourcing Services
Authors:
Sojhal Ismail Khan,
Dominika Woszczyk,
Chengzeng You,
Soteris Demetriou,
Muhammad Naveed
Abstract:
Mobile crowdsourcing services (MCS), enable fast and economical data acquisition at scale and find applications in a variety of domains. Prior work has shown that Foursquare and Waze (a location-based and a navigation MCS) are vulnerable to different kinds of data poisoning attacks. Such attacks can be upsetting and even dangerous especially when they are used to inject improper inputs to mislead…
▽ More
Mobile crowdsourcing services (MCS), enable fast and economical data acquisition at scale and find applications in a variety of domains. Prior work has shown that Foursquare and Waze (a location-based and a navigation MCS) are vulnerable to different kinds of data poisoning attacks. Such attacks can be upsetting and even dangerous especially when they are used to inject improper inputs to mislead users. However, to date, there is no comprehensive study on the extent of improper input validation (IIV) vulnerabilities and the feasibility of their exploits in MCSs across domains. In this work, we leverage the fact that MCS interface with their participants through mobile apps to design tools and new methodologies embodied in an end-to-end feedback-driven analysis framework which we use to study 10 popular and previously unexplored services in five different domains. Using our framework we send tens of thousands of API requests with automatically generated input values to characterize their IIV attack surface. Alarmingly, we found that most of them (8/10) suffer from grave IIV vulnerabilities which allow an adversary to launch data poisoning attacks at scale: 7400 spoofed API requests were successful in faking online posts for robberies, gunshots, and other dangerous incidents, faking fitness activities with supernatural speeds and distances among many others. Lastly, we discuss easy to implement and deploy mitigation strategies which can greatly reduce the IIV attack surface and argue for their use as a necessary complementary measure working toward trustworthy mobile crowdsourcing services.
△ Less
Submitted 18 October, 2021; v1 submitted 16 October, 2021;
originally announced October 2021.
-
Secure Neuroimaging Analysis using Federated Learning with Homomorphic Encryption
Authors:
Dimitris Stripelis,
Hamza Saleem,
Tanmay Ghai,
Nikhil Dhinagar,
Umang Gupta,
Chrysovalantis Anastasiou,
Greg Ver Steeg,
Srivatsan Ravi,
Muhammad Naveed,
Paul M. Thompson,
Jose Luis Ambite
Abstract:
Federated learning (FL) enables distributed computation of machine learning models over various disparate, remote data sources, without requiring to transfer any individual data to a centralized location. This results in an improved generalizability of models and efficient scaling of computation as more sources and larger datasets are added to the federation. Nevertheless, recent membership attack…
▽ More
Federated learning (FL) enables distributed computation of machine learning models over various disparate, remote data sources, without requiring to transfer any individual data to a centralized location. This results in an improved generalizability of models and efficient scaling of computation as more sources and larger datasets are added to the federation. Nevertheless, recent membership attacks show that private or sensitive personal data can sometimes be leaked or inferred when model parameters or summary statistics are shared with a central site, requiring improved security solutions. In this work, we propose a framework for secure FL using fully-homomorphic encryption (FHE). Specifically, we use the CKKS construction, an approximate, floating point compatible scheme that benefits from ciphertext packing and rescaling. In our evaluation on large-scale brain MRI datasets, we use our proposed secure FL framework to train a deep learning model to predict a person's age from distributed MRI scans, a common benchmarking task, and demonstrate that there is no degradation in the learning performance between the encrypted and non-encrypted federated models.
△ Less
Submitted 9 November, 2021; v1 submitted 7 August, 2021;
originally announced August 2021.
-
ARC: A Vision-based Automatic Retail Checkout System
Authors:
Syed Talha Bukhari,
Abdul Wahab Amin,
Muhammad Abdullah Naveed,
Muhammad Rzi Abbas
Abstract:
Retail checkout systems employed at supermarkets primarily rely on barcode scanners, with some utilizing QR codes, to identify the items being purchased. These methods are time-consuming in practice, require a certain level of human supervision, and involve waiting in long queues. In this regard, we propose a system, that we call ARC, which aims at making the process of check-out at retail store c…
▽ More
Retail checkout systems employed at supermarkets primarily rely on barcode scanners, with some utilizing QR codes, to identify the items being purchased. These methods are time-consuming in practice, require a certain level of human supervision, and involve waiting in long queues. In this regard, we propose a system, that we call ARC, which aims at making the process of check-out at retail store counters faster, autonomous, and more convenient, while reducing dependency on a human operator. The approach makes use of a computer vision-based system, with a Convolutional Neural Network at its core, which scans objects placed beneath a webcam for identification. To evaluate the proposed system, we curated an image dataset of one-hundred local retail items of various categories. Within the given assumptions and considerations, the system achieves a reasonable test-time accuracy, pointing towards an ambitious future for the proposed setup. The project code and the dataset are made publicly available.
△ Less
Submitted 17 May, 2021; v1 submitted 6 April, 2021;
originally announced April 2021.
-
Exacerbating Algorithmic Bias through Fairness Attacks
Authors:
Ninareh Mehrabi,
Muhammad Naveed,
Fred Morstatter,
Aram Galstyan
Abstract:
Algorithmic fairness has attracted significant attention in recent years, with many quantitative measures suggested for characterizing the fairness of different machine learning algorithms. Despite this interest, the robustness of those fairness measures with respect to an intentional adversarial attack has not been properly addressed. Indeed, most adversarial machine learning has focused on the i…
▽ More
Algorithmic fairness has attracted significant attention in recent years, with many quantitative measures suggested for characterizing the fairness of different machine learning algorithms. Despite this interest, the robustness of those fairness measures with respect to an intentional adversarial attack has not been properly addressed. Indeed, most adversarial machine learning has focused on the impact of malicious attacks on the accuracy of the system, without any regard to the system's fairness. We propose new types of data poisoning attacks where an adversary intentionally targets the fairness of a system. Specifically, we propose two families of attacks that target fairness measures. In the anchoring attack, we skew the decision boundary by placing poisoned points near specific target points to bias the outcome. In the influence attack on fairness, we aim to maximize the covariance between the sensitive attributes and the decision outcome and affect the fairness of the model. We conduct extensive experiments that indicate the effectiveness of our proposed attacks.
△ Less
Submitted 15 December, 2020;
originally announced December 2020.
-
A Privacy-Preserving, Accountable and Spam-Resilient Geo-Marketplace
Authors:
Kien Nguyen,
Gabriel Ghinita,
Muhammad Naveed,
Cyrus Shahabi
Abstract:
Mobile devices with rich features can record videos, traffic parameters or air quality readings along user trajectories. Although such data may be valuable, users are seldom rewarded for collecting them. Emerging digital marketplaces allow owners to advertise their data to interested buyers. We focus on geo-marketplaces, where buyers search data based on geo-tags. Such marketplaces present signifi…
▽ More
Mobile devices with rich features can record videos, traffic parameters or air quality readings along user trajectories. Although such data may be valuable, users are seldom rewarded for collecting them. Emerging digital marketplaces allow owners to advertise their data to interested buyers. We focus on geo-marketplaces, where buyers search data based on geo-tags. Such marketplaces present significant challenges. First, if owners upload data with revealed geo-tags, they expose themselves to serious privacy risks. Second, owners must be accountable for advertised data, and must not be allowed to subsequently alter geo-tags. Third, such a system may be vulnerable to intensive spam activities, where dishonest owners flood the system with fake advertisements. We propose a geo-marketplace that addresses all these concerns. We employ searchable encryption, digital commitments, and blockchain to protect the location privacy of owners while at the same time incorporating accountability and spam-resilience mechanisms. We implement a prototype with two alternative designs that obtain distinct trade-offs between trust assumptions and performance. Our experiments on real location data show that one can achieve the above design goals with practical performance and reasonable financial overhead.
△ Less
Submitted 30 September, 2019; v1 submitted 31 August, 2019;
originally announced September 2019.
-
Resilience of Social Networks Under Different Attack Strategies
Authors:
Mohammad Ayub Latif,
Muhammad Naveed,
Faraz Zaidi
Abstract:
Recent years have seen the world become a closely connected society with the emergence of different types of social networks. Online social networks have provided a way to bridge long distances and establish numerous communication channels which were not possible earlier. These networks exhibit interesting behavior under intentional attacks and random failures where different structural properties…
▽ More
Recent years have seen the world become a closely connected society with the emergence of different types of social networks. Online social networks have provided a way to bridge long distances and establish numerous communication channels which were not possible earlier. These networks exhibit interesting behavior under intentional attacks and random failures where different structural properties influence the resilience in different ways.
In this paper, we perform two sets of experiments and draw conclusions from the results pertaining to the resilience of social networks. The first experiment performs a comparative analysis of four different classes of networks namely small world networks, scale free networks, small world-scale free networks and random networks with four semantically different social networks under different attack strategies. The second experiment compares the resilience of these semantically different social networks under different attack strategies. Empirical analysis reveals interesting behavior of different classes of networks with different attack strategies.
△ Less
Submitted 31 October, 2014;
originally announced October 2014.
-
Privacy in the Genomic Era
Authors:
Muhammad Naveed,
Erman Ayday,
Ellen W. Clayton,
Jacques Fellay,
Carl A. Gunter,
Jean-Pierre Hubaux,
Bradley A. Malin,
XiaoFeng Wang
Abstract:
Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has…
▽ More
Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with traits and certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward.
△ Less
Submitted 17 June, 2015; v1 submitted 8 May, 2014;
originally announced May 2014.