-
How to Privately Tune Hyperparameters in Federated Learning? Insights from a Benchmark Study
Authors:
Natalija Mitic,
Apostolos Pyrgelis,
Sinem Sav
Abstract:
In this paper, we address the problem of privacy-preserving hyperparameter (HP) tuning for cross-silo federated learning (FL). We first perform a comprehensive measurement study that benchmarks various HP strategies suitable for FL. Our benchmarks show that the optimal parameters of the FL server, e.g., the learning rate, can be accurately and efficiently tuned based on the HPs found by each clien…
▽ More
In this paper, we address the problem of privacy-preserving hyperparameter (HP) tuning for cross-silo federated learning (FL). We first perform a comprehensive measurement study that benchmarks various HP strategies suitable for FL. Our benchmarks show that the optimal parameters of the FL server, e.g., the learning rate, can be accurately and efficiently tuned based on the HPs found by each client on its local data. We demonstrate that HP averaging is suitable for iid settings, while density-based clustering can uncover the optimal set of parameters in non-iid ones. Then, to prevent information leakage from the exchange of the clients' local HPs, we design and implement PrivTuna, a novel framework for privacy-preserving HP tuning using multiparty homomorphic encryption. We use PrivTuna to implement privacy-preserving federated averaging and density-based clustering, and we experimentally evaluate its performance demonstrating its computation/communication efficiency and its precision in tuning hyperparameters.
△ Less
Submitted 22 May, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
P3LI5: Practical and Confidential Lawful Interception on the 5G Core
Authors:
Francesco Intoci,
Julian Sturm,
Daniel Fraunholz,
Apostolos Pyrgelis,
Colin Barschel
Abstract:
Lawful Interception (LI) is a legal obligation of Communication Service Providers (CSPs) to provide interception capabilities to Law Enforcement Agencies (LEAs) in order to gain insightful data from network communications for criminal proceedings, e.g., network identifiers for tracking suspects. With the privacy-enhancements of network identifiers in the 5th generation of mobile networks (5G), LEA…
▽ More
Lawful Interception (LI) is a legal obligation of Communication Service Providers (CSPs) to provide interception capabilities to Law Enforcement Agencies (LEAs) in order to gain insightful data from network communications for criminal proceedings, e.g., network identifiers for tracking suspects. With the privacy-enhancements of network identifiers in the 5th generation of mobile networks (5G), LEAs need to interact with CSPs for network identifier resolution. This raises new privacy issues, as untrusted CSPs are able to infer sensitive information about ongoing investigations, e.g., the identities of their subscribers under suspicion. In this work, we propose P3LI5, a novel system that enables LEAs to privately query CSPs for network identifier resolution leveraging on an information retrieval protocol, SparseWPIR, that is based on private information retrieval and its weakly private version. As such, P3LI5 can be adapted to various operational scenarios with different confidentiality or latency requirements, by selectively allowing a bounded information leakage for improved performance. We implement P3LI5 on the 5G LI infrastructure using well known open-source projects and demonstrate its scalability to large databases while retaining low latency. To the best of our knowledge, P3LI5 is the first proposal for addressing the privacy issues raised by the mandatory requirement for LI on the 5G core network.
△ Less
Submitted 27 August, 2023;
originally announced August 2023.
-
slytHErin: An Agile Framework for Encrypted Deep Neural Network Inference
Authors:
Francesco Intoci,
Sinem Sav,
Apostolos Pyrgelis,
Jean-Philippe Bossuat,
Juan Ramon Troncoso-Pastoriza,
Jean-Pierre Hubaux
Abstract:
Homomorphic encryption (HE), which allows computations on encrypted data, is an enabling technology for confidential cloud computing. One notable example is privacy-preserving Prediction-as-a-Service (PaaS), where machine-learning predictions are computed on encrypted data. However, develo** HE-based solutions for encrypted PaaS is a tedious task which requires a careful design that predominantl…
▽ More
Homomorphic encryption (HE), which allows computations on encrypted data, is an enabling technology for confidential cloud computing. One notable example is privacy-preserving Prediction-as-a-Service (PaaS), where machine-learning predictions are computed on encrypted data. However, develo** HE-based solutions for encrypted PaaS is a tedious task which requires a careful design that predominantly depends on the deployment scenario and on leveraging the characteristics of modern HE schemes. Prior works on privacy-preserving PaaS focus solely on protecting the confidentiality of the client data uploaded to a remote model provider, e.g., a cloud offering a prediction API, and assume (or take advantage of the fact) that the model is held in plaintext. Furthermore, their aim is to either minimize the latency of the service by processing one sample at a time, or to maximize the number of samples processed per second, while processing a fixed (large) number of samples. In this work, we present slytHErin, an agile framework that enables privacy-preserving PaaS beyond the application scenarios considered in prior works. Thanks to its hybrid design leveraging HE and its multiparty variant (MHE), slytHErin enables novel PaaS scenarios by encrypting the data, the model or both. Moreover, slytHErin features a flexible input data packing approach that allows processing a batch of an arbitrary number of samples, and several computation optimizations that are model-and-setting-agnostic. slytHErin is implemented in Go and it allows end-users to perform encrypted PaaS on custom deep learning models comprising fully-connected, convolutional, and pooling layers, in a few lines of code and without having to worry about the cumbersome implementation and optimization concerns inherent to HE.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Scalable and Privacy-Preserving Federated Principal Component Analysis
Authors:
David Froelicher,
Hyunghoon Cho,
Manaswitha Edupalli,
Joao Sa Sousa,
Jean-Philippe Bossuat,
Apostolos Pyrgelis,
Juan R. Troncoso-Pastoriza,
Bonnie Berger,
Jean-Pierre Hubaux
Abstract:
Principal component analysis (PCA) is an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermedia…
▽ More
Principal component analysis (PCA) is an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermediate results in a passive-adversary model with up to all-but-one colluding parties. SF-PCA jointly leverages multiparty homomorphic encryption, interactive protocols, and edge computing to efficiently interleave computations on local cleartext data with operations on collectively encrypted data. SF-PCA obtains results as accurate as non-secure centralized solutions, independently of the data distribution among the parties. It scales linearly or better with the dataset dimensions and with the number of data providers. SF-PCA is more precise than existing approaches that approximate the solution by combining local analysis results, and between 3x and 250x faster than privacy-preserving alternatives based solely on secure multiparty computation or homomorphic encryption. Our work demonstrates the practical applicability of secure and federated PCA on private distributed datasets.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
Verifiable Encodings for Secure Homomorphic Analytics
Authors:
Sylvain Chatel,
Christian Knabenhans,
Apostolos Pyrgelis,
Carmela Troncoso,
Jean-Pierre Hubaux
Abstract:
Homomorphic encryption, which enables the execution of arithmetic operations directly on ciphertexts, is a promising solution for protecting privacy of cloud-delegated computations on sensitive data. However, the correctness of the computation result is not ensured. We propose two error detection encodings and build authenticators that enable practical client-verification of cloud-based homomorphi…
▽ More
Homomorphic encryption, which enables the execution of arithmetic operations directly on ciphertexts, is a promising solution for protecting privacy of cloud-delegated computations on sensitive data. However, the correctness of the computation result is not ensured. We propose two error detection encodings and build authenticators that enable practical client-verification of cloud-based homomorphic computations under different trade-offs and without compromising on the features of the encryption algorithm. Our authenticators operate on top of trending ring learning with errors based fully homomorphic encryption schemes over the integers. We implement our solution in VERITAS, a ready-to-use system for verification of outsourced computations executed over encrypted data. We show that contrary to prior work VERITAS supports verification of any homomorphic operation and we demonstrate its practicality for various applications, such as ride-hailing, genomic-data analysis, encrypted search, and machine-learning training and inference.
△ Less
Submitted 4 June, 2024; v1 submitted 28 July, 2022;
originally announced July 2022.
-
Privacy-Preserving Federated Recurrent Neural Networks
Authors:
Sinem Sav,
Abdulrahman Diaa,
Apostolos Pyrgelis,
Jean-Philippe Bossuat,
Jean-Pierre Hubaux
Abstract:
We present RHODE, a novel system that enables privacy-preserving training of and prediction on Recurrent Neural Networks (RNNs) in a cross-silo federated learning setting by relying on multiparty homomorphic encryption. RHODE preserves the confidentiality of the training data, the model, and the prediction data; and it mitigates federated learning attacks that target the gradients under a passive-…
▽ More
We present RHODE, a novel system that enables privacy-preserving training of and prediction on Recurrent Neural Networks (RNNs) in a cross-silo federated learning setting by relying on multiparty homomorphic encryption. RHODE preserves the confidentiality of the training data, the model, and the prediction data; and it mitigates federated learning attacks that target the gradients under a passive-adversary threat model. We propose a packing scheme, multi-dimensional packing, for a better utilization of Single Instruction, Multiple Data (SIMD) operations under encryption. With multi-dimensional packing, RHODE enables the efficient processing, in parallel, of a batch of samples. To avoid the exploding gradients problem, RHODE provides several clip** approximations for performing gradient clip** under encryption. We experimentally show that the model performance with RHODE remains similar to non-secure solutions both for homogeneous and heterogeneous data distribution among the data holders. Our experimental evaluation shows that RHODE scales linearly with the number of data holders and the number of timesteps, sub-linearly and sub-quadratically with the number of features and the number of hidden units of RNNs, respectively. To the best of our knowledge, RHODE is the first system that provides the building blocks for the training of RNNs and its variants, under encryption in a federated learning setting.
△ Less
Submitted 3 May, 2023; v1 submitted 28 July, 2022;
originally announced July 2022.
-
Every Byte Matters: Traffic Analysis of Bluetooth Wearable Devices
Authors:
Ludovic Barman,
Alexandre Dumur,
Apostolos Pyrgelis,
Jean-Pierre Hubaux
Abstract:
Wearable devices such as smartwatches, fitness trackers, and blood-pressure monitors process, store, and communicate sensitive and personal information related to the health, life-style, habits and interests of the wearer. This data is exchanged with a companion app running on a smartphone over a Bluetooth connection. In this work, we investigate what can be inferred from the metadata (such as the…
▽ More
Wearable devices such as smartwatches, fitness trackers, and blood-pressure monitors process, store, and communicate sensitive and personal information related to the health, life-style, habits and interests of the wearer. This data is exchanged with a companion app running on a smartphone over a Bluetooth connection. In this work, we investigate what can be inferred from the metadata (such as the packet timings and sizes) of encrypted Bluetooth communications between a wearable device and its connected smartphone. We show that a passive eavesdropper can use traffic-analysis attacks to accurately recognize (a) communicating devices, even without having access to the MAC address, (b) human actions (e.g., monitoring heart rate, exercising) performed on wearable devices ranging from fitness trackers to smartwatches, (c) the mere opening of specific applications on a Wear OS smartwatch (e.g., the opening of a medical app, which can immediately reveal a condition of the wearer), (d) fine-grained actions (e.g., recording an insulin injection) within a specific application that helps diabetic users to monitor their condition, and (e) the profile and habits of the wearer by continuously monitoring her traffic over an extended period. We run traffic-analysis attacks by collecting a dataset of Bluetooth traces of multiple wearable devices, by designing features based on packet sizes and timings, and by using machine learning to classify the encrypted traffic to actions performed by the wearer. Then, we explore standard defense strategies; we show that these defenses do not provide sufficient protection against our attacks and introduce significant costs. Our research highlights the need to rethink how applications exchange sensitive information over Bluetooth, to minimize unnecessary data exchanges, and to design new defenses against traffic-analysis tailored to the wearable setting.
△ Less
Submitted 24 May, 2021;
originally announced May 2021.
-
SoK: Privacy-Preserving Collaborative Tree-based Model Learning
Authors:
Sylvain Chatel,
Apostolos Pyrgelis,
Juan Ramon Troncoso-Pastoriza,
Jean-Pierre Hubaux
Abstract:
Tree-based models are among the most efficient machine learning techniques for data mining nowadays due to their accuracy, interpretability, and simplicity. The recent orthogonal needs for more data and privacy protection call for collaborative privacy-preserving solutions. In this work, we survey the literature on distributed and privacy-preserving training of tree-based models and we systematize…
▽ More
Tree-based models are among the most efficient machine learning techniques for data mining nowadays due to their accuracy, interpretability, and simplicity. The recent orthogonal needs for more data and privacy protection call for collaborative privacy-preserving solutions. In this work, we survey the literature on distributed and privacy-preserving training of tree-based models and we systematize its knowledge based on four axes: the learning algorithm, the collaborative model, the protection mechanism, and the threat model. We use this to identify the strengths and limitations of these works and provide for the first time a framework analyzing the information leakage occurring in distributed tree-based model learning.
△ Less
Submitted 18 June, 2021; v1 submitted 16 March, 2021;
originally announced March 2021.
-
Privacy-Preserving and Efficient Verification of the Outcome in Genome-Wide Association Studies
Authors:
Anisa Halimi,
Leonard Dervishi,
Erman Ayday,
Apostolos Pyrgelis,
Juan Ramon Troncoso-Pastoriza,
Jean-Pierre Hubaux,
Xiaoqian Jiang,
Jaideep Vaidya
Abstract:
Providing provenance in scientific workflows is essential for reproducibility and auditability purposes. Workflow systems model and record provenance describing the steps performed to obtain the final results of a computation. In this work, we propose a framework that verifies the correctness of the statistical test results that are conducted by a researcher while protecting individuals' privacy i…
▽ More
Providing provenance in scientific workflows is essential for reproducibility and auditability purposes. Workflow systems model and record provenance describing the steps performed to obtain the final results of a computation. In this work, we propose a framework that verifies the correctness of the statistical test results that are conducted by a researcher while protecting individuals' privacy in the researcher's dataset. The researcher publishes the workflow of the conducted study, its output, and associated metadata. They keep the research dataset private while providing, as part of the metadata, a partial noisy dataset (that achieves local differential privacy). To check the correctness of the workflow output, a verifier makes use of the workflow, its metadata, and results of another statistical study (using publicly available datasets) to distinguish between correct statistics and incorrect ones. We use case the proposed framework in the genome-wide association studies (GWAS), in which the goal is to identify highly associated point mutations (variants) with a given phenotype. For evaluation, we use real genomic data and show that the correctness of the workflow output can be verified with high accuracy even when the aggregate statistics of a small number of variants are provided. We also quantify the privacy leakage due to the provided workflow and its associated metadata in the GWAS use-case and show that the additional privacy risk due to the provided metadata does not increase the existing privacy risk due to sharing of the research results. Thus, our results show that the workflow output (i.e., research results) can be verified with high confidence in a privacy-preserving way. We believe that this work will be a valuable step towards providing provenance in a privacy-preserving way while providing guarantees to the users about the correctness of the results.
△ Less
Submitted 7 November, 2022; v1 submitted 21 January, 2021;
originally announced January 2021.
-
POSEIDON: Privacy-Preserving Federated Neural Network Learning
Authors:
Sinem Sav,
Apostolos Pyrgelis,
Juan R. Troncoso-Pastoriza,
David Froelicher,
Jean-Philippe Bossuat,
Joao Sa Sousa,
Jean-Pierre Hubaux
Abstract:
In this paper, we address the problem of privacy-preserving training and evaluation of neural networks in an $N$-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network training. It employs multiparty lattice-based cryptography to preserve the confidentiality of the training data, the model, and the evaluation…
▽ More
In this paper, we address the problem of privacy-preserving training and evaluation of neural networks in an $N$-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network training. It employs multiparty lattice-based cryptography to preserve the confidentiality of the training data, the model, and the evaluation data, under a passive-adversary model and collusions between up to $N-1$ parties. To efficiently execute the secure backpropagation algorithm for training neural networks, we provide a generic packing approach that enables Single Instruction, Multiple Data (SIMD) operations on encrypted data. We also introduce arbitrary linear transformations within the cryptographic bootstrap** operation, optimizing the costly cryptographic computations over the parties, and we define a constrained optimization problem for choosing the cryptographic parameters. Our experimental results show that POSEIDON achieves accuracy similar to centralized or decentralized non-private approaches and that its computation and communication overhead scales linearly with the number of parties. POSEIDON trains a 3-layer neural network on the MNIST dataset with 784 features and 60K samples distributed among 10 parties in less than 2 hours.
△ Less
Submitted 8 January, 2021; v1 submitted 1 September, 2020;
originally announced September 2020.
-
Privacy and Integrity Preserving Computations with CRISP
Authors:
Sylvain Chatel,
Apostolos Pyrgelis,
Juan R. Troncoso-Pastoriza,
Jean-Pierre Hubaux
Abstract:
In the digital era, users share their personal data with service providers to obtain some utility, e.g., access to high-quality services. Yet, the induced information flows raise privacy and integrity concerns. Consequently, cautious users may want to protect their privacy by minimizing the amount of information they disclose to curious service providers. Service providers are interested in verify…
▽ More
In the digital era, users share their personal data with service providers to obtain some utility, e.g., access to high-quality services. Yet, the induced information flows raise privacy and integrity concerns. Consequently, cautious users may want to protect their privacy by minimizing the amount of information they disclose to curious service providers. Service providers are interested in verifying the integrity of the users' data to improve their services and obtain useful knowledge for their business. In this work, we present a generic solution to the trade-off between privacy, integrity, and utility, by achieving authenticity verification of data that has been encrypted for offloading to service providers. Based on lattice-based homomorphic encryption and commitments, as well as zero-knowledge proofs, our construction enables a service provider to process and reuse third-party signed data in a privacy-friendly manner with integrity guarantees. We evaluate our solution on different use cases such as smart-metering, disease susceptibility, and location-based activity tracking, thus showing its versatility. Our solution achieves broad generality, quantum-resistance, and relaxes some assumptions of state-of-the-art solutions without affecting performance.
△ Less
Submitted 12 January, 2021; v1 submitted 8 July, 2020;
originally announced July 2020.
-
Decentralized Privacy-Preserving Proximity Tracing
Authors:
Carmela Troncoso,
Mathias Payer,
Jean-Pierre Hubaux,
Marcel Salathé,
James Larus,
Edouard Bugnion,
Wouter Lueks,
Theresa Stadler,
Apostolos Pyrgelis,
Daniele Antonioli,
Ludovic Barman,
Sylvain Chatel,
Kenneth Paterson,
Srdjan Čapkun,
David Basin,
Jan Beutel,
Dennis Jackson,
Marc Roeschlin,
Patrick Leu,
Bart Preneel,
Nigel Smart,
Aysajan Abidin,
Seda Gürses,
Michael Veale,
Cas Cremers
, et al. (9 additional authors not shown)
Abstract:
This document describes and analyzes a system for secure and privacy-preserving proximity tracing at large scale. This system, referred to as DP3T, provides a technological foundation to help slow the spread of SARS-CoV-2 by simplifying and accelerating the process of notifying people who might have been exposed to the virus so that they can take appropriate measures to break its transmission chai…
▽ More
This document describes and analyzes a system for secure and privacy-preserving proximity tracing at large scale. This system, referred to as DP3T, provides a technological foundation to help slow the spread of SARS-CoV-2 by simplifying and accelerating the process of notifying people who might have been exposed to the virus so that they can take appropriate measures to break its transmission chain. The system aims to minimise privacy and security risks for individuals and communities and guarantee the highest level of data protection. The goal of our proximity tracing system is to determine who has been in close physical proximity to a COVID-19 positive person and thus exposed to the virus, without revealing the contact's identity or where the contact occurred. To achieve this goal, users run a smartphone app that continually broadcasts an ephemeral, pseudo-random ID representing the user's phone and also records the pseudo-random IDs observed from smartphones in close proximity. When a patient is diagnosed with COVID-19, she can upload pseudo-random IDs previously broadcast from her phone to a central server. Prior to the upload, all data remains exclusively on the user's phone. Other users' apps can use data from the server to locally estimate whether the device's owner was exposed to the virus through close-range physical proximity to a COVID-19 positive person who has uploaded their data. In case the app detects a high risk, it will inform the user.
△ Less
Submitted 25 May, 2020;
originally announced May 2020.
-
Scalable Privacy-Preserving Distributed Learning
Authors:
David Froelicher,
Juan R. Troncoso-Pastoriza,
Apostolos Pyrgelis,
Sinem Sav,
Joao Sa Sousa,
Jean-Philippe Bossuat,
Jean-Pierre Hubaux
Abstract:
In this paper, we address the problem of privacy-preserving distributed learning and the evaluation of machine-learning models by analyzing it in the widespread MapReduce abstraction that we extend with privacy constraints. We design SPINDLE (Scalable Privacy-preservINg Distributed LEarning), the first distributed and privacy-preserving system that covers the complete ML workflow by enabling the e…
▽ More
In this paper, we address the problem of privacy-preserving distributed learning and the evaluation of machine-learning models by analyzing it in the widespread MapReduce abstraction that we extend with privacy constraints. We design SPINDLE (Scalable Privacy-preservINg Distributed LEarning), the first distributed and privacy-preserving system that covers the complete ML workflow by enabling the execution of a cooperative gradient-descent and the evaluation of the obtained model and by preserving data and model confidentiality in a passive-adversary model with up to N-1 colluding parties. SPINDLE uses multiparty homomorphic encryption to execute parallel high-depth computations on encrypted data without significant overhead. We instantiate SPINDLE for the training and evaluation of generalized linear models on distributed datasets and show that it is able to accurately (on par with non-secure centrally-trained models) and efficiently (due to a multi-level parallelization of the computations) train models that require a high number of iterations on large input data with thousands of features, distributed among hundreds of data providers. For instance, it trains a logistic-regression model on a dataset of one million samples with 32 features distributed among 160 data providers in less than three minutes.
△ Less
Submitted 14 July, 2021; v1 submitted 19 May, 2020;
originally announced May 2020.
-
Measuring Membership Privacy on Aggregate Location Time-Series
Authors:
Apostolos Pyrgelis,
Carmela Troncoso,
Emiliano De Cristofaro
Abstract:
While location data is extremely valuable for various applications, disclosing it prompts serious threats to individuals' privacy. To limit such concerns, organizations often provide analysts with aggregate time-series that indicate, e.g., how many people are in a location at a time interval, rather than raw individual traces. In this paper, we perform a measurement study to understand Membership…
▽ More
While location data is extremely valuable for various applications, disclosing it prompts serious threats to individuals' privacy. To limit such concerns, organizations often provide analysts with aggregate time-series that indicate, e.g., how many people are in a location at a time interval, rather than raw individual traces. In this paper, we perform a measurement study to understand Membership Inference Attacks (MIAs) on aggregate location time-series, where an adversary tries to infer whether a specific user contributed to the aggregates.
We find that the volume of contributed data, as well as the regularity and particularity of users' mobility patterns, play a crucial role in the attack's success. We experiment with a wide range of defenses based on generalization, hiding, and perturbation, and evaluate their ability to thwart the attack vis-a-vis the utility loss they introduce for various mobility analytics tasks.
Our results show that some defenses fail across the board, while others work for specific tasks on aggregate location time-series. For instance, suppressing small counts can be used for ranking hotspots, data generalization for forecasting traffic, hotspot discovery, and map inference, while sampling is effective for location labeling and anomaly detection when the dataset is sparse. Differentially private techniques provide reasonable accuracy only in very specific settings, e.g., discovering hotspots and forecasting their traffic, and more so when using weaker privacy notions like crowd-blending privacy. Overall, our measurements show that there does not exist a unique generic defense that can preserve the utility of the analytics for arbitrary applications, and provide useful insights regarding the disclosure of sanitized aggregate location time-series.
△ Less
Submitted 27 April, 2020; v1 submitted 20 February, 2019;
originally announced February 2019.
-
On Collaborative Predictive Blacklisting
Authors:
Luca Melis,
Apostolos Pyrgelis,
Emiliano De Cristofaro
Abstract:
Collaborative predictive blacklisting (CPB) allows to forecast future attack sources based on logs and alerts contributed by multiple organizations. Unfortunately, however, research on CPB has only focused on increasing the number of predicted attacks but has not considered the impact on false positives and false negatives. Moreover, sharing alerts is often hindered by confidentiality, trust, and…
▽ More
Collaborative predictive blacklisting (CPB) allows to forecast future attack sources based on logs and alerts contributed by multiple organizations. Unfortunately, however, research on CPB has only focused on increasing the number of predicted attacks but has not considered the impact on false positives and false negatives. Moreover, sharing alerts is often hindered by confidentiality, trust, and liability issues, which motivates the need for privacy-preserving approaches to the problem. In this paper, we present a measurement study of state-of-the-art CPB techniques, aiming to shed light on the actual impact of collaboration. To this end, we reproduce and measure two systems: a non privacy-friendly one that uses a trusted coordinating party with access to all alerts (Soldo et al., 2010) and a peer-to-peer one using privacy-preserving data sharing (Freudiger et al., 2015). We show that, while collaboration boosts the number of predicted attacks, it also yields high false positives, ultimately leading to poor accuracy. This motivates us to present a hybrid approach, using a semi-trusted central entity, aiming to increase utility from collaboration while, at the same time, limiting information disclosure and false positives. This leads to a better trade-off of true and false positive rates, while at the same time addressing privacy concerns.
△ Less
Submitted 5 October, 2018;
originally announced October 2018.
-
There goes Wally: Anonymously sharing your location gives you away
Authors:
Apostolos Pyrgelis,
Nicolas Kourtellis,
Ilias Leontiadis,
Joan Serrà,
Claudio Soriente
Abstract:
With current technology, a number of entities have access to user mobility traces at different levels of spatio-temporal granularity. At the same time, users frequently reveal their location through different means, including geo-tagged social media posts and mobile app usage. Such leaks are often bound to a pseudonym or a fake identity in an attempt to preserve one's privacy. In this work, we inv…
▽ More
With current technology, a number of entities have access to user mobility traces at different levels of spatio-temporal granularity. At the same time, users frequently reveal their location through different means, including geo-tagged social media posts and mobile app usage. Such leaks are often bound to a pseudonym or a fake identity in an attempt to preserve one's privacy. In this work, we investigate how large-scale mobility traces can de-anonymize anonymous location leaks. By mining the country-wide mobility traces of tens of millions of users, we aim to understand how many location leaks are required to uniquely match a trace, how spatio-temporal obfuscation decreases the matching quality, and how the location popularity and time of the leak influence de-anonymization. We also study the mobility characteristics of those individuals whose anonymous leaks are more prone to identification. Finally, by extending our matching methodology to full traces, we show how large-scale human mobility is highly unique. Our quantitative results have implications for the privacy of users' traces, and may serve as a guideline for future policies regarding the management and publication of mobility data.
△ Less
Submitted 15 November, 2018; v1 submitted 7 June, 2018;
originally announced June 2018.
-
PriFi: Low-Latency Anonymity for Organizational Networks
Authors:
Ludovic Barman,
Italo Dacosta,
Mahdi Zamani,
Ennan Zhai,
Apostolos Pyrgelis,
Bryan Ford,
Jean-Pierre Hubaux,
Joan Feigenbaum
Abstract:
Organizational networks are vulnerable to traffic-analysis attacks that enable adversaries to infer sensitive information from the network traffic - even if encryption is used. Typical anonymous communication networks are tailored to the Internet and are poorly suited for organizational networks. We present PriFi, an anonymous communication protocol for LANs, which protects users against eavesdrop…
▽ More
Organizational networks are vulnerable to traffic-analysis attacks that enable adversaries to infer sensitive information from the network traffic - even if encryption is used. Typical anonymous communication networks are tailored to the Internet and are poorly suited for organizational networks. We present PriFi, an anonymous communication protocol for LANs, which protects users against eavesdroppers and provides high-performance traffic-analysis resistance. PriFi builds on Dining Cryptographers networks but reduces the high communication latency of prior work via a new client/relay/server architecture, in which a client's packets remain on their usual network path without additional hops, and in which a set of remote servers assist the anonymization process without adding latency. PriFi also solves the challenge of equivocation attacks, which are not addressed by related works, by encrypting the traffic based on the communication history. Our evaluation shows that PriFi introduces a small latency overhead (~100ms for 100 clients) and is compatible with delay-sensitive applications such as VoIP.
△ Less
Submitted 6 April, 2021; v1 submitted 27 October, 2017;
originally announced October 2017.
-
Knock Knock, Who's There? Membership Inference on Aggregate Location Data
Authors:
Apostolos Pyrgelis,
Carmela Troncoso,
Emiliano De Cristofaro
Abstract:
Aggregate location data is often used to support smart services and applications, e.g., generating live traffic maps or predicting visits to businesses. In this paper, we present the first study on the feasibility of membership inference attacks on aggregate location time-series. We introduce a game-based definition of the adversarial task, and cast it as a classification problem where machine lea…
▽ More
Aggregate location data is often used to support smart services and applications, e.g., generating live traffic maps or predicting visits to businesses. In this paper, we present the first study on the feasibility of membership inference attacks on aggregate location time-series. We introduce a game-based definition of the adversarial task, and cast it as a classification problem where machine learning can be used to distinguish whether or not a target user is part of the aggregates.
We empirically evaluate the power of these attacks on both raw and differentially private aggregates using two mobility datasets. We find that membership inference is a serious privacy threat, and show how its effectiveness depends on the adversary's prior knowledge, the characteristics of the underlying location data, as well as the number of users and the timeframe on which aggregation is performed. Although differentially private mechanisms can indeed reduce the extent of the attacks, they also yield a significant loss in utility. Moreover, a strategic adversary mimicking the behavior of the defense mechanism can greatly limit the protection they provide. Overall, our work presents a novel methodology geared to evaluate membership inference on aggregate location data in real-world settings and can be used by providers to assess the quality of privacy protection before data release or by regulators to detect violations.
△ Less
Submitted 29 November, 2017; v1 submitted 21 August, 2017;
originally announced August 2017.
-
What Does The Crowd Say About You? Evaluating Aggregation-based Location Privacy
Authors:
Apostolos Pyrgelis,
Carmela Troncoso,
Emiliano De Cristofaro
Abstract:
Information about people's movements and the locations they visit enables an increasing number of mobility analytics applications, e.g., in the context of urban and transportation planning, In this setting, rather than collecting or sharing raw data, entities often use aggregation as a privacy protection mechanism, aiming to hide individual users' location traces. Furthermore, to bound information…
▽ More
Information about people's movements and the locations they visit enables an increasing number of mobility analytics applications, e.g., in the context of urban and transportation planning, In this setting, rather than collecting or sharing raw data, entities often use aggregation as a privacy protection mechanism, aiming to hide individual users' location traces. Furthermore, to bound information leakage from the aggregates, they can perturb the input of the aggregation or its output to ensure that these are differentially private.
In this paper, we set to evaluate the impact of releasing aggregate location time-series on the privacy of individuals contributing to the aggregation. We introduce a framework allowing us to reason about privacy against an adversary attempting to predict users' locations or recover their mobility patterns. We formalize these attacks as inference problems, and discuss a few strategies to model the adversary's prior knowledge based on the information she may have access to. We then use the framework to quantify the privacy loss stemming from aggregate location data, with and without the protection of differential privacy, using two real-world mobility datasets. We find that aggregates do leak information about individuals' punctual locations and mobility profiles. The density of the observations, as well as timing, play important roles, e.g., regular patterns during peak hours are better protected than sporadic movements. Finally, our evaluation shows that both output and input perturbation offer little additional protection, unless they introduce large amounts of noise ultimately destroying the utility of the data.
△ Less
Submitted 10 June, 2017; v1 submitted 1 March, 2017;
originally announced March 2017.
-
Privacy-Friendly Mobility Analytics using Aggregate Location Data
Authors:
Apostolos Pyrgelis,
Emiliano De Cristofaro,
Gordon Ross
Abstract:
Location data can be extremely useful to study commuting patterns and disruptions, as well as to predict real-time traffic volumes. At the same time, however, the fine-grained collection of user locations raises serious privacy concerns, as this can reveal sensitive information about the users, such as, life style, political and religious inclinations, or even identities. In this paper, we study t…
▽ More
Location data can be extremely useful to study commuting patterns and disruptions, as well as to predict real-time traffic volumes. At the same time, however, the fine-grained collection of user locations raises serious privacy concerns, as this can reveal sensitive information about the users, such as, life style, political and religious inclinations, or even identities. In this paper, we study the feasibility of crowd-sourced mobility analytics over aggregate location information: users periodically report their location, using a privacy-preserving aggregation protocol, so that the server can only recover aggregates -- i.e., how many, but not which, users are in a region at a given time. We experiment with real-world mobility datasets obtained from the Transport For London authority and the San Francisco Cabs network, and present a novel methodology based on time series modeling that is geared to forecast traffic volumes in regions of interest and to detect mobility anomalies in them. In the presence of anomalies, we also make enhanced traffic volume predictions by feeding our model with additional information from correlated regions. Finally, we present and evaluate a mobile app prototype, called Mobility Data Donors (MDD), in terms of computation, communication, and energy overhead, demonstrating the real-world deployability of our techniques.
△ Less
Submitted 9 October, 2016; v1 submitted 21 September, 2016;
originally announced September 2016.
-
Building and Measuring Privacy-Preserving Predictive Blacklists
Authors:
Luca Melis,
Apostolos Pyrgelis,
Emiliano De Cristofaro
Abstract:
(Withdrawn) Collaborative security initiatives are increasingly often advocated to improve timeliness and effectiveness of threat mitigation. Among these, collaborative predictive blacklisting (CPB) aims to forecast attack sources based on alerts contributed by multiple organizations that might be targeted in similar ways. Alas, CPB proposals thus far have only focused on improving hit counts, but…
▽ More
(Withdrawn) Collaborative security initiatives are increasingly often advocated to improve timeliness and effectiveness of threat mitigation. Among these, collaborative predictive blacklisting (CPB) aims to forecast attack sources based on alerts contributed by multiple organizations that might be targeted in similar ways. Alas, CPB proposals thus far have only focused on improving hit counts, but overlooked the impact of collaboration on false positives and false negatives. Moreover, sharing threat intelligence often prompts important privacy, confidentiality, and liability issues. In this paper, we first provide a comprehensive measurement analysis of two state-of-the-art CPB systems: one that uses a trusted central party to collect alerts [Soldo et al., Infocom'10] and a peer-to-peer one relying on controlled data sharing [Freudiger et al., DIMVA'15], studying the impact of collaboration on both correct and incorrect predictions. Then, we present a novel privacy-friendly approach that significantly improves over previous work, achieving a better balance of true and false positive rates, while minimizing information disclosure. Finally, we present an extension that allows our system to scale to very large numbers of organizations.
△ Less
Submitted 7 October, 2018; v1 submitted 13 December, 2015;
originally announced December 2015.
-
Elliptic Curve Based Zero Knowledge Proofs and Their Applicability on Resource Constrained Devices
Authors:
Ioannis Chatzigiannakis,
Apostolos Pyrgelis,
Paul G. Spirakis,
Yannis C. Stamatiou
Abstract:
Elliptic Curve Cryptography (ECC) is an attractive alternative to conventional public key cryptography, such as RSA. ECC is an ideal candidate for implementation on constrained devices where the major computational resources i.e. speed, memory are limited and low-power wireless communication protocols are employed. That is because it attains the same security levels with traditional cryptosystems…
▽ More
Elliptic Curve Cryptography (ECC) is an attractive alternative to conventional public key cryptography, such as RSA. ECC is an ideal candidate for implementation on constrained devices where the major computational resources i.e. speed, memory are limited and low-power wireless communication protocols are employed. That is because it attains the same security levels with traditional cryptosystems using smaller parameter sizes. Moreover, in several application areas such as person identification and eVoting, it is frequently required of entities to prove knowledge of some fact without revealing this knowledge. Such proofs of knowledge are called Zero Knowledge Interactive Proofs (ZKIP) and involve interactions between two communicating parties, the Prover and the Verifier. In a ZKIP, the Prover demonstrates the possesion of some information (e.g. authentication information) to the Verifier without disclosing it. In this paper, we focus on the application of ZKIP protocols on resource constrained devices. We study well-established ZKIP protocols based on the discrete logarithm problem and we transform them under the ECC setting. Then, we implement the proposed protocols on Wiselib, a generic and open source algorithmic library. Finally, we present a thorough evaluation of the protocols on two popular hardware platforms equipped with low end microcontrollers (Jennic JN5139, TI MSP430) and 802.15.4 RF transceivers, in terms of code size, execution time, message size and energy requirements. To the best of our knowledge, this is the first attempt of implementing and evaluating ZKIP protocols with emphasis on low-end devices. This work's results can be used from developers who wish to achieve certain levels of security and privacy in their applications.
△ Less
Submitted 8 July, 2011;
originally announced July 2011.
-
Component Based Clustering in Wireless Sensor Networks
Authors:
Dimitrios Amaxilatis,
Ioannis Chatzigiannakis,
Christos Koninis,
Apostolos Pyrgelis
Abstract:
Clustering is an important research topic for wireless sensor networks (WSNs). A large variety of approaches has been presented focusing on different performance metrics. Even though all of them have many practical applications, an extremely limited number of software implementations is available to the research community. Furthermore, these very few techniques are implemented for specific WSN sys…
▽ More
Clustering is an important research topic for wireless sensor networks (WSNs). A large variety of approaches has been presented focusing on different performance metrics. Even though all of them have many practical applications, an extremely limited number of software implementations is available to the research community. Furthermore, these very few techniques are implemented for specific WSN systems or are integrated in complex applications. Thus it is very difficult to comparatively study their performance and almost impossible to reuse them in future applications under a different scope. In this work we study a large body of well established algorithms. We identify their main building blocks and propose a component-based architecture for develo** clustering algorithms that (a) promotes exchangeability of algorithms thus enabling the fast prototy** of new approaches, (b) allows cross-layer implementations to realize complex applications, (c) offers a common platform to comparatively study the performance of different approaches, (d) is hardware and OS independent. We implement 5 well known algorithms and discuss how to implement 11 more. We conduct an extended simulation study to demonstrate the faithfulness of our implementations when compared to the original implementations. Our simulations are at very large scale thus also demonstrating the scalability of the original algorithms beyond their original presentations. We also conduct experiments to assess their practicality in real WSNs. We demonstrate how the implemented clustering algorithms can be combined with routing and group key establishment algorithms to construct WSN applications. Our study clearly demonstrates the applicability of our approach and the benefits it offers to both research & development communities.
△ Less
Submitted 8 July, 2011; v1 submitted 19 May, 2011;
originally announced May 2011.
-
Wiselib: A Generic Algorithm Library for Heterogeneous Sensor Networks
Authors:
Tobias Baumgartner,
Ioannis Chatzigiannakis,
Sandor P. Fekete,
Christos Koninis,
Alexander Kroeller,
Apostolos Pyrgelis
Abstract:
One unfortunate consequence of the success story of wireless sensor networks (WSNs) in separate research communities is an ever-growing gap between theory and practice. Even though there is a increasing number of algorithmic methods for WSNs, the vast majority has never been tried in practice; conversely, many practical challenges are still awaiting efficient algorithmic solutions. The main cause…
▽ More
One unfortunate consequence of the success story of wireless sensor networks (WSNs) in separate research communities is an ever-growing gap between theory and practice. Even though there is a increasing number of algorithmic methods for WSNs, the vast majority has never been tried in practice; conversely, many practical challenges are still awaiting efficient algorithmic solutions. The main cause for this discrepancy is the fact that programming sensor nodes still happens at a very technical level. We remedy the situation by introducing Wiselib, our algorithm library that allows for simple implementations of algorithms onto a large variety of hardware and software. This is achieved by employing advanced C++ techniques such as templates and inline functions, allowing to write generic code that is resolved and bound at compile time, resulting in virtually no memory or computation overhead at run time.
The Wiselib runs on different host operating systems, such as Contiki, iSense OS, and ScatterWeb. Furthermore, it runs on virtual nodes simulated by Shawn. For any algorithm, the Wiselib provides data structures that suit the specific properties of the target platform. Algorithm code does not contain any platform-specific specializations, allowing a single implementation to run natively on heterogeneous networks.
In this paper, we describe the building blocks of the Wiselib, and analyze the overhead. We demonstrate the effectiveness of our approach by showing how routing algorithms can be implemented. We also report on results from experiments with real sensor-node hardware.
△ Less
Submitted 16 January, 2011;
originally announced January 2011.