-
slytHErin: An Agile Framework for Encrypted Deep Neural Network Inference
Authors:
Francesco Intoci,
Sinem Sav,
Apostolos Pyrgelis,
Jean-Philippe Bossuat,
Juan Ramon Troncoso-Pastoriza,
Jean-Pierre Hubaux
Abstract:
Homomorphic encryption (HE), which allows computations on encrypted data, is an enabling technology for confidential cloud computing. One notable example is privacy-preserving Prediction-as-a-Service (PaaS), where machine-learning predictions are computed on encrypted data. However, develo** HE-based solutions for encrypted PaaS is a tedious task which requires a careful design that predominantl…
▽ More
Homomorphic encryption (HE), which allows computations on encrypted data, is an enabling technology for confidential cloud computing. One notable example is privacy-preserving Prediction-as-a-Service (PaaS), where machine-learning predictions are computed on encrypted data. However, develo** HE-based solutions for encrypted PaaS is a tedious task which requires a careful design that predominantly depends on the deployment scenario and on leveraging the characteristics of modern HE schemes. Prior works on privacy-preserving PaaS focus solely on protecting the confidentiality of the client data uploaded to a remote model provider, e.g., a cloud offering a prediction API, and assume (or take advantage of the fact) that the model is held in plaintext. Furthermore, their aim is to either minimize the latency of the service by processing one sample at a time, or to maximize the number of samples processed per second, while processing a fixed (large) number of samples. In this work, we present slytHErin, an agile framework that enables privacy-preserving PaaS beyond the application scenarios considered in prior works. Thanks to its hybrid design leveraging HE and its multiparty variant (MHE), slytHErin enables novel PaaS scenarios by encrypting the data, the model or both. Moreover, slytHErin features a flexible input data packing approach that allows processing a batch of an arbitrary number of samples, and several computation optimizations that are model-and-setting-agnostic. slytHErin is implemented in Go and it allows end-users to perform encrypted PaaS on custom deep learning models comprising fully-connected, convolutional, and pooling layers, in a few lines of code and without having to worry about the cumbersome implementation and optimization concerns inherent to HE.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Scalable and Privacy-Preserving Federated Principal Component Analysis
Authors:
David Froelicher,
Hyunghoon Cho,
Manaswitha Edupalli,
Joao Sa Sousa,
Jean-Philippe Bossuat,
Apostolos Pyrgelis,
Juan R. Troncoso-Pastoriza,
Bonnie Berger,
Jean-Pierre Hubaux
Abstract:
Principal component analysis (PCA) is an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermedia…
▽ More
Principal component analysis (PCA) is an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermediate results in a passive-adversary model with up to all-but-one colluding parties. SF-PCA jointly leverages multiparty homomorphic encryption, interactive protocols, and edge computing to efficiently interleave computations on local cleartext data with operations on collectively encrypted data. SF-PCA obtains results as accurate as non-secure centralized solutions, independently of the data distribution among the parties. It scales linearly or better with the dataset dimensions and with the number of data providers. SF-PCA is more precise than existing approaches that approximate the solution by combining local analysis results, and between 3x and 250x faster than privacy-preserving alternatives based solely on secure multiparty computation or homomorphic encryption. Our work demonstrates the practical applicability of secure and federated PCA on private distributed datasets.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
Orchestrating Collaborative Cybersecurity: A Secure Framework for Distributed Privacy-Preserving Threat Intelligence Sharing
Authors:
Juan R. Trocoso-Pastoriza,
Alain Mermoud,
Romain Bouyé,
Francesco Marino,
Jean-Philippe Bossuat,
Vincent Lenders,
Jean-Pierre Hubaux
Abstract:
Cyber Threat Intelligence (CTI) sharing is an important activity to reduce information asymmetries between attackers and defenders. However, this activity presents challenges due to the tension between data sharing and confidentiality, that result in information retention often leading to a free-rider problem. Therefore, the information that is shared represents only the tip of the iceberg. Curren…
▽ More
Cyber Threat Intelligence (CTI) sharing is an important activity to reduce information asymmetries between attackers and defenders. However, this activity presents challenges due to the tension between data sharing and confidentiality, that result in information retention often leading to a free-rider problem. Therefore, the information that is shared represents only the tip of the iceberg. Current literature assumes access to centralized databases containing all the information, but this is not always feasible, due to the aforementioned tension. This results in unbalanced or incomplete datasets, requiring the use of techniques to expand them; we show how these techniques lead to biased results and misleading performance expectations. We propose a novel framework for extracting CTI from distributed data on incidents, vulnerabilities and indicators of compromise, and demonstrate its use in several practical scenarios, in conjunction with the Malware Information Sharing Platforms (MISP). Policy implications for CTI sharing are presented and discussed. The proposed system relies on an efficient combination of privacy enhancing technologies and federated processing. This lets organizations stay in control of their CTI and minimize the risks of exposure or leakage, while enabling the benefits of sharing, more accurate and representative results, and more effective predictive and preventive defenses.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Verifiable Encodings for Secure Homomorphic Analytics
Authors:
Sylvain Chatel,
Christian Knabenhans,
Apostolos Pyrgelis,
Carmela Troncoso,
Jean-Pierre Hubaux
Abstract:
Homomorphic encryption, which enables the execution of arithmetic operations directly on ciphertexts, is a promising solution for protecting privacy of cloud-delegated computations on sensitive data. However, the correctness of the computation result is not ensured. We propose two error detection encodings and build authenticators that enable practical client-verification of cloud-based homomorphi…
▽ More
Homomorphic encryption, which enables the execution of arithmetic operations directly on ciphertexts, is a promising solution for protecting privacy of cloud-delegated computations on sensitive data. However, the correctness of the computation result is not ensured. We propose two error detection encodings and build authenticators that enable practical client-verification of cloud-based homomorphic computations under different trade-offs and without compromising on the features of the encryption algorithm. Our authenticators operate on top of trending ring learning with errors based fully homomorphic encryption schemes over the integers. We implement our solution in VERITAS, a ready-to-use system for verification of outsourced computations executed over encrypted data. We show that contrary to prior work VERITAS supports verification of any homomorphic operation and we demonstrate its practicality for various applications, such as ride-hailing, genomic-data analysis, encrypted search, and machine-learning training and inference.
△ Less
Submitted 4 June, 2024; v1 submitted 28 July, 2022;
originally announced July 2022.
-
Privacy-Preserving Federated Recurrent Neural Networks
Authors:
Sinem Sav,
Abdulrahman Diaa,
Apostolos Pyrgelis,
Jean-Philippe Bossuat,
Jean-Pierre Hubaux
Abstract:
We present RHODE, a novel system that enables privacy-preserving training of and prediction on Recurrent Neural Networks (RNNs) in a cross-silo federated learning setting by relying on multiparty homomorphic encryption. RHODE preserves the confidentiality of the training data, the model, and the prediction data; and it mitigates federated learning attacks that target the gradients under a passive-…
▽ More
We present RHODE, a novel system that enables privacy-preserving training of and prediction on Recurrent Neural Networks (RNNs) in a cross-silo federated learning setting by relying on multiparty homomorphic encryption. RHODE preserves the confidentiality of the training data, the model, and the prediction data; and it mitigates federated learning attacks that target the gradients under a passive-adversary threat model. We propose a packing scheme, multi-dimensional packing, for a better utilization of Single Instruction, Multiple Data (SIMD) operations under encryption. With multi-dimensional packing, RHODE enables the efficient processing, in parallel, of a batch of samples. To avoid the exploding gradients problem, RHODE provides several clip** approximations for performing gradient clip** under encryption. We experimentally show that the model performance with RHODE remains similar to non-secure solutions both for homogeneous and heterogeneous data distribution among the data holders. Our experimental evaluation shows that RHODE scales linearly with the number of data holders and the number of timesteps, sub-linearly and sub-quadratically with the number of features and the number of hidden units of RNNs, respectively. To the best of our knowledge, RHODE is the first system that provides the building blocks for the training of RNNs and its variants, under encryption in a federated learning setting.
△ Less
Submitted 3 May, 2023; v1 submitted 28 July, 2022;
originally announced July 2022.
-
Every Byte Matters: Traffic Analysis of Bluetooth Wearable Devices
Authors:
Ludovic Barman,
Alexandre Dumur,
Apostolos Pyrgelis,
Jean-Pierre Hubaux
Abstract:
Wearable devices such as smartwatches, fitness trackers, and blood-pressure monitors process, store, and communicate sensitive and personal information related to the health, life-style, habits and interests of the wearer. This data is exchanged with a companion app running on a smartphone over a Bluetooth connection. In this work, we investigate what can be inferred from the metadata (such as the…
▽ More
Wearable devices such as smartwatches, fitness trackers, and blood-pressure monitors process, store, and communicate sensitive and personal information related to the health, life-style, habits and interests of the wearer. This data is exchanged with a companion app running on a smartphone over a Bluetooth connection. In this work, we investigate what can be inferred from the metadata (such as the packet timings and sizes) of encrypted Bluetooth communications between a wearable device and its connected smartphone. We show that a passive eavesdropper can use traffic-analysis attacks to accurately recognize (a) communicating devices, even without having access to the MAC address, (b) human actions (e.g., monitoring heart rate, exercising) performed on wearable devices ranging from fitness trackers to smartwatches, (c) the mere opening of specific applications on a Wear OS smartwatch (e.g., the opening of a medical app, which can immediately reveal a condition of the wearer), (d) fine-grained actions (e.g., recording an insulin injection) within a specific application that helps diabetic users to monitor their condition, and (e) the profile and habits of the wearer by continuously monitoring her traffic over an extended period. We run traffic-analysis attacks by collecting a dataset of Bluetooth traces of multiple wearable devices, by designing features based on packet sizes and timings, and by using machine learning to classify the encrypted traffic to actions performed by the wearer. Then, we explore standard defense strategies; we show that these defenses do not provide sufficient protection against our attacks and introduce significant costs. Our research highlights the need to rethink how applications exchange sensitive information over Bluetooth, to minimize unnecessary data exchanges, and to design new defenses against traffic-analysis tailored to the wearable setting.
△ Less
Submitted 24 May, 2021;
originally announced May 2021.
-
SoK: Privacy-Preserving Collaborative Tree-based Model Learning
Authors:
Sylvain Chatel,
Apostolos Pyrgelis,
Juan Ramon Troncoso-Pastoriza,
Jean-Pierre Hubaux
Abstract:
Tree-based models are among the most efficient machine learning techniques for data mining nowadays due to their accuracy, interpretability, and simplicity. The recent orthogonal needs for more data and privacy protection call for collaborative privacy-preserving solutions. In this work, we survey the literature on distributed and privacy-preserving training of tree-based models and we systematize…
▽ More
Tree-based models are among the most efficient machine learning techniques for data mining nowadays due to their accuracy, interpretability, and simplicity. The recent orthogonal needs for more data and privacy protection call for collaborative privacy-preserving solutions. In this work, we survey the literature on distributed and privacy-preserving training of tree-based models and we systematize its knowledge based on four axes: the learning algorithm, the collaborative model, the protection mechanism, and the threat model. We use this to identify the strengths and limitations of these works and provide for the first time a framework analyzing the information leakage occurring in distributed tree-based model learning.
△ Less
Submitted 18 June, 2021; v1 submitted 16 March, 2021;
originally announced March 2021.
-
Privacy-Preserving and Efficient Verification of the Outcome in Genome-Wide Association Studies
Authors:
Anisa Halimi,
Leonard Dervishi,
Erman Ayday,
Apostolos Pyrgelis,
Juan Ramon Troncoso-Pastoriza,
Jean-Pierre Hubaux,
Xiaoqian Jiang,
Jaideep Vaidya
Abstract:
Providing provenance in scientific workflows is essential for reproducibility and auditability purposes. Workflow systems model and record provenance describing the steps performed to obtain the final results of a computation. In this work, we propose a framework that verifies the correctness of the statistical test results that are conducted by a researcher while protecting individuals' privacy i…
▽ More
Providing provenance in scientific workflows is essential for reproducibility and auditability purposes. Workflow systems model and record provenance describing the steps performed to obtain the final results of a computation. In this work, we propose a framework that verifies the correctness of the statistical test results that are conducted by a researcher while protecting individuals' privacy in the researcher's dataset. The researcher publishes the workflow of the conducted study, its output, and associated metadata. They keep the research dataset private while providing, as part of the metadata, a partial noisy dataset (that achieves local differential privacy). To check the correctness of the workflow output, a verifier makes use of the workflow, its metadata, and results of another statistical study (using publicly available datasets) to distinguish between correct statistics and incorrect ones. We use case the proposed framework in the genome-wide association studies (GWAS), in which the goal is to identify highly associated point mutations (variants) with a given phenotype. For evaluation, we use real genomic data and show that the correctness of the workflow output can be verified with high accuracy even when the aggregate statistics of a small number of variants are provided. We also quantify the privacy leakage due to the provided workflow and its associated metadata in the GWAS use-case and show that the additional privacy risk due to the provided metadata does not increase the existing privacy risk due to sharing of the research results. Thus, our results show that the workflow output (i.e., research results) can be verified with high confidence in a privacy-preserving way. We believe that this work will be a valuable step towards providing provenance in a privacy-preserving way while providing guarantees to the users about the correctness of the results.
△ Less
Submitted 7 November, 2022; v1 submitted 21 January, 2021;
originally announced January 2021.
-
Revolutionizing Medical Data Sharing Using Advanced Privacy Enhancing Technologies: Technical, Legal and Ethical Synthesis
Authors:
James Scheibner,
Jean Louis Raisaro,
Juan Ramón Troncoso-Pastoriza,
Marcello Ienca,
Jacques Fellay,
Effy Vayena,
Jean-Pierre Hubaux
Abstract:
Multisite medical data sharing is critical in modern clinical practice and medical research. The challenge is to conduct data sharing that preserves individual privacy and data usability. The shortcomings of traditional privacy-enhancing technologies mean that institutions rely on bespoke data sharing contracts. These contracts increase the inefficiency of data sharing and may disincentivize impor…
▽ More
Multisite medical data sharing is critical in modern clinical practice and medical research. The challenge is to conduct data sharing that preserves individual privacy and data usability. The shortcomings of traditional privacy-enhancing technologies mean that institutions rely on bespoke data sharing contracts. These contracts increase the inefficiency of data sharing and may disincentivize important clinical treatment and medical research. This paper provides a synthesis between two novel advanced privacy enhancing technologies (PETs): Homomorphic Encryption and Secure Multiparty Computation (defined together as Multiparty Homomorphic Encryption or MHE). These PETs provide a mathematical guarantee of privacy, with MHE providing a performance advantage over separately using HE or SMC. We argue MHE fulfills legal requirements for medical data sharing under the General Data Protection Regulation (GDPR) which has set a global benchmark for data protection. Specifically, the data processed and shared using MHE can be considered anonymized data. We explain how MHE can reduce the reliance on customized contractual measures between institutions. The proposed approach can accelerate the pace of medical research whilst offering additional incentives for healthcare and research institutes to employ common data interoperability standards.
△ Less
Submitted 27 October, 2020;
originally announced October 2020.
-
POSEIDON: Privacy-Preserving Federated Neural Network Learning
Authors:
Sinem Sav,
Apostolos Pyrgelis,
Juan R. Troncoso-Pastoriza,
David Froelicher,
Jean-Philippe Bossuat,
Joao Sa Sousa,
Jean-Pierre Hubaux
Abstract:
In this paper, we address the problem of privacy-preserving training and evaluation of neural networks in an $N$-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network training. It employs multiparty lattice-based cryptography to preserve the confidentiality of the training data, the model, and the evaluation…
▽ More
In this paper, we address the problem of privacy-preserving training and evaluation of neural networks in an $N$-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network training. It employs multiparty lattice-based cryptography to preserve the confidentiality of the training data, the model, and the evaluation data, under a passive-adversary model and collusions between up to $N-1$ parties. To efficiently execute the secure backpropagation algorithm for training neural networks, we provide a generic packing approach that enables Single Instruction, Multiple Data (SIMD) operations on encrypted data. We also introduce arbitrary linear transformations within the cryptographic bootstrap** operation, optimizing the costly cryptographic computations over the parties, and we define a constrained optimization problem for choosing the cryptographic parameters. Our experimental results show that POSEIDON achieves accuracy similar to centralized or decentralized non-private approaches and that its computation and communication overhead scales linearly with the number of parties. POSEIDON trains a 3-layer neural network on the MNIST dataset with 784 features and 60K samples distributed among 10 parties in less than 2 hours.
△ Less
Submitted 8 January, 2021; v1 submitted 1 September, 2020;
originally announced September 2020.
-
Privacy and Integrity Preserving Computations with CRISP
Authors:
Sylvain Chatel,
Apostolos Pyrgelis,
Juan R. Troncoso-Pastoriza,
Jean-Pierre Hubaux
Abstract:
In the digital era, users share their personal data with service providers to obtain some utility, e.g., access to high-quality services. Yet, the induced information flows raise privacy and integrity concerns. Consequently, cautious users may want to protect their privacy by minimizing the amount of information they disclose to curious service providers. Service providers are interested in verify…
▽ More
In the digital era, users share their personal data with service providers to obtain some utility, e.g., access to high-quality services. Yet, the induced information flows raise privacy and integrity concerns. Consequently, cautious users may want to protect their privacy by minimizing the amount of information they disclose to curious service providers. Service providers are interested in verifying the integrity of the users' data to improve their services and obtain useful knowledge for their business. In this work, we present a generic solution to the trade-off between privacy, integrity, and utility, by achieving authenticity verification of data that has been encrypted for offloading to service providers. Based on lattice-based homomorphic encryption and commitments, as well as zero-knowledge proofs, our construction enables a service provider to process and reuse third-party signed data in a privacy-friendly manner with integrity guarantees. We evaluate our solution on different use cases such as smart-metering, disease susceptibility, and location-based activity tracking, thus showing its versatility. Our solution achieves broad generality, quantum-resistance, and relaxes some assumptions of state-of-the-art solutions without affecting performance.
△ Less
Submitted 12 January, 2021; v1 submitted 8 July, 2020;
originally announced July 2020.
-
Decentralized Privacy-Preserving Proximity Tracing
Authors:
Carmela Troncoso,
Mathias Payer,
Jean-Pierre Hubaux,
Marcel Salathé,
James Larus,
Edouard Bugnion,
Wouter Lueks,
Theresa Stadler,
Apostolos Pyrgelis,
Daniele Antonioli,
Ludovic Barman,
Sylvain Chatel,
Kenneth Paterson,
Srdjan Čapkun,
David Basin,
Jan Beutel,
Dennis Jackson,
Marc Roeschlin,
Patrick Leu,
Bart Preneel,
Nigel Smart,
Aysajan Abidin,
Seda Gürses,
Michael Veale,
Cas Cremers
, et al. (9 additional authors not shown)
Abstract:
This document describes and analyzes a system for secure and privacy-preserving proximity tracing at large scale. This system, referred to as DP3T, provides a technological foundation to help slow the spread of SARS-CoV-2 by simplifying and accelerating the process of notifying people who might have been exposed to the virus so that they can take appropriate measures to break its transmission chai…
▽ More
This document describes and analyzes a system for secure and privacy-preserving proximity tracing at large scale. This system, referred to as DP3T, provides a technological foundation to help slow the spread of SARS-CoV-2 by simplifying and accelerating the process of notifying people who might have been exposed to the virus so that they can take appropriate measures to break its transmission chain. The system aims to minimise privacy and security risks for individuals and communities and guarantee the highest level of data protection. The goal of our proximity tracing system is to determine who has been in close physical proximity to a COVID-19 positive person and thus exposed to the virus, without revealing the contact's identity or where the contact occurred. To achieve this goal, users run a smartphone app that continually broadcasts an ephemeral, pseudo-random ID representing the user's phone and also records the pseudo-random IDs observed from smartphones in close proximity. When a patient is diagnosed with COVID-19, she can upload pseudo-random IDs previously broadcast from her phone to a central server. Prior to the upload, all data remains exclusively on the user's phone. Other users' apps can use data from the server to locally estimate whether the device's owner was exposed to the virus through close-range physical proximity to a COVID-19 positive person who has uploaded their data. In case the app detects a high risk, it will inform the user.
△ Less
Submitted 25 May, 2020;
originally announced May 2020.
-
Scalable Privacy-Preserving Distributed Learning
Authors:
David Froelicher,
Juan R. Troncoso-Pastoriza,
Apostolos Pyrgelis,
Sinem Sav,
Joao Sa Sousa,
Jean-Philippe Bossuat,
Jean-Pierre Hubaux
Abstract:
In this paper, we address the problem of privacy-preserving distributed learning and the evaluation of machine-learning models by analyzing it in the widespread MapReduce abstraction that we extend with privacy constraints. We design SPINDLE (Scalable Privacy-preservINg Distributed LEarning), the first distributed and privacy-preserving system that covers the complete ML workflow by enabling the e…
▽ More
In this paper, we address the problem of privacy-preserving distributed learning and the evaluation of machine-learning models by analyzing it in the widespread MapReduce abstraction that we extend with privacy constraints. We design SPINDLE (Scalable Privacy-preservINg Distributed LEarning), the first distributed and privacy-preserving system that covers the complete ML workflow by enabling the execution of a cooperative gradient-descent and the evaluation of the obtained model and by preserving data and model confidentiality in a passive-adversary model with up to N-1 colluding parties. SPINDLE uses multiparty homomorphic encryption to execute parallel high-depth computations on encrypted data without significant overhead. We instantiate SPINDLE for the training and evaluation of generalized linear models on distributed datasets and show that it is able to accurately (on par with non-secure centrally-trained models) and efficiently (due to a multi-level parallelization of the computations) train models that require a high number of iterations on large input data with thousands of features, distributed among hundreds of data providers. For instance, it trains a logistic-regression model on a dataset of one million samples with 32 features distributed among 160 data providers in less than three minutes.
△ Less
Submitted 14 July, 2021; v1 submitted 19 May, 2020;
originally announced May 2020.
-
Drynx: Decentralized, Secure, Verifiable System for Statistical Queries and Machine Learning on Distributed Datasets
Authors:
David Froelicher,
Juan R. Troncoso-Pastoriza,
Joao Sa Sousa,
Jean-Pierre Hubaux
Abstract:
Data sharing has become of primary importance in many domains such as big-data analytics, economics and medical research, but remains difficult to achieve when the data are sensitive. In fact, sharing personal information requires individuals' unconditional consent or is often simply forbidden for privacy and security reasons. In this paper, we propose Drynx, a decentralized system for privacy-con…
▽ More
Data sharing has become of primary importance in many domains such as big-data analytics, economics and medical research, but remains difficult to achieve when the data are sensitive. In fact, sharing personal information requires individuals' unconditional consent or is often simply forbidden for privacy and security reasons. In this paper, we propose Drynx, a decentralized system for privacy-conscious statistical analysis on distributed datasets. Drynx relies on a set of computing nodes to enable the computation of statistics such as standard deviation or extrema, and the training and evaluation of machine-learning models on sensitive and distributed data. To ensure data confidentiality and the privacy of the data providers, Drynx combines interactive protocols, homomorphic encryption, zero-knowledge proofs of correctness, and differential privacy. It enables an efficient and decentralized verification of the input data and of all the system's computations thus provides auditability in a strong adversarial model in which no entity has to be individually trusted. Drynx is highly modular, dynamic and parallelizable. Our evaluation shows that it enables the training of a logistic regression model on a dataset (12 features and 600,000 records) distributed among 12 data providers in less than 2 seconds. The computations are distributed among 6 computing nodes, and Drynx enables the verification of the query execution's correctness in less than 22 seconds.
△ Less
Submitted 27 February, 2020; v1 submitted 11 February, 2019;
originally announced February 2019.
-
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs
Authors:
Kirill Nikitin,
Ludovic Barman,
Wouter Lueks,
Matthew Underwood,
Jean-Pierre Hubaux,
Bryan Ford
Abstract:
Most encrypted data formats leak metadata via their plaintext headers, such as format version, encryption schemes used, number of recipients who can decrypt the data, and even the recipients' identities. This leakage can pose security and privacy risks to users, e.g., by revealing the full membership of a group of collaborators from a single encrypted e-mail, or by enabling an eavesdropper to fing…
▽ More
Most encrypted data formats leak metadata via their plaintext headers, such as format version, encryption schemes used, number of recipients who can decrypt the data, and even the recipients' identities. This leakage can pose security and privacy risks to users, e.g., by revealing the full membership of a group of collaborators from a single encrypted e-mail, or by enabling an eavesdropper to fingerprint the precise encryption software version and configuration the sender used. We propose that future encrypted data formats improve security and privacy hygiene by producing $\textit{Padded Uniform Random Blobs}$ or PURBs: ciphertexts indistinguishable from random bit strings to anyone without a decryption key. A PURB's content leaks $\textit{nothing at all}$, even the application that created it, and is padded such that even its length leaks as little as possible. Encoding and decoding ciphertexts with $\textit{no}$ cleartext markers presents efficiency challenges, however. We present cryptographically agile encodings enabling legitimate recipients to decrypt a PURB efficiently, even when encrypted for any number of recipients' public keys and/or passwords, and when these public keys are from different cryptographic suites. PURBs employ Padmé, a~novel padding scheme that limits information leakage via ciphertexts of maximum length $M$ to a practical optimum of $O(\log \log M)$ bits, comparable to padding to a power of two, but with lower overhead of at most $12\%$ and decreasing with larger payloads.
△ Less
Submitted 25 July, 2019; v1 submitted 8 June, 2018;
originally announced June 2018.
-
PriFi: Low-Latency Anonymity for Organizational Networks
Authors:
Ludovic Barman,
Italo Dacosta,
Mahdi Zamani,
Ennan Zhai,
Apostolos Pyrgelis,
Bryan Ford,
Jean-Pierre Hubaux,
Joan Feigenbaum
Abstract:
Organizational networks are vulnerable to traffic-analysis attacks that enable adversaries to infer sensitive information from the network traffic - even if encryption is used. Typical anonymous communication networks are tailored to the Internet and are poorly suited for organizational networks. We present PriFi, an anonymous communication protocol for LANs, which protects users against eavesdrop…
▽ More
Organizational networks are vulnerable to traffic-analysis attacks that enable adversaries to infer sensitive information from the network traffic - even if encryption is used. Typical anonymous communication networks are tailored to the Internet and are poorly suited for organizational networks. We present PriFi, an anonymous communication protocol for LANs, which protects users against eavesdroppers and provides high-performance traffic-analysis resistance. PriFi builds on Dining Cryptographers networks but reduces the high communication latency of prior work via a new client/relay/server architecture, in which a client's packets remain on their usual network path without additional hops, and in which a set of remote servers assist the anonymization process without adding latency. PriFi also solves the challenge of equivocation attacks, which are not addressed by related works, by encrypting the traffic based on the communication history. Our evaluation shows that PriFi introduces a small latency overhead (~100ms for 100 clients) and is compatible with delay-sensitive applications such as VoIP.
△ Less
Submitted 6 April, 2021; v1 submitted 27 October, 2017;
originally announced October 2017.
-
FairTest: Discovering Unwarranted Associations in Data-Driven Applications
Authors:
Florian Tramèr,
Vaggelis Atlidakis,
Roxana Geambasu,
Daniel Hsu,
Jean-Pierre Hubaux,
Mathias Humbert,
Ari Juels,
Huang Lin
Abstract:
In a world where traditional notions of privacy are increasingly challenged by the myriad companies that collect and analyze our data, it is important that decision-making entities are held accountable for unfair treatments arising from irresponsible data usage. Unfortunately, a lack of appropriate methodologies and tools means that even identifying unfair or discriminatory effects can be a challe…
▽ More
In a world where traditional notions of privacy are increasingly challenged by the myriad companies that collect and analyze our data, it is important that decision-making entities are held accountable for unfair treatments arising from irresponsible data usage. Unfortunately, a lack of appropriate methodologies and tools means that even identifying unfair or discriminatory effects can be a challenge in practice. We introduce the unwarranted associations (UA) framework, a principled methodology for the discovery of unfair, discriminatory, or offensive user treatment in data-driven applications. The UA framework unifies and rationalizes a number of prior attempts at formalizing algorithmic fairness. It uniquely combines multiple investigative primitives and fairness metrics with broad applicability, granular exploration of unfair treatment in user subgroups, and incorporation of natural notions of utility that may account for observed disparities. We instantiate the UA framework in FairTest, the first comprehensive tool that helps developers check data-driven applications for unfair user treatment. It enables scalable and statistically rigorous investigation of associations between application outcomes (such as prices or premiums) and sensitive user attributes (such as race or gender). Furthermore, FairTest provides debugging capabilities that let programmers rule out potential confounders for observed unfair effects. We report on use of FairTest to investigate and in some cases address disparate impact, offensive labeling, and uneven rates of algorithmic error in four data-driven applications. As examples, our results reveal subtle biases against older populations in the distribution of error in a predictive health application and offensive racial labeling in an image tagger.
△ Less
Submitted 16 August, 2016; v1 submitted 8 October, 2015;
originally announced October 2015.
-
Prolonging the Hide-and-Seek Game: Optimal Trajectory Privacy for Location-Based Services
Authors:
George Theodorakopoulos,
Reza Shokri,
Carmela Troncoso,
Jean-Pierre Hubaux,
Jean-Yves Le Boudec
Abstract:
Human mobility is highly predictable. Individuals tend to only visit a few locations with high frequency, and to move among them in a certain sequence reflecting their habits and daily routine. This predictability has to be taken into account in the design of location privacy preserving mechanisms (LPPMs) in order to effectively protect users when they continuously expose their position to locatio…
▽ More
Human mobility is highly predictable. Individuals tend to only visit a few locations with high frequency, and to move among them in a certain sequence reflecting their habits and daily routine. This predictability has to be taken into account in the design of location privacy preserving mechanisms (LPPMs) in order to effectively protect users when they continuously expose their position to location-based services (LBSs). In this paper, we describe a method for creating LPPMs that are customized for a user's mobility profile taking into account privacy and quality of service requirements. By construction, our LPPMs take into account the sequential correlation across the user's exposed locations, providing the maximum possible trajectory privacy, i.e., privacy for the user's present location, as well as past and expected future locations. Moreover, our LPPMs are optimal against a strategic adversary, i.e., an attacker that implements the strongest inference attack knowing both the LPPM operation and the user's mobility profile. The optimality of the LPPMs in the context of trajectory privacy is a novel contribution, and it is achieved by formulating the LPPM design problem as a Bayesian Stackelberg game between the user and the adversary. An additional benefit of our formal approach is that the design parameters of the LPPM are chosen by the optimization algorithm.
△ Less
Submitted 5 September, 2014;
originally announced September 2014.
-
Privacy in the Genomic Era
Authors:
Muhammad Naveed,
Erman Ayday,
Ellen W. Clayton,
Jacques Fellay,
Carl A. Gunter,
Jean-Pierre Hubaux,
Bradley A. Malin,
XiaoFeng Wang
Abstract:
Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has…
▽ More
Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with traits and certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward.
△ Less
Submitted 17 June, 2015; v1 submitted 8 May, 2014;
originally announced May 2014.
-
The Chills and Thrills of Whole Genome Sequencing
Authors:
Erman Ayday,
Emiliano De Cristofaro,
Jean-Pierre Hubaux,
Gene Tsudik
Abstract:
In recent years, Whole Genome Sequencing (WGS) evolved from a futuristic-sounding research project to an increasingly affordable technology for determining complete genome sequences of complex organisms, including humans. This prompts a wide range of revolutionary applications, as WGS promises to improve modern healthcare and provide a better understanding of the human genome -- in particular, its…
▽ More
In recent years, Whole Genome Sequencing (WGS) evolved from a futuristic-sounding research project to an increasingly affordable technology for determining complete genome sequences of complex organisms, including humans. This prompts a wide range of revolutionary applications, as WGS promises to improve modern healthcare and provide a better understanding of the human genome -- in particular, its relation to diseases and response to treatments. However, this progress raises worrisome privacy and ethical issues, since, besides uniquely identifying its owner, the genome contains a treasure trove of highly personal and sensitive information. In this article, after summarizing recent advances in genomics, we discuss some important privacy issues associated with human genomic information and identify a number of particularly relevant research challenges.
△ Less
Submitted 16 February, 2015; v1 submitted 5 June, 2013;
originally announced June 2013.
-
VANET Connectivity Analysis
Authors:
M. Kafsi,
P. Papadimitratos,
O. Dousse,
T. Alpcan,
J. -P. Hubaux
Abstract:
Vehicular Ad Hoc Networks (VANETs) are a peculiar subclass of mobile ad hoc networks that raise a number of technical challenges, notably from the point of view of their mobility models. In this paper, we provide a thorough analysis of the connectivity of such networks by leveraging on well-known results of percolation theory. By means of simulations, we study the influence of a number of parame…
▽ More
Vehicular Ad Hoc Networks (VANETs) are a peculiar subclass of mobile ad hoc networks that raise a number of technical challenges, notably from the point of view of their mobility models. In this paper, we provide a thorough analysis of the connectivity of such networks by leveraging on well-known results of percolation theory. By means of simulations, we study the influence of a number of parameters, including vehicle density, proportion of equipped vehicles, and radio communication range. We also study the influence of traffic lights and roadside units. Our results provide insights on the behavior of connectivity. We believe this paper to be a valuable framework to assess the feasibility and performance of future applications relying on vehicular connectivity in urban scenarios.
△ Less
Submitted 30 December, 2009;
originally announced December 2009.
-
Wireless Sensor Networking for Rain-fed Farming Decision Support
Authors:
J. Panchard,
P. R. S. Rao,
M. S. Sheshshayee,
P. Papadimitratos,
S. Kumar,
J-P. Hubaux
Abstract:
Wireless sensor networks (WSNs) can be a valuable decision-support tool for farmers. This motivated our deployment of a WSN system to support rain-fed agriculture in India. We defined promising use cases and resolved technical challenges throughout a two-year deployment of our COMMON-Sense Net system, which provided farmers with environment data. However, the direct use of this technology in the…
▽ More
Wireless sensor networks (WSNs) can be a valuable decision-support tool for farmers. This motivated our deployment of a WSN system to support rain-fed agriculture in India. We defined promising use cases and resolved technical challenges throughout a two-year deployment of our COMMON-Sense Net system, which provided farmers with environment data. However, the direct use of this technology in the field did not foster the expected participation of the population. This made it difficult to develop the intended decision-support system. Based on this experience, we take the following position in this paper: currently, the deployment of WSN technology in develo** regions is more likely to be effective if it targets scientists and technical personnel as users, rather than the farmers themselves. We base this claim on the lessons learned from the COMMON-Sense system deployment and the results of an extensive user experiment with agriculture scientists, which we describe in this paper.
△ Less
Submitted 30 December, 2009;
originally announced December 2009.
-
How to Specify and How to Prove Correctness of Secure Routing Protocols for MANET
Authors:
P. Papadimitratos,
Z. J. Haas,
J. -P. Hubaux
Abstract:
Secure routing protocols for mobile ad hoc networks have been developed recently, yet, it has been unclear what are the properties they achieve, as a formal analysis of these protocols is mostly lacking. In this paper, we are concerned with this problem, how to specify and how to prove the correctness of a secure routing protocol. We provide a definition of what a protocol is expected to achieve…
▽ More
Secure routing protocols for mobile ad hoc networks have been developed recently, yet, it has been unclear what are the properties they achieve, as a formal analysis of these protocols is mostly lacking. In this paper, we are concerned with this problem, how to specify and how to prove the correctness of a secure routing protocol. We provide a definition of what a protocol is expected to achieve independently of its functionality, as well as communication and adversary models. This way, we enable formal reasoning on the correctness of secure routing protocols. We demonstrate this by analyzing two protocols from the literature.
△ Less
Submitted 30 December, 2009;
originally announced December 2009.
-
Secure Vehicular Communication Systems: Implementation, Performance, and Research Challenges
Authors:
F. Kargl,
P. Papadimitratos,
L. Buttyan,
M. Muter,
B. Wiedersheim,
E. Schoch,
T. -V. Thong,
G. Calandriello,
A. Held,
A. Kung,
J. -P. Hubaux
Abstract:
Vehicular Communication (VC) systems are on the verge of practical deployment. Nonetheless, their security and privacy protection is one of the problems that have been addressed only recently. In order to show the feasibility of secure VC, certain implementations are required. In [1] we discuss the design of a VC security system that has emerged as a result of the European SeVeCom project. In th…
▽ More
Vehicular Communication (VC) systems are on the verge of practical deployment. Nonetheless, their security and privacy protection is one of the problems that have been addressed only recently. In order to show the feasibility of secure VC, certain implementations are required. In [1] we discuss the design of a VC security system that has emerged as a result of the European SeVeCom project. In this second paper, we discuss various issues related to the implementation and deployment aspects of secure VC systems. Moreover, we provide an outlook on open security research issues that will arise as VC systems develop from today's simple prototypes to full-fledged systems.
△ Less
Submitted 30 December, 2009;
originally announced December 2009.
-
Secure Vehicular Communication Systems: Design and Architecture
Authors:
P. Papadimitratos,
L. Buttyan,
T. Holczer,
E. Schoch,
J. Freudiger,
M. Raya,
Z. Ma,
F. Kargl,
A. Kung,
J. -P. Hubaux
Abstract:
Significant developments have taken place over the past few years in the area of vehicular communication (VC) systems. Now, it is well understood in the community that security and protection of private user information are a prerequisite for the deployment of the technology. This is so, precisely because the benefits of VC systems, with the mission to enhance transportation safety and efficienc…
▽ More
Significant developments have taken place over the past few years in the area of vehicular communication (VC) systems. Now, it is well understood in the community that security and protection of private user information are a prerequisite for the deployment of the technology. This is so, precisely because the benefits of VC systems, with the mission to enhance transportation safety and efficiency, are at stake. Without the integration of strong and practical security and privacy enhancing mechanisms, VC systems could be disrupted or disabled, even by relatively unsophisticated attackers. We address this problem within the SeVeCom project, having developed a security architecture that provides a comprehensive and practical solution. We present our results in a set of two papers in this issue. In this first one, we analyze threats and types of adversaries, we identify security and privacy requirements, and we present a spectrum of mechanisms to secure VC systems. We provide a solution that can be quickly adopted and deployed. In the second paper, we present our progress towards the implementation of our architecture and results on the performance of the secure VC system, along with a discussion of upcoming research challenges and our related current results.
△ Less
Submitted 30 December, 2009;
originally announced December 2009.
-
Efficient and Robust Secure Aggregation for Sensor Networks
Authors:
P. Haghani,
P. Papadimitratos,
M. Poturalski,
K. Aberer,
J. -P. Hubaux
Abstract:
Wireless Sensor Networks (WSNs) rely on in-network aggregation for efficiency, however, this comes at a price: A single adversary can severely influence the outcome by contributing an arbitrary partial aggregate value. Secure in-network aggregation can detect such manipulation. But as long as such faults persist, no aggregation result can be obtained. In contrast, the collection of individual se…
▽ More
Wireless Sensor Networks (WSNs) rely on in-network aggregation for efficiency, however, this comes at a price: A single adversary can severely influence the outcome by contributing an arbitrary partial aggregate value. Secure in-network aggregation can detect such manipulation. But as long as such faults persist, no aggregation result can be obtained. In contrast, the collection of individual sensor node values is robust and solves the problem of availability, yet in an inefficient way. Our work seeks to bridge this gap in secure data collection: We propose a system that enhances availability with an efficiency close to that of in-network aggregation. To achieve this, our scheme relies on costly operations to localize and exclude nodes that manipulate the aggregation, but \emph{only} when a failure is detected. The detection of aggregation disruptions and the removal of faulty nodes provides robustness. At the same time, after removing faulty nodes, the WSN can enjoy low cost (secure) aggregation. Thus, the high exclusion cost is amortized, and efficiency increases.
△ Less
Submitted 19 August, 2008;
originally announced August 2008.
-
Secure Neighbor Discovery in Wireless Networks: Formal Investigation of Possibility
Authors:
Marcin Poturalski,
Panos Papadimitratos,
Jean-Pierre Hubaux
Abstract:
Wireless communication enables a broad spectrum of applications, ranging from commodity to tactical systems. Neighbor discovery (ND), that is, determining which devices are within direct radio communication, is a building block of network protocols and applications, and its vulnerability can severely compromise their functionalities. A number of proposals to secure ND have been published, but no…
▽ More
Wireless communication enables a broad spectrum of applications, ranging from commodity to tactical systems. Neighbor discovery (ND), that is, determining which devices are within direct radio communication, is a building block of network protocols and applications, and its vulnerability can severely compromise their functionalities. A number of proposals to secure ND have been published, but none have analyzed the problem formally. In this paper, we contribute such an analysis: We build a formal model capturing salient characteristics of wireless systems, most notably obstacles and interference, and we provide a specification of a basic variant of the ND problem. Then, we derive an impossibility result for a general class of protocols we term "time-based protocols," to which many of the schemes in the literature belong. We also identify the conditions under which the impossibility result is lifted. Moreover, we explore a second class of protocols we term "time- and location-based protocols," and prove they can secure ND.
△ Less
Submitted 19 August, 2008;
originally announced August 2008.
-
Impact of Vehicular Communications Security on Transportation Safety
Authors:
Panos Papadimitratos,
Giorgio Calandriello,
Jean-Pierre Hubaux,
Antonio Lioy
Abstract:
Transportation safety, one of the main driving forces of the development of vehicular communication (VC) systems, relies on high-rate safety messaging (beaconing). At the same time, there is consensus among authorities, industry, and academia on the need to secure VC systems. With specific proposals in the literature, a critical question must be answered: can secure VC systems be practical and s…
▽ More
Transportation safety, one of the main driving forces of the development of vehicular communication (VC) systems, relies on high-rate safety messaging (beaconing). At the same time, there is consensus among authorities, industry, and academia on the need to secure VC systems. With specific proposals in the literature, a critical question must be answered: can secure VC systems be practical and satisfy the requirements of safety applications, in spite of the significant communication and processing overhead and other restrictions security and privacy-enhancing mechanisms impose? To answer this question, we investigate in this paper the following three dimensions for secure and privacy-enhancing VC schemes: the reliability of communication, the processing overhead at each node, and the impact on a safety application. The results indicate that with the appropriate system design, including sufficiently high processing power, applications enabled by secure VC can be in practice as effective as those enabled by unsecured VC.
△ Less
Submitted 19 August, 2008;
originally announced August 2008.
-
Report on the "Secure Vehicular Communications: Results and Challenges Ahead" Workshop
Authors:
Panos Papadimitratos,
Jean-Pierre Hubaux
Abstract:
This is a report and a collection of abstracts from the Feb. 2008 Lausanne Workshop on Secure Vehicular Communication Systems.
This is a report and a collection of abstracts from the Feb. 2008 Lausanne Workshop on Secure Vehicular Communication Systems.
△ Less
Submitted 19 August, 2008;
originally announced August 2008.
-
GossiCrypt: Wireless Sensor Network Data Confidentiality Against Parasitic Adversaries
Authors:
Jun Luo,
Panos Papadimitratos,
Jean-Pierre Hubaux
Abstract:
Resource and cost constraints remain a challenge for wireless sensor network security. In this paper, we propose a new approach to protect confidentiality against a parasitic adversary, which seeks to exploit sensor networks by obtaining measurements in an unauthorized way. Our low-complexity solution, GossiCrypt, leverages on the large scale of sensor networks to protect confidentiality efficie…
▽ More
Resource and cost constraints remain a challenge for wireless sensor network security. In this paper, we propose a new approach to protect confidentiality against a parasitic adversary, which seeks to exploit sensor networks by obtaining measurements in an unauthorized way. Our low-complexity solution, GossiCrypt, leverages on the large scale of sensor networks to protect confidentiality efficiently and effectively. GossiCrypt protects data by symmetric key encryption at their source nodes and re-encryption at a randomly chosen subset of nodes en route to the sink. Furthermore, it employs key refreshing to mitigate the physical compromise of cryptographic keys. We validate GossiCrypt analytically and with simulations, showing it protects data confidentiality with probability almost one. Moreover, compared with a system that uses public-key data encryption, the energy consumption of GossiCrypt is one to three orders of magnitude lower.
△ Less
Submitted 19 August, 2008;
originally announced August 2008.
-
Towards Provable Secure Neighbor Discovery in Wireless Networks
Authors:
Marcin Poturalski,
Panos Papadimitratos,
Jean-Pierre Hubaux
Abstract:
In wireless systems, neighbor discovery (ND) is a fundamental building block: determining which devices are within direct radio communication is an enabler for networking protocols and a wide range of applications. To thwart abuse of ND and the resultant compromise of the dependent functionality of wireless systems, numerous works proposed solutions to secure ND. Nonetheless, until very recently…
▽ More
In wireless systems, neighbor discovery (ND) is a fundamental building block: determining which devices are within direct radio communication is an enabler for networking protocols and a wide range of applications. To thwart abuse of ND and the resultant compromise of the dependent functionality of wireless systems, numerous works proposed solutions to secure ND. Nonetheless, until very recently, there has been no formal analysis of secure ND protocols. We close this gap in \cite{asiaccs08}, but we concentrate primarily on the derivation of an impossibility result for a class of protocols. In this paper, we focus on reasoning about specific protocols. First, we contribute a number of extensions and refinements on the framework of [24]. As we are particularly concerned with the practicality of provably secure ND protocols, we investigate availability and redefine accordingly the ND specification, and also consider composability of ND with other protocols. Then, we propose and analyze two secure ND protocols: We revisit one of the protocols analyzed in [24], and introduce and prove correct a more elaborate challenge-response protocol.
△ Less
Submitted 19 August, 2008;
originally announced August 2008.