Search | arXiv e-print repository

SplitOut: Out-of-the-Box Training-Hijacking Detection in Split Learning via Outlier Detection

Authors: Ege Erdogan, Unat Teksen, Mehmet Salih Celiktenyildiz, Alptekin Kupcu, A. Ercument Cicek

Abstract: Split learning enables efficient and privacy-aware training of a deep neural network by splitting a neural network so that the clients (data holders) compute the first layers and only share the intermediate output with the central compute-heavy server. This paradigm introduces a new attack medium in which the server has full control over what the client models learn, which has already been exploit… ▽ More Split learning enables efficient and privacy-aware training of a deep neural network by splitting a neural network so that the clients (data holders) compute the first layers and only share the intermediate output with the central compute-heavy server. This paradigm introduces a new attack medium in which the server has full control over what the client models learn, which has already been exploited to infer the private data of clients and to implement backdoors in the client models. Although previous work has shown that clients can successfully detect such training-hijacking attacks, the proposed methods rely on heuristics, require tuning of many hyperparameters, and do not fully utilize the clients' capabilities. In this work, we show that given modest assumptions regarding the clients' compute capabilities, an out-of-the-box outlier detection method can be used to detect existing training-hijacking attacks with almost-zero false positive rates. We conclude through experiments on different tasks that the simplicity of our approach we name SplitOut makes it a more viable and reliable alternative compared to the earlier detection methods. △ Less

Submitted 11 December, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

arXiv:2108.09052 [pdf, other]

doi 10.1145/3559613.3563198

SplitGuard: Detecting and Mitigating Training-Hijacking Attacks in Split Learning

Authors: Ege Erdogan, Alptekin Kupcu, A. Ercument Cicek

Abstract: Distributed deep learning frameworks such as split learning provide great benefits with regards to the computational cost of training deep neural networks and the privacy-aware utilization of the collective data of a group of data-holders. Split learning, in particular, achieves this goal by dividing a neural network between a client and a server so that the client computes the initial set of laye… ▽ More Distributed deep learning frameworks such as split learning provide great benefits with regards to the computational cost of training deep neural networks and the privacy-aware utilization of the collective data of a group of data-holders. Split learning, in particular, achieves this goal by dividing a neural network between a client and a server so that the client computes the initial set of layers, and the server computes the rest. However, this method introduces a unique attack vector for a malicious server attempting to steal the client's private data: the server can direct the client model towards learning any task of its choice, e.g. towards outputting easily invertible values. With a concrete example already proposed (Pasquini et al., CCS '21), such training-hijacking attacks present a significant risk for the data privacy of split learning clients. In this paper, we propose SplitGuard, a method by which a split learning client can detect whether it is being targeted by a training-hijacking attack or not. We experimentally evaluate our method's effectiveness, compare it with potential alternatives, and discuss in detail various points related to its use. We conclude that SplitGuard can effectively detect training-hijacking attacks while minimizing the amount of information recovered by the adversaries. △ Less

Submitted 16 September, 2022; v1 submitted 20 August, 2021; originally announced August 2021.

Comments: Proceedings of the 21st Workshop on Privacy in the Electronic Society (WPES '22), November 7, 2022, Los Angeles, CA, USA

arXiv:2108.09033 [pdf, other]

doi 10.1145/3559613.3563201

UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks Against Split Learning

Authors: Ege Erdogan, Alptekin Kupcu, A. Ercument Cicek

Abstract: Training deep neural networks often forces users to work in a distributed or outsourced setting, accompanied with privacy concerns. Split learning aims to address this concern by distributing the model among a client and a server. The scheme supposedly provides privacy, since the server cannot see the clients' models and inputs. We show that this is not true via two novel attacks. (1) We show that… ▽ More Training deep neural networks often forces users to work in a distributed or outsourced setting, accompanied with privacy concerns. Split learning aims to address this concern by distributing the model among a client and a server. The scheme supposedly provides privacy, since the server cannot see the clients' models and inputs. We show that this is not true via two novel attacks. (1) We show that an honest-but-curious split learning server, equipped only with the knowledge of the client neural network architecture, can recover the input samples and obtain a functionally similar model to the client model, without being detected. (2) We show that if the client keeps hidden only the output layer of the model to "protect" the private labels, the honest-but-curious server can infer the labels with perfect accuracy. We test our attacks using various benchmark datasets and against proposed privacy-enhancing extensions to split learning. Our results show that plaintext split learning can pose serious risks, ranging from data (input) privacy to intellectual property (model parameters), and provide no more than a false sense of security. △ Less

Submitted 16 September, 2022; v1 submitted 20 August, 2021; originally announced August 2021.

Comments: Proceedings of the 21st Workshop on Privacy in the Electronic Society (WPES '22), November 7, 2022, Los Angeles, CA, USA

arXiv:2001.08852 [pdf, other]

Genome Reconstruction Attacks Against Genomic Data-Sharing Beacons

Authors: Kerem Ayoz, Erman Ayday, A. Ercument Cicek

Abstract: Sharing genome data in a privacy-preserving way stands as a major bottleneck in front of the scientific progress promised by the big data era in genomics. A community-driven protocol named genomic data-sharing beacon protocol has been widely adopted for sharing genomic data. The system aims to provide a secure, easy to implement, and standardized interface for data sharing by only allowing yes/no… ▽ More Sharing genome data in a privacy-preserving way stands as a major bottleneck in front of the scientific progress promised by the big data era in genomics. A community-driven protocol named genomic data-sharing beacon protocol has been widely adopted for sharing genomic data. The system aims to provide a secure, easy to implement, and standardized interface for data sharing by only allowing yes/no queries on the presence of specific alleles in the dataset. However, beacon protocol was recently shown to be vulnerable against membership inference attacks. In this paper, we show that privacy threats against genomic data sharing beacons are not limited to membership inference. We identify and analyze a novel vulnerability of genomic data-sharing beacons: genome reconstruction. We show that it is possible to successfully reconstruct a substantial part of the genome of a victim when the attacker knows the victim has been added to the beacon in a recent update. We also show that even if multiple individuals are added to the beacon during the same update, it is possible to identify the victim's genome with high confidence using traits that are easily accessible by the attacker (e.g., eye and hair color). Moreover, we show how the reconstructed genome using a beacon that is not associated with a sensitive phenotype can be used for membership inference attacks to beacons with sensitive phenotypes (i.e., HIV+). The outcome of this work will guide beacon operators on when and how to update the content of the beacon. Thus, this work will be an important attempt at hel** beacon operators and participants make informed decisions. △ Less

Submitted 21 August, 2020; v1 submitted 23 January, 2020; originally announced January 2020.

arXiv:1902.04341 [pdf, other]

doi 10.1093/bioinformatics/btaa179

Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing Algorithm

Authors: Can Firtina, Jeremie S. Kim, Mohammed Alser, Damla Senol Cali, A. Ercument Cicek, Can Alkan, Onur Mutlu

Abstract: Long reads produced by third-generation sequencing technologies are used to construct an assembly (i.e., the subject's genome), which is further used in downstream genome analysis. Unfortunately, long reads have high sequencing error rates and a large proportion of bps in these long reads are incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis.… ▽ More Long reads produced by third-generation sequencing technologies are used to construct an assembly (i.e., the subject's genome), which is further used in downstream genome analysis. Unfortunately, long reads have high sequencing error rates and a large proportion of bps in these long reads are incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e., read-to-assembly alignment information). However, assembly polishing algorithms can only polish an assembly using reads either from a certain sequencing technology or from a small assembly. Such technology-dependency and assembly-size dependency require researchers to 1) run multiple polishing algorithms and 2) use small chunks of a large genome to use all available read sets and polish large genomes. We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e., both large and small genomes) using reads from all sequencing technologies (i.e., second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo 1) models an assembly as a profile hidden Markov model (pHMM), 2) uses read-to-assembly alignment to train the pHMM with the Forward-Backward algorithm, and 3) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real read sets demonstrate that Apollo is the only algorithm that 1) uses reads from any sequencing technology within a single run and 2) scales well to polish large assemblies without splitting the assembly into multiple parts. △ Less

Submitted 7 March, 2020; v1 submitted 12 February, 2019; originally announced February 2019.

Comments: 9 pages, 1 figure. Accepted in Bioinformatics

Journal ref: Bioinformatics . 2020 Jun 1;36(12):3669-3679

arXiv:1812.05067 [pdf, ps, other]

doi 10.1145/3314221.3314603

Bidirectional Type Checking for Relational Properties

Authors: Ezgi Çiçek, Weihao Qu, Gilles Barthe, Marco Gaboardi, Deepak Garg

Abstract: Relational type systems have been designed for several applications including information flow, differential privacy, and cost analysis. In order to achieve the best results, these systems often use relational refinements and relational effects to maximally exploit the similarity in the structure of the two programs being compared. Relational type systems are appealing for relational properties be… ▽ More Relational type systems have been designed for several applications including information flow, differential privacy, and cost analysis. In order to achieve the best results, these systems often use relational refinements and relational effects to maximally exploit the similarity in the structure of the two programs being compared. Relational type systems are appealing for relational properties because they deliver simpler and more precise verification than what could be derived from ty** the two programs separately. However, relational type systems do not yet achieve the practical appeal of their non-relational counterpart, in part because of the lack of a general foundations for implementing them. In this paper, we take a step in this direction by develo** bidirectional relational type checking for systems with relational refinements and effects. Our approach achieves the benefits of bidirectional type checking, in a relational setting. In particular, it significantly reduces the need for ty** annotations through the combination of type checking and type inference. In order to highlight the foundational nature of our approach, we develop bidirectional versions of several relational type systems which incrementally combine many different components needed for expressive relational analysis. △ Less

Submitted 12 December, 2018; originally announced December 2018.

Comments: 14 pages

arXiv:1605.05847 [pdf, ps, other]

Privacy-Related Consequences of Turkish Citizen Database Leak

Authors: Erin Avllazagaj, Erman Ayday, A. Ercument Cicek

Abstract: Personal data is collected and stored more than ever by the governments and companies in the digital age. Even though the data is only released after anonymization, deanonymization is possible by joining different datasets. This puts the privacy of individuals in jeopardy. Furthermore, data leaks can unveil personal identifiers of individuals when security is breached. Processing the leaked datase… ▽ More Personal data is collected and stored more than ever by the governments and companies in the digital age. Even though the data is only released after anonymization, deanonymization is possible by joining different datasets. This puts the privacy of individuals in jeopardy. Furthermore, data leaks can unveil personal identifiers of individuals when security is breached. Processing the leaked dataset can provide even more information than what is visible to naked eye. In this work, we report the results of our analyses on the recent "Turkish citizen database leak", which revealed the national identifier numbers of close to fifty million voters, along with personal information such as date of birth, birth place, and full address. We show that with automated processing of the data, one can uniquely identify (i) mother's maiden name of individuals and (ii) landline numbers, for a significant portion of people. This is a serious privacy and security threat because (i) identity theft risk is now higher, and (ii) scammers are able to access more information about individuals. The only and utmost goal of this work is to point out to the security risks and suggest stricter measures to related companies and agencies to protect the security and privacy of individuals. △ Less

Submitted 19 May, 2016; originally announced May 2016.

Comments: 12 pages, 5 figures

Showing 1–7 of 7 results for author: Çiçek, E