-
On Rényi Differential Privacy in Statistics-Based Synthetic Data Generation
Authors:
Takayuki Miura,
Toshiki Shibahara,
Masanobu Kii,
Atsunori Ichikawa,
Juko Yamamoto,
Koji Chida
Abstract:
Privacy protection with synthetic data generation often uses differentially private statistics and model parameters to quantitatively express theoretical security. However, these methods do not take into account privacy protection due to the randomness of data generation. In this paper, we theoretically evaluate Rényi differential privacy of the randomness in data generation of a synthetic data ge…
▽ More
Privacy protection with synthetic data generation often uses differentially private statistics and model parameters to quantitatively express theoretical security. However, these methods do not take into account privacy protection due to the randomness of data generation. In this paper, we theoretically evaluate Rényi differential privacy of the randomness in data generation of a synthetic data generation method that uses the mean vector and the covariance matrix of an original dataset. Specifically, for a fixed $α> 1$, we show the condition of $\varepsilon$ such that the synthetic data generation satisfies $(α, \varepsilon)$-Rényi differential privacy under a bounded neighboring condition and an unbounded neighboring condition, respectively. In particular, under the unbounded condition, when the size of the original dataset and synthetic datase is 10 million, the mechanism satisfies $(4, 0.576)$-Rényi differential privacy. We also show that when we translate it into the traditional $(\varepsilon, δ)$-differential privacy, the mechanism satisfies $(4.00, 10^{-10})$-differential privacy.
△ Less
Submitted 31 March, 2023;
originally announced March 2023.
-
Do Backdoors Assist Membership Inference Attacks?
Authors:
Yumeki Goto,
Nami Ashizawa,
Toshiki Shibahara,
Naoto Yanai
Abstract:
When an adversary provides poison samples to a machine learning model, privacy leakage, such as membership inference attacks that infer whether a sample was included in the training of the model, becomes effective by moving the sample to an outlier. However, the attacks can be detected because inference accuracy deteriorates due to poison samples. In this paper, we discuss a \textit{backdoor-assis…
▽ More
When an adversary provides poison samples to a machine learning model, privacy leakage, such as membership inference attacks that infer whether a sample was included in the training of the model, becomes effective by moving the sample to an outlier. However, the attacks can be detected because inference accuracy deteriorates due to poison samples. In this paper, we discuss a \textit{backdoor-assisted membership inference attack}, a novel membership inference attack based on backdoors that return the adversary's expected output for a triggered sample. We found three crucial insights through experiments with an academic benchmark dataset. We first demonstrate that the backdoor-assisted membership inference attack is unsuccessful. Second, when we analyzed loss distributions to understand the reason for the unsuccessful results, we found that backdoors cannot separate loss distributions of training and non-training samples. In other words, backdoors cannot affect the distribution of clean samples. Third, we also show that poison and triggered samples activate neurons of different distributions. Specifically, backdoors make any clean sample an inlier, contrary to poisoning samples. As a result, we confirm that backdoors cannot assist membership inference.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
Interpreting Graph-based Sybil Detection Methods as Low-Pass Filtering
Authors:
Satoshi Furutani,
Toshiki Shibahara,
Mitsuaki Akiyama,
Masaki Aida
Abstract:
Online social networks (OSNs) are threatened by Sybil attacks, which create fake accounts (also called Sybils) on OSNs and use them for various malicious activities. Therefore, Sybil detection is a fundamental task for OSN security. Most existing Sybil detection methods are based on the graph structure of OSNs, and various methods have been proposed recently. However, although almost all methods h…
▽ More
Online social networks (OSNs) are threatened by Sybil attacks, which create fake accounts (also called Sybils) on OSNs and use them for various malicious activities. Therefore, Sybil detection is a fundamental task for OSN security. Most existing Sybil detection methods are based on the graph structure of OSNs, and various methods have been proposed recently. However, although almost all methods have been compared experimentally in terms of detection performance and noise robustness, theoretical understanding of them is still lacking. In this study, we show that existing graph-based Sybil detection methods can be interpreted in a unified framework of low-pass filtering. This framework enables us to theoretically compare and analyze each method from two perspectives: filter kernel properties and the spectrum of shift matrices. Our analysis reveals that the detection performance of each method depends on how well low-pass filtering can extract low frequency components and remove noisy high frequency components. Furthermore, on the basis of the analysis, we propose a novel Sybil detection method called SybilHeat. Numerical experiments on synthetic graphs and real social networks demonstrate that SybilHeat performs consistently well on graphs with various structural properties. This study lays a theoretical foundation for graph-based Sybil detection and leads to a better understanding of Sybil detection methods.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
MEGEX: Data-Free Model Extraction Attack against Gradient-Based Explainable AI
Authors:
Takayuki Miura,
Satoshi Hasegawa,
Toshiki Shibahara
Abstract:
The advance of explainable artificial intelligence, which provides reasons for its predictions, is expected to accelerate the use of deep neural networks in the real world like Machine Learning as a Service (MLaaS) that returns predictions on queried data with the trained model. Deep neural networks deployed in MLaaS face the threat of model extraction attacks. A model extraction attack is an atta…
▽ More
The advance of explainable artificial intelligence, which provides reasons for its predictions, is expected to accelerate the use of deep neural networks in the real world like Machine Learning as a Service (MLaaS) that returns predictions on queried data with the trained model. Deep neural networks deployed in MLaaS face the threat of model extraction attacks. A model extraction attack is an attack to violate intellectual property and privacy in which an adversary steals trained models in a cloud using only their predictions. In particular, a data-free model extraction attack has been proposed recently and is more critical. In this attack, an adversary uses a generative model instead of preparing input data. The feasibility of this attack, however, needs to be studied since it requires more queries than that with surrogate datasets. In this paper, we propose MEGEX, a data-free model extraction attack against a gradient-based explainable AI. In this method, an adversary uses the explanations to train the generative model and reduces the number of queries to steal the model. Our experiments show that our proposed method reconstructs high-accuracy models -- 0.97$\times$ and 0.98$\times$ the victim model accuracy on SVHN and CIFAR-10 datasets given 2M and 20M queries, respectively. This implies that there is a trade-off between the interpretability of models and the difficulty of stealing them.
△ Less
Submitted 19 July, 2021;
originally announced July 2021.
-
Deep learning generates custom-made logistic regression models for explaining how breast cancer subtypes are classified
Authors:
Takuma Shibahara,
Chisa Wada,
Yasuho Yamashita,
Kazuhiro Fujita,
Masamichi Sato,
Junichi Kuwata,
Atsushi Okamoto,
Yoshimasa Ono
Abstract:
Differentiating the intrinsic subtypes of breast cancer is crucial for deciding the best treatment strategy. Deep learning can predict the subtypes from genetic information more accurately than conventional statistical methods, but to date, deep learning has not been directly utilized to examine which genes are associated with which subtypes. To clarify the mechanisms embedded in the intrinsic sub…
▽ More
Differentiating the intrinsic subtypes of breast cancer is crucial for deciding the best treatment strategy. Deep learning can predict the subtypes from genetic information more accurately than conventional statistical methods, but to date, deep learning has not been directly utilized to examine which genes are associated with which subtypes. To clarify the mechanisms embedded in the intrinsic subtypes, we developed an explainable deep learning model called a point-wise linear (PWL) model that generates a custom-made logistic regression for each patient. Logistic regression, which is familiar to both physicians and medical informatics researchers, allows us to analyze the importance of the feature variables, and the PWL model harnesses these practical abilities of logistic regression. In this study, we show that analyzing breast cancer subtypes is clinically beneficial for patients and one of the best ways to validate the capability of the PWL model. First, we trained the PWL model with RNA-seq data to predict PAM50 intrinsic subtypes and applied it to the 41/50 genes of PAM50 through the subtype prediction task. Second, we developed a deep enrichment analysis method to reveal the relationships between the PAM50 subtypes and the copy numbers of breast cancer. Our findings showed that the PWL model utilized genes relevant to the cell cycle-related pathways. These preliminary successes in breast cancer subtype analysis demonstrate the potential of our analysis strategy to clarify the mechanisms underlying breast cancer and improve overall clinical outcomes.
△ Less
Submitted 18 July, 2022; v1 submitted 20 January, 2020;
originally announced January 2020.
-
A Study on the Vulnerabilities of Mobile Apps associated with Software Modules
Authors:
Takuya Watanabe,
Mitsuaki Akiyama,
Fumihiro Kanei,
Eitaro Shioji,
Yuta Takata,
Bo Sun,
Yuta Ishi,
Toshiki Shibahara,
Takeshi Yagi,
Tatsuya Mori
Abstract:
This paper reports a large-scale study that aims to understand how mobile application (app) vulnerabilities are associated with software libraries. We analyze both free and paid apps. Studying paid apps was quite meaningful because it helped us understand how differences in app development/maintenance affect the vulnerabilities associated with libraries. We analyzed 30k free and paid apps collecte…
▽ More
This paper reports a large-scale study that aims to understand how mobile application (app) vulnerabilities are associated with software libraries. We analyze both free and paid apps. Studying paid apps was quite meaningful because it helped us understand how differences in app development/maintenance affect the vulnerabilities associated with libraries. We analyzed 30k free and paid apps collected from the official Android marketplace. Our extensive analyses revealed that approximately 70%/50% of vulnerabilities of free/paid apps stem from software libraries, particularly from third-party libraries. Somewhat paradoxically, we found that more expensive/popular paid apps tend to have more vulnerabilities. This comes from the fact that more expensive/popular paid apps tend to have more functionality, i.e., more code and libraries, which increases the probability of vulnerabilities. Based on our findings, we provide suggestions to stakeholders of mobile app distribution ecosystems.
△ Less
Submitted 27 March, 2017; v1 submitted 10 February, 2017;
originally announced February 2017.