Search | arXiv e-print repository

SoK: Leveraging Transformers for Malware Analysis

Authors: Pradip Kunwar, Kshitiz Aryal, Maanak Gupta, Mahmoud Abdelsalam, Elisa Bertino

Abstract: The introduction of transformers has been an important breakthrough for AI research and application as transformers are the foundation behind Generative AI. A promising application domain for transformers is cybersecurity, in particular the malware domain analysis. The reason is the flexibility of the transformer models in handling long sequential features and understanding contextual relationship… ▽ More The introduction of transformers has been an important breakthrough for AI research and application as transformers are the foundation behind Generative AI. A promising application domain for transformers is cybersecurity, in particular the malware domain analysis. The reason is the flexibility of the transformer models in handling long sequential features and understanding contextual relationships. However, as the use of transformers for malware analysis is still in the infancy stage, it is critical to evaluate, systematize, and contextualize existing literature to foster future research. This Systematization of Knowledge (SoK) paper aims to provide a comprehensive analysis of transformer-based approaches designed for malware analysis. Based on our systematic analysis of existing knowledge, we structure and propose taxonomies based on: (a) how different transformers are adapted, organized, and modified across various use cases; and (b) how diverse feature types and their representation capabilities are reflected. We also provide an inventory of datasets used to explore multiple research avenues in the use of transformers for malware analysis and discuss open challenges with future research directions. We believe that this SoK paper will assist the research community in gaining detailed insights from existing work and will serve as a foundational resource for implementing novel research using transformers for malware analysis. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2403.00935 [pdf, other]

Transfer Learning for Security: Challenges and Future Directions

Authors: Adrian Shuai Li, Arun Iyengar, Ashish Kundu, Elisa Bertino

Abstract: Many machine learning and data mining algorithms rely on the assumption that the training and testing data share the same feature space and distribution. However, this assumption may not always hold. For instance, there are situations where we need to classify data in one domain, but we only have sufficient training data available from a different domain. The latter data may follow a distinct dist… ▽ More Many machine learning and data mining algorithms rely on the assumption that the training and testing data share the same feature space and distribution. However, this assumption may not always hold. For instance, there are situations where we need to classify data in one domain, but we only have sufficient training data available from a different domain. The latter data may follow a distinct distribution. In such cases, successfully transferring knowledge across domains can significantly improve learning performance and reduce the need for extensive data labeling efforts. Transfer learning (TL) has thus emerged as a promising framework to tackle this challenge, particularly in security-related tasks. This paper aims to review the current advancements in utilizing TL techniques for security. The paper includes a discussion of the existing research gaps in applying TL in the security domain, as well as exploring potential future research directions and issues that arise in the context of TL-assisted security solutions. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2401.04958 [pdf, other]

FBSDetector: Fake Base Station and Multi Step Attack Detection in Cellular Networks using Machine Learning

Authors: Kazi Samin Mubasshir, Imtiaz Karim, Elisa Bertino

Abstract: Fake base stations (FBSes) pose a significant security threat by impersonating legitimate base stations. Though efforts have been made to defeat this threat, up to this day, the presence of FBSes and the multi-step attacks (MSAs) stemming from them can lead to unauthorized surveillance, interception of sensitive information, and disruption of network services for legitimate users. Therefore, detec… ▽ More Fake base stations (FBSes) pose a significant security threat by impersonating legitimate base stations. Though efforts have been made to defeat this threat, up to this day, the presence of FBSes and the multi-step attacks (MSAs) stemming from them can lead to unauthorized surveillance, interception of sensitive information, and disruption of network services for legitimate users. Therefore, detecting these malicious entities is crucial to ensure the security and reliability of cellular networks. Traditional detection methods often rely on additional hardware, predefined rules, signal scanning, changing protocol specifications, or cryptographic mechanisms that have limitations and incur huge infrastructure costs in accurately identifying FBSes. In this paper, we develop FBSDetector-an effective and efficient detection solution that can reliably detect FBSes and MSAs from layer-3 network traces using machine learning (ML) at the user equipment (UE) side. To develop FBSDetector, we created FBSAD and MSAD, the first-ever high-quality and large-scale datasets for training machine learning models capable of detecting FBSes and MSAs. These datasets capture the network traces in different real-world cellular network scenarios (including mobility and different attacker capabilities) incorporating legitimate base stations and FBSes. The combined network trace has a volume of 6.6 GB containing 751963 packets. Our novel ML models, specially designed to detect FBSes and MSAs, can effectively detect FBSes with an accuracy of 92% and a false positive rate of 5.96% and recognize MSAs with an accuracy of 86% and a false positive rate of 7.82%. We deploy FBSDetector as a real-world solution to protect end-users through an Android app and validate in a controlled lab environment. Compared to the existing solutions that fail to detect FBSes, FBSDetector can detect FBSes in the wild in real time. △ Less

Submitted 10 January, 2024; originally announced January 2024.

arXiv:2312.13119 [pdf, other]

Graphene: Infrastructure Security Posture Analysis with AI-generated Attack Graphs

Authors: Xin **, Charalampos Katsis, Fan Sang, Jiahao Sun, Elisa Bertino, Ramana Rao Kompella, Ashish Kundu

Abstract: The rampant occurrence of cybersecurity breaches imposes substantial limitations on the progress of network infrastructures, leading to compromised data, financial losses, potential harm to individuals, and disruptions in essential services. The current security landscape demands the urgent development of a holistic security assessment solution that encompasses vulnerability analysis and investiga… ▽ More The rampant occurrence of cybersecurity breaches imposes substantial limitations on the progress of network infrastructures, leading to compromised data, financial losses, potential harm to individuals, and disruptions in essential services. The current security landscape demands the urgent development of a holistic security assessment solution that encompasses vulnerability analysis and investigates the potential exploitation of these vulnerabilities as attack paths. In this paper, we propose Graphene, an advanced system designed to provide a detailed analysis of the security posture of computing infrastructures. Using user-provided information, such as device details and software versions, Graphene performs a comprehensive security assessment. This assessment includes identifying associated vulnerabilities and constructing potential attack graphs that adversaries can exploit. Furthermore, Graphene evaluates the exploitability of these attack paths and quantifies the overall security posture through a scoring mechanism. The system takes a holistic approach by analyzing security layers encompassing hardware, system, network, and cryptography. Furthermore, Graphene delves into the interconnections between these layers, exploring how vulnerabilities in one layer can be leveraged to exploit vulnerabilities in others. In this paper, we present the end-to-end pipeline implemented in Graphene, showcasing the systematic approach adopted for conducting this thorough security analysis. △ Less

Submitted 30 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.09665 [pdf, other]

FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge

Authors: Jiahe Lan, Jie Wang, Baochen Yan, Zheng Yan, Elisa Bertino

Abstract: Speech recognition systems driven by DNNs have revolutionized human-computer interaction through voice interfaces, which significantly facilitate our daily lives. However, the growing popularity of these systems also raises special concerns on their security, particularly regarding backdoor attacks. A backdoor attack inserts one or more hidden backdoors into a DNN model during its training process… ▽ More Speech recognition systems driven by DNNs have revolutionized human-computer interaction through voice interfaces, which significantly facilitate our daily lives. However, the growing popularity of these systems also raises special concerns on their security, particularly regarding backdoor attacks. A backdoor attack inserts one or more hidden backdoors into a DNN model during its training process, such that it does not affect the model's performance on benign inputs, but forces the model to produce an adversary-desired output if a specific trigger is present in the model input. Despite the initial success of current audio backdoor attacks, they suffer from the following limitations: (i) Most of them require sufficient knowledge, which limits their widespread adoption. (ii) They are not stealthy enough, thus easy to be detected by humans. (iii) Most of them cannot attack live speech, reducing their practicality. To address these problems, in this paper, we propose FlowMur, a stealthy and practical audio backdoor attack that can be launched with limited knowledge. FlowMur constructs an auxiliary dataset and a surrogate model to augment adversary knowledge. To achieve dynamicity, it formulates trigger generation as an optimization problem and optimizes the trigger over different attachment positions. To enhance stealthiness, we propose an adaptive data poisoning method according to Signal-to-Noise Ratio (SNR). Furthermore, ambient noise is incorporated into the process of trigger generation and data poisoning to make FlowMur robust to ambient noise and improve its practicality. Extensive experiments conducted on two datasets demonstrate that FlowMur achieves high attack performance in both digital and physical settings while remaining resilient to state-of-the-art defenses. In particular, a human study confirms that triggers generated by FlowMur are not easily detected by participants. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: To appear at lEEE Symposium on Security & Privacy (Oakland) 2024

arXiv:2312.07877 [pdf, other]

Segment-Based Formal Verification of WiFi Fragmentation and Power Save Mode

Authors: Zilin Shen, Imtiaz Karim, Elisa Bertino

Abstract: The IEEE 802.11 family of standards, better known as WiFi, is a widely used protocol utilized by billions of users. Previous works on WiFi formal verification have mostly focused on the four-way handshake and other security aspects. However, recent works have uncovered severe vulnerabilities in functional aspects of WiFi, which can cause information leakage for billions of devices. No formal analy… ▽ More The IEEE 802.11 family of standards, better known as WiFi, is a widely used protocol utilized by billions of users. Previous works on WiFi formal verification have mostly focused on the four-way handshake and other security aspects. However, recent works have uncovered severe vulnerabilities in functional aspects of WiFi, which can cause information leakage for billions of devices. No formal analysis method exists able to reason on the functional aspects of the WiFi protocol. In this paper, we take the first steps in addressing this gap and present an extensive formal analysis of the functional aspects of the WiFi protocol, more specifically, the fragmentation and the power-save-mode process. To achieve this, we design a novel segment-based formal verification process and introduce a practical threat model (i.e. MAC spoofing) in Tamarin to reason about the various capabilities of the attacker. To this end, we verify 68 properties extracted from WiFi protocol specification, find 3 vulnerabilities from the verification, verify 3 known attacks, and discover 2 new issues. These vulnerabilities and issues affect 14 commercial devices out of 17 tested cases, showing the prevalence and impact of the issues. Apart from this, we show that the proposed countermeasures indeed are sufficient to address the issues. We hope our results and analysis will help vendors adopt the countermeasures and motivate further research into the verification of the functional aspects of the WiFi protocol. △ Less

Submitted 16 December, 2023; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: Accepted to AsiaCCS 2024

arXiv:2311.04326 [pdf]

Educating for AI Cybersecurity Work and Research: Ethics, Systems Thinking, and Communication Requirements

Authors: Sorin Adam Matei, Elisa Bertino

Abstract: The present study explored managerial and instructor perceptions of their freshly employed cybersecurity workers' or students' preparedness to work effectively in a changing cybersecurity environment that includes AI tools. Specifically, we related perceptions of technical preparedness to ethical, systems thinking, and communication skills. We found that managers and professors perceive preparedne… ▽ More The present study explored managerial and instructor perceptions of their freshly employed cybersecurity workers' or students' preparedness to work effectively in a changing cybersecurity environment that includes AI tools. Specifically, we related perceptions of technical preparedness to ethical, systems thinking, and communication skills. We found that managers and professors perceive preparedness to use AI tools in cybersecurity to be significantly associated with all three non-technical skill sets. Most important, ethics is a clear leader in the network of relationships. Contrary to expectations that ethical concerns are left behind in the rush to adopt the most advanced AI tools in security, both higher education instructors and managers appreciate their role and see them closely associated with technical prowess. Another significant finding is that professors over-estimate students' preparedness for ethical, system thinking, and communication abilities compared to IT managers' perceptions of their newly employed IT workers. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.00903 [pdf]

Artificial Intelligence Ethics Education in Cybersecurity: Challenges and Opportunities: a focus group report

Authors: Diane Jackson, Sorin Adam Matei, Elisa Bertino

Abstract: The emergence of AI tools in cybersecurity creates many opportunities and uncertainties. A focus group with advanced graduate students in cybersecurity revealed the potential depth and breadth of the challenges and opportunities. The salient issues are access to open source or free tools, documentation, curricular diversity, and clear articulation of ethical principles for AI cybersecurity educati… ▽ More The emergence of AI tools in cybersecurity creates many opportunities and uncertainties. A focus group with advanced graduate students in cybersecurity revealed the potential depth and breadth of the challenges and opportunities. The salient issues are access to open source or free tools, documentation, curricular diversity, and clear articulation of ethical principles for AI cybersecurity education. Confronting the "black box" mentality in AI cybersecurity work is also of the greatest importance, doubled by deeper and prior education in foundational AI work. Systems thinking and effective communication were considered relevant areas of educational improvement. Future AI educators and practitioners need to address these issues by implementing rigorous technical training curricula, clear documentation, and frameworks for ethically monitoring AI combined with critical and system's thinking and communication skills. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2306.13339 [pdf, other]

doi 10.1109/TDSC.2024.3353548

TrustGuard: GNN-based Robust and Explainable Trust Evaluation with Dynamicity Support

Authors: Jie Wang, Zheng Yan, Jiahe Lan, Elisa Bertino, Witold Pedrycz

Abstract: Trust evaluation assesses trust relationships between entities and facilitates decision-making. Machine Learning (ML) shows great potential for trust evaluation owing to its learning capabilities. In recent years, Graph Neural Networks (GNNs), as a new ML paradigm, have demonstrated superiority in dealing with graph data. This has motivated researchers to explore their use in trust evaluation, as… ▽ More Trust evaluation assesses trust relationships between entities and facilitates decision-making. Machine Learning (ML) shows great potential for trust evaluation owing to its learning capabilities. In recent years, Graph Neural Networks (GNNs), as a new ML paradigm, have demonstrated superiority in dealing with graph data. This has motivated researchers to explore their use in trust evaluation, as trust relationships among entities can be modeled as a graph. However, current trust evaluation methods that employ GNNs fail to fully satisfy the dynamic nature of trust, overlook the adverse effects of trust-related attacks, and cannot provide convincing explanations on evaluation results. To address these problems, we propose TrustGuard, a GNN-based accurate trust evaluation model that supports trust dynamicity, is robust against typical attacks, and provides explanations through visualization. Specifically, TrustGuard is designed with a layered architecture that contains a snapshot input layer, a spatial aggregation layer, a temporal aggregation layer, and a prediction layer. Among them, the spatial aggregation layer adopts a defense mechanism to robustly aggregate local trust, and the temporal aggregation layer applies an attention mechanism for effective learning of temporal patterns. Extensive experiments on two real-world datasets show that TrustGuard outperforms state-of-the-art GNN-based trust evaluation models with respect to trust prediction across single-timeslot and multi-timeslot, even in the presence of attacks. In addition, TrustGuard can explain its evaluation results by visualizing both spatial and temporal views. △ Less

Submitted 4 February, 2024; v1 submitted 23 June, 2023; originally announced June 2023.

Comments: Accepted by IEEE TDSC. Code: https://github.com/Jieerbobo/TrustGuard

arXiv:2306.00262 [pdf, other]

Maximal Domain Independent Representations Improve Transfer Learning

Authors: Adrian Shuai Li, Elisa Bertino, Xuan-Hong Dang, Ankush Singla, Yuhai Tu, Mark N Wegman

Abstract: The most effective domain adaptation (DA) involves the decomposition of data representation into a domain independent representation (DIRep), and a domain dependent representation (DDRep). A classifier is trained by using the DIRep of the labeled source images. Since the DIRep is domain invariant, the classifier can be "transferred" to make predictions for the target domain with no (or few) labels… ▽ More The most effective domain adaptation (DA) involves the decomposition of data representation into a domain independent representation (DIRep), and a domain dependent representation (DDRep). A classifier is trained by using the DIRep of the labeled source images. Since the DIRep is domain invariant, the classifier can be "transferred" to make predictions for the target domain with no (or few) labels. However, information useful for classification in the target domain can "hide" in the DDRep in current DA algorithms such as Domain-Separation-Networks (DSN). DSN's weak constraint to enforce orthogonality of DIRep and DDRep, allows this hiding and can result in poor performance. To address this shortcoming, we developed a new algorithm wherein a stronger constraint is imposed to minimize the DDRep by using a KL divergent loss for the DDRep in order to create the maximal DIRep that enhances transfer learning performance. By using synthetic data sets, we show explicitly that depending on initialization DSN with its weaker constraint can lead to sub-optimal solutions with poorer DA performance whereas our algorithm with maximal DIRep is robust against such perturbations. We demonstrate the equal-or-better performance of our approach against state-of-the-art algorithms by using several standard benchmark image datasets including Office. We further highlight the compatibility of our algorithm with pretrained models, extending its applicability and versatility in real-world scenarios. △ Less

Submitted 6 June, 2024; v1 submitted 31 May, 2023; originally announced June 2023.

arXiv:2306.00202 [pdf, other]

Building Manufacturing Deep Learning Models with Minimal and Imbalanced Training Data Using Domain Adaptation and Data Augmentation

Authors: Adrian Shuai Li, Elisa Bertino, Rih-Teng Wu, Ting-Yan Wu

Abstract: Deep learning (DL) techniques are highly effective for defect detection from images. Training DL classification models, however, requires vast amounts of labeled data which is often expensive to collect. In many cases, not only the available training data is limited but may also imbalanced. In this paper, we propose a novel domain adaptation (DA) approach to address the problem of labeled training… ▽ More Deep learning (DL) techniques are highly effective for defect detection from images. Training DL classification models, however, requires vast amounts of labeled data which is often expensive to collect. In many cases, not only the available training data is limited but may also imbalanced. In this paper, we propose a novel domain adaptation (DA) approach to address the problem of labeled training data scarcity for a target learning task by transferring knowledge gained from an existing source dataset used for a similar learning task. Our approach works for scenarios where the source dataset and the dataset available for the target learning task have same or different feature spaces. We combine our DA approach with an autoencoder-based data augmentation approach to address the problem of imbalanced target datasets. We evaluate our combined approach using image data for wafer defect prediction. The experiments show its superior performance against other algorithms when the number of labeled samples in the target dataset is significantly small and the target dataset is imbalanced. △ Less

Submitted 31 May, 2023; originally announced June 2023.

arXiv:2304.10977 [pdf, other]

Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition

Authors: Matteo Muffo, Aldo Cocco, Enrico Bertino

Abstract: In recent years, Large Language Models such as GPT-3 showed remarkable capabilities in performing NLP tasks in the zero and few shot settings. On the other hand, the experiments highlighted the difficulty of GPT-3 in carrying out tasks that require a certain degree of reasoning, such as arithmetic operations. In this paper we evaluate the ability of Transformer Language Models to perform arithmeti… ▽ More In recent years, Large Language Models such as GPT-3 showed remarkable capabilities in performing NLP tasks in the zero and few shot settings. On the other hand, the experiments highlighted the difficulty of GPT-3 in carrying out tasks that require a certain degree of reasoning, such as arithmetic operations. In this paper we evaluate the ability of Transformer Language Models to perform arithmetic operations following a pipeline that, before performing computations, decomposes numbers in units, tens, and so on. We denote the models fine-tuned with this pipeline with the name Calculon and we test them in the task of performing additions, subtractions and multiplications on the same test sets of GPT-3. Results show an increase of accuracy of 63% in the five-digit addition task. Moreover, we demonstrate the importance of the decomposition pipeline introduced, since fine-tuning the same Language Model without decomposing numbers results in 0% accuracy in the five-digit addition task. △ Less

Submitted 21 April, 2023; originally announced April 2023.

Comments: 7 pages, 1 figure, published at LREC 2022

Journal ref: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 291-297

arXiv:2303.18121 [pdf, ps, other]

BERTino: an Italian DistilBERT model

Authors: Matteo Muffo, Enrico Bertino

Abstract: The recent introduction of Transformers language representation models allowed great improvements in many natural language processing (NLP) tasks. However, if on one hand the performances achieved by this kind of architectures are surprising, on the other their usability is limited by the high number of parameters which constitute their network, resulting in high computational and memory demands.… ▽ More The recent introduction of Transformers language representation models allowed great improvements in many natural language processing (NLP) tasks. However, if on one hand the performances achieved by this kind of architectures are surprising, on the other their usability is limited by the high number of parameters which constitute their network, resulting in high computational and memory demands. In this work we present BERTino, a DistilBERT model which proposes to be the first lightweight alternative to the BERT architecture specific for the Italian language. We evaluated BERTino on the Italian ISDT, Italian ParTUT, Italian WikiNER and multiclass classification tasks, obtaining F1 scores comparable to those obtained by a BERTBASE with a remarkable improvement in training and inference speed. △ Less

Submitted 31 March, 2023; originally announced March 2023.

Comments: 5 pages

Journal ref: Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

arXiv:2301.09201 [pdf, other]

SPEC5G: A Dataset for 5G Cellular Network Protocol Analysis

Authors: Imtiaz Karim, Kazi Samin Mubasshir, Mirza Masfiqur Rahman, Elisa Bertino

Abstract: 5G is the 5th generation cellular network protocol. It is the state-of-the-art global wireless standard that enables an advanced kind of network designed to connect virtually everyone and everything with increased speed and reduced latency. Therefore, its development, analysis, and security are critical. However, all approaches to the 5G protocol development and security analysis, e.g., property e… ▽ More 5G is the 5th generation cellular network protocol. It is the state-of-the-art global wireless standard that enables an advanced kind of network designed to connect virtually everyone and everything with increased speed and reduced latency. Therefore, its development, analysis, and security are critical. However, all approaches to the 5G protocol development and security analysis, e.g., property extraction, protocol summarization, and semantic analysis of the protocol specifications and implementations are completely manual. To reduce such manual effort, in this paper, we curate SPEC5G the first-ever public 5G dataset for NLP research. The dataset contains 3,547,586 sentences with 134M words, from 13094 cellular network specifications and 13 online websites. By leveraging large-scale pre-trained language models that have achieved state-of-the-art results on NLP tasks, we use this dataset for security-related text classification and summarization. Security-related text classification can be used to extract relevant security-related properties for protocol testing. On the other hand, summarization can help developers and practitioners understand the high level of the protocol, which is itself a daunting task. Our results show the value of our 5G-centric dataset in 5G protocol analysis automation. We believe that SPEC5G will enable a new research direction into automatic analyses for the 5G cellular network protocol and numerous related downstream tasks. Our data and code are publicly available. △ Less

Submitted 14 September, 2023; v1 submitted 22 January, 2023; originally announced January 2023.

arXiv:2210.03014 [pdf, other]

EvilScreen Attack: Smart TV Hijacking via Multi-channel Remote Control Mimicry

Authors: Yiwei Zhang, Siqi Ma, Tiancheng Chen, Juanru Li, Robert H. Deng, Elisa Bertino

Abstract: Modern smart TVs often communicate with their remote controls (including those smart phone simulated ones) using multiple wireless channels (e.g., Infrared, Bluetooth, and Wi-Fi). However, this multi-channel remote control communication introduces a new attack surface. An inherent security flaw is that remote controls of most smart TVs are designed to work in a benign environment rather than an ad… ▽ More Modern smart TVs often communicate with their remote controls (including those smart phone simulated ones) using multiple wireless channels (e.g., Infrared, Bluetooth, and Wi-Fi). However, this multi-channel remote control communication introduces a new attack surface. An inherent security flaw is that remote controls of most smart TVs are designed to work in a benign environment rather than an adversarial one, and thus wireless communications between a smart TV and its remote controls are not strongly protected. Attackers could leverage such flaw to abuse the remote control communication and compromise smart TV systems. In this paper, we propose EvilScreen, a novel attack that exploits ill-protected remote control communications to access protected resources of a smart TV or even control the screen. EvilScreen exploits a multi-channel remote control mimicry vulnerability present in today smart TVs. Unlike other attacks, which compromise the TV system by exploiting code vulnerabilities or malicious third-party apps, EvilScreen directly reuses commands of different remote controls, combines them together to circumvent deployed authentication and isolation policies, and finally accesses or controls TV resources remotely. We evaluated eight mainstream smart TVs and found that they are all vulnerable to EvilScreen attacks, including a Samsung product adopting the ISO/IEC security specification. △ Less

Submitted 6 October, 2022; originally announced October 2022.

arXiv:2201.09370 [pdf, other]

Are Your Sensitive Attributes Private? Novel Model Inversion Attribute Inference Attacks on Classification Models

Authors: Shagufta Mehnaz, Sayanton V. Dibbo, Ehsanul Kabir, Ninghui Li, Elisa Bertino

Abstract: Increasing use of machine learning (ML) technologies in privacy-sensitive domains such as medical diagnoses, lifestyle predictions, and business decisions highlights the need to better understand if these ML technologies are introducing leakage of sensitive and proprietary training data. In this paper, we focus on model inversion attacks where the adversary knows non-sensitive attributes about rec… ▽ More Increasing use of machine learning (ML) technologies in privacy-sensitive domains such as medical diagnoses, lifestyle predictions, and business decisions highlights the need to better understand if these ML technologies are introducing leakage of sensitive and proprietary training data. In this paper, we focus on model inversion attacks where the adversary knows non-sensitive attributes about records in the training data and aims to infer the value of a sensitive attribute unknown to the adversary, using only black-box access to the target classification model. We first devise a novel confidence score-based model inversion attribute inference attack that significantly outperforms the state-of-the-art. We then introduce a label-only model inversion attack that relies only on the model's predicted labels but still matches our confidence score-based attack in terms of attack effectiveness. We also extend our attacks to the scenario where some of the other (non-sensitive) attributes of a target record are unknown to the adversary. We evaluate our attacks on two types of machine learning models, decision tree and deep neural network, trained on three real datasets. Moreover, we empirically demonstrate the disparate vulnerability of model inversion attacks, i.e., specific groups in the training dataset (grouped by gender, race, etc.) could be more vulnerable to model inversion attacks. △ Less

Submitted 23 January, 2022; originally announced January 2022.

Comments: Conditionally accepted to USENIX Security 2022. This is not the camera-ready version. arXiv admin note: substantial text overlap with arXiv:2012.03404

arXiv:2112.03511 [pdf, other]

Control Parameters Considered Harmful: Detecting Range Specification Bugs in Drone Configuration Modules via Learning-Guided Search

Authors: Ruidong Han, Chao Yang, Siqi Ma, JiangFeng Ma, Cong Sun, Juanru Li, Elisa Bertino

Abstract: In order to support a variety of missions and deal with different flight environments, drone control programs typically provide configurable control parameters. However, such a flexibility introduces vulnerabilities. One such vulnerability, referred to as range specification bugs, has been recently identified. The vulnerability originates from the fact that even though each individual parameter re… ▽ More In order to support a variety of missions and deal with different flight environments, drone control programs typically provide configurable control parameters. However, such a flexibility introduces vulnerabilities. One such vulnerability, referred to as range specification bugs, has been recently identified. The vulnerability originates from the fact that even though each individual parameter receives a value in the recommended value range, certain combinations of parameter values may affect the drone physical stability. In this paper we develop a novel learning-guided search system to find such combinations, that we refer to as incorrect configurations. Our system applies metaheuristic search algorithms mutating configurations to detect the configuration parameters that have values driving the drone to unstable physical states. To guide the mutations, our system leverages a machine learning predictor as the fitness evaluator. Finally, by utilizing multi-objective optimization, our system returns the feasible ranges based on the mutation search results. Because in our system the mutations are guided by a predictor, evaluating the parameter configurations does not require realistic/simulation executions. Therefore, our system supports a comprehensive and yet efficient detection of incorrect configurations. We have carried out an experimental evaluation of our system. The evaluation results show that the system successfully reports potentially incorrect configurations, of which over 85% lead to actual unstable physical states. △ Less

Submitted 7 December, 2021; originally announced December 2021.

Comments: Accepted to ICSE2022 Technical Track

arXiv:2103.07704 [pdf, other]

Simeon -- Secure Federated Machine Learning Through Iterative Filtering

Authors: Nicholas Malecki, Hye-young Paik, Aleksandar Ignjatovic, Alan Blair, Elisa Bertino

Abstract: Federated learning enables a global machine learning model to be trained collaboratively by distributed, mutually non-trusting learning agents who desire to maintain the privacy of their training data and their hardware. A global model is distributed to clients, who perform training, and submit their newly-trained model to be aggregated into a superior model. However, federated learning systems ar… ▽ More Federated learning enables a global machine learning model to be trained collaboratively by distributed, mutually non-trusting learning agents who desire to maintain the privacy of their training data and their hardware. A global model is distributed to clients, who perform training, and submit their newly-trained model to be aggregated into a superior model. However, federated learning systems are vulnerable to interference from malicious learning agents who may desire to prevent training or induce targeted misclassification in the resulting global model. A class of Byzantine-tolerant aggregation algorithms has emerged, offering varying degrees of robustness against these attacks, often with the caveat that the number of attackers is bounded by some quantity known prior to training. This paper presents Simeon: a novel approach to aggregation that applies a reputation-based iterative filtering technique to achieve robustness even in the presence of attackers who can exhibit arbitrary behaviour. We compare Simeon to state-of-the-art aggregation techniques and find that Simeon achieves comparable or superior robustness to a variety of attacks. Notably, we show that Simeon is tolerant to sybil attacks, where other algorithms are not, presenting a key advantage of our approach. △ Less

Submitted 13 March, 2021; originally announced March 2021.

arXiv:2103.05758 [pdf, other]

Fine with "1234"? An Analysis of SMS One-Time Password Randomness in Android Apps

Authors: Siqi Ma, Juanru Li, Hyoungshick Kim, Elisa Bertino, Surya Nepal, Diethelm Ostry, Cong Sun

Abstract: A fundamental premise of SMS One-Time Password (OTP) is that the used pseudo-random numbers (PRNs) are uniquely unpredictable for each login session. Hence, the process of generating PRNs is the most critical step in the OTP authentication. An improper implementation of the pseudo-random number generator (PRNG) will result in predictable or even static OTP values, making them vulnerable to potenti… ▽ More A fundamental premise of SMS One-Time Password (OTP) is that the used pseudo-random numbers (PRNs) are uniquely unpredictable for each login session. Hence, the process of generating PRNs is the most critical step in the OTP authentication. An improper implementation of the pseudo-random number generator (PRNG) will result in predictable or even static OTP values, making them vulnerable to potential attacks. In this paper, we present a vulnerability study against PRNGs implemented for Android apps. A key challenge is that PRNGs are typically implemented on the server-side, and thus the source code is not accessible. To resolve this issue, we build an analysis tool, \sysname, to assess implementations of the PRNGs in an automated manner without the source code requirement. Through reverse engineering, \sysname identifies the apps using SMS OTP and triggers each app's login functionality to retrieve OTP values. It further assesses the randomness of the OTP values to identify vulnerable PRNGs. By analyzing 6,431 commercially used Android apps downloaded from \tool{Google Play} and \tool{Tencent Myapp}, \sysname identified 399 vulnerable apps that generate predictable OTP values. Even worse, 194 vulnerable apps use the OTP authentication alone without any additional security mechanisms, leading to insecure authentication against guessing attacks and replay attacks. △ Less

Submitted 6 March, 2021; originally announced March 2021.

arXiv:2101.01279 [pdf]

Computing Research Challenges in Next Generation Wireless Networking

Authors: Elisa Bertino, Daniel Bliss, Daniel Lopresti, Larry Peterson, Henning Schulzrinne

Abstract: By all measures, wireless networking has seen explosive growth over the past decade. Fourth Generation Long Term Evolution (4G LTE) cellular technology has increased the bandwidth available for smartphones, in essence, delivering broadband speeds to mobile devices. The most recent 5G technology is further enhancing the transmission speeds and cell capacity, as well as, reducing latency through the… ▽ More By all measures, wireless networking has seen explosive growth over the past decade. Fourth Generation Long Term Evolution (4G LTE) cellular technology has increased the bandwidth available for smartphones, in essence, delivering broadband speeds to mobile devices. The most recent 5G technology is further enhancing the transmission speeds and cell capacity, as well as, reducing latency through the use of different radio technologies and is expected to provide Internet connections that are an order of magnitude faster than 4G LTE. Technology continues to advance rapidly, however, and the next generation, 6G, is already being envisioned. 6G will make possible a wide range of powerful, new applications including holographic telepresence, telehealth, remote education, ubiquitous robotics and autonomous vehicles, smart cities and communities (IoT), and advanced manufacturing (Industry 4.0, sometimes referred to as the Fourth Industrial Revolution), to name but a few. The advances we will see begin at the hardware level and extend all the way to the top of the software "stack." Artificial Intelligence (AI) will also start playing a greater role in the development and management of wireless networking infrastructure by becoming embedded in applications throughout all levels of the network. The resulting benefits to society will be enormous. At the same time these exciting new wireless capabilities are appearing rapidly on the horizon, a broad range of research challenges loom ahead. These stem from the ever-increasing complexity of the hardware and software systems, along with the need to provide infrastructure that is robust and secure while simultaneously protecting the privacy of users. Here we outline some of those challenges and provide recommendations for the research that needs to be done to address them. △ Less

Submitted 4 January, 2021; originally announced January 2021.

Comments: A Computing Community Consortium (CCC) white paper, 5 pages

Report number: ccc2020whitepaper_16

arXiv:2012.09520 [pdf]

PURE: A Framework for Analyzing Proximity-based Contact Tracing Protocols

Authors: Fabrizio Cicala, Weicheng Wang, Tianhao Wang, Ninghui Li, Elisa Bertino, Faming Liang, Yang Yang

Abstract: Many proximity-based tracing (PCT) protocols have been proposed and deployed to combat the spreading of COVID-19. In this paper, we take a systematic approach to analyze PCT protocols. We identify a list of desired properties of a contact tracing design from the four aspects of Privacy, Utility, Resiliency, and Efficiency (PURE). We also identify two main design choices for PCT protocols: what inf… ▽ More Many proximity-based tracing (PCT) protocols have been proposed and deployed to combat the spreading of COVID-19. In this paper, we take a systematic approach to analyze PCT protocols. We identify a list of desired properties of a contact tracing design from the four aspects of Privacy, Utility, Resiliency, and Efficiency (PURE). We also identify two main design choices for PCT protocols: what information patients report to the server, and which party performs the matching. These two choices determine most of the PURE properties and enable us to conduct a comprehensive analysis and comparison of the existing protocols. △ Less

Submitted 17 December, 2020; originally announced December 2020.

Comments: 35 pages, 10 figures, 7 tables

ACM Class: C.3; H.4; J.3; J.7; K.4; K.6.5

arXiv:2012.06034 [pdf]

Artificial Intelligence & Cooperation

Authors: Elisa Bertino, Finale Doshi-Velez, Maria Gini, Daniel Lopresti, David Parkes

Abstract: The rise of Artificial Intelligence (AI) will bring with it an ever-increasing willingness to cede decision-making to machines. But rather than just giving machines the power to make decisions that affect us, we need ways to work cooperatively with AI systems. There is a vital need for research in "AI and Cooperation" that seeks to understand the ways in which systems of AIs and systems of AIs wit… ▽ More The rise of Artificial Intelligence (AI) will bring with it an ever-increasing willingness to cede decision-making to machines. But rather than just giving machines the power to make decisions that affect us, we need ways to work cooperatively with AI systems. There is a vital need for research in "AI and Cooperation" that seeks to understand the ways in which systems of AIs and systems of AIs with people can engender cooperative behavior. Trust in AI is also key: trust that is intrinsic and trust that can only be earned over time. Here we use the term "AI" in its broadest sense, as employed by the recent 20-Year Community Roadmap for AI Research (Gil and Selman, 2019), including but certainly not limited to, recent advances in deep learning. With success, cooperation between humans and AIs can build society just as human-human cooperation has. Whether coming from an intrinsic willingness to be helpful, or driven through self-interest, human societies have grown strong and the human species has found success through cooperation. We cooperate "in the small" -- as family units, with neighbors, with co-workers, with strangers -- and "in the large" as a global community that seeks cooperative outcomes around questions of commerce, climate change, and disarmament. Cooperation has evolved in nature also, in cells and among animals. While many cases involving cooperation between humans and AIs will be asymmetric, with the human ultimately in control, AI systems are growing so complex that, even today, it is impossible for the human to fully comprehend their reasoning, recommendations, and actions when functioning simply as passive observers. △ Less

Submitted 10 December, 2020; originally announced December 2020.

Comments: A Computing Community Consortium (CCC) white paper, 4 pages

Report number: ccc2020whitepaper_4

arXiv:2012.05410 [pdf]

Artificial Intelligence at the Edge

Authors: Elisa Bertino, Sujata Banerjee

Abstract: The Internet of Things (IoT) and edge computing applications aim to support a variety of societal needs, including the global pandemic situation that the entire world is currently experiencing and responses to natural disasters. The need for real-time interactive applications such as immersive video conferencing, augmented/virtual reality, and autonomous vehicles, in education, healthcare, disas… ▽ More The Internet of Things (IoT) and edge computing applications aim to support a variety of societal needs, including the global pandemic situation that the entire world is currently experiencing and responses to natural disasters. The need for real-time interactive applications such as immersive video conferencing, augmented/virtual reality, and autonomous vehicles, in education, healthcare, disaster recovery and other domains, has never been higher. At the same time, there have been recent technological breakthroughs in highly relevant fields such as artificial intelligence (AI)/machine learning (ML), advanced communication systems (5G and beyond), privacy-preserving computations, and hardware accelerators. 5G mobile communication networks increase communication capacity, reduce transmission latency and error, and save energy -- capabilities that are essential for new applications. The envisioned future 6G technology will integrate many more technologies, including for example visible light communication, to support groundbreaking applications, such as holographic communications and high precision manufacturing. Many of these applications require computations and analytics close to application end-points: that is, at the edge of the network, rather than in a centralized cloud. AI techniques applied at the edge have tremendous potential both to power new applications and to need more efficient operation of edge infrastructure. However, it is critical to understand where to deploy AI systems within complex ecosystems consisting of advanced applications and the specific real-time requirements towards AI systems. △ Less

Submitted 9 December, 2020; originally announced December 2020.

Comments: A Computing Community Consortium (CCC) white paper, 4 pages

Report number: ccc2020whitepaper_3

arXiv:2012.03404 [pdf, other]

Black-box Model Inversion Attribute Inference Attacks on Classification Models

Authors: Shagufta Mehnaz, Ninghui Li, Elisa Bertino

Abstract: Increasing use of ML technologies in privacy-sensitive domains such as medical diagnoses, lifestyle predictions, and business decisions highlights the need to better understand if these ML technologies are introducing leakages of sensitive and proprietary training data. In this paper, we focus on one kind of model inversion attacks, where the adversary knows non-sensitive attributes about instance… ▽ More Increasing use of ML technologies in privacy-sensitive domains such as medical diagnoses, lifestyle predictions, and business decisions highlights the need to better understand if these ML technologies are introducing leakages of sensitive and proprietary training data. In this paper, we focus on one kind of model inversion attacks, where the adversary knows non-sensitive attributes about instances in the training data and aims to infer the value of a sensitive attribute unknown to the adversary, using oracle access to the target classification model. We devise two novel model inversion attribute inference attacks -- confidence modeling-based attack and confidence score-based attack, and also extend our attack to the case where some of the other (non-sensitive) attributes are unknown to the adversary. Furthermore, while previous work uses accuracy as the metric to evaluate the effectiveness of attribute inference attacks, we find that accuracy is not informative when the sensitive attribute distribution is unbalanced. We identify two metrics that are better for evaluating attribute inference attacks, namely G-mean and Matthews correlation coefficient (MCC). We evaluate our attacks on two types of machine learning models, decision tree and deep neural network, trained with two real datasets. Experimental results show that our newly proposed attacks significantly outperform the state-of-the-art attacks. Moreover, we empirically show that specific groups in the training dataset (grouped by attributes, e.g., gender, race) could be more vulnerable to model inversion attacks. We also demonstrate that our attacks' performances are not impacted significantly when some of the other (non-sensitive) attributes are also unknown to the adversary. △ Less

Submitted 6 December, 2020; originally announced December 2020.

arXiv:2010.09767 [pdf, other]

FLAP -- A Federated Learning Framework for Attribute-based Access Control Policies

Authors: Amani Abu Jabal, Elisa Bertino, Jorge Lobo, Dinesh Verma, Seraphin Calo, Alessandra Russo

Abstract: Technology advances in areas such as sensors, IoT, and robotics, enable new collaborative applications (e.g., autonomous devices). A primary requirement for such collaborations is to have a secure system which enables information sharing and information flow protection. Policy-based management system is a key mechanism for secure selective sharing of protected resources. However, policies in each… ▽ More Technology advances in areas such as sensors, IoT, and robotics, enable new collaborative applications (e.g., autonomous devices). A primary requirement for such collaborations is to have a secure system which enables information sharing and information flow protection. Policy-based management system is a key mechanism for secure selective sharing of protected resources. However, policies in each party of such a collaborative environment cannot be static as they have to adapt to different contexts and situations. One advantage of collaborative applications is that each party in the collaboration can take advantage of knowledge of the other parties for learning or enhancing its own policies. We refer to this learning mechanism as policy transfer. The design of a policy transfer framework has challenges, including policy conflicts and privacy issues. Policy conflicts typically arise because of differences in the obligations of the parties, whereas privacy issues result because of data sharing constraints for sensitive data. Hence, the policy transfer framework should be able to tackle such challenges by considering minimal sharing of data and support policy adaptation to address conflict. In the paper we propose a framework that aims at addressing such challenges. We introduce a formal definition of the policy transfer problem for attribute-based policies. We then introduce the transfer methodology that consists of three sequential steps. Finally we report experimental results. △ Less

Submitted 2 November, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: Presented at AAAI FSS-20: Artificial Intelligence in Government and Public Sector, Washington, DC, USA

arXiv:2003.13604 [pdf]

5G Security and Privacy: A Research Roadmap

Authors: Elisa Bertino, Syed Rafiul Hussain, Omar Chowdhury

Abstract: Cellular networks represent a critical infrastructure and their security is thus crucial. 5G - the latest generation of cellular networks - combines different technologies to increase capacity, reduce latency, and save energy. Due to its complexity and scale, however, ensuring its security is extremely challenging. In this white paper, we outline recent approaches supporting systematic analyses of… ▽ More Cellular networks represent a critical infrastructure and their security is thus crucial. 5G - the latest generation of cellular networks - combines different technologies to increase capacity, reduce latency, and save energy. Due to its complexity and scale, however, ensuring its security is extremely challenging. In this white paper, we outline recent approaches supporting systematic analyses of 4G LTE and 5G protocols and their related defenses and introduce an initial security and privacy roadmap, covering different research challenges, including formal and comprehensive analyses of cellular protocols as defined by the standardization groups, verification of the software implementing the protocols, the design of robust defenses, and application and device security. △ Less

Submitted 30 March, 2020; originally announced March 2020.

Comments: A Computing Community Consortium (CCC) white paper, 8 pages

Report number: ccc2020whitepaper_1

arXiv:1910.06799 [pdf, other]

Federated Learning for Coalition Operations

Authors: D. Verma, S. Calo, S. Witherspoon, E. Bertino, A. Abu Jabal, A. Swami, G. Cirincione, S. Julier, G. White, G. de Mel, G. Pearson

Abstract: Machine Learning in coalition settings requires combining insights available from data assets and knowledge repositories distributed across multiple coalition partners. In tactical environments, this requires sharing the assets, knowledge and models in a bandwidth-constrained environment, while staying in conformance with the privacy, security and other applicable policies for each coalition membe… ▽ More Machine Learning in coalition settings requires combining insights available from data assets and knowledge repositories distributed across multiple coalition partners. In tactical environments, this requires sharing the assets, knowledge and models in a bandwidth-constrained environment, while staying in conformance with the privacy, security and other applicable policies for each coalition member. Federated Machine Learning provides an approach for such sharing. In its simplest version, federated machine learning could exchange training data available among the different coalition members, with each partner deciding which part of the training data from other partners to accept based on the quality and value of the offered data. In a more sophisticated version, coalition partners may exchange models learnt locally, which need to be transformed, accepted in entirety or in part based on the quality and value offered by each model, and fused together into an integrated model. In this paper, we examine the challenges present in creating federated learning solutions in coalition settings, and present the different flavors of federated learning that we have created as part of our research in the DAIS ITA. The challenges addressed include dealing with varying quality of data and models, determining the value offered by the data/model of each coalition partner, addressing the heterogeneity in data representation, labeling and AI model architecture selected by different coalition members, and handling the varying levels of trust present among members of the coalition. We also identify some open problems that remain to be addressed to create a viable solution for federated learning in coalition environments. △ Less

Submitted 14 October, 2019; originally announced October 2019.

Comments: Presented at AAAI FSS-19: Artificial Intelligence in Government and Public Sector, Arlington, Virginia, USA

arXiv:1807.02593 [pdf, other]

Gargoyle: A Network-based Insider Attack Resilient Framework for Organizations

Authors: Arash Shaghaghi, Salil S. Kanhere, Mohamed Ali Kaafar, Elisa Bertino, Sanjay Jha

Abstract: `Anytime, Anywhere' data access model has become a widespread IT policy in organizations making insider attacks even more complicated to model, predict and deter. Here, we propose Gargoyle, a network-based insider attack resilient framework against the most complex insider threats within a pervasive computing context. Compared to existing solutions, Gargoyle evaluates the trustworthiness of an acc… ▽ More `Anytime, Anywhere' data access model has become a widespread IT policy in organizations making insider attacks even more complicated to model, predict and deter. Here, we propose Gargoyle, a network-based insider attack resilient framework against the most complex insider threats within a pervasive computing context. Compared to existing solutions, Gargoyle evaluates the trustworthiness of an access request context through a new set of contextual attributes called Network Context Attribute (NCA). NCAs are extracted from the network traffic and include information such as the user's device capabilities, security-level, current and prior interactions with other devices, network connection status, and suspicious online activities. Retrieving such information from the user's device and its integrated sensors are challenging in terms of device performance overheads, sensor costs, availability, reliability and trustworthiness. To address these issues, Gargoyle leverages the capabilities of Software-Defined Network (SDN) for both policy enforcement and implementation. In fact, Gargoyle's SDN App can interact with the network controller to create a `defence-in-depth' protection system. For instance, Gargoyle can automatically quarantine a suspicious data requestor in the enterprise network for further investigation or filter out an access request before engaging a data provider. Finally, instead of employing simplistic binary rules in access authorizations, Gargoyle incorporates Function-based Access Control (FBAC) and supports the customization of access policies into a set of functions (e.g., disabling copy, allowing print) depending on the perceived trustworthiness of the context. △ Less

Submitted 6 July, 2018; originally announced July 2018.

Comments: Accepted to IEEE LCN 2018 as full paper, Pre-final version - slightly different than the final version published by the conference

arXiv:1504.05998 [pdf, ps, other]

Differentially Private $k$-Means Clustering

Authors: Dong Su, Jianneng Cao, Ninghui Li, Elisa Bertino, Hongxia **

Abstract: There are two broad approaches for differentially private data analysis. The interactive approach aims at develo** customized differentially private algorithms for various data mining tasks. The non-interactive approach aims at develo** differentially private algorithms that can output a synopsis of the input dataset, which can then be used to support various data mining tasks. In this paper w… ▽ More There are two broad approaches for differentially private data analysis. The interactive approach aims at develo** customized differentially private algorithms for various data mining tasks. The non-interactive approach aims at develo** differentially private algorithms that can output a synopsis of the input dataset, which can then be used to support various data mining tasks. In this paper we study the tradeoff of interactive vs. non-interactive approaches and propose a hybrid approach that combines interactive and non-interactive, using $k$-means clustering as an example. In the hybrid approach to differentially private $k$-means clustering, one first uses a non-interactive mechanism to publish a synopsis of the input dataset, then applies the standard $k$-means clustering algorithm to learn $k$ cluster centroids, and finally uses an interactive approach to further improve these cluster centroids. We analyze the error behavior of both non-interactive and interactive approaches and use such analysis to decide how to allocate privacy budget between the non-interactive step and the interactive step. Results from extensive experiments support our analysis and demonstrate the effectiveness of our approach. △ Less

Submitted 22 April, 2015; originally announced April 2015.

arXiv:1412.4378 [pdf, ps, other]

Privacy-Preserving and Outsourced Multi-User k-Means Clustering

Authors: Bharath K. Samanthula, Fang-Yu Rao, Elisa Bertino, Xun Yi, Dongxi Liu

Abstract: Many techniques for privacy-preserving data mining (PPDM) have been investigated over the past decade. Often, the entities involved in the data mining process are end-users or organizations with limited computing and storage resources. As a result, such entities may want to refrain from participating in the PPDM process. To overcome this issue and to take many other benefits of cloud computing, ou… ▽ More Many techniques for privacy-preserving data mining (PPDM) have been investigated over the past decade. Often, the entities involved in the data mining process are end-users or organizations with limited computing and storage resources. As a result, such entities may want to refrain from participating in the PPDM process. To overcome this issue and to take many other benefits of cloud computing, outsourcing PPDM tasks to the cloud environment has recently gained special attention. We consider the scenario where n entities outsource their databases (in encrypted format) to the cloud and ask the cloud to perform the clustering task on their combined data in a privacy-preserving manner. We term such a process as privacy-preserving and outsourced distributed clustering (PPODC). In this paper, we propose a novel and efficient solution to the PPODC problem based on k-means clustering algorithm. The main novelty of our solution lies in avoiding the secure division operations required in computing cluster centers altogether through an efficient transformation technique. Our solution builds the clusters securely in an iterative fashion and returns the final cluster centers to all entities when a pre-determined termination condition holds. The proposed solution protects data confidentiality of all the participating entities under the standard semi-honest model. To the best of our knowledge, ours is the first work to discuss and propose a comprehensive solution to the PPODC problem that incurs negligible cost on the participating entities. We theoretically estimate both the computation and communication costs of the proposed protocol and also demonstrate its practical value through experiments on a real dataset. △ Less

Submitted 14 December, 2014; originally announced December 2014.

Comments: 16 pages, 2 figures, 5 tables

ACM Class: D.4.6; E.3; H.3.3

arXiv:1401.3768 [pdf, ps, other]

Lightweight and Secure Two-Party Range Queries over Outsourced Encrypted Databases

Authors: Bharath K. Samanthula, Wei Jiang, Elisa Bertino

Abstract: With the many benefits of cloud computing, an entity may want to outsource its data and their related analytics tasks to a cloud. When data are sensitive, it is in the interest of the entity to outsource encrypted data to the cloud; however, this limits the types of operations that can be performed on the cloud side. Especially, evaluating queries over the encrypted data stored on the cloud withou… ▽ More With the many benefits of cloud computing, an entity may want to outsource its data and their related analytics tasks to a cloud. When data are sensitive, it is in the interest of the entity to outsource encrypted data to the cloud; however, this limits the types of operations that can be performed on the cloud side. Especially, evaluating queries over the encrypted data stored on the cloud without the entity performing any computation and without ever decrypting the data become a very challenging problem. In this paper, we propose solutions to conduct range queries over outsourced encrypted data. The existing methods leak valuable information to the cloud which can violate the security guarantee of the underlying encryption schemes. In general, the main security primitive used to evaluate range queries is secure comparison (SC) of encrypted integers. However, we observe that the existing SC protocols are not very efficient. To this end, we first propose a novel SC scheme that takes encrypted integers and outputs encrypted comparison result. We empirically show its practical advantage over the current state-of-the-art. We then utilize the proposed SC scheme to construct two new secure range query protocols. Our protocols protect data confidentiality, privacy of user's query, and also preserve the semantic security of the encrypted data; therefore, they are more secure than the existing protocols. Furthermore, our second protocol is lightweight at the user end, and it can allow an authorized user to use any device with limited storage and computing capability to perform the range queries over outsourced encrypted data. △ Less

Submitted 15 January, 2014; originally announced January 2014.

Comments: 25 pages, 1 figure

ACM Class: D.4.6; E.3; H.3.3

arXiv:1211.3200 [pdf, ps, other]

An Analytic Approach to People Evaluation in Crowdsourcing Systems

Authors: Mohammad Allahbakhsh, Aleksandar Ignjatovic, Boualem Benatallah, Seyed-Mehdi-Reza Beheshti, Norman Foo, Elisa Bertino

Abstract: Worker selection is a significant and challenging issue in crowdsourcing systems. Such selection is usually based on an assessment of the reputation of the individual workers participating in such systems. However, assessing the credibility and adequacy of such calculated reputation is a real challenge. In this paper, we propose an analytic model which leverages the values of the tasks completed,… ▽ More Worker selection is a significant and challenging issue in crowdsourcing systems. Such selection is usually based on an assessment of the reputation of the individual workers participating in such systems. However, assessing the credibility and adequacy of such calculated reputation is a real challenge. In this paper, we propose an analytic model which leverages the values of the tasks completed, the credibility of the evaluators of the results of the tasks and time of evaluation of the results of these tasks in order to calculate an accurate and credible reputation rank of participating workers and fairness rank for evaluators. The model has been implemented and experimentally validated. △ Less

Submitted 13 November, 2012; originally announced November 2012.

Comments: 23 pages

Report number: UNSW-CSE-TR-201204

arXiv:1211.0963 [pdf, ps, other]

Detecting, Representing and Querying Collusion in Online Rating Systems

Authors: Mohammad Allahbakhsh, Aleksandar Ignjatovic, Boualem Benatallah, Seyed-Mehdi-Reza Beheshti, Norman Foo, Elisa Bertino

Abstract: Online rating systems are subject to malicious behaviors mainly by posting unfair rating scores. Users may try to individually or collaboratively promote or demote a product. Collaborating unfair rating 'collusion' is more damaging than individual unfair rating. Although collusion detection in general has been widely studied, identifying collusion groups in online rating systems is less studied an… ▽ More Online rating systems are subject to malicious behaviors mainly by posting unfair rating scores. Users may try to individually or collaboratively promote or demote a product. Collaborating unfair rating 'collusion' is more damaging than individual unfair rating. Although collusion detection in general has been widely studied, identifying collusion groups in online rating systems is less studied and needs more investigation. In this paper, we study impact of collusion in online rating systems and asses their susceptibility to collusion attacks. The proposed model uses a frequent itemset mining algorithm to detect candidate collusion groups. Then, several indicators are used for identifying collusion groups and for estimating how damaging such colluding groups might be. Also, we propose an algorithm for finding possible collusive subgroup inside larger groups which are not identified as collusive. The model has been implemented and we present results of experimental evaluation of our methodology. △ Less

Submitted 2 November, 2012; originally announced November 2012.

Comments: 22 pages, 6 figures

Report number: UNSW-CSE-TR-201220

arXiv:1206.3880 [pdf]

Cryptographic Key Management for Smart Power Grids - Approaches and Issues

Authors: M. Nabeel, J. Zage, S. Kerr, E. Bertino, N. Athula. Kulatunga, U. Sudheera Navaratne, M. Duren

Abstract: The smart power grid promises to improve efficiency and reliability of power delivery. This report introduces the logical components, associated technologies, security protocols, and network designs of the system. Undermining the potential benefits are security threats, and those threats related to cyber security are described in this report. Concentrating on the design of the smart meter and its… ▽ More The smart power grid promises to improve efficiency and reliability of power delivery. This report introduces the logical components, associated technologies, security protocols, and network designs of the system. Undermining the potential benefits are security threats, and those threats related to cyber security are described in this report. Concentrating on the design of the smart meter and its communication links, this report describes the ZigBee technology and implementation, and the communication between the smart meter and the collector node, with emphasis on security attributes. It was observed that many of the secure features are based on keys that must be maintained; therefore, secure key management techniques become the basis to securing the entire grid. The descriptions of current key management techniques are delineated, highlighting their weaknesses. Finally some initial research directions are outlined. △ Less

Submitted 18 June, 2012; originally announced June 2012.

Report number: CERIAS TR 2012-02 CERIAS TR 2012-02 CERIAS TR 2012-02

arXiv:cs/0404003 [pdf, ps, other]

Enhancing the expressive power of the U-Datalog language

Authors: Elisa Bertino, Barbara Catania, Roberta Gori

Abstract: U-Datalog has been developed with the aim of providing a set-oriented logical update language, guaranteeing update parallelism in the context of a Datalog-like language. In U-Datalog, updates are expressed by introducing constraints (+p(X), to denote insertion, and [minus sign]p(X), to denote deletion) inside Datalog rules. A U-Datalog program can be interpreted as a CLP program. In this framewo… ▽ More U-Datalog has been developed with the aim of providing a set-oriented logical update language, guaranteeing update parallelism in the context of a Datalog-like language. In U-Datalog, updates are expressed by introducing constraints (+p(X), to denote insertion, and [minus sign]p(X), to denote deletion) inside Datalog rules. A U-Datalog program can be interpreted as a CLP program. In this framework, a set of updates (constraints) is satisfiable if it does not represent an inconsistent theory, that is, it does not require the insertion and the deletion of the same fact. This approach resembles a very simple form of negation. However, on the other hand, U-Datalog does not provide any mechanism to explicitly deal with negative information, resulting in a language with limited expressive power. In this paper, we provide a semantics, based on stratification, handling the use of negated atoms in U-Datalog programs, and we show which problems arise in defining a compositional semantics. △ Less

Submitted 1 April, 2004; originally announced April 2004.

Comments: Appeared in Theory and Practice of Logic Programming, vol. 1, no. 1, 2001

ACM Class: D.1.6; D.3.2

Journal ref: Theory and Practice of Logic Programming, vol. 1, no. 1, 2001

arXiv:cs/0207076 [pdf, ps, other]

Introducing Dynamic Behavior in Amalgamated Knowledge Bases

Authors: Elisa Bertino, Barbara Catania, Paolo Perlasca

Abstract: The problem of integrating knowledge from multiple and heterogeneous sources is a fundamental issue in current information systems. In order to cope with this problem, the concept of mediator has been introduced as a software component providing intermediate services, linking data resources and application programs, and making transparent the heterogeneity of the underlying systems. In designing… ▽ More The problem of integrating knowledge from multiple and heterogeneous sources is a fundamental issue in current information systems. In order to cope with this problem, the concept of mediator has been introduced as a software component providing intermediate services, linking data resources and application programs, and making transparent the heterogeneity of the underlying systems. In designing a mediator architecture, we believe that an important aspect is the definition of a formal framework by which one is able to model integration according to a declarative style. To this purpose, the use of a logical approach seems very promising. Another important aspect is the ability to model both static integration aspects, concerning query execution, and dynamic ones, concerning data updates and their propagation among the various data sources. Unfortunately, as far as we know, no formal proposals for logically modeling mediator architectures both from a static and dynamic point of view have already been developed. In this paper, we extend the framework for amalgamated knowledge bases, presented by Subrahmanian, to deal with dynamic aspects. The language we propose is based on the Active U-Datalog language, and extends it with annotated logic and amalgamation concepts. We model the sources of information and the mediator (also called supervisor) as Active U-Datalog deductive databases, thus modeling queries, transactions, and active rules, interpreted according to the PARK semantics. By using active rules, the system can efficiently perform update propagation among different databases. The result is a logical environment, integrating active and deductive rules, to perform queries and update propagation in an heterogeneous mediated framework. △ Less

Submitted 22 July, 2002; originally announced July 2002.

Comments: Other Keywords: Deductive databases; Heterogeneous databases; Active rules; Updates

ACM Class: D.1.6 Logic Programming; H.2.5 Heterogeneous Databases; F.4.1 Mathematical Logic, Logic and constraint programming

Showing 1–36 of 36 results for author: Bertino, E