-
Adversarial Attacks on Code Models with Discriminative Graph Patterns
Authors:
Thanh-Dat Nguyen,
Yang Zhou,
Xuan Bach D. Le,
Patanamon,
Thongtanunam,
David Lo
Abstract:
Pre-trained language models of code are now widely used in various software engineering tasks such as code generation, code completion, vulnerability detection, etc. This, in turn, poses security and reliability risks to these models. One of the important threats is \textit{adversarial attacks}, which can lead to erroneous predictions and largely affect model performance on downstream tasks. Curre…
▽ More
Pre-trained language models of code are now widely used in various software engineering tasks such as code generation, code completion, vulnerability detection, etc. This, in turn, poses security and reliability risks to these models. One of the important threats is \textit{adversarial attacks}, which can lead to erroneous predictions and largely affect model performance on downstream tasks. Current adversarial attacks on code models usually adopt fixed sets of program transformations, such as variable renaming and dead code insertion, leading to limited attack effectiveness. To address the aforementioned challenges, we propose a novel adversarial attack framework, GraphCodeAttack, to better evaluate the robustness of code models. Given a target code model, GraphCodeAttack automatically mines important code patterns, which can influence the model's decisions, to perturb the structure of input code to the model. To do so, GraphCodeAttack uses a set of input source codes to probe the model's outputs and identifies the \textit{discriminative} ASTs patterns that can influence the model decisions. GraphCodeAttack then selects appropriate AST patterns, concretizes the selected patterns as attacks, and inserts them as dead code into the model's input program. To effectively synthesize attacks from AST patterns, GraphCodeAttack uses a separate pre-trained code model to fill in the ASTs with concrete code snippets. We evaluate the robustness of two popular code models (e.g., CodeBERT and GraphCodeBERT) against our proposed approach on three tasks: Authorship Attribution, Vulnerability Prediction, and Clone Detection. The experimental results suggest that our proposed approach significantly outperforms state-of-the-art approaches in attacking code models such as CARROT and ALERT.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning
Authors:
Thanh Le-Cong,
Duc-Minh Luong,
Xuan Bach D. Le,
David Lo,
Nhat-Hoa Tran,
Bui Quang-Huy,
Quyet-Thang Huynh
Abstract:
Automated program repair (APR) faces the challenge of test overfitting, where generated patches pass validation tests but fail to generalize. Existing methods for patch assessment involve generating new tests or manual inspection, which can be time-consuming or biased. In this paper, we propose a novel technique, INVALIDATOR, to automatically assess the correctness of APR-generated patches via sem…
▽ More
Automated program repair (APR) faces the challenge of test overfitting, where generated patches pass validation tests but fail to generalize. Existing methods for patch assessment involve generating new tests or manual inspection, which can be time-consuming or biased. In this paper, we propose a novel technique, INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR leverages program invariants to reason about program semantics while also capturing program syntax through language semantics learned from a large code corpus using a pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that an APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains erroneous behaviors from the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is threefold. First, INVALIDATOR leverages both semantic and syntactic reasoning to enhance its discriminative capability. Second, INVALIDATOR does not require new test cases to be generated, but instead only relies on the current test suite and uses invariant inference to generalize program behaviors. Third, INVALIDATOR is fully automated. Experimental results demonstrate that INVALIDATOR outperforms existing methods in terms of Accuracy and F-measure, correctly identifying 79% of overfitting patches and detecting 23% more overfitting patches than the best baseline.
△ Less
Submitted 17 March, 2023; v1 submitted 3 January, 2023;
originally announced January 2023.
-
Privacy Concerns Raised by Pervasive User Data Collection From Cyberspace and Their Countermeasures
Authors:
Yinhao Jiang,
Ba Dung Le,
Tanveer Zia,
Praveen Gauravaram
Abstract:
The virtual dimension called `Cyberspace' built on internet technologies has served people's daily lives for decades. Now it offers advanced services and connected experiences with the develo** pervasive computing technologies that digitise, collect, and analyse users' activity data. This changes how user information gets collected and impacts user privacy at traditional cyberspace gateways, inc…
▽ More
The virtual dimension called `Cyberspace' built on internet technologies has served people's daily lives for decades. Now it offers advanced services and connected experiences with the develo** pervasive computing technologies that digitise, collect, and analyse users' activity data. This changes how user information gets collected and impacts user privacy at traditional cyberspace gateways, including the devices carried by users for daily use. This work investigates the impacts and surveys privacy concerns caused by this data collection, namely identity tracking from browsing activities, user input data disclosure, data accessibility in mobile devices, security of delicate data transmission, privacy in participating sensing, and identity privacy in opportunistic networks. Each of the surveyed privacy concerns is discussed in a well-defined scope according to the impacts mentioned above. Existing countermeasures are also surveyed and discussed, which identifies corresponding research gaps. To complete the perspectives, three complex open problems, namely trajectory privacy, privacy in smart metering, and involuntary privacy leakage with ambient intelligence, are briefly discussed for future research directions before a succinct conclusion to our survey at the end.
△ Less
Submitted 9 February, 2022;
originally announced February 2022.
-
Defining Security Requirements with the Common Criteria: Applications, Adoptions, and Challenges
Authors:
Nan Sun,
Chang-Tsun Li,
Hin Chan,
Ba Dung Le,
MD Zahidul Islam,
Leo Yu Zhang,
MD Rafiqul Islam,
Warren Armstrong
Abstract:
Advances of emerging Information and Communications Technology (ICT) technologies push the boundaries of what is possible and open up new markets for innovative ICT products and services. The adoption of ICT products and systems with security properties depends on consumers' confidence and markets' trust in the security functionalities and whether the assurance measures applied to these products m…
▽ More
Advances of emerging Information and Communications Technology (ICT) technologies push the boundaries of what is possible and open up new markets for innovative ICT products and services. The adoption of ICT products and systems with security properties depends on consumers' confidence and markets' trust in the security functionalities and whether the assurance measures applied to these products meet the inherent security requirements. Such confidence and trust are primarily gained through the rigorous development of security requirements, validation criteria, evaluation, and certification. Common Criteria for Information Technology Security Evaluation (often referred to as Common Criteria or CC) is an international standard (ISO/IEC 15408) for cyber security certification. In this paper, we conduct a systematic review of the CC standards and its adoptions. Adoption barriers of the CC are also investigated based on the analysis of current trends in security evaluation. Specifically, we share the experiences and lessons gained through the recent Development of Australian Cyber Criteria Assessment (DACCA) project that promotes the CC among stakeholders in ICT security products related to specification, development, evaluation, certification and approval, procurement, and deployment. Best practices on develo** Protection Profiles, recommendations, and future directions for trusted cybersecurity advancement are presented.
△ Less
Submitted 2 April, 2022; v1 submitted 19 January, 2022;
originally announced January 2022.
-
Usability and Aesthetics: Better Together for Automated Repair of Web Pages
Authors:
Thanh Le-Cong,
Xuan Bach D. Le,
Quyet-Thang Huynh,
Phi-Le Nguyen
Abstract:
With the recent explosive growth of mobile devices such as smartphones or tablets, guaranteeing consistent web appearance across all environments has become a significant problem. This happens simply because it is hard to keep track of the web appearance on different sizes and types of devices that render the web pages. Therefore, fixing the inconsistent appearance of web pages can be difficult, a…
▽ More
With the recent explosive growth of mobile devices such as smartphones or tablets, guaranteeing consistent web appearance across all environments has become a significant problem. This happens simply because it is hard to keep track of the web appearance on different sizes and types of devices that render the web pages. Therefore, fixing the inconsistent appearance of web pages can be difficult, and the cost incurred can be huge, e.g., poor user experience and financial loss due to it. Recently, automated web repair techniques have been proposed to automatically resolve inconsistent web page appearance, focusing on improving usability. However, generated patches tend to disrupt the webpage's layout, rendering the repaired webpage aesthetically unpleasing, e.g., distorted images or misalignment of components.
In this paper, we propose an automated repair approach for web pages based on meta-heuristic algorithms that can assure both usability and aesthetics. The key novelty that empowers our approach is a novel fitness function that allows us to optimistically evolve buggy web pages to find the best solution that optimizes both usability and aesthetics at the same time. Empirical evaluations show that our approach is able to successfully resolve mobile-friendly problems in 94% of the evaluation subjects, significantly outperforming state-of-the-art baseline techniques in terms of both usability and aesthetics.
△ Less
Submitted 1 January, 2022;
originally announced January 2022.
-
Discrete Distribution Estimation with Local Differential Privacy: A Comparative Analysis
Authors:
Ba Dung Le,
Tanveer Zia
Abstract:
Local differential privacy is a promising privacy-preserving model for statistical aggregation of user data that prevents user privacy leakage from the data aggregator. This paper focuses on the problem of estimating the distribution of discrete user values with Local differential privacy. We review and present a comparative analysis on the performance of the existing discrete distribution estimat…
▽ More
Local differential privacy is a promising privacy-preserving model for statistical aggregation of user data that prevents user privacy leakage from the data aggregator. This paper focuses on the problem of estimating the distribution of discrete user values with Local differential privacy. We review and present a comparative analysis on the performance of the existing discrete distribution estimation algorithms in terms of their accuracy on benchmark datasets. Our evaluation benchmarks include real-world and synthetic datasets of categorical individual values with the number of individuals from hundreds to millions and the domain size up to a few hundreds of values. The experimental results show that the Basic RAPPOR algorithm generally performs best for the benchmark datasets in the high privacy regime while the k-RR algorithm often gives the best estimation in the low privacy regime. In the medium privacy regime, the performance of the k-RR, the k-subset, and the HR algorithms are fairly competitive with each other and generally better than the performance of the Basic RAPPOR and the CMS algorithms.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
Gathering Cyber Threat Intelligence from Twitter Using Novelty Classification
Authors:
Ba Dung Le,
Guanhua Wang,
Mehwish Nasim,
Ali Babar
Abstract:
Preventing organizations from Cyber exploits needs timely intelligence about Cyber vulnerabilities and attacks, referred as threats. Cyber threat intelligence can be extracted from various sources including social media platforms where users publish the threat information in real time. Gathering Cyber threat intelligence from social media sites is a time consuming task for security analysts that c…
▽ More
Preventing organizations from Cyber exploits needs timely intelligence about Cyber vulnerabilities and attacks, referred as threats. Cyber threat intelligence can be extracted from various sources including social media platforms where users publish the threat information in real time. Gathering Cyber threat intelligence from social media sites is a time consuming task for security analysts that can delay timely response to emerging Cyber threats. We propose a framework for automatically gathering Cyber threat intelligence from Twitter by using a novelty detection model. Our model learns the features of Cyber threat intelligence from the threat descriptions published in public repositories such as Common Vulnerabilities and Exposures (CVE) and classifies a new unseen tweet as either normal or anomalous to Cyber threat intelligence. We evaluate our framework using a purpose-built data set of tweets from 50 influential Cyber security related accounts over twelve months (in 2018). Our classifier achieves the F1-score of 0.643 for classifying Cyber threat tweets and outperforms several baselines including binary classification models. Our analysis of the classification results suggests that Cyber threat relevant tweets on Twitter do not often include the CVE identifier of the related threats. Hence, it would be valuable to collect these tweets and associate them with the related CVE identifier for cyber security applications.
△ Less
Submitted 4 September, 2019; v1 submitted 3 July, 2019;
originally announced July 2019.
-
On Reliability of Patch Correctness Assessment
Authors:
Xuan Bach D. Le,
Lingfeng Bao,
David Lo,
Xin Xia,
Shan** Li
Abstract:
Current state-of-the-art automatic software repair (ASR) techniques rely heavily on incomplete specifications, e.g., test suites, to generate repairs. This, however, may render ASR tools to generate incorrect repairs that do not generalize. To assess patch correctness, researchers have been following two typical ways separately: (1) Automated annotation, wherein patches are automatically labeled b…
▽ More
Current state-of-the-art automatic software repair (ASR) techniques rely heavily on incomplete specifications, e.g., test suites, to generate repairs. This, however, may render ASR tools to generate incorrect repairs that do not generalize. To assess patch correctness, researchers have been following two typical ways separately: (1) Automated annotation, wherein patches are automatically labeled by an independent test suite (ITS) - a patch passing the ITS is regarded as correct or generalizable, and incorrect otherwise, (2) Author annotation, wherein authors of ASR techniques annotate correctness labels of patches generated by their and competing tools by themselves. While automated annotation fails to prove that a patch is actually correct, author annotation is prone to subjectivity. This concern has caused an on-going debate on appropriate ways to assess the effectiveness of numerous ASR techniques proposed recently. To address this concern, we propose to assess reliability of author and automated annotations on patch correctness assessment. We do this by first constructing a gold set of correctness labels for 189 randomly selected patches generated by 8 state-of-the-art ASR techniques through a user study involving 35 professional developers as independent annotators. By measuring inter-rater agreement as a proxy for annotation quality - as commonly done in the literature - we demonstrate that our constructed gold set is on par with other high-quality gold sets. We then compare labels generated by author and automated annotations with this gold set to assess reliability of the patch assessment methodologies. We subsequently report several findings and highlight implications for future studies.
△ Less
Submitted 27 June, 2018; v1 submitted 15 May, 2018;
originally announced May 2018.