Search | arXiv e-print repository

Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning

Authors: Nay Myat Min, Long H. Pham, Jun Sun

Abstract: The application of deep neural network models in various security-critical applications has raised significant security concerns, particularly the risk of backdoor attacks. Neural backdoors pose a serious security threat as they allow attackers to maliciously alter model behavior. While many defenses have been explored, existing approaches are often bounded by model-specific constraints, or necess… ▽ More The application of deep neural network models in various security-critical applications has raised significant security concerns, particularly the risk of backdoor attacks. Neural backdoors pose a serious security threat as they allow attackers to maliciously alter model behavior. While many defenses have been explored, existing approaches are often bounded by model-specific constraints, or necessitate complex alterations to the training process, or fall short against diverse backdoor attacks. In this work, we introduce a novel method for comprehensive and effective elimination of backdoors, called ULRL (short for UnLearn and ReLearn for backdoor removal). ULRL requires only a small set of clean samples and works effectively against all kinds of backdoors. It first applies unlearning for identifying suspicious neurons and then targeted neural weight tuning for backdoor mitigation (i.e., by promoting significant weight deviation on the suspicious neurons). Evaluated against 12 different types of backdoors, ULRL is shown to significantly outperform state-of-the-art methods in eliminating backdoors whilst preserving the model utility. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2307.02289 [pdf, other]

Fuzzing with Quantitative and Adaptive Hot-Bytes Identification

Authors: Tai D. Nguyen, Long H. Pham, Jun Sun

Abstract: Fuzzing has emerged as a powerful technique for finding security bugs in complicated real-world applications. American fuzzy lop (AFL), a leading fuzzing tool, has demonstrated its powerful bug finding ability through a vast number of reported CVEs. However, its random mutation strategy is unable to generate test inputs that satisfy complicated branching conditions (e.g., magic-byte comparisons, c… ▽ More Fuzzing has emerged as a powerful technique for finding security bugs in complicated real-world applications. American fuzzy lop (AFL), a leading fuzzing tool, has demonstrated its powerful bug finding ability through a vast number of reported CVEs. However, its random mutation strategy is unable to generate test inputs that satisfy complicated branching conditions (e.g., magic-byte comparisons, checksum tests, and nested if-statements), which are commonly used in image decoders/encoders, XML parsers, and checksum tools. Existing approaches (such as Steelix and Neuzz) on addressing this problem assume unrealistic assumptions such as we can satisfy the branch condition byte-to-byte or we can identify and focus on the important bytes in the input (called hot-bytes) once and for all. In this work, we propose an approach called \tool~which is designed based on the following principles. First, there is a complicated relation between inputs and branching conditions and thus we need not only an expressive model to capture such relationship but also an informative measure so that we can learn such relationship effectively. Second, different branching conditions demand different hot-bytes and we must adjust our fuzzing strategy adaptively depending on which branches are the current bottleneck. We implement our approach as an open source project and compare its efficiency with other state-of-the-art fuzzers. Our evaluation results on 10 real-world programs and LAVA-M dataset show that \tool~achieves sustained increases in branch coverage and discovers more bugs than other fuzzers. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2209.09327 [pdf, ps, other]

S2TD: a Separation Logic Verifier that Supports Reasoning of the Absence and Presence of Bugs

Authors: Quang Loc Le, Jun Sun, Long H. Pham, Shengchao Qin

Abstract: Heap-manipulating programs are known to be challenging to reason about. We present a novel verifier for heap-manipulating programs called S2TD, which encodes programs systematically in the form of Constrained Horn Clauses (CHC) using a novel extension of separation logic (SL) with recursive predicates and dangling predicates. S2TD actively explores cyclic proofs to address the path explosion probl… ▽ More Heap-manipulating programs are known to be challenging to reason about. We present a novel verifier for heap-manipulating programs called S2TD, which encodes programs systematically in the form of Constrained Horn Clauses (CHC) using a novel extension of separation logic (SL) with recursive predicates and dangling predicates. S2TD actively explores cyclic proofs to address the path explosion problem. S2TD differentiates itself from existing CHC-based verifiers by focusing on heap-manipulating programs and employing cyclic proof to efficiently verify or falsify them with counterexamples. Compared with existing SL-based verifiers, S2TD precisely specifies the heaps of de-allocated pointers to avoid false positives in reasoning about the presence of bugs. S2TD has been evaluated using a comprehensive set of benchmark programs from the SV-COMP repository. The results show that S2TD is more effective than state-of-art program verifiers and is more efficient than most of them. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: 24 pages

MSC Class: 68N15

arXiv:2205.06992 [pdf, other]

Verifying Neural Networks Against Backdoor Attacks

Authors: Long H. Pham, Jun Sun

Abstract: Neural networks have achieved state-of-the-art performance in solving many problems, including many applications in safety/security-critical systems. Researchers also discovered multiple security issues associated with neural networks. One of them is backdoor attacks, i.e., a neural network may be embedded with a backdoor such that a target output is almost always generated in the presence of a tr… ▽ More Neural networks have achieved state-of-the-art performance in solving many problems, including many applications in safety/security-critical systems. Researchers also discovered multiple security issues associated with neural networks. One of them is backdoor attacks, i.e., a neural network may be embedded with a backdoor such that a target output is almost always generated in the presence of a trigger. Existing defense approaches mostly focus on detecting whether a neural network is 'backdoored' based on heuristics, e.g., activation patterns. To the best of our knowledge, the only line of work which certifies the absence of backdoor is based on randomized smoothing, which is known to significantly reduce neural network performance. In this work, we propose an approach to verify whether a given neural network is free of backdoor with a certain level of success rate. Our approach integrates statistical sampling as well as abstract interpretation. The experiment results show that our approach effectively verifies the absence of backdoor or generates backdoor triggers. △ Less

Submitted 14 May, 2022; originally announced May 2022.

arXiv:2101.01917 [pdf, other]

sGUARD: Towards Fixing Vulnerable Smart Contracts Automatically

Authors: Tai D. Nguyen, Long H. Pham, Jun Sun

Abstract: Smart contracts are distributed, self-enforcing programs executing on top of blockchain networks. They have the potential to revolutionize many industries such as financial institutes and supply chains. However, smart contracts are subject to code-based vulnerabilities, which casts a shadow on its applications. As smart contracts are unpatchable (due to the immutability of blockchain), it is essen… ▽ More Smart contracts are distributed, self-enforcing programs executing on top of blockchain networks. They have the potential to revolutionize many industries such as financial institutes and supply chains. However, smart contracts are subject to code-based vulnerabilities, which casts a shadow on its applications. As smart contracts are unpatchable (due to the immutability of blockchain), it is essential that smart contracts are guaranteed to be free of vulnerabilities. Unfortunately, smart contract languages such as Solidity are Turing-complete, which implies that verifying them statically is infeasible. Thus, alternative approaches must be developed to provide the guarantee. In this work, we develop an approach which automatically transforms smart contracts so that they are provably free of 4 common kinds of vulnerabilities. The key idea is to apply runtime verification in an efficient and provably correct manner. Experiment results with 5000 smart contracts show that our approach incurs minor run-time overhead in terms of time (i.e., 14.79%) and gas (i.e., 0.79%). △ Less

Submitted 6 January, 2021; originally announced January 2021.

Comments: Published in IEEE S&P 2021

arXiv:2007.11206 [pdf, other]

SOCRATES: Towards a Unified Platform for Neural Network Analysis

Authors: Long H. Pham, Jiaying Li, Jun Sun

Abstract: Studies show that neural networks, not unlike traditional programs, are subject to bugs, e.g., adversarial samples that cause classification errors and discriminatory instances that demonstrate the lack of fairness. Given that neural networks are increasingly applied in critical applications (e.g., self-driving cars, face recognition systems and personal credit rating systems), it is desirable tha… ▽ More Studies show that neural networks, not unlike traditional programs, are subject to bugs, e.g., adversarial samples that cause classification errors and discriminatory instances that demonstrate the lack of fairness. Given that neural networks are increasingly applied in critical applications (e.g., self-driving cars, face recognition systems and personal credit rating systems), it is desirable that systematic methods are developed to analyze (e.g., test or verify) neural networks against desirable properties. Recently, a number of approaches have been developed for analyzing neural networks. These efforts are however scattered (i.e., each approach tackles some restricted classes of neural networks against certain particular properties), incomparable (i.e., each approach has its own assumptions and input format) and thus hard to apply, reuse or extend. In this project, we aim to build a unified framework for develo** techniques to analyze neural networks. Towards this goal, we develop a platform called SOCRATES which supports a standardized format for a variety of neural network models, an assertion language for property specification as well as multiple neural network analysis algorithms including two novel ones for falsifying and probabilistic verification of neural network models. SOCRATES is extensible and thus existing approaches can be easily integrated. Experiment results show that our platform can handle a wide range of networks models and properties. More importantly, it provides a platform for synergistic research on neural network analysis. △ Less

Submitted 5 February, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

arXiv:2004.08563 [pdf, other]

sFuzz: An Efficient Adaptive Fuzzer for Solidity Smart Contracts

Authors: Tai D. Nguyen, Long H. Pham, Jun Sun, Yun Lin, Quang Tran Minh

Abstract: Smart contracts are Turing-complete programs that execute on the infrastructure of the blockchain, which often manage valuable digital assets. Solidity is one of the most popular programming languages for writing smart contracts on the Ethereum platform. Like traditional programs, smart contracts may contain vulnerabilities. Unlike traditional programs, smart contracts cannot be easily patched onc… ▽ More Smart contracts are Turing-complete programs that execute on the infrastructure of the blockchain, which often manage valuable digital assets. Solidity is one of the most popular programming languages for writing smart contracts on the Ethereum platform. Like traditional programs, smart contracts may contain vulnerabilities. Unlike traditional programs, smart contracts cannot be easily patched once they are deployed. It is thus important that smart contracts are tested thoroughly before deployment. In this work, we present an adaptive fuzzer for smart contracts on the Ethereum platform called sFuzz. Compared to existing Solidity fuzzers, sFuzz combines the strategy in the AFL fuzzer and an efficient lightweight multi-objective adaptive strategy targeting those hard-to-cover branches. sFuzz has been applied to more than 4 thousand smart contracts and the experimental results show that (1) sFuzz is efficient, e.g., two orders of magnitude faster than state-of-the-art tools; (2) sFuzz is effective in achieving high code coverage and discovering vulnerabilities; and (3) the different fuzzing strategies in sFuzz complement each other. △ Less

Submitted 18 April, 2020; originally announced April 2020.

Comments: ICSE2020

arXiv:1908.10051 [pdf, ps, other]

Compositional Verification of Heap-Manipulating Programs through Property-Guided Learning

Authors: Long H. Pham, Jun Sun, Quang Loc Le

Abstract: Analyzing and verifying heap-manipulating programs automatically is challenging. A key for fighting the complexity is to develop compositional methods. For instance, many existing verifiers for heap-manipulating programs require user-provided specification for each function in the program in order to decompose the verification problem. The requirement, however, often hinders the users from applyin… ▽ More Analyzing and verifying heap-manipulating programs automatically is challenging. A key for fighting the complexity is to develop compositional methods. For instance, many existing verifiers for heap-manipulating programs require user-provided specification for each function in the program in order to decompose the verification problem. The requirement, however, often hinders the users from applying such tools. To overcome the issue, we propose to automatically learn heap-related program invariants in a property-guided way for each function call. The invariants are learned based on the memory graphs observed during test execution and improved through memory graph mutation. We implemented a prototype of our approach and integrated it with two existing program verifiers. The experimental results show that our approach enhances existing verifiers effectively in automatically verifying complex heap-manipulating programs with multiple function calls. △ Less

Submitted 27 August, 2019; originally announced August 2019.

arXiv:1907.05637 [pdf, other]

Concolic Testing Heap-Manipulating Programs

Authors: Long H. Pham, Quang Loc Le, Quoc-Sang Phan, Jun Sun

Abstract: Concolic testing is a test generation technique which works effectively by integrating random testing generation and symbolic execution. Existing concolic testing engines focus on numeric programs. Heap-manipulating programs make extensive use of complex heap objects like trees and lists. Testing such programs is challenging due to multiple reasons. Firstly, test inputs for such program are requir… ▽ More Concolic testing is a test generation technique which works effectively by integrating random testing generation and symbolic execution. Existing concolic testing engines focus on numeric programs. Heap-manipulating programs make extensive use of complex heap objects like trees and lists. Testing such programs is challenging due to multiple reasons. Firstly, test inputs for such program are required to satisfy non-trivial constraints which must be specified precisely. Secondly, precisely encoding and solving path conditions in such programs are challenging and often expensive. In this work, we propose the first concolic testing engine called CSF for heap-manipulating programs based on separation logic. CSF effectively combines specification-based testing and concolic execution for test input generation. It is evaluated on a set of challenging heap-manipulating programs. The results show that CSF generates valid test inputs with high coverage efficiently. Furthermore, we show that CSF can be potentially used in combination with precondition inference tools to reduce the user effort. △ Less

Submitted 12 July, 2019; originally announced July 2019.

arXiv:1712.06025 [pdf, other]

Enhancing Symbolic Execution of Heap-based Programs with Separation Logic for Test Input Generation

Authors: Long H. Pham, Quang Loc Le, Quoc-Sang Phan, Jun Sun, Shengchao Qin

Abstract: Symbolic execution is a well established method for test input generation. Despite of having achieved tremendous success over numerical domains, existing symbolic execution techniques for heap-based programs are limited due to the lack of a succinct and precise description for symbolic values over unbounded heaps. In this work, we present a new symbolic execution method for heap-based programs bas… ▽ More Symbolic execution is a well established method for test input generation. Despite of having achieved tremendous success over numerical domains, existing symbolic execution techniques for heap-based programs are limited due to the lack of a succinct and precise description for symbolic values over unbounded heaps. In this work, we present a new symbolic execution method for heap-based programs based on separation logic. The essence of our proposal is context-sensitive lazy initialization, a novel approach for efficient test input generation. Our approach differs from existing approaches in two ways. Firstly, our approach is based on separation logic, which allows us to precisely capture preconditions of heap-based programs so that we avoid generating invalid test inputs. Secondly, we generate only fully initialized test inputs, which are more useful in practice compared to those partially initialized test inputs generated by the state-of-the-art tools. We have implemented our approach as a tool, called Java StarFinder, and evaluated it on a set of programs with complex heap inputs. The results show that our approach significantly reduces the number of invalid test inputs and improves the test coverage. △ Less

Submitted 16 September, 2019; v1 submitted 16 December, 2017; originally announced December 2017.

arXiv:1610.08607 [pdf, other]

doi 10.1109/ICECCS.2017.12

Learning Likely Invariants to Explain Why a Program Fails

Authors: Jun Sun, Long H. Pham, Lyly Tran Thi, **gyi Wang, Xin Peng

Abstract: Debugging is difficult. Recent studies show that automatic bug localization techniques have limited usefulness. One of the reasons is that programmers typically have to understand why the program fails before fixing it. In this work, we aim to help programmers understand a bug by automatically generating likely invariants which are violated in the failed tests. Given a program with an initial asse… ▽ More Debugging is difficult. Recent studies show that automatic bug localization techniques have limited usefulness. One of the reasons is that programmers typically have to understand why the program fails before fixing it. In this work, we aim to help programmers understand a bug by automatically generating likely invariants which are violated in the failed tests. Given a program with an initial assertion and at least one test case failing the assertion, we first generate random test cases, identify potential bug locations through bug localization, and then generate program state mutation based on active learning techniques to identify a predicate "explaining" the cause of the bug. The predicate is a classifier for the passed test cases and failed test cases. Our main contribution is the application of invariant learning for bug explanation, as well as a novel approach to overcome the problem of lack of test cases in practice. We apply our method to real-world bugs and show the generated invariants are often correlated to the actual bug fixes. △ Less

Submitted 27 October, 2016; originally announced October 2016.

Comments: 10 pages

Showing 1–11 of 11 results for author: Pham, L H