Search | arXiv e-print repository

Interval Analysis in Industrial-Scale BMC Software Verifiers: A Case Study

Authors: Rafael Sá Menezes, Edoardo Manino, Fedor Shmarov, Mohannad Aldughaim, Rosiane de Freitas, Lucas C. Cordeiro

Abstract: Bounded Model Checking (BMC) is a widely used software verification technique. Despite its successes, the technique has several limiting factors, from state-space explosion to lack of completeness. Over the years, interval analysis has repeatedly been proposed as a partial solution to these limitations. In this work, we evaluate whether the computational cost of interval analysis yields significan… ▽ More Bounded Model Checking (BMC) is a widely used software verification technique. Despite its successes, the technique has several limiting factors, from state-space explosion to lack of completeness. Over the years, interval analysis has repeatedly been proposed as a partial solution to these limitations. In this work, we evaluate whether the computational cost of interval analysis yields significant enough improvements in BMC's performance to justify its use. In more detail, we quantify the benefits of interval analysis on two benchmarks: the Intel Core Power Management firmware and 9537 programs in the ReachSafety category of the International Competition on Software Verification. Our results show that interval analysis is essential in solving 203 unique benchmarks. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Submitted to IFM

arXiv:2406.04375 [pdf, other]

Verifying components of Arm(R) Confidential Computing Architecture with ESBMC

Authors: Tong Wu, Shale Xiong, Edoardo Manino, Gareth Stockwell, Lucas C. Cordeiro

Abstract: Realm Management Monitor (RMM) is an essential firmware component within the recent Arm Confidential Computing Architecture (Arm CCA). Previous work applies formal techniques to verify the specification and prototype reference implementation of RMM. However, relying solely on a single verification tool may lead to the oversight of certain bugs or vulnerabilities. This paper discusses the applicati… ▽ More Realm Management Monitor (RMM) is an essential firmware component within the recent Arm Confidential Computing Architecture (Arm CCA). Previous work applies formal techniques to verify the specification and prototype reference implementation of RMM. However, relying solely on a single verification tool may lead to the oversight of certain bugs or vulnerabilities. This paper discusses the application of ESBMC, a state-of-the-art Satisfiability Modulo Theories (SMT)-based software model checker, to further enhance RRM verification. We demonstrate ESBMC's ability to precisely parse the source code and identify specification failures within a reasonable time frame. Moreover, we propose potential improvements for ESBMC to enhance its efficiency for industry engineers. This work contributes to exploring the capabilities of formal verification techniques in real-world scenarios and suggests avenues for further improvements to better meet industrial verification needs. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.08848 [pdf, other]

Automated Repair of AI Code with Large Language Models and Formal Verification

Authors: Yiannis Charalambous, Edoardo Manino, Lucas C. Cordeiro

Abstract: The next generation of AI systems requires strong safety guarantees. This report looks at the software implementation of neural networks and related memory safety properties, including NULL pointer deference, out-of-bound access, double-free, and memory leaks. Our goal is to detect these vulnerabilities, and automatically repair them with the help of large language models. To this end, we first ex… ▽ More The next generation of AI systems requires strong safety guarantees. This report looks at the software implementation of neural networks and related memory safety properties, including NULL pointer deference, out-of-bound access, double-free, and memory leaks. Our goal is to detect these vulnerabilities, and automatically repair them with the help of large language models. To this end, we first expand the size of NeuroCodeBench, an existing dataset of neural network code, to about 81k programs via an automated process of program mutation. Then, we verify the memory safety of the mutated neural network implementations with ESBMC, a state-of-the-art software verifier. Whenever ESBMC spots a vulnerability, we invoke a large language model to repair the source code. For the latest task, we compare the performance of various state-of-the-art prompt engineering techniques, and an iterative approach that repeatedly calls the large language model. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2312.14746 [pdf, ps, other]

ESBMC v7.4: Harnessing the Power of Intervals

Authors: Rafael Menezes, Mohannad Aldughaim, Bruno Farias, Xianzhiyu Li, Edoardo Manino, Fedor Shmarov, Kunjian Song, Franz Brauße, Mikhail R. Gadelha, Norbert Tihanyi, Konstantin Korovin, Lucas C. Cordeiro

Abstract: ESBMC implements many state-of-the-art techniques for model checking. We report on new and improved features that allow us to obtain verification results for previously unsupported programs and properties. ESBMC employs a new static interval analysis of expressions in programs to increase verification performance. This includes interval-based reasoning over booleans and integers, forward and backw… ▽ More ESBMC implements many state-of-the-art techniques for model checking. We report on new and improved features that allow us to obtain verification results for previously unsupported programs and properties. ESBMC employs a new static interval analysis of expressions in programs to increase verification performance. This includes interval-based reasoning over booleans and integers, forward and backward contractors, and particular optimizations related to singleton intervals because of their ubiquity. Other relevant improvements concern the verification of concurrent programs, as well as several operational models, internal ones, and also those of libraries such as pthread and the C mathematics library. An extended memory safety analysis now allows tracking of memory leaks that are considered still reachable. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2309.03617 [pdf, other]

NeuroCodeBench: a plain C neural network benchmark for software verification

Authors: Edoardo Manino, Rafael Sá Menezes, Fedor Shmarov, Lucas C. Cordeiro

Abstract: Safety-critical systems with neural network components require strong guarantees. While existing neural network verification techniques have shown great progress towards this goal, they cannot prove the absence of software faults in the network implementation. This paper presents NeuroCodeBench - a verification benchmark for neural network code written in plain C. It contains 32 neural networks wi… ▽ More Safety-critical systems with neural network components require strong guarantees. While existing neural network verification techniques have shown great progress towards this goal, they cannot prove the absence of software faults in the network implementation. This paper presents NeuroCodeBench - a verification benchmark for neural network code written in plain C. It contains 32 neural networks with 607 safety properties divided into 6 categories: maths library, activation functions, error-correcting networks, transfer function approximation, probability density estimation and reinforcement learning. Our preliminary evaluation shows that state-of-the-art software verifiers struggle to provide correct verdicts, due to their incomplete support of the standard C mathematical library and the complexity of larger neural networks. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: Submitted to the 2023 AFRiTS workshop

arXiv:2301.09142 [pdf, ps, other]

LF-checker: Machine Learning Acceleration of Bounded Model Checking for Concurrency Verification (Competition Contribution)

Authors: Tong Wu, Edoardo Manino, Fatimah Aljaafari, Pavlos Petoumenos, Lucas C. Cordeiro

Abstract: We describe and evaluate LF-checker, a metaverifier tool based on machine learning. It extracts multiple features of the program under test and predicts the optimal configuration (flags) of a bounded model checker with a decision tree. Our current work is specialised in concurrency verification and employs ESBMC as a back-end verification engine. In the paper, we demonstrate that LF-checker achiev… ▽ More We describe and evaluate LF-checker, a metaverifier tool based on machine learning. It extracts multiple features of the program under test and predicts the optimal configuration (flags) of a bounded model checker with a decision tree. Our current work is specialised in concurrency verification and employs ESBMC as a back-end verification engine. In the paper, we demonstrate that LF-checker achieves better results than the default configuration of the underlying verification engine. △ Less

Submitted 22 January, 2023; originally announced January 2023.

arXiv:2212.04310 [pdf, other]

Montague semantics and modifier consistency measurement in neural language models

Authors: Danilo S. Carvalho, Edoardo Manino, Julia Rozanova, Lucas Cordeiro, André Freitas

Abstract: In recent years, distributional language representation models have demonstrated great practical success. At the same time, the need for interpretability has elicited questions on their intrinsic properties and capabilities. Crucially, distributional models are often inconsistent when dealing with compositional phenomena in natural language, which has significant implications for their safety and… ▽ More In recent years, distributional language representation models have demonstrated great practical success. At the same time, the need for interpretability has elicited questions on their intrinsic properties and capabilities. Crucially, distributional models are often inconsistent when dealing with compositional phenomena in natural language, which has significant implications for their safety and fairness. Despite this, most current research on compositionality is directed towards improving their performance on similarity tasks only. This work takes a different approach, and proposes a methodology for measuring compositional behavior in contemporary language models. Specifically, we focus on adjectival modifier phenomena in adjective-noun phrases. We introduce three novel tests of compositional behavior inspired by Montague semantics. Our experimental results indicate that current neural language models behave according to the expected linguistic theories to a limited extent only. This raises the question of whether these language models are not able to capture the semantic properties we evaluated, or whether linguistic theories from Montagovian tradition would not match the expected capabilities of distributional models. △ Less

Submitted 3 April, 2023; v1 submitted 10 October, 2022; originally announced December 2022.

arXiv:2210.12054 [pdf, other]

Towards Global Neural Network Abstractions with Locally-Exact Reconstruction

Authors: Edoardo Manino, Iury Bessa, Lucas Cordeiro

Abstract: Neural networks are a powerful class of non-linear functions. However, their black-box nature makes it difficult to explain their behaviour and certify their safety. Abstraction techniques address this challenge by transforming the neural network into a simpler, over-approximated function. Unfortunately, existing abstraction techniques are slack, which limits their applicability to small local reg… ▽ More Neural networks are a powerful class of non-linear functions. However, their black-box nature makes it difficult to explain their behaviour and certify their safety. Abstraction techniques address this challenge by transforming the neural network into a simpler, over-approximated function. Unfortunately, existing abstraction techniques are slack, which limits their applicability to small local regions of the input domain. In this paper, we propose Global Interval Neural Network Abstractions with Center-Exact Reconstruction (GINNACER). Our novel abstraction technique produces sound over-approximation bounds over the whole input domain while guaranteeing exact reconstructions for any given local input. Our experiments show that GINNACER is several orders of magnitude tighter than state-of-the-art global abstraction techniques, while being competitive with local ones. △ Less

Submitted 31 March, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

Comments: Under submission to the Neural Networks Journal (revised version). Sections 2, 4.7, 5.4, Appendix A and B have been added

arXiv:2207.04231 [pdf, other]

CEG4N: Counter-Example Guided Neural Network Quantization Refinement

Authors: João Batista P. Matos Jr., Iury Bessa, Edoardo Manino, Xidan Song, Lucas C. Cordeiro

Abstract: Neural networks are essential components of learning-based software systems. However, their high compute, memory, and power requirements make using them in low resources domains challenging. For this reason, neural networks are often quantized before deployment. Existing quantization techniques tend to degrade the network accuracy. We propose Counter-Example Guided Neural Network Quantization Refi… ▽ More Neural networks are essential components of learning-based software systems. However, their high compute, memory, and power requirements make using them in low resources domains challenging. For this reason, neural networks are often quantized before deployment. Existing quantization techniques tend to degrade the network accuracy. We propose Counter-Example Guided Neural Network Quantization Refinement (CEG4N). This technique combines search-based quantization and equivalence verification: the former minimizes the computational requirements, while the latter guarantees that the network's output does not change after quantization. We evaluate CEG4N~on a diverse set of benchmarks, including large and small networks. Our technique successfully quantizes the networks in our evaluation while producing models with up to 72% better accuracy than state-of-the-art techniques. △ Less

Submitted 9 July, 2022; originally announced July 2022.

arXiv:2206.06043 [pdf, other]

Combining BMC and Fuzzing Techniques for Finding Software Vulnerabilities in Concurrent Programs

Authors: Fatimah K. Aljaafari, Rafael Menezes, Edoardo Manino, Fedor Shmarov, Mustafa A. Mustafa, Lucas C. Cordeiro

Abstract: Finding software vulnerabilities in concurrent programs is a challenging task due to the size of the state-space exploration, as the number of interleavings grows exponentially with the number of program threads and statements. We propose and evaluate EBF (Ensembles of Bounded Model Checking with Fuzzing) -- a technique that combines Bounded Model Checking (BMC) and Gray-Box Fuzzing (GBF) to find… ▽ More Finding software vulnerabilities in concurrent programs is a challenging task due to the size of the state-space exploration, as the number of interleavings grows exponentially with the number of program threads and statements. We propose and evaluate EBF (Ensembles of Bounded Model Checking with Fuzzing) -- a technique that combines Bounded Model Checking (BMC) and Gray-Box Fuzzing (GBF) to find software vulnerabilities in concurrent programs. Since there are no publicly-available GBF tools for concurrent code, we first propose OpenGBF -- a new open-source concurrency-aware gray-box fuzzer that explores different thread schedules by instrumenting the code under test with random delays. Then, we build an ensemble of a BMC tool and OpenGBF in the following way. On the one hand, when the BMC tool in the ensemble returns a counterexample, we use it as a seed for OpenGBF, thus increasing the likelihood of executing paths guarded by complex mathematical expressions. On the other hand, we aggregate the outcomes of the BMC and GBF tools in the ensemble using a decision matrix, thus improving the accuracy of EBF. We evaluate EBF against state-of-the-art pure BMC tools and show that it can generate up to 14.9% more correct verification witnesses than the corresponding BMC tools alone. Furthermore, we demonstrate the efficacy of OpenGBF, by showing that it can find 24.2% of the vulnerabilities in our evaluation suite, while non-concurrency-aware GBF tools can only find 0.55%. Finally, thanks to our concurrency-aware OpenGBF, EBF detects a data race in the open-source wolfMqtt library and reproduces known bugs in several other real-world programs, which demonstrates its effectiveness in finding vulnerabilities in real-world software. △ Less

Submitted 20 October, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

arXiv:2204.12316 [pdf, other]

Systematicity, Compositionality and Transitivity of Deep NLP Models: a Metamorphic Testing Perspective

Authors: Edoardo Manino, Julia Rozanova, Danilo Carvalho, Andre Freitas, Lucas Cordeiro

Abstract: Metamorphic testing has recently been used to check the safety of neural NLP models. Its main advantage is that it does not rely on a ground truth to generate test cases. However, existing studies are mostly concerned with robustness-like metamorphic relations, limiting the scope of linguistic properties they can test. We propose three new classes of metamorphic relations, which address the proper… ▽ More Metamorphic testing has recently been used to check the safety of neural NLP models. Its main advantage is that it does not rely on a ground truth to generate test cases. However, existing studies are mostly concerned with robustness-like metamorphic relations, limiting the scope of linguistic properties they can test. We propose three new classes of metamorphic relations, which address the properties of systematicity, compositionality and transitivity. Unlike robustness, our relations are defined over multiple source inputs, thus increasing the number of test cases that we can produce by a polynomial factor. With them, we test the internal consistency of state-of-the-art NLP models, and show that they do not always behave according to their expected linguistic properties. Lastly, we introduce a novel graphical notation that efficiently summarises the inner structure of metamorphic relations. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: Findings of the Association for Computational Linguistics 2022

arXiv:2111.13110 [pdf, other]

QNNVerifier: A Tool for Verifying Neural Networks using SMT-Based Model Checking

Authors: Xidan Song, Edoardo Manino, Luiz Sena, Erickson Alves, Eddie de Lima Filho, Iury Bessa, Mikel Lujan, Lucas Cordeiro

Abstract: QNNVerifier is the first open-source tool for verifying implementations of neural networks that takes into account the finite word-length (i.e. quantization) of their operands. The novel support for quantization is achieved by employing state-of-the-art software model checking (SMC) techniques. It translates the implementation of neural networks to a decidable fragment of first-order logic based o… ▽ More QNNVerifier is the first open-source tool for verifying implementations of neural networks that takes into account the finite word-length (i.e. quantization) of their operands. The novel support for quantization is achieved by employing state-of-the-art software model checking (SMC) techniques. It translates the implementation of neural networks to a decidable fragment of first-order logic based on satisfiability modulo theories (SMT). The effects of fixed- and floating-point operations are represented through direct implementations given a hardware-determined precision. Furthermore, QNNVerifier allows to specify bespoke safety properties and verify the resulting model with different verification strategies (incremental and k-induction) and SMT solvers. Finally, QNNVerifier is the first tool that combines invariant inference via interval analysis and discretization of non-linear activation functions to speed up the verification of neural networks by orders of magnitude. A video presentation of QNNVerifier is available at https://youtu.be/7jMgOL41zTY △ Less

Submitted 25 November, 2021; originally announced November 2021.

Comments: Submitted to the Demo track of the ICSE 2022 conference

arXiv:2106.05997 [pdf, other]

Verifying Quantized Neural Networks using SMT-Based Model Checking

Authors: Luiz Sena, Xidan Song, Erickson Alves, Iury Bessa, Edoardo Manino, Lucas Cordeiro, Eddie de Lima Filho

Abstract: Artificial Neural Networks (ANNs) are being deployed for an increasing number of safety-critical applications, including autonomous cars and medical diagnosis. However, concerns about their reliability have been raised due to their black-box nature and apparent fragility to adversarial attacks. These concerns are amplified when ANNs are deployed on restricted system, which limit the precision of m… ▽ More Artificial Neural Networks (ANNs) are being deployed for an increasing number of safety-critical applications, including autonomous cars and medical diagnosis. However, concerns about their reliability have been raised due to their black-box nature and apparent fragility to adversarial attacks. These concerns are amplified when ANNs are deployed on restricted system, which limit the precision of mathematical operations and thus introduce additional quantization errors. Here, we develop and evaluate a novel symbolic verification framework using software model checking (SMC) and satisfiability modulo theories (SMT) to check for vulnerabilities in ANNs. More specifically, we propose several ANN-related optimizations for SMC, including invariant inference via interval analysis, slicing, expression simplifications, and discretization of non-linear activation functions. With this verification framework, we can provide formal guarantees on the safe behavior of ANNs implemented both in floating- and fixed-point arithmetic. In this regard, our verification approach was able to verify and produce adversarial examples for $52$ test cases spanning image classification and general machine learning applications. Furthermore, for small- to medium-sized ANN, our approach completes most of its verification runs in minutes. Moreover, in contrast to most state-of-the-art methods, our approach is not restricted to specific choices regarding activation functions and non-quantized representations. Our experiments show that our approach can analyze larger ANN implementations and substantially reduce the verification time compared to state-of-the-art techniques that use SMT solving. △ Less

Submitted 16 September, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: Changes with respect to the previous version: improved explanation of our methodology in Section 3; improved and extended experimental evaluation in Section 4; added comparison with the state of the art in Section 4.5

arXiv:2008.10012 [pdf, ps, other]

doi 10.1007/978-3-030-40943-2_10

Zealotry and Influence Maximization in the Voter Model: When to Target Zealots?

Authors: Guillermo Romero Moreno, Edoardo Manino, Long Tran-Thanh, Markus Brede

Abstract: In this paper, we study influence maximization in the voter model in the presence of biased voters (or zealots) on complex networks. Under what conditions should an external controller with finite budget who aims at maximizing its influence over the system target zealots? Our analysis, based on both analytical and numerical results, shows a rich diagram of preferences and degree-dependencies of al… ▽ More In this paper, we study influence maximization in the voter model in the presence of biased voters (or zealots) on complex networks. Under what conditions should an external controller with finite budget who aims at maximizing its influence over the system target zealots? Our analysis, based on both analytical and numerical results, shows a rich diagram of preferences and degree-dependencies of allocations to zealots and normal agents varying with the budget. We find that when we have a large budget or for low levels of zealotry, optimal strategies should give larger allocations to zealots and allocations are positively correlated with node degree. In contrast, for low budgets or highly-biased zealots, optimal strategies give higher allocations to normal agents, with some residual allocations to zealots, and allocations to both types of agents decrease with node degree. Our results emphasize that heterogeneity in agent properties strongly affects strategies for influence maximization on heterogeneous networks. △ Less

Submitted 23 August, 2020; originally announced August 2020.

Comments: 12 pages, 5 figures. This is a pre-print of a contribution published in Complex Networks XI, Springer Proceedings in Complexity 2020 (editors: Barbosa H., Gomez-Gardenes J., Gonçalves B., Mangioni G., Menezes R., Oliveira M.) published by Springer, Cham

arXiv:1911.05712 [pdf, ps, other]

Streaming Bayesian Inference for Crowdsourced Classification

Authors: Edoardo Manino, Long Tran-Thanh, Nicholas R. Jennings

Abstract: A key challenge in crowdsourcing is inferring the ground truth from noisy and unreliable data. To do so, existing approaches rely on collecting redundant information from the crowd, and aggregating it with some probabilistic method. However, oftentimes such methods are computationally inefficient, are restricted to some specific settings, or lack theoretical guarantees. In this paper, we revisit t… ▽ More A key challenge in crowdsourcing is inferring the ground truth from noisy and unreliable data. To do so, existing approaches rely on collecting redundant information from the crowd, and aggregating it with some probabilistic method. However, oftentimes such methods are computationally inefficient, are restricted to some specific settings, or lack theoretical guarantees. In this paper, we revisit the problem of binary classification from crowdsourced data. Specifically we propose Streaming Bayesian Inference for Crowdsourcing (SBIC), a new algorithm that does not suffer from any of these limitations. First, SBIC has low complexity and can be used in a real-time online setting. Second, SBIC has the same accuracy as the best state-of-the-art algorithms in all settings. Third, SBIC has provable asymptotic guarantees both in the online and offline settings. △ Less

Submitted 13 November, 2019; originally announced November 2019.

Comments: Accepted at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

arXiv:1610.06106 [pdf, other]

Efficiency of active learning for the allocation of workers on crowdsourced classification tasks

Authors: Edoardo Manino, Long Tran-Thanh, Nicholas R. Jennings

Abstract: Crowdsourcing has been successfully employed in the past as an effective and cheap way to execute classification tasks and has therefore attracted the attention of the research community. However, we still lack a theoretical understanding of how to collect the labels from the crowd in an optimal way. In this paper we focus on the problem of worker allocation and compare two active learning policie… ▽ More Crowdsourcing has been successfully employed in the past as an effective and cheap way to execute classification tasks and has therefore attracted the attention of the research community. However, we still lack a theoretical understanding of how to collect the labels from the crowd in an optimal way. In this paper we focus on the problem of worker allocation and compare two active learning policies proposed in the empirical literature with a uniform allocation of the available budget. To this end we make a thorough mathematical analysis of the problem and derive a new bound on the performance of the system. Furthermore we run extensive simulations in a more realistic scenario and show that our theoretical results hold in practice. △ Less

Submitted 19 October, 2016; originally announced October 2016.

Comments: paper accepted in the CrowdML workshop at NIPS 2016

Showing 1–16 of 16 results for author: Manino, E