-
Automatic ICD-10 Code Association: A Challenging Task on French Clinical Texts
Authors:
Yakini Tchouka,
Jean-François Couchot,
David Laiymani,
Philippe Selles,
Azzedine Rahmani
Abstract:
Automatically associating ICD codes with electronic health data is a well-known NLP task in medical research. NLP has evolved significantly in recent years with the emergence of pre-trained language models based on Transformers architecture, mainly in the English language. This paper adapts these models to automatically associate the ICD codes. Several neural network architectures have been experi…
▽ More
Automatically associating ICD codes with electronic health data is a well-known NLP task in medical research. NLP has evolved significantly in recent years with the emergence of pre-trained language models based on Transformers architecture, mainly in the English language. This paper adapts these models to automatically associate the ICD codes. Several neural network architectures have been experimented with to address the challenges of dealing with a large set of both input tokens and labels to be guessed. In this paper, we propose a model that combines the latest advances in NLP and multi-label classification for ICD-10 code association. Fair experiments on a Clinical dataset in the French language show that our approach increases the $F_1$-score metric by more than 55\% compared to state-of-the-art results.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
An Easy-to-use and Robust Approach for the Differentially Private De-Identification of Clinical Textual Documents
Authors:
Yakini Tchouka,
Jean-François Couchot,
David Laiymani
Abstract:
Unstructured textual data is at the heart of healthcare systems. For obvious privacy reasons, these documents are not accessible to researchers as long as they contain personally identifiable information. One way to share this data while respecting the legislative framework (notably GDPR or HIPAA) is, within the medical structures, to de-identify it, i.e. to detect the personal information of a pe…
▽ More
Unstructured textual data is at the heart of healthcare systems. For obvious privacy reasons, these documents are not accessible to researchers as long as they contain personally identifiable information. One way to share this data while respecting the legislative framework (notably GDPR or HIPAA) is, within the medical structures, to de-identify it, i.e. to detect the personal information of a person through a Named Entity Recognition (NER) system and then replacing it to make it very difficult to associate the document with the person. The challenge is having reliable NER and substitution tools without compromising confidentiality and consistency in the document. Most of the conducted research focuses on English medical documents with coarse substitutions by not benefiting from advances in privacy. This paper shows how an efficient and differentially private de-identification approach can be achieved by strengthening the less robust de-identification method and by adapting state-of-the-art differentially private mechanisms for substitution purposes. The result is an approach for de-identifying clinical documents in French language, but also generalizable to other languages and whose robustness is mathematically proven.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
In-stream Probabilistic Cardinality Estimation for Bloom Filters
Authors:
Remy Scholler,
Jean-Francois Couchot,
Oumaima Alaoui-Ismaili,
Denis Renaud,
Eric Ballot
Abstract:
The amount of data coming from different sources such as IoT-sensors, social networks, cellular networks, has increased exponentially during the last few years. Probabilistic Data Structures (PDS) are efficient alternatives to deterministic data structures suitable for large data processing and streaming applications. They are mainly used for approximate membership queries, frequency count, cardin…
▽ More
The amount of data coming from different sources such as IoT-sensors, social networks, cellular networks, has increased exponentially during the last few years. Probabilistic Data Structures (PDS) are efficient alternatives to deterministic data structures suitable for large data processing and streaming applications. They are mainly used for approximate membership queries, frequency count, cardinality estimation and similarity research. Finding the number of distinct elements in a large dataset or in streaming data is an active research area. In this work, we show that usual methods based on Bloom filters for this kind of cardinality estimation are relatively accurate on average but have a high variance. Therefore, reducing this variance is interesting to obtain accurate statistics. We propose a probabilistic approach to estimate more accurately the cardinality of a Bloom filter based on its parameters, i.e., number of hash functions $k$, size $m$, and a counter $s$ which is incremented whenever an element is not in the filter (i.e., when the result of the membership query for this element is negative). The value of the counter can never be larger than the exact cardinality due to the Bloom filter's nature, but hash collisions can cause it to underestimate it. This creates a counting error that we estimate accurately, in-stream, along with its standard deviation. We also discuss a way to optimize the parameters of a Bloom filter based on its counting error. We evaluate our approach with synthetic data created from an analysis of a real mobility dataset provided by a mobile network operator in the form of displacement matrices computed from mobile phone records. The approach proposed here performs at least as well on average and has a much lower variance (about 6 to 7 times less) than state of the art methods.
△ Less
Submitted 1 November, 2022; v1 submitted 27 October, 2022;
originally announced October 2022.
-
De-Identification of French Unstructured Clinical Notes for Machine Learning Tasks
Authors:
Yakini Tchouka,
Jean-François Couchot,
Maxime Coulmeau,
David Laiymani,
Philippe Selles,
Azzedine Rahmani
Abstract:
Unstructured textual data are at the heart of health systems: liaison letters between doctors, operating reports, coding of procedures according to the ICD-10 standard, etc. The details included in these documents make it possible to get to know the patient better, to better manage him or her, to better study the pathologies, to accurately remunerate the associated medical acts\ldots All this seem…
▽ More
Unstructured textual data are at the heart of health systems: liaison letters between doctors, operating reports, coding of procedures according to the ICD-10 standard, etc. The details included in these documents make it possible to get to know the patient better, to better manage him or her, to better study the pathologies, to accurately remunerate the associated medical acts\ldots All this seems to be (at least partially) within reach of today by artificial intelligence techniques. However, for obvious reasons of privacy protection, the designers of these AIs do not have the legal right to access these documents as long as they contain identifying data. De-identifying these documents, i.e. detecting and deleting all identifying information present in them, is a legally necessary step for sharing this data between two complementary worlds. Over the last decade, several proposals have been made to de-identify documents, mainly in English. While the detection scores are often high, the substitution methods are often not very robust to attack. In French, very few methods are based on arbitrary detection and/or substitution rules. In this paper, we propose a new comprehensive de-identification method dedicated to French-language medical documents. Both the approach for the detection of identifying elements (based on deep learning) and their substitution (based on differential privacy) are based on the most proven existing approaches. The result is an approach that effectively protects the privacy of the patients at the heart of these medical documents. The whole approach has been evaluated on a French language medical dataset of a French public hospital and the results are very encouraging.
△ Less
Submitted 6 October, 2023; v1 submitted 16 September, 2022;
originally announced September 2022.
-
On the Risks of Collecting Multidimensional Data Under Local Differential Privacy
Authors:
Héber H. Arcolezi,
Sébastien Gambs,
Jean-François Couchot,
Catuscia Palamidessi
Abstract:
The private collection of multiple statistics from a population is a fundamental statistical problem. One possible approach to realize this is to rely on the local model of differential privacy (LDP). Numerous LDP protocols have been developed for the task of frequency estimation of single and multiple attributes. These studies mainly focused on improving the utility of the algorithms to ensure th…
▽ More
The private collection of multiple statistics from a population is a fundamental statistical problem. One possible approach to realize this is to rely on the local model of differential privacy (LDP). Numerous LDP protocols have been developed for the task of frequency estimation of single and multiple attributes. These studies mainly focused on improving the utility of the algorithms to ensure the server performs the estimations accurately. In this paper, we investigate privacy threats (re-identification and attribute inference attacks) against LDP protocols for multidimensional data following two state-of-the-art solutions for frequency estimation of multiple attributes. To broaden the scope of our study, we have also experimentally assessed five widely used LDP protocols, namely, generalized randomized response, optimal local hashing, subset selection, RAPPOR and optimal unary encoding. Finally, we also proposed a countermeasure that improves both utility and robustness against the identified threats. Our contributions can help practitioners aiming to collect users' statistics privately to decide which LDP mechanism best fits their needs.
△ Less
Submitted 1 August, 2023; v1 submitted 4 September, 2022;
originally announced September 2022.
-
Multi-Freq-LDPy: Multiple Frequency Estimation Under Local Differential Privacy in Python
Authors:
Héber H. Arcolezi,
Jean-François Couchot,
Sébastien Gambs,
Catuscia Palamidessi,
Majid Zolfaghari
Abstract:
This paper introduces the multi-freq-ldpy Python package for multiple frequency estimation under Local Differential Privacy (LDP) guarantees. LDP is a gold standard for achieving local privacy with several real-world implementations by big tech companies such as Google, Apple, and Microsoft. The primary application of LDP is frequency (or histogram) estimation, in which the aggregator estimates th…
▽ More
This paper introduces the multi-freq-ldpy Python package for multiple frequency estimation under Local Differential Privacy (LDP) guarantees. LDP is a gold standard for achieving local privacy with several real-world implementations by big tech companies such as Google, Apple, and Microsoft. The primary application of LDP is frequency (or histogram) estimation, in which the aggregator estimates the number of times each value has been reported. The presented package provides an easy-to-use and fast implementation of state-of-the-art solutions and LDP protocols for frequency estimation of: single attribute (i.e., the building blocks), multiple attributes (i.e., multidimensional data), multiple collections (i.e., longitudinal data), and both multiple attributes/collections. Multi-freq-ldpy is built on the well-established Numpy package -- a de facto standard for scientific computing in Python -- and the Numba package for fast execution. These features are described and illustrated in this paper with four worked examples. This package is open-source and publicly available under an MIT license via GitHub (https://github.com/hharcolezi/multi-freq-ldpy) and can be installed via PyPI (https://pypi.org/project/multi-freq-ldpy/).
△ Less
Submitted 23 September, 2022; v1 submitted 5 May, 2022;
originally announced May 2022.
-
Differentially Private Multivariate Time Series Forecasting of Aggregated Human Mobility With Deep Learning: Input or Gradient Perturbation?
Authors:
Héber H. Arcolezi,
Jean-François Couchot,
Denis Renaud,
Bechara Al Bouna,
Xiaokui Xiao
Abstract:
This paper investigates the problem of forecasting multivariate aggregated human mobility while preserving the privacy of the individuals concerned. Differential privacy, a state-of-the-art formal notion, has been used as the privacy guarantee in two different and independent steps when training deep learning models. On one hand, we considered \textit{gradient perturbation}, which uses the differe…
▽ More
This paper investigates the problem of forecasting multivariate aggregated human mobility while preserving the privacy of the individuals concerned. Differential privacy, a state-of-the-art formal notion, has been used as the privacy guarantee in two different and independent steps when training deep learning models. On one hand, we considered \textit{gradient perturbation}, which uses the differentially private stochastic gradient descent algorithm to guarantee the privacy of each time series sample in the learning stage. On the other hand, we considered \textit{input perturbation}, which adds differential privacy guarantees in each sample of the series before applying any learning. We compared four state-of-the-art recurrent neural networks: Long Short-Term Memory, Gated Recurrent Unit, and their Bidirectional architectures, i.e., Bidirectional-LSTM and Bidirectional-GRU. Extensive experiments were conducted with a real-world multivariate mobility dataset, which we published openly along with this paper. As shown in the results, differentially private deep learning models trained under gradient or input perturbation achieve nearly the same performance as non-private deep learning models, with loss in performance varying between $0.57\%$ to $2.8\%$. The contribution of this paper is significant for those involved in urban planning and decision-making, providing a solution to the human mobility multivariate forecast problem through differentially private deep learning models.
△ Less
Submitted 3 June, 2022; v1 submitted 1 May, 2022;
originally announced May 2022.
-
Improving the utility of locally differentially private protocols for longitudinal and multidimensional frequency estimates
Authors:
Héber H. Arcolezi,
Jean-François Couchot,
Bechara Al Bouna,
Xiaokui Xiao
Abstract:
This paper investigates the problem of collecting multidimensional data throughout time (i.e., longitudinal studies) for the fundamental task of frequency estimation under Local Differential Privacy (LDP) guarantees. Contrary to frequency estimation of a single attribute, the multidimensional aspect demands particular attention to the privacy budget. Besides, when collecting user statistics longit…
▽ More
This paper investigates the problem of collecting multidimensional data throughout time (i.e., longitudinal studies) for the fundamental task of frequency estimation under Local Differential Privacy (LDP) guarantees. Contrary to frequency estimation of a single attribute, the multidimensional aspect demands particular attention to the privacy budget. Besides, when collecting user statistics longitudinally, privacy progressively degrades. Indeed, the "multiple" settings in combination (i.e., many attributes and several collections throughout time) impose several challenges, for which this paper proposes the first solution for frequency estimates under LDP. To tackle these issues, we extend the analysis of three state-of-the-art LDP protocols (Generalized Randomized Response -- GRR, Optimized Unary Encoding -- OUE, and Symmetric Unary Encoding -- SUE) for both longitudinal and multidimensional data collections. While the known literature uses OUE and SUE for two rounds of sanitization (a.k.a. memoization), i.e., L-OUE and L-SUE, respectively, we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility (i.e., L-OSUE). Also, for attributes with small domain sizes, we propose Longitudinal GRR (L-GRR), which provides higher utility than the other protocols based on unary encoding. Last, we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates (ALLOMFREE), which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol, i.e., either L-GRR or L-OSUE. As shown in the results, ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.
△ Less
Submitted 16 July, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.
-
Random Sampling Plus Fake Data: Multidimensional Frequency Estimates With Local Differential Privacy
Authors:
Héber H. Arcolezi,
Jean-François Couchot,
Bechara Al Bouna,
Xiaokui Xiao
Abstract:
With local differential privacy (LDP), users can privatize their data and thus guarantee privacy properties before transmitting it to the server (a.k.a. the aggregator). One primary objective of LDP is frequency (or histogram) estimation, in which the aggregator estimates the number of users for each possible value. In practice, when a study with rich content on a population is desired, the intere…
▽ More
With local differential privacy (LDP), users can privatize their data and thus guarantee privacy properties before transmitting it to the server (a.k.a. the aggregator). One primary objective of LDP is frequency (or histogram) estimation, in which the aggregator estimates the number of users for each possible value. In practice, when a study with rich content on a population is desired, the interest is in the multiple attributes of the population, that is to say, in multidimensional data ($d \geq 2$). However, contrary to the problem of frequency estimation of a single attribute (the majority of the works), the multidimensional aspect imposes to pay particular attention to the privacy budget. This one can indeed grow extremely quickly due to the composition theorem. To the authors' knowledge, two solutions seem to stand out for this task: 1) splitting the privacy budget for each attribute, i.e., send each value with $\fracε{d}$-LDP (Spl), and 2) random sampling a single attribute and spend all the privacy budget to send it with $ε$-LDP (Smp). Although Smp adds additional sampling error, it has proven to provide higher data utility than the former Spl solution. However, we argue that aggregators (who are also seen as attackers) are aware of the sampled attribute and its LDP value, which is protected by a "less strict" $e^ε$ probability bound (rather than $e^{ε/d}$). This way, we propose a solution named Random Sampling plus Fake Data (RS+FD), which allows creating uncertainty over the sampled attribute by generating fake data for each non-sampled attribute; RS+FD further benefits from amplification by sampling. We theoretically and experimentally validate our proposed solution on both synthetic and real-world datasets to show that RS+FD achieves nearly the same or better utility than the state-of-the-art Smp solution.
△ Less
Submitted 15 September, 2021;
originally announced September 2021.
-
Safe Disassociation of Set-Valued Datasets
Authors:
Nancy Awad,
Bechara Al Bouna,
Jean-Francois Couchot,
Laurent Philippe
Abstract:
Disassociation introduced by Terrovitis et al. is a bucketization based anonimyzation technique that divides a set-valued dataset into several clusters to hide the link between individuals and their complete set of items. It increases the utility of the anonymized dataset, but on the other side, it raises many privacy concerns, one in particular, is when the items are tightly coupled to form what…
▽ More
Disassociation introduced by Terrovitis et al. is a bucketization based anonimyzation technique that divides a set-valued dataset into several clusters to hide the link between individuals and their complete set of items. It increases the utility of the anonymized dataset, but on the other side, it raises many privacy concerns, one in particular, is when the items are tightly coupled to form what is called, a cover problem. In this paper, we present safe disassociation, a technique that relies on partial-suppression, to overcome the aforementioned privacy breach encountered when disassociating set-valued datasets. Safe disassociation allows the $k^m$-anonymity privacy constraint to be extended to a bucketized dataset and copes with the cover problem. We describe our algorithm that achieves the safe disassociation and we provide a set of experiments to demonstrate its efficiency.
△ Less
Submitted 5 April, 2019;
originally announced April 2019.
-
Traversing a n-cube without Balanced Hamiltonian Cycle to Generate Pseudorandom Numbers
Authors:
Jean-François Couchot,
Pierre-Cyrille Heam,
Christophe Guyeux,
Qianxue Wang,
Jacques M. Bahi
Abstract:
This article presents a new class of Pseudorandom Number Generators. The generators are based on traversing a n-cube where a Balanced Hamiltonian Cycle has been removed. The construction of such generators is automatic for small number of bits, but remains an open problem when this number becomes large. A running example is used throughout the paper. Finally, first statistical experiments of these…
▽ More
This article presents a new class of Pseudorandom Number Generators. The generators are based on traversing a n-cube where a Balanced Hamiltonian Cycle has been removed. The construction of such generators is automatic for small number of bits, but remains an open problem when this number becomes large. A running example is used throughout the paper. Finally, first statistical experiments of these generators are presented, they show how efficient and promising the proposed approach seems.
△ Less
Submitted 26 June, 2017;
originally announced June 2017.
-
A Robust Data Hiding Process Contributing to the Development of a Semantic Web
Authors:
Jacques M. Bahi,
Jean-François Couchot,
Nicolas Friot,
Christophe Guyeux
Abstract:
In this paper, a novel steganographic scheme based on chaotic iterations is proposed. This research work takes place into the information hiding framework, and focus more specifically on robust steganography. Steganographic algorithms can participate in the development of a semantic web: medias being on the Internet can be enriched by information related to their contents, authors, etc., leading t…
▽ More
In this paper, a novel steganographic scheme based on chaotic iterations is proposed. This research work takes place into the information hiding framework, and focus more specifically on robust steganography. Steganographic algorithms can participate in the development of a semantic web: medias being on the Internet can be enriched by information related to their contents, authors, etc., leading to better results for the search engines that can deal with such tags. As media can be modified by users for various reasons, it is preferable that these embedding tags can resist to changes resulting from some classical transformations as for example crop**, rotation, image conversion, and so on. This is why a new robust watermarking scheme for semantic search engines is proposed in this document. For the sake of completeness, the robustness of this scheme is finally compared to existing established algorithms.
△ Less
Submitted 27 June, 2017;
originally announced June 2017.
-
Well-supported phylogenies using largest subsets of core-genes by discrete particle swarm optimization
Authors:
Reem Alsrraj,
Bassam AlKindy,
Christophe Guyeux,
Laurent Philippe,
Jean-François Couchot
Abstract:
The number of complete chloroplastic genomes increases day after day, making it possible to rethink plants phylogeny at the biomolecular era. Given a set of close plants sharing in the order of one hundred of core chloroplastic genes, this article focuses on how to extract the largest subset of sequences in order to obtain the most supported species tree. Due to computational complexity, a discret…
▽ More
The number of complete chloroplastic genomes increases day after day, making it possible to rethink plants phylogeny at the biomolecular era. Given a set of close plants sharing in the order of one hundred of core chloroplastic genes, this article focuses on how to extract the largest subset of sequences in order to obtain the most supported species tree. Due to computational complexity, a discrete and distributed Particle Swarm Optimization (DPSO) is proposed. It is finally applied to the core genes of Rosales order.
△ Less
Submitted 25 June, 2017;
originally announced June 2017.
-
One random jump and one permutation: sufficient conditions to chaotic, statistically faultless, and large throughput PRNG for FPGA
Authors:
Mohammed Bakiri,
Jean-François Couchot,
Christophe Guyeux
Abstract:
Sub-categories of mathematical topology, like the mathematical theory of chaos, offer interesting applications devoted to information security. In this research work, we have introduced a new chaos-based pseudorandom number generator implemented in FPGA, which is mainly based on the deletion of a Hamilton cycle within the $n$-cube (or on the vectorial negation), plus one single permutation. By doi…
▽ More
Sub-categories of mathematical topology, like the mathematical theory of chaos, offer interesting applications devoted to information security. In this research work, we have introduced a new chaos-based pseudorandom number generator implemented in FPGA, which is mainly based on the deletion of a Hamilton cycle within the $n$-cube (or on the vectorial negation), plus one single permutation. By doing so, we produce a kind of post-treatment on hardware pseudorandom generators, but the obtained generator has usually a better statistical profile than its input, while running at a similar speed. We tested 6 combinations of Boolean functions and strategies that all achieve to pass the most stringent TestU01 battery of tests. This generation can reach a throughput/latency ratio equal to 6.7 Gbps, being thus the second fastest FPGA generator that can pass TestU01.
△ Less
Submitted 25 June, 2017;
originally announced June 2017.
-
On the ability to reconstruct ancestral genomes from Mycobacterium genus
Authors:
Christophe Guyeux,
Bashar Al-Nuaimi,
Bassam AlKindy,
Jean-François Couchot,
Michel Salomon
Abstract:
Technical signs of progress during the last decades has led to a situation in which the accumulation of genome sequence data is increasingly fast and cheap. The huge amount of molecular data available nowadays can help addressing new and essential questions in Evolution. However, reconstructing evolution of DNA sequences requires models, algorithms, statistical and computational methods of ever in…
▽ More
Technical signs of progress during the last decades has led to a situation in which the accumulation of genome sequence data is increasingly fast and cheap. The huge amount of molecular data available nowadays can help addressing new and essential questions in Evolution. However, reconstructing evolution of DNA sequences requires models, algorithms, statistical and computational methods of ever increasing complexity. Since most dramatic genomic changes are caused by genome rearrangements (gene duplications, gain/loss events), it becomes crucial to understand their mechanisms and reconstruct ancestors of the given genomes. This problem is known to be NP-complete even in the "simplest" case of three genomes. Heuristic algorithms are usually executed to provide approximations of the exact solution.
We state that, even if the ancestral reconstruction problem is NP-hard in theory, its exact resolution is feasible in various situations, encompassing organelles and some bacteria. Such accurate reconstruction, which identifies too some highly homoplasic mutations whose ancestral status is undecidable, will be initiated in this work-in-progress, to reconstruct ancestral genomes of two Mycobacterium pathogenetic bacterias. By mixing automatic reconstruction of obvious situations with human interventions on signaled problematic cases, we will indicate that it should be possible to achieve a concrete, complete, and really accurate reconstruction of lineages of the Mycobacterium tuberculosis complex. Thus, it is possible to investigate how these genomes have evolved from their last common ancestors.
△ Less
Submitted 30 April, 2017;
originally announced May 2017.
-
Random Walk in a N-cube Without Hamiltonian Cycle to Chaotic Pseudorandom Number Generation: Theoretical and Practical Considerations
Authors:
Sylvain Contassot-Vivier,
Jean-François Couchot,
Christophe Guyeux,
Pierre-Cyrille Heam
Abstract:
Designing a pseudorandom number generator (PRNG) is a difficult and complex task. Many recent works have considered chaotic functions as the basis of built PRNGs: the quality of the output would indeed be an obvious consequence of some chaos properties. However, there is no direct reasoning that goes from chaotic functions to uniform distribution of the output. Moreover, embedding such kind of fun…
▽ More
Designing a pseudorandom number generator (PRNG) is a difficult and complex task. Many recent works have considered chaotic functions as the basis of built PRNGs: the quality of the output would indeed be an obvious consequence of some chaos properties. However, there is no direct reasoning that goes from chaotic functions to uniform distribution of the output. Moreover, embedding such kind of functions into a PRNG does not necessarily allow to get a chaotic output, which could be required for simulating some chaotic behaviors.
In a previous work, some of the authors have proposed the idea of walking into a $\mathsf{N}$-cube where a balanced Hamiltonian cycle has been removed as the basis of a chaotic PRNG. In this article, all the difficult issues observed in the previous work have been tackled. The chaotic behavior of the whole PRNG is proven. The construction of the balanced Hamiltonian cycle is theoretically and practically solved. An upper bound of the expected length of the walk to obtain a uniform distribution is calculated. Finally practical experiments show that the generators successfully pass the classical statistical tests.
△ Less
Submitted 8 February, 2017;
originally announced February 2017.
-
Improving Blind Steganalysis in Spatial Domain using a Criterion to Choose the Appropriate Steganalyzer between CNN and SRM+EC
Authors:
Jean-Francois Couchot,
Raphaël Couturier,
Michel Salomon
Abstract:
Conventional state-of-the-art image steganalysis approaches usually consist of a classifier trained with features provided by rich image models. As both features extraction and classification steps are perfectly embodied in the deep learning architecture called Convolutional Neural Network (CNN), different studies have tried to design a CNN-based steganalyzer. The network designed by Xu et al. is…
▽ More
Conventional state-of-the-art image steganalysis approaches usually consist of a classifier trained with features provided by rich image models. As both features extraction and classification steps are perfectly embodied in the deep learning architecture called Convolutional Neural Network (CNN), different studies have tried to design a CNN-based steganalyzer. The network designed by Xu et al. is the first competitive CNN with the combination Spatial Rich Models (SRM) and Ensemble Classifier (EC) providing detection performances of the same order. In this work we propose a criterion to choose either the CNN or the SRM+EC method for a given input image. Our approach is studied with three different steganographic spatial domain algorithms: S-UNIWARD, MiPOD, and HILL, using the Tensorflow computing platform, and exhibits detection capabilities better than each method alone. Furthermore, as SRM+EC and the CNN are both only trained with a single embedding algorithm, namely MiPOD, the proposed method can be seen as an approach for blind steganalysis. In blind detection, error rates are respectively of 16% for S-UNIWARD, 16% for MiPOD, and 17% for HILL on the BOSSBase with a payload of 0.4 bpp. For 0.1 bpp, the respective corresponding error rates are of 39%, 38%, and 41%, and are always better than the ones provided by SRM+EC.
△ Less
Submitted 9 January, 2017; v1 submitted 28 December, 2016;
originally announced December 2016.
-
FPGA Implementation of $\mathbb{F}_2$-Linear Pseudorandom Number Generators Based on Zynq MPSoC: a Chaotic Iterations Post Processing Case Study
Authors:
Mohammed Bakiri,
Jean-François Couchot,
Christophe Guyeux
Abstract:
Pseudorandom number generation (PRNG) is a key element in hardware security platforms like field-programmable gate array FPGA circuits. In this article, 18 PRNGs belonging in 4 families (xorshift, LFSR, TGFSR, and LCG) are physically implemented in a FPGA and compared in terms of area, throughput, and statistical tests. Two flows of conception are used for Register Transfer Level (RTL) and High-le…
▽ More
Pseudorandom number generation (PRNG) is a key element in hardware security platforms like field-programmable gate array FPGA circuits. In this article, 18 PRNGs belonging in 4 families (xorshift, LFSR, TGFSR, and LCG) are physically implemented in a FPGA and compared in terms of area, throughput, and statistical tests. Two flows of conception are used for Register Transfer Level (RTL) and High-level Synthesis (HLS). Additionally, the relations between linear complexity, seeds, and arithmetic operations on the one hand, and the resources deployed in FPGA on the other hand, are deeply investigated. In order to do that, a SoC based on Zynq EPP with ARM Cortex-$A9$ MPSoC is developed to accelerate the implementation and the tests of various PRNGs on FPGA hardware. A case study is finally proposed using chaotic iterations as a post processing for FPGA. The latter has improved the statistical profile of a combination of PRNGs that, without it, failed in the so-called TestU01 statistical battery of tests.
△ Less
Submitted 25 November, 2016;
originally announced November 2016.
-
A Second Order Derivatives based Approach for Steganography
Authors:
Jean-François Couchot,
Raphaël Couturier,
Yousra Ahmed Fadil,
Christophe Guyeux
Abstract:
Steganography schemes are designed with the objective of minimizing a defined distortion function. In most existing state of the art approaches, this distortion function is based on image feature preservation. Since smooth regions or clean edges define image core, even a small modification in these areas largely modifies image features and is thus easily detectable. On the contrary, textures, nois…
▽ More
Steganography schemes are designed with the objective of minimizing a defined distortion function. In most existing state of the art approaches, this distortion function is based on image feature preservation. Since smooth regions or clean edges define image core, even a small modification in these areas largely modifies image features and is thus easily detectable. On the contrary, textures, noisy or chaotic regions are so difficult to model that the features having been modified inside these areas are similar to the initial ones. These regions are characterized by disturbed level curves. This work presents a new distortion function for steganography that is based on second order derivatives, which are mathematical tools that usually evaluate level curves. Two methods are explained to compute these partial derivatives and have been completely implemented. The first experiments show that these approaches are promising.
△ Less
Submitted 25 November, 2016;
originally announced November 2016.
-
Relation between Gene Content and Taxonomy in Chloroplasts
Authors:
Bashar Al-Nuaimi,
Christophe Guyeux,
Bassam AlKindy,
Jean-François Couchot,
Michel Salomon
Abstract:
The aim of this study is to investigate the relation that can be found between the phylogeny of a large set of complete chloroplast genomes, and the evolution of gene content inside these sequences. Core and pan genomes have been computed on \textit{de novo} annotation of these 845 genomes, the former being used for producing well-supported phylogenetic tree while the latter provides information r…
▽ More
The aim of this study is to investigate the relation that can be found between the phylogeny of a large set of complete chloroplast genomes, and the evolution of gene content inside these sequences. Core and pan genomes have been computed on \textit{de novo} annotation of these 845 genomes, the former being used for producing well-supported phylogenetic tree while the latter provides information regarding the evolution of gene contents over time. It details too the specificity of some branches of the tree, when specificity is obtained on accessory genes. After having detailed the material and methods, we emphasize some remarkable relation between well-known events of the chloroplast history, like endosymbiosis, and the evolution of gene contents over the phylogenetic tree.
△ Less
Submitted 20 September, 2016;
originally announced September 2016.
-
Binary Particle Swarm Optimization versus Hybrid Genetic Algorithm for Inferring Well Supported Phylogenetic Trees
Authors:
Bassam AlKindy,
Bashar Al-Nuaimi,
Christophe Guyeux,
Jean-François Couchot,
Michel Salomon,
Reem Alsrraj,
Laurent Philippe
Abstract:
The amount of completely sequenced chloroplast genomes increases rapidly every day, leading to the possibility to build large-scale phylogenetic trees of plant species. Considering a subset of close plant species defined according to their chloroplasts, the phylogenetic tree that can be inferred by their core genes is not necessarily well supported, due to the possible occurrence of problematic ge…
▽ More
The amount of completely sequenced chloroplast genomes increases rapidly every day, leading to the possibility to build large-scale phylogenetic trees of plant species. Considering a subset of close plant species defined according to their chloroplasts, the phylogenetic tree that can be inferred by their core genes is not necessarily well supported, due to the possible occurrence of problematic genes (i.e., homoplasy, incomplete lineage sorting, horizontal gene transfers, etc.) which may blur the phylogenetic signal. However, a trustworthy phylogenetic tree can still be obtained provided such a number of blurring genes is reduced. The problem is thus to determine the largest subset of core genes that produces the best-supported tree. To discard problematic genes and due to the overwhelming number of possible combinations, this article focuses on how to extract the largest subset of sequences in order to obtain the most supported species tree. Due to computational complexity, a distributed Binary Particle Swarm Optimization (BPSO) is proposed in sequential and distributed fashions. Obtained results from both versions of the BPSO are compared with those computed using an hybrid approach embedding both genetic algorithms and statistical tests. The proposal has been applied to different cases of plant families, leading to encouraging results for these families.
△ Less
Submitted 31 August, 2016;
originally announced August 2016.
-
Quality Analysis of a Chaotic Proven Keyed Hash Function
Authors:
Jacques M. Bahi,
Jean-François Couchot,
Christophe Guyeux
Abstract:
Hash functions are cryptographic tools, which are notably involved in integrity checking and password storage. They are of primary importance to improve the security of exchanges through the Internet. However, as security flaws have been recently identified in the current standard in this domain, new ways to hash digital data must be investigated. In this document an original keyed hash function i…
▽ More
Hash functions are cryptographic tools, which are notably involved in integrity checking and password storage. They are of primary importance to improve the security of exchanges through the Internet. However, as security flaws have been recently identified in the current standard in this domain, new ways to hash digital data must be investigated. In this document an original keyed hash function is evaluated. It is based on asynchronous iterations leading to functions that have been proven to be chaotic. It thus possesses various topological properties as uniformity and sensibility to its initial condition. These properties make our hash function satisfies established security requirements in this field. This claim is qualitatively proven and experimentally verified in this research work, among other things by realizing a large number of simulations.
△ Less
Submitted 21 August, 2016;
originally announced August 2016.
-
Neural Networks and Chaos: Construction, Evaluation of Chaotic Networks, and Prediction of Chaos with Multilayer Feedforward Networks
Authors:
Jacques M. Bahi,
Jean-François Couchot,
Christophe Guyeux,
Michel Salomon
Abstract:
Many research works deal with chaotic neural networks for various fields of application. Unfortunately, up to now these networks are usually claimed to be chaotic without any mathematical proof. The purpose of this paper is to establish, based on a rigorous theoretical framework, an equivalence between chaotic iterations according to Devaney and a particular class of neural networks. On the one ha…
▽ More
Many research works deal with chaotic neural networks for various fields of application. Unfortunately, up to now these networks are usually claimed to be chaotic without any mathematical proof. The purpose of this paper is to establish, based on a rigorous theoretical framework, an equivalence between chaotic iterations according to Devaney and a particular class of neural networks. On the one hand we show how to build such a network, on the other hand we provide a method to check if a neural network is a chaotic one. Finally, the ability of classical feedforward multilayer perceptrons to learn sets of data obtained from a dynamical system is regarded. Various Boolean functions are iterated on finite states. Iterations of some of them are proven to be chaotic as it is defined by Devaney. In that context, important differences occur in the training process, establishing with various neural networks that chaotic behaviors are far more difficult to learn.
△ Less
Submitted 21 August, 2016;
originally announced August 2016.
-
Steganalyzer performances in operational contexts
Authors:
Yousra A. Fadil,
Jean-François Couchot,
Raphaël Couturier,
Christophe Guyeux
Abstract:
Steganography and steganalysis are two important branches of the information hiding field of research. Steganography methods consist in hiding information in such a way that the secret message is undetectable for the uninitiated. Steganalyzis encompasses all the techniques that attempt to detect the presence of such hidden information. This latter is usually designed by making classifiers able to…
▽ More
Steganography and steganalysis are two important branches of the information hiding field of research. Steganography methods consist in hiding information in such a way that the secret message is undetectable for the uninitiated. Steganalyzis encompasses all the techniques that attempt to detect the presence of such hidden information. This latter is usually designed by making classifiers able to separate innocent images from steganographied ones according to their differences on well-selected features. We wonder, in this article whether it is possible to construct a kind of universal steganalyzer without any knowledge regarding the steganographier side. The effects on the classification score of a modification of either parameters or methods between the learning and testing stages are then evaluated, while the possibility to improve the separation score by merging many methods during learning stage is deeper investigated.
△ Less
Submitted 20 August, 2016;
originally announced August 2016.
-
Steganalysis via a Convolutional Neural Network using Large Convolution Filters for Embedding Process with Same Stego Key
Authors:
Jean-François Couchot,
Raphaël Couturier,
Christophe Guyeux,
Michel Salomon
Abstract:
For the past few years, in the race between image steganography and steganalysis, deep learning has emerged as a very promising alternative to steganalyzer approaches based on rich image models combined with ensemble classifiers. A key knowledge of image steganalyzer, which combines relevant image features and innovative classification procedures, can be deduced by a deep learning approach called…
▽ More
For the past few years, in the race between image steganography and steganalysis, deep learning has emerged as a very promising alternative to steganalyzer approaches based on rich image models combined with ensemble classifiers. A key knowledge of image steganalyzer, which combines relevant image features and innovative classification procedures, can be deduced by a deep learning approach called Convolutional Neural Networks (CNN). These kind of deep learning networks is so well-suited for classification tasks based on the detection of variations in 2D shapes that it is the state-of-the-art in many image recognition problems. In this article, we design a CNN-based steganalyzer for images obtained by applying steganography with a unique embedding key. This one is quite different from the previous study of {\em Qian et al.} and its successor, namely {\em Pibre et al.} The proposed architecture embeds less convolutions, with much larger filters in the final convolutional layer, and is more general: it is able to deal with larger images and lower payloads. For the "same embedding key" scenario, our proposal outperforms all other steganalyzers, in particular the existing CNN-based ones, and defeats many state-of-the-art image steganography schemes.
△ Less
Submitted 30 July, 2016; v1 submitted 25 May, 2016;
originally announced May 2016.
-
Improved Core Genes Prediction for Constructing well-supported Phylogenetic Trees in large sets of Plant Species
Authors:
Bassam AlKindy,
Huda Al-Nayyef,
Christophe Guyeux,
Jean-François Couchot,
Michel Salomon,
Jacques M. Bahi
Abstract:
The way to infer well-supported phylogenetic trees that precisely reflect the evolutionary process is a challenging task that completely depends on the way the related core genes have been found. In previous computational biology studies, many similarity based algorithms, mainly dependent on calculating sequence alignment matrices, have been proposed to find them. In these kinds of approaches, a s…
▽ More
The way to infer well-supported phylogenetic trees that precisely reflect the evolutionary process is a challenging task that completely depends on the way the related core genes have been found. In previous computational biology studies, many similarity based algorithms, mainly dependent on calculating sequence alignment matrices, have been proposed to find them. In these kinds of approaches, a significantly high similarity score between two coding sequences extracted from a given annotation tool means that one has the same genes. In a previous work article, we presented a quality test approach (QTA) that improves the core genes quality by combining two annotation tools (namely NCBI, a partially human-curated database, and DOGMA, an efficient annotation algorithm for chloroplasts). This method takes the advantages from both sequence similarity and gene features to guarantee that the core genome contains correct and well-clustered coding sequences (\emph{i.e.}, genes). We then show in this article how useful are such well-defined core genes for biomolecular phylogenetic reconstructions, by investigating various subsets of core genes at various family or genus levels, leading to subtrees with strong bootstraps that are finally merged in a well-supported supertree.
△ Less
Submitted 23 April, 2015;
originally announced April 2015.
-
Hybrid Genetic Algorithm and Lasso Test Approach for Inferring Well Supported Phylogenetic Trees based on Subsets of Chloroplastic Core Genes
Authors:
Bassam AlKindy,
Christophe Guyeux,
Jean-François Couchot,
Michel Salomon,
Christian Parisod,
Jacques M. Bahi
Abstract:
The amount of completely sequenced chloroplast genomes increases rapidly every day, leading to the possibility to build large scale phylogenetic trees of plant species. Considering a subset of close plant species defined according to their chloroplasts, the phylogenetic tree that can be inferred by their core genes is not necessarily well supported, due to the possible occurrence of "problematic"…
▽ More
The amount of completely sequenced chloroplast genomes increases rapidly every day, leading to the possibility to build large scale phylogenetic trees of plant species. Considering a subset of close plant species defined according to their chloroplasts, the phylogenetic tree that can be inferred by their core genes is not necessarily well supported, due to the possible occurrence of "problematic" genes (i.e., homoplasy, incomplete lineage sorting, horizontal gene transfers, etc.) which may blur phylogenetic signal. However, a trustworthy phylogenetic tree can still be obtained if the number of problematic genes is low, the problem being to determine the largest subset of core genes that produces the best supported tree. To discard problematic genes and due to the overwhelming number of possible combinations, we propose an hybrid approach that embeds both genetic algorithms and statistical tests. Given a set of organisms, the result is a pipeline of many stages for the production of well supported phylogenetic trees. The proposal has been applied to different cases of plant families, leading to encouraging results for these families.
△ Less
Submitted 20 April, 2015;
originally announced April 2015.
-
Gene Similarity-based Approaches for Determining Core-Genes of Chloroplasts
Authors:
Bassam AlKindy,
Christophe Guyeux,
Jean-François Couchot,
Michel Salomon,
Jacques M. Bahi
Abstract:
In computational biology and bioinformatics, the manner to understand evolution processes within various related organisms paid a lot of attention these last decades. However, accurate methodologies are still needed to discover genes content evolution. In a previous work, two novel approaches based on sequence similarities and genes features have been proposed. More precisely, we proposed to use g…
▽ More
In computational biology and bioinformatics, the manner to understand evolution processes within various related organisms paid a lot of attention these last decades. However, accurate methodologies are still needed to discover genes content evolution. In a previous work, two novel approaches based on sequence similarities and genes features have been proposed. More precisely, we proposed to use genes names, sequence similarities, or both, insured either from NCBI or from DOGMA annotation tools. Dogma has the advantage to be an up-to-date accurate automatic tool specifically designed for chloroplasts, whereas NCBI possesses high quality human curated genes (together with wrongly annotated ones). The key idea of the former proposal was to take the best from these two tools. However, the first proposal was limited by name variations and spelling errors on the NCBI side, leading to core trees of low quality. In this paper, these flaws are fixed by improving the comparison of NCBI and DOGMA results, and by relaxing constraints on gene names while adding a stage of post-validation on gene sequences. The two stages of similarity measures, on names and sequences, are thus proposed for sequence clustering. This improves results that can be obtained using either NCBI or DOGMA alone. Results obtained with this quality control test are further investigated and compared with previously released ones, on both computational and biological aspects, considering a set of 99 chloroplastic genomes.
△ Less
Submitted 17 December, 2014;
originally announced December 2014.
-
Finding the Core-Genes of Chloroplasts
Authors:
Bassam AlKindy,
Jean-François Couchot,
Christophe Guyeux,
Arnaud Mouly,
Michel Salomon,
Jacques M. Bahi
Abstract:
Due to the recent evolution of sequencing techniques, the number of available genomes is rising steadily, leading to the possibility to make large scale genomic comparison between sets of close species. An interesting question to answer is: what is the common functionality genes of a collection of species, or conversely, to determine what is specific to a given species when compared to other ones…
▽ More
Due to the recent evolution of sequencing techniques, the number of available genomes is rising steadily, leading to the possibility to make large scale genomic comparison between sets of close species. An interesting question to answer is: what is the common functionality genes of a collection of species, or conversely, to determine what is specific to a given species when compared to other ones belonging in the same genus, family, etc. Investigating such problem means to find both core and pan genomes of a collection of species, \textit{i.e.}, genes in common to all the species vs. the set of all genes in all species under consideration. However, obtaining trustworthy core and pan genomes is not an easy task, leading to a large amount of computation, and requiring a rigorous methodology. Surprisingly, as far as we know, this methodology in finding core and pan genomes has not really been deeply investigated. This research work tries to fill this gap by focusing only on chloroplastic genomes, whose reasonable sizes allow a deep study. To achieve this goal, a collection of 99 chloroplasts are considered in this article. Two methodologies have been investigated, respectively based on sequence similarities and genes names taken from annotation tools. The obtained results will finally be evaluated in terms of biological relevance.
△ Less
Submitted 22 September, 2014;
originally announced September 2014.
-
Application of Steganography for Anonymity through the Internet
Authors:
Jacques M. Bahi,
Jean-François Couchot,
Nicolas Friot,
Christophe Guyeux
Abstract:
In this paper, a novel steganographic scheme based on chaotic iterations is proposed. This research work takes place into the information hiding security framework. The applications for anonymity and privacy through the Internet are regarded too. To guarantee such an anonymity, it should be possible to set up a secret communication channel into a web page, being both secure and robust. To achieve…
▽ More
In this paper, a novel steganographic scheme based on chaotic iterations is proposed. This research work takes place into the information hiding security framework. The applications for anonymity and privacy through the Internet are regarded too. To guarantee such an anonymity, it should be possible to set up a secret communication channel into a web page, being both secure and robust. To achieve this goal, we propose an information hiding scheme being stego-secure, which is the highest level of security in a well defined and studied category of attacks called "watermark-only attack". This category of attacks is the best context to study steganography-based anonymity through the Internet. The steganalysis of our steganographic process is also studied in order to show it security in a real test framework.
△ Less
Submitted 23 February, 2012;
originally announced February 2012.
-
On the Link Between Strongly Connected Iteration Graphs and Chaotic Boolean Discrete-Time Dynamical Systems
Authors:
J. M. Bahi,
J. -F. Couchot,
C. Guyeux,
A. Richard
Abstract:
Chaotic functions are characterized by sensitivity to initial conditions, transitivity, and regularity. Providing new functions with such properties is a real challenge. This work shows that one can associate with any Boolean network a continuous function, whose discrete-time iterations are chaotic if and only if the iteration graph of the Boolean network is strongly connected. Then, sufficient co…
▽ More
Chaotic functions are characterized by sensitivity to initial conditions, transitivity, and regularity. Providing new functions with such properties is a real challenge. This work shows that one can associate with any Boolean network a continuous function, whose discrete-time iterations are chaotic if and only if the iteration graph of the Boolean network is strongly connected. Then, sufficient conditions for this strong connectivity are expressed on the interaction graph of this network, leading to a constructive method of chaotic function computation. The whole approach is evaluated in the chaos-based pseudo-random number generation context.
△ Less
Submitted 7 December, 2011;
originally announced December 2011.
-
Steganography: a Class of Algorithms having Secure Properties
Authors:
Jacques M. Bahi,
Jean-François Couchot,
Christophe Guyeux
Abstract:
Chaos-based approaches are frequently proposed in information hiding, but without obvious justification. Indeed, the reason why chaos is useful to tackle with discretion, robustness, or security, is rarely elucidated. This research work presents a new class of non-blind information hidingalgorithms based on some finite domains iterations that are Devaney's topologically chaotic. The approach is en…
▽ More
Chaos-based approaches are frequently proposed in information hiding, but without obvious justification. Indeed, the reason why chaos is useful to tackle with discretion, robustness, or security, is rarely elucidated. This research work presents a new class of non-blind information hidingalgorithms based on some finite domains iterations that are Devaney's topologically chaotic. The approach is entirely formalized and reasons to take place into the mathematical theory of chaos are explained. Finally, stego-security and chaos security are consequently proven for a large class of algorithms.
△ Less
Submitted 7 December, 2011;
originally announced December 2011.
-
Performance Analysis of a Keyed Hash Function based on Discrete and Chaotic Proven Iterations
Authors:
Jacques M. Bahi,
Jean-François Couchot,
Christophe Guyeux
Abstract:
Security of information transmitted through the Internet is an international concern. This security is guaranteed by tools like hash functions. However, as security flaws have been recently identified in the current standard in this domain, new ways to hash digital media must be investigated. In this document an original keyed hash function is evaluated. It is based on chaotic iterations and thus…
▽ More
Security of information transmitted through the Internet is an international concern. This security is guaranteed by tools like hash functions. However, as security flaws have been recently identified in the current standard in this domain, new ways to hash digital media must be investigated. In this document an original keyed hash function is evaluated. It is based on chaotic iterations and thus possesses various topological properties as uniform repartition and sensibility to its initial condition. These properties make our hash function satisfy the requirements in this field. This claim is verified qualitatively and experimentally in this research work, among other things by realizing simulations of diffusion and confusion.
△ Less
Submitted 6 December, 2011;
originally announced December 2011.
-
Steganography: a class of secure and robust algorithms
Authors:
Jacques M. Bahi,
Jean-François Couchot,
Christophe Guyeux
Abstract:
This research work presents a new class of non-blind information hiding algorithms that are stego-secure and robust. They are based on some finite domains iterations having the Devaney's topological chaos property. Thanks to a complete formalization of the approach we prove security against watermark-only attacks of a large class of steganographic algorithms. Finally a complete study of robustness…
▽ More
This research work presents a new class of non-blind information hiding algorithms that are stego-secure and robust. They are based on some finite domains iterations having the Devaney's topological chaos property. Thanks to a complete formalization of the approach we prove security against watermark-only attacks of a large class of steganographic algorithms. Finally a complete study of robustness is given in frequency DWT and DCT domains.
△ Less
Submitted 6 December, 2011;
originally announced December 2011.
-
Class of Trustworthy Pseudo-Random Number Generators
Authors:
Jacques M. Bahi,
Jean-François Couchot,
Christophe Guyeux,
Qianxue Wang
Abstract:
With the widespread use of communication technologies, cryptosystems are therefore critical to guarantee security over open networks as the Internet. Pseudo-random number generators (PRNGs) are fundamental in cryptosystems and information hiding schemes. One of the existing chaos-based PRNGs is using chaotic iterations schemes. In prior literature, the iterate function is just the vectorial boolea…
▽ More
With the widespread use of communication technologies, cryptosystems are therefore critical to guarantee security over open networks as the Internet. Pseudo-random number generators (PRNGs) are fundamental in cryptosystems and information hiding schemes. One of the existing chaos-based PRNGs is using chaotic iterations schemes. In prior literature, the iterate function is just the vectorial boolean negation. In this paper, we propose a method using Graph with strongly connected components as a selection criterion for chaotic iterate function. In order to face the challenge of using the proposed chaotic iterate functions in PRNG, these PRNGs are subjected to a statistical battery of tests, which is the well-known NIST in the area of cryptography.
△ Less
Submitted 5 December, 2011;
originally announced December 2011.
-
Graph Based Reduction of Program Verification Conditions
Authors:
Jean-François Couchot,
Alain Giorgetti,
Nicolas Stouls
Abstract:
Increasing the automaticity of proofs in deductive verification of C programs is a challenging task. When applied to industrial C programs known heuristics to generate simpler verification conditions are not efficient enough. This is mainly due to their size and a high number of irrelevant hypotheses. This work presents a strategy to reduce program verification conditions by selecting their rele…
▽ More
Increasing the automaticity of proofs in deductive verification of C programs is a challenging task. When applied to industrial C programs known heuristics to generate simpler verification conditions are not efficient enough. This is mainly due to their size and a high number of irrelevant hypotheses. This work presents a strategy to reduce program verification conditions by selecting their relevant hypotheses. The relevance of a hypothesis is determined by the combination of a syntactic analysis and two graph traversals. The first graph is labeled by constants and the second one by the predicates in the axioms. The approach is applied on a benchmark arising in industrial program verification.
△ Less
Submitted 8 July, 2009;
originally announced July 2009.