-
Self-supervised Deep Reconstruction of Mixed Strip-shredded Text Documents
Authors:
Thiago M. Paixão,
Rodrigo F. Berriel,
Maria C. S. Boeres,
Alessandro L. Koerich,
Claudine Badue,
Alberto F. de Souza,
Thiago Oliveira-Santos
Abstract:
The reconstruction of shredded documents consists of coherently arranging fragments of paper (shreds) to recover the original document(s). A great challenge in computational reconstruction is to properly evaluate the compatibility between the shreds. While traditional pixel-based approaches are not robust to real shredding, more sophisticated solutions compromise significantly time performance. Th…
▽ More
The reconstruction of shredded documents consists of coherently arranging fragments of paper (shreds) to recover the original document(s). A great challenge in computational reconstruction is to properly evaluate the compatibility between the shreds. While traditional pixel-based approaches are not robust to real shredding, more sophisticated solutions compromise significantly time performance. The solution presented in this work extends our previous deep learning method for single-page reconstruction to a more realistic/complex scenario: the reconstruction of several mixed shredded documents at once. In our approach, the compatibility evaluation is modeled as a two-class (valid or invalid) pattern recognition problem. The model is trained in a self-supervised manner on samples extracted from simulated-shredded documents, which obviates manual annotation. Experimental results on three datasets -- including a new collection of 100 strip-shredded documents produced for this work -- have shown that the proposed method outperforms the competing ones on complex scenarios, achieving accuracy superior to 90%.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.
-
Fast(er) Reconstruction of Shredded Text Documents via Self-Supervised Deep Asymmetric Metric Learning
Authors:
Thiago M. Paixão,
Rodrigo F. Berriel,
Maria C. S. Boeres,
Alessando L. Koerich,
Claudine Badue,
Alberto F. De Souza,
Thiago Oliveira-Santos
Abstract:
The reconstruction of shredded documents consists in arranging the pieces of paper (shreds) in order to reassemble the original aspect of such documents. This task is particularly relevant for supporting forensic investigation as documents may contain criminal evidence. As an alternative to the laborious and time-consuming manual process, several researchers have been investigating ways to perform…
▽ More
The reconstruction of shredded documents consists in arranging the pieces of paper (shreds) in order to reassemble the original aspect of such documents. This task is particularly relevant for supporting forensic investigation as documents may contain criminal evidence. As an alternative to the laborious and time-consuming manual process, several researchers have been investigating ways to perform automatic digital reconstruction. A central problem in automatic reconstruction of shredded documents is the pairwise compatibility evaluation of the shreds, notably for binary text documents. In this context, deep learning has enabled great progress for accurate reconstructions in the domain of mechanically-shredded documents. A sensitive issue, however, is that current deep model solutions require an inference whenever a pair of shreds has to be evaluated. This work proposes a scalable deep learning approach for measuring pairwise compatibility in which the number of inferences scales linearly (rather than quadratically) with the number of shreds. Instead of predicting compatibility directly, deep models are leveraged to asymmetrically project the raw shred content onto a common metric space in which distance is proportional to the compatibility. Experimental results show that our method has accuracy comparable to the state-of-the-art with a speed-up of about 22 times for a test instance with 505 shreds (20 mixed shredded-pages from different documents).
△ Less
Submitted 28 April, 2020; v1 submitted 22 March, 2020;
originally announced March 2020.
-
Comparing two deep learning sequence-based models for protein-protein interaction prediction
Authors:
Florian Richoux,
Charlène Servantie,
Cynthia Borès,
Stéphane Téletchéa
Abstract:
Biological data are extremely diverse, complex but also quite sparse. The recent developments in deep learning methods are offering new possibilities for the analysis of complex data. However, it is easy to be get a deep learning model that seems to have good results but is in fact either overfitting the training data or the validation data. In particular, the fact to overfit the validation data,…
▽ More
Biological data are extremely diverse, complex but also quite sparse. The recent developments in deep learning methods are offering new possibilities for the analysis of complex data. However, it is easy to be get a deep learning model that seems to have good results but is in fact either overfitting the training data or the validation data. In particular, the fact to overfit the validation data, called "information leak", is almost never treated in papers proposing deep learning models to predict protein-protein interactions (PPI). In this work, we compare two carefully designed deep learning models and show pitfalls to avoid while predicting PPIs through machine learning methods. Our best model predicts accurately more than 78% of human PPI, in very strict conditions both for training and testing. The methodology we propose here allow us to have strong confidences about the ability of a model to scale up on larger datasets. This would allow sharper models when larger datasets would be available, rather than current models prone to information leaks. Our solid methodological foundations shall be applicable to more organisms and whole proteome networks predictions.
△ Less
Submitted 14 January, 2019;
originally announced January 2019.
-
Pattern formation in binary fluid mixtures induced by short-range competing interactions
Authors:
C. Bores,
E. Lomba,
A. Perera,
N. G. Almarza
Abstract:
Molecular dynamics simulations and integral equation calculations of a simple equimolar mixture of diatomic molecules and monomers interacting via attractive and repulsive short-range potentials show the existence of pattern formation (microheterogeneity), mostly due to depletion forces away from the demixing region. Effective site-site potentials extracted from the pair correlation functions usin…
▽ More
Molecular dynamics simulations and integral equation calculations of a simple equimolar mixture of diatomic molecules and monomers interacting via attractive and repulsive short-range potentials show the existence of pattern formation (microheterogeneity), mostly due to depletion forces away from the demixing region. Effective site-site potentials extracted from the pair correlation functions using an inverse Monte Carlo approach and an integral equation inversion procedure exhibit the features characteristic of a short-range attractive and long-range repulsive potential. When charges are incorporated into the model, this becomes a coarse grained representation of a room temperature ionic liquid, and as expected, intermediate range order becomes more pronounced and stable.
△ Less
Submitted 29 June, 2015;
originally announced June 2015.
-
Demixing and confinement in slit pores
Authors:
N. G. Almarza,
C. Martín,
E. Lomba,
C. Bores
Abstract:
Using Monte Carlo simulation, we study the influence of geometric confinement on demixing for a series of symmetric non-additive hard spheres mixtures confined in slit pores. We consider both a wide range of positive non-additivities and a series of pore widths, ranging from the pure two dimensional limit to a large pore width where results are close to the bulk three dimensional case. Critical pa…
▽ More
Using Monte Carlo simulation, we study the influence of geometric confinement on demixing for a series of symmetric non-additive hard spheres mixtures confined in slit pores. We consider both a wide range of positive non-additivities and a series of pore widths, ranging from the pure two dimensional limit to a large pore width where results are close to the bulk three dimensional case. Critical parameters are extracted by means of finite size analysis. We find that for this particular case in which demixing is induced by volume effects, phase separation is in most cases somewhat impeded by spatial confinement. However, a non-monotonous dependence of the critical pressure and density with pore size is found for small non-additivities. In this latter case, it turns out that an otherwise stable bulk mixture can be forced to demix by simple geometric confinement when the pore width decreases down to approximately one and a half molecular diameters.
△ Less
Submitted 17 October, 2014;
originally announced October 2014.
-
Explicit spatial description of fluid inclusions in porous matrices in terms of an inhomogeneous integral equation
Authors:
Enrique Lomba,
Cecilia Bores,
Gerhard Kahl
Abstract:
We study the fluid inclusion of both Lennard-Jones particles and particles with competing interaction ranges --short range attractive and long range repulsive (SALR)-- in a disordered porous medium constructed as a controlled pore glass in two dimensions. With the aid of a full two-dimensional Ornstein-Zernike approach, complemented by a Replica Ornstein-Zernike integral equation, we explicitly ob…
▽ More
We study the fluid inclusion of both Lennard-Jones particles and particles with competing interaction ranges --short range attractive and long range repulsive (SALR)-- in a disordered porous medium constructed as a controlled pore glass in two dimensions. With the aid of a full two-dimensional Ornstein-Zernike approach, complemented by a Replica Ornstein-Zernike integral equation, we explicitly obtain the spatial density distribution of the fluid adsorbed in the porous matrix and a good approximation for the average fluid-matrix correlations. The results illustrate the remarkable differences between the adsorbed Lennard-Jones (LJ) and SALR systems. In the latter instance, particles tend to aggregate in clusters which occupy pockets and bays in the porous structure, whereas the LJ fluid uniformly wets the porous walls. A comparison with Molecular Dynamics simulations shows that the two-dimensional Ornstein-Zernike approach with a Hypernetted Chain closure together with a sensible approximation for the fluid-fluid correlations can provide an accurate picture of the spatial distribution of adsorbed fluids for a given configuration of porous material.
△ Less
Submitted 10 September, 2014;
originally announced September 2014.
-
Improving Memory Hierarchy Utilisation for Stencil Computations on Multicore Machines
Authors:
Alexandre Sena,
Aline Nascimento,
Cristina Boeres,
Vinod E. F. Rebello,
André Bulcão
Abstract:
Although modern supercomputers are composed of multicore machines, one can find scientists that still execute their legacy applications which were developed to monocore cluster where memory hierarchy is dedicated to a sole core. The main objective of this paper is to propose and evaluate an algorithm that identify an efficient blocksize to be applied on MPI stencil computations on multicore machin…
▽ More
Although modern supercomputers are composed of multicore machines, one can find scientists that still execute their legacy applications which were developed to monocore cluster where memory hierarchy is dedicated to a sole core. The main objective of this paper is to propose and evaluate an algorithm that identify an efficient blocksize to be applied on MPI stencil computations on multicore machines. Under the light of an extensive experimental analysis, this work shows the benefits of identifying blocksizes that will dividing data on the various cores and suggest a methodology that explore the memory hierarchy available in modern machines.
△ Less
Submitted 30 October, 2013;
originally announced October 2013.
-
Memory Aware Load Balance Strategy on a Parallel Branch-and-Bound Application
Authors:
Juliana M. N. Silva,
Cristina Boeres,
Lúcia M. A. Drummond,
Artur A. Pessoa
Abstract:
The latest trends in high-performance computing systems show an increasing demand on the use of a large scale multicore systems in a efficient way, so that high compute-intensive applications can be executed reasonably well. However, the exploitation of the degree of parallelism available at each multicore component can be limited by the poor utilization of the memory hierarchy available. Actually…
▽ More
The latest trends in high-performance computing systems show an increasing demand on the use of a large scale multicore systems in a efficient way, so that high compute-intensive applications can be executed reasonably well. However, the exploitation of the degree of parallelism available at each multicore component can be limited by the poor utilization of the memory hierarchy available. Actually, the multicore architecture introduces some distinct features that are already observed in shared memory and distributed environments. One example is that subsets of cores can share different subsets of memory. In order to achieve high performance it is imperative that a careful allocation scheme of an application is carried out on the available cores, based on a scheduling model that considers the main performance bottlenecks, as for example, memory contention. In this paper, the {\em Multicore Cluster Model} (MCM) is proposed, which captures the most relevant performance characteristics in multicores systems such as the influence of memory hierarchy and contention. Better performance was achieved when a load balance strategy for a Branch-and-Bound application applied to the Partitioning Sets Problem is based on MCM, showing its efficiency and applicability to modern systems.
△ Less
Submitted 22 February, 2013;
originally announced February 2013.