Search | arXiv e-print repository

doi 10.1007/s10032-022-00406-7

Sequence-aware multimodal page classification of Brazilian legal documents

Authors: Pedro H. Luz de Araujo, Ana Paula G. S. de Almeida, Fabricio A. Braz, Nilton C. da Silva, Flavio de Barros Vidal, Teofilo E. de Campos

Abstract: The Brazilian Supreme Court receives tens of thousands of cases each semester. Court employees spend thousands of hours to execute the initial analysis and classification of those cases -- which takes effort away from posterior, more complex stages of the case management workflow. In this paper, we explore multimodal classification of documents from Brazil's Supreme Court. We train and evaluate ou… ▽ More The Brazilian Supreme Court receives tens of thousands of cases each semester. Court employees spend thousands of hours to execute the initial analysis and classification of those cases -- which takes effort away from posterior, more complex stages of the case management workflow. In this paper, we explore multimodal classification of documents from Brazil's Supreme Court. We train and evaluate our methods on a novel multimodal dataset of 6,510 lawsuits (339,478 pages) with manual annotation assigning each page to one of six classes. Each lawsuit is an ordered sequence of pages, which are stored both as an image and as a corresponding text extracted through optical character recognition. We first train two unimodal classifiers: a ResNet pre-trained on ImageNet is fine-tuned on the images, and a convolutional network with filters of multiple kernel sizes is trained from scratch on document texts. We use them as extractors of visual and textual features, which are then combined through our proposed Fusion Module. Our Fusion Module can handle missing textual or visual input by using learned embeddings for missing data. Moreover, we experiment with bi-directional Long Short-Term Memory (biLSTM) networks and linear-chain conditional random fields to model the sequential nature of the pages. The multimodal approaches outperform both textual and visual classifiers, especially when leveraging the sequential nature of the pages. △ Less

Submitted 15 July, 2022; v1 submitted 2 July, 2022; originally announced July 2022.

Comments: 11 pages, 6 figures. This preprint, which was originally written on 8 April 2021, has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this article is published in the International Journal on Document Analysis and Recognition, and is available online at https://doi.org/10.1007/s10032-022-00406-7 and https://rdcu.be/cRvvV

Journal ref: International Journal on Document Analysis and Recognition.2022

arXiv:2109.13885 [pdf, other]

Turning old models fashion again: Recycling classical CNN networks using the Lattice Transformation

Authors: Ana Paula G. S. de Almeida, Flavio de Barros Vidal

Abstract: In the early 1990s, the first signs of life of the CNN era were given: LeCun et al. proposed a CNN model trained by the backpropagation algorithm to classify low-resolution images of handwritten digits. Undoubtedly, it was a breakthrough in the field of computer vision. But with the rise of other classification methods, it fell out fashion. That was until 2012, when Krizhevsky et al. revived the i… ▽ More In the early 1990s, the first signs of life of the CNN era were given: LeCun et al. proposed a CNN model trained by the backpropagation algorithm to classify low-resolution images of handwritten digits. Undoubtedly, it was a breakthrough in the field of computer vision. But with the rise of other classification methods, it fell out fashion. That was until 2012, when Krizhevsky et al. revived the interest in CNNs by exhibiting considerably higher image classification accuracy on the ImageNet challenge. Since then, the complexity of the architectures are exponentially increasing and many structures are rapidly becoming obsolete. Using multistream networks as a base and the feature infusion precept, we explore the proposed LCNN cross-fusion strategy to use the backbones of former state-of-the-art networks on image classification in order to discover if the technique is able to put these designs back in the game. In this paper, we showed that we can obtain an increase of accuracy up to 63.21% on the NORB dataset we comparing with the original structure. However, no technique is definitive. While our goal is to try to reuse previous state-of-the-art architectures with few modifications, we also expose the disadvantages of our explored strategy. △ Less

Submitted 28 September, 2021; originally announced September 2021.

Comments: 21 pages, 13 figures

MSC Class: 65D19; 68T07 ACM Class: I.2; I.4

arXiv:2008.00157 [pdf, other]

doi 10.1049/el.2019.2631

L-CNN: A Lattice cross-fusion strategy for multistream convolutional neural networks

Authors: Ana Paula G. S. de Almeida, Flavio de Barros Vidal

Abstract: This paper proposes a fusion strategy for multistream convolutional networks, the Lattice Cross Fusion. This approach crosses signals from convolution layers performing mathematical operation-based fusions right before pooling layers. Results on a purposely worsened CIFAR-10, a popular image classification data set, with a modified AlexNet-LCNN version show that this novel method outperforms by 46… ▽ More This paper proposes a fusion strategy for multistream convolutional networks, the Lattice Cross Fusion. This approach crosses signals from convolution layers performing mathematical operation-based fusions right before pooling layers. Results on a purposely worsened CIFAR-10, a popular image classification data set, with a modified AlexNet-LCNN version show that this novel method outperforms by 46% the baseline single stream network, with faster convergence, stability, and robustness. △ Less

Submitted 31 July, 2020; originally announced August 2020.

Comments: 5 pages, 3 figures

MSC Class: 68T07 ACM Class: I.2.10

Journal ref: Electronics Letters, vol. 55, no. 22, pp. 1180-1182, 2029

arXiv:1811.11569 [pdf, other]

Document classification using a Bi-LSTM to unclog Brazil's supreme court

Authors: Fabricio Ataides Braz, Nilton Correia da Silva, Teofilo Emidio de Campos, Felipe Borges S. Chaves, Marcelo H. S. Ferreira, Pedro Henrique Inazawa, Victor H. D. Coelho, Bernardo Pablo Sukiennik, Ana Paula Goncalves Soares de Almeida, Flavio Barros Vidal, Davi Alves Bezerra, Davi B. Gusmao, Gabriel G. Ziegler, Ricardo V. C. Fernandes, Roberta Zumblick, Fabiano Hartmann Peixoto

Abstract: The Brazilian court system is currently the most clogged up judiciary system in the world. Thousands of lawsuit cases reach the supreme court every day. These cases need to be analyzed in order to be associated to relevant tags and allocated to the right team. Most of the cases reach the court as raster scanned documents with widely variable levels of quality. One of the first steps for the analys… ▽ More The Brazilian court system is currently the most clogged up judiciary system in the world. Thousands of lawsuit cases reach the supreme court every day. These cases need to be analyzed in order to be associated to relevant tags and allocated to the right team. Most of the cases reach the court as raster scanned documents with widely variable levels of quality. One of the first steps for the analysis is to classify these documents. In this paper we present a Bidirectional Long Short-Term Memory network (Bi-LSTM) to classify these pieces of legal document. △ Less

Submitted 27 November, 2018; originally announced November 2018.

Comments: This work was presented at NIPS 2018 Workshop on Machine Learning for the Develo** World (ML4D)

MSC Class: 68T50 ACM Class: I.2.7

Showing 1–4 of 4 results for author: de Almeida, A P G S