-
Sequence-aware multimodal page classification of Brazilian legal documents
Authors:
Pedro H. Luz de Araujo,
Ana Paula G. S. de Almeida,
Fabricio A. Braz,
Nilton C. da Silva,
Flavio de Barros Vidal,
Teofilo E. de Campos
Abstract:
The Brazilian Supreme Court receives tens of thousands of cases each semester. Court employees spend thousands of hours to execute the initial analysis and classification of those cases -- which takes effort away from posterior, more complex stages of the case management workflow. In this paper, we explore multimodal classification of documents from Brazil's Supreme Court. We train and evaluate ou…
▽ More
The Brazilian Supreme Court receives tens of thousands of cases each semester. Court employees spend thousands of hours to execute the initial analysis and classification of those cases -- which takes effort away from posterior, more complex stages of the case management workflow. In this paper, we explore multimodal classification of documents from Brazil's Supreme Court. We train and evaluate our methods on a novel multimodal dataset of 6,510 lawsuits (339,478 pages) with manual annotation assigning each page to one of six classes. Each lawsuit is an ordered sequence of pages, which are stored both as an image and as a corresponding text extracted through optical character recognition. We first train two unimodal classifiers: a ResNet pre-trained on ImageNet is fine-tuned on the images, and a convolutional network with filters of multiple kernel sizes is trained from scratch on document texts. We use them as extractors of visual and textual features, which are then combined through our proposed Fusion Module. Our Fusion Module can handle missing textual or visual input by using learned embeddings for missing data. Moreover, we experiment with bi-directional Long Short-Term Memory (biLSTM) networks and linear-chain conditional random fields to model the sequential nature of the pages. The multimodal approaches outperform both textual and visual classifiers, especially when leveraging the sequential nature of the pages.
△ Less
Submitted 15 July, 2022; v1 submitted 2 July, 2022;
originally announced July 2022.
-
Towards robustness under occlusion for face recognition
Authors:
Tomas M. Borges,
Teofilo E. de Campos,
Ricardo de Queiroz
Abstract:
In this paper, we evaluate the effects of occlusions in the performance of a face recognition pipeline that uses a ResNet backbone. The classifier was trained on a subset of the CelebA-HQ dataset containing 5,478 images from 307 classes, to achieve top-1 error rate of 17.91%. We designed 8 different occlusion masks which were applied to the input images. This caused a significant drop in the class…
▽ More
In this paper, we evaluate the effects of occlusions in the performance of a face recognition pipeline that uses a ResNet backbone. The classifier was trained on a subset of the CelebA-HQ dataset containing 5,478 images from 307 classes, to achieve top-1 error rate of 17.91%. We designed 8 different occlusion masks which were applied to the input images. This caused a significant drop in the classifier performance: its error rate for each mask became at least two times worse than before. In order to increase robustness under occlusions, we followed two approaches. The first is image inpainting using the pre-trained pluralistic image completion network. The second is Cutmix, a regularization strategy consisting of mixing training images and their labels using rectangular patches, making the classifier more robust against input corruptions. Both strategies revealed effective and interesting results were observed. In particular, the Cutmix approach makes the network more robust without requiring additional steps at the application time, though its training time is considerably longer. Our datasets containing the different occlusion masks as well as their inpainted counterparts are made publicly available to promote research on the field.
△ Less
Submitted 19 September, 2021;
originally announced September 2021.
-
Domain adaptation for person re-identification on new unlabeled data using AlignedReID++
Authors:
Tiago de C. G. Pereira,
Teofilo E. de Campos
Abstract:
In the world where big data reigns and there is plenty of hardware prepared to gather a huge amount of non structured data, data acquisition is no longer a problem. Surveillance cameras are ubiquitous and they capture huge numbers of people walking across different scenes. However, extracting value from this data is challenging, specially for tasks that involve human images, such as face recogniti…
▽ More
In the world where big data reigns and there is plenty of hardware prepared to gather a huge amount of non structured data, data acquisition is no longer a problem. Surveillance cameras are ubiquitous and they capture huge numbers of people walking across different scenes. However, extracting value from this data is challenging, specially for tasks that involve human images, such as face recognition and person re-identification. Annotation of this kind of data is a challenging and expensive task. In this work we propose a domain adaptation workflow to allow CNNs that were trained in one domain to be applied to another domain without the need for new annotation of the target data. Our method uses AlignedReID++ as the baseline, trained using a Triplet loss with batch hard. Domain adaptation is done by using pseudo-labels generated using an unsupervised learning strategy. Our results show that domain adaptation techniques really improve the performance of the CNN when applied in the target domain.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
Learn by Guessing: Multi-Step Pseudo-Label Refinement for Person Re-Identification
Authors:
Tiago de C. G. Pereira,
Teofilo E. de Campos
Abstract:
Unsupervised Domain Adaptation (UDA) methods for person Re-Identification (Re-ID) rely on target domain samples to model the marginal distribution of the data. To deal with the lack of target domain labels, UDA methods leverage information from labeled source samples and unlabeled target samples. A promising approach relies on the use of unsupervised learning as part of the pipeline, such as clust…
▽ More
Unsupervised Domain Adaptation (UDA) methods for person Re-Identification (Re-ID) rely on target domain samples to model the marginal distribution of the data. To deal with the lack of target domain labels, UDA methods leverage information from labeled source samples and unlabeled target samples. A promising approach relies on the use of unsupervised learning as part of the pipeline, such as clustering methods. The quality of the clusters clearly plays a major role in methods performance, but this point has been overlooked. In this work, we propose a multi-step pseudo-label refinement method to select the best possible clusters and keep improving them so that these clusters become closer to the class divisions without knowledge of the class labels. Our refinement method includes a cluster selection strategy and a camera-based normalization method which reduces the within-domain variations caused by the use of multiple cameras in person Re-ID. This allows our method to reach state-of-the-art UDA results on DukeMTMC-Market1501 (source-target). We surpass state-of-the-art for UDA Re-ID by 3.4% on Market1501-DukeMTMC datasets, which is a more challenging adaptation setup because the target domain (DukeMTMC) has eight distinct cameras. Furthermore, the camera-based normalization method causes a significant reduction in the number of iterations required for training convergence.
△ Less
Submitted 4 January, 2021;
originally announced January 2021.
-
EdgeNet: Semantic Scene Completion from a Single RGB-D Image
Authors:
Aloisio Dourado,
Teofilo Emidio de Campos,
Hansung Kim,
Adrian Hilton
Abstract:
Semantic scene completion is the task of predicting a complete 3D representation of volumetric occupancy with corresponding semantic labels for a scene from a single point of view. Previous works on Semantic Scene Completion from RGB-D data used either only depth or depth with colour by projecting the 2D image into the 3D volume resulting in a sparse data representation. In this work, we present a…
▽ More
Semantic scene completion is the task of predicting a complete 3D representation of volumetric occupancy with corresponding semantic labels for a scene from a single point of view. Previous works on Semantic Scene Completion from RGB-D data used either only depth or depth with colour by projecting the 2D image into the 3D volume resulting in a sparse data representation. In this work, we present a new strategy to encode colour information in 3D space using edge detection and flipped truncated signed distance. We also present EdgeNet, a new end-to-end neural network architecture capable of handling features generated from the fusion of depth and edge information. Experimental results show improvement of 6.9% over the state-of-the-art result on real data, for end-to-end approaches.
△ Less
Submitted 6 September, 2020; v1 submitted 7 August, 2019;
originally announced August 2019.
-
Domain adaptation for holistic skin detection
Authors:
Aloisio Dourado,
Frederico Guth,
Teofilo Emidio de Campos,
Li Weigang
Abstract:
Human skin detection in images is a widely studied topic of Computer Vision for which it is commonly accepted that analysis of pixel color or local patches may suffice. This is because skin regions appear to be relatively uniform and many argue that there is a small chromatic variation among different samples. However, we found that there are strong biases in the datasets commonly used to train or…
▽ More
Human skin detection in images is a widely studied topic of Computer Vision for which it is commonly accepted that analysis of pixel color or local patches may suffice. This is because skin regions appear to be relatively uniform and many argue that there is a small chromatic variation among different samples. However, we found that there are strong biases in the datasets commonly used to train or tune skin detection methods. Furthermore, the lack of contextual information may hinder the performance of local approaches. In this paper we present a comprehensive evaluation of holistic and local Convolutional Neural Network (CNN) approaches on in-domain and cross-domain experiments and compare with state-of-the-art pixel-based approaches. We also propose a combination of inductive transfer learning and unsupervised domain adaptation methods, which are evaluated on different domains under several amounts of labelled data availability. We show a clear superiority of CNN over pixel-based approaches even without labelled training samples on the target domain. Furthermore, we provide experimental support for the counter-intuitive superiority of holistic over local approaches for human skin detection.
△ Less
Submitted 28 March, 2020; v1 submitted 16 March, 2019;
originally announced March 2019.
-
Hand range of motion evaluation for Rheumatoid Arthritis patients
Authors:
Luciano Walenty Xavier Cejnog,
Roberto Marcondes Cesar Jr.,
Teofilo Emidio de Campos,
Valeria Meirelles Carril Elui
Abstract:
We introduce a framework for dynamic evaluation of the fingers movements: flexion, extension, abduction and adduction. This framework estimates angle measurements from joints computed by a hand pose estimation algorithm using a depth sensor (Realsense SR300). Given depth maps as input, our framework uses Pose-REN, which is a state-of-art hand pose estimation method that estimates 3D hand joint pos…
▽ More
We introduce a framework for dynamic evaluation of the fingers movements: flexion, extension, abduction and adduction. This framework estimates angle measurements from joints computed by a hand pose estimation algorithm using a depth sensor (Realsense SR300). Given depth maps as input, our framework uses Pose-REN, which is a state-of-art hand pose estimation method that estimates 3D hand joint positions using a deep convolutional neural network. The pose estimation algorithm runs in real-time, allowing users to visualise 3D skeleton tracking results at the same time as the depth images are acquired. Once 3D joint poses are obtained, our framework estimates a plane containing the wrist and MCP joints and measures flexion/extension and abduction/aduction angles by applying computational geometry operations with respect to this plane. We analysed flexion and abduction movement patterns using real data, extracting the movement trajectories. Our preliminary results show that this method allows an automatic discrimination of hands with Rheumatoid Arthritis (RA) and healthy patients. The angle between joints can be used as an indicative of current movement capabilities and function. Although the measurements can be noisy and less accurate than those obtained statically through goniometry, the acquisition is much easier, non-invasive and patient-friendly, which shows the potential of our approach. The system can be used with and without orthosis. Our framework allows the acquisition of measurements with minimal intervention and significantly reduces the evaluation time.
△ Less
Submitted 16 March, 2019;
originally announced March 2019.
-
Document classification using a Bi-LSTM to unclog Brazil's supreme court
Authors:
Fabricio Ataides Braz,
Nilton Correia da Silva,
Teofilo Emidio de Campos,
Felipe Borges S. Chaves,
Marcelo H. S. Ferreira,
Pedro Henrique Inazawa,
Victor H. D. Coelho,
Bernardo Pablo Sukiennik,
Ana Paula Goncalves Soares de Almeida,
Flavio Barros Vidal,
Davi Alves Bezerra,
Davi B. Gusmao,
Gabriel G. Ziegler,
Ricardo V. C. Fernandes,
Roberta Zumblick,
Fabiano Hartmann Peixoto
Abstract:
The Brazilian court system is currently the most clogged up judiciary system in the world. Thousands of lawsuit cases reach the supreme court every day. These cases need to be analyzed in order to be associated to relevant tags and allocated to the right team. Most of the cases reach the court as raster scanned documents with widely variable levels of quality. One of the first steps for the analys…
▽ More
The Brazilian court system is currently the most clogged up judiciary system in the world. Thousands of lawsuit cases reach the supreme court every day. These cases need to be analyzed in order to be associated to relevant tags and allocated to the right team. Most of the cases reach the court as raster scanned documents with widely variable levels of quality. One of the first steps for the analysis is to classify these documents. In this paper we present a Bidirectional Long Short-Term Memory network (Bi-LSTM) to classify these pieces of legal document.
△ Less
Submitted 27 November, 2018;
originally announced November 2018.
-
Semantic Scene Completion Combining Colour and Depth: preliminary experiments
Authors:
Andre Bernardes Soares Guedes,
Teofilo Emidio de Campos,
Adrian Hilton
Abstract:
Semantic scene completion is the task of producing a complete 3D voxel representation of volumetric occupancy with semantic labels for a scene from a single-view observation. We built upon the recent work of Song et al. (CVPR 2017), who proposed SSCnet, a method that performs scene completion and semantic labelling in a single end-to-end 3D convolutional network. SSCnet uses only depth maps as inp…
▽ More
Semantic scene completion is the task of producing a complete 3D voxel representation of volumetric occupancy with semantic labels for a scene from a single-view observation. We built upon the recent work of Song et al. (CVPR 2017), who proposed SSCnet, a method that performs scene completion and semantic labelling in a single end-to-end 3D convolutional network. SSCnet uses only depth maps as input, even though depth maps are usually obtained from devices that also capture colour information, such as RGBD sensors and stereo cameras. In this work, we investigate the potential of the RGB colour channels to improve SSCnet.
△ Less
Submitted 13 February, 2018;
originally announced February 2018.
-
Assessment of algorithms for mitosis detection in breast cancer histopathology images
Authors:
Mitko Veta,
Paul J. van Diest,
Stefan M. Willems,
Haibo Wang,
Anant Madabhushi,
Angel Cruz-Roa,
Fabio Gonzalez,
Anders B. L. Larsen,
Jacob S. Vestergaard,
Anders B. Dahl,
Dan C. Cireşan,
Jürgen Schmidhuber,
Alessandro Giusti,
Luca M. Gambardella,
F. Boray Tek,
Thomas Walter,
Ching-Wei Wang,
Satoshi Kondo,
Bogdan J. Matuszewski,
Frederic Precioso,
Violet Snell,
Josef Kittler,
Teofilo E. de Campos,
Adnan M. Khan,
Nasir M. Rajpoot
, et al. (4 additional authors not shown)
Abstract:
The proliferative activity of breast tumors, which is routinely estimated by counting of mitotic figures in hematoxylin and eosin stained histology sections, is considered to be one of the most important prognostic markers. However, mitosis counting is laborious, subjective and may suffer from low inter-observer agreement. With the wider acceptance of whole slide images in pathology labs, automati…
▽ More
The proliferative activity of breast tumors, which is routinely estimated by counting of mitotic figures in hematoxylin and eosin stained histology sections, is considered to be one of the most important prognostic markers. However, mitosis counting is laborious, subjective and may suffer from low inter-observer agreement. With the wider acceptance of whole slide images in pathology labs, automatic image analysis has been proposed as a potential solution for these issues. In this paper, the results from the Assessment of Mitosis Detection Algorithms 2013 (AMIDA13) challenge are described. The challenge was based on a data set consisting of 12 training and 11 testing subjects, with more than one thousand annotated mitotic figures by multiple observers. Short descriptions and results from the evaluation of eleven methods are presented. The top performing method has an error rate that is comparable to the inter-observer agreement among pathologists.
△ Less
Submitted 21 November, 2014;
originally announced November 2014.