Search | arXiv e-print repository

doi 10.1007/978-3-031-49249-5_5

Leveraging Model Fusion for Improved License Plate Recognition

Authors: Rayson Laroca, Luiz A. Zanlorensi, Valter Estevam, Rodrigo Minetto, David Menotti

Abstract: License Plate Recognition (LPR) plays a critical role in various applications, such as toll collection, parking management, and traffic law enforcement. Although LPR has witnessed significant advancements through the development of deep learning, there has been a noticeable lack of studies exploring the potential improvements in results by fusing the outputs from multiple recognition models. This… ▽ More License Plate Recognition (LPR) plays a critical role in various applications, such as toll collection, parking management, and traffic law enforcement. Although LPR has witnessed significant advancements through the development of deep learning, there has been a noticeable lack of studies exploring the potential improvements in results by fusing the outputs from multiple recognition models. This research aims to fill this gap by investigating the combination of up to 12 different models using straightforward approaches, such as selecting the most confident prediction or employing majority vote-based strategies. Our experiments encompass a wide range of datasets, revealing substantial benefits of fusion approaches in both intra- and cross-dataset setups. Essentially, fusing multiple models reduces considerably the likelihood of obtaining subpar performance on a particular dataset/scenario. We also found that combining models based on their speed is an appealing approach. Specifically, for applications where the recognition task can tolerate some additional time, though not excessively, an effective strategy is to combine 4-6 models. These models may not be the most accurate individually, but their fusion strikes an optimal balance between speed and accuracy. △ Less

Submitted 5 December, 2023; v1 submitted 8 September, 2023; originally announced September 2023.

Comments: Accepted for presentation at the Iberoamerican Congress on Pattern Recognition (CIARP) 2023

arXiv:2305.17313 [pdf, other]

doi 10.1016/j.cag.2023.05.005

Super-Resolution of License Plate Images Using Attention Modules and Sub-Pixel Convolution Layers

Authors: Valfride Nascimento, Rayson Laroca, Jorge de A. Lambert, William Robson Schwartz, David Menotti

Abstract: Recent years have seen significant developments in the field of License Plate Recognition (LPR) through the integration of deep learning techniques and the increasing availability of training data. Nevertheless, reconstructing license plates (LPs) from low-resolution (LR) surveillance footage remains challenging. To address this issue, we introduce a Single-Image Super-Resolution (SISR) approach t… ▽ More Recent years have seen significant developments in the field of License Plate Recognition (LPR) through the integration of deep learning techniques and the increasing availability of training data. Nevertheless, reconstructing license plates (LPs) from low-resolution (LR) surveillance footage remains challenging. To address this issue, we introduce a Single-Image Super-Resolution (SISR) approach that integrates attention and transformer modules to enhance the detection of structural and textural features in LR images. Our approach incorporates sub-pixel convolution layers (also known as PixelShuffle) and a loss function that uses an Optical Character Recognition (OCR) model for feature extraction. We trained the proposed architecture on synthetic images created by applying heavy Gaussian noise to high-resolution LP images from two public datasets, followed by bicubic downsampling. As a result, the generated images have a Structural Similarity Index Measure (SSIM) of less than 0.10. Our results show that our approach for reconstructing these low-resolution synthesized images outperforms existing ones in both quantitative and qualitative measures. Our code is publicly available at https://github.com/valfride/lpr-rsr-ext/ △ Less

Submitted 26 May, 2023; originally announced May 2023.

Journal ref: Computers & Graphics, vol. 113, pp. 69-76, 2023

arXiv:2304.04653 [pdf, other]

doi 10.1109/IJCNN54540.2023.10191584

Do We Train on Test Data? The Impact of Near-Duplicates on License Plate Recognition

Authors: Rayson Laroca, Valter Estevam, Alceu S. Britto Jr., Rodrigo Minetto, David Menotti

Abstract: This work draws attention to the large fraction of near-duplicates in the training and test sets of datasets widely adopted in License Plate Recognition (LPR) research. These duplicates refer to images that, although different, show the same license plate. Our experiments, conducted on the two most popular datasets in the field, show a substantial decrease in recognition rate when six well-known m… ▽ More This work draws attention to the large fraction of near-duplicates in the training and test sets of datasets widely adopted in License Plate Recognition (LPR) research. These duplicates refer to images that, although different, show the same license plate. Our experiments, conducted on the two most popular datasets in the field, show a substantial decrease in recognition rate when six well-known models are trained and tested under fair splits, that is, in the absence of duplicates in the training and test sets. Moreover, in one of the datasets, the ranking of models changed considerably when they were trained and tested under duplicate-free splits. These findings suggest that such duplicates have significantly biased the evaluation and development of deep learning-based models for LPR. The list of near-duplicates we have found and proposals for fair splits are publicly available for further research at https://raysonlaroca.github.io/supp/lpr-train-on-test/ △ Less

Submitted 4 August, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: Accepted for presentation at the International Joint Conference on Neural Networks (IJCNN) 2023

arXiv:2303.05912 [pdf, other]

doi 10.1080/21681163.2023.2183807

DACov: A Deeper Analysis of Data Augmentation on the Computed Tomography Segmentation Problem

Authors: Bruno A. Krinski, Daniel V. Ruiz, Rayson Laroca, Eduardo Todt

Abstract: Due to the COVID-19 global pandemic, computer-assisted diagnoses of medical images have gained much attention, and robust methods of semantic segmentation of Computed Tomography (CT) images have become highly desirable. In this work, we present a deeper analysis of how data augmentation techniques improve segmentation performance on this problem. We evaluate 20 traditional augmentation techniques… ▽ More Due to the COVID-19 global pandemic, computer-assisted diagnoses of medical images have gained much attention, and robust methods of semantic segmentation of Computed Tomography (CT) images have become highly desirable. In this work, we present a deeper analysis of how data augmentation techniques improve segmentation performance on this problem. We evaluate 20 traditional augmentation techniques on five public datasets. Six different probabilities of applying each augmentation technique on an image were evaluated. We also assess a different training methodology where the training subsets are combined into a single larger set. All networks were evaluated through a 5-fold cross-validation strategy, resulting in over 4,600 experiments. We also propose a novel data augmentation technique based on Generative Adversarial Networks (GANs) to create new healthy and unhealthy lung CT images, evaluating four variations of our approach with the same six probabilities of the traditional methods. Our findings show that GAN-based techniques and spatial-level transformations are the most promising for improving the learning of deep models on this problem, with the StarGANv2 + F with a probability of 0.3 achieving the highest F-score value on the Ricord1a dataset in the unified training strategy. Our code is publicly available at https://github.com/VRI-UFPR/DACov2022 △ Less

Submitted 10 March, 2023; originally announced March 2023.

Journal ref: Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 2023

arXiv:2210.16836 [pdf, other]

doi 10.1109/SIBGRAPI55357.2022.9991753

Combining Attention Module and Pixel Shuffle for License Plate Super-Resolution

Authors: Valfride Nascimento, Rayson Laroca, Jorge de A. Lambert, William Robson Schwartz, David Menotti

Abstract: The License Plate Recognition (LPR) field has made impressive advances in the last decade due to novel deep learning approaches combined with the increased availability of training data. However, it still has some open issues, especially when the data come from low-resolution (LR) and low-quality images/videos, as in surveillance systems. This work focuses on license plate (LP) reconstruction in L… ▽ More The License Plate Recognition (LPR) field has made impressive advances in the last decade due to novel deep learning approaches combined with the increased availability of training data. However, it still has some open issues, especially when the data come from low-resolution (LR) and low-quality images/videos, as in surveillance systems. This work focuses on license plate (LP) reconstruction in LR and low-quality images. We present a Single-Image Super-Resolution (SISR) approach that extends the attention/transformer module concept by exploiting the capabilities of PixelShuffle layers and that has an improved loss function based on LPR predictions. For training the proposed architecture, we use synthetic images generated by applying heavy Gaussian noise in terms of Structural Similarity Index Measure (SSIM) to the original high-resolution (HR) images. In our experiments, the proposed method outperformed the baselines both quantitatively and qualitatively. The datasets we created for this work are publicly available to the research community at https://github.com/valfride/lpr-rsr/ △ Less

Submitted 30 October, 2022; originally announced October 2022.

Comments: Accepted for presentation at the Conference on Graphics, Patterns and Images (SIBGRAPI) 2022

arXiv:2209.12064 [pdf, other]

doi 10.1109/SIBGRAPI55357.2022.9991799

Face Super-Resolution Using Stochastic Differential Equations

Authors: Marcelo dos Santos, Rayson Laroca, Rafael O. Ribeiro, João Neves, Hugo Proença, David Menotti

Abstract: Diffusion models have proven effective for various applications such as images, audio and graph generation. Other important applications are image super-resolution and the solution of inverse problems. More recently, some works have used stochastic differential equations (SDEs) to generalize diffusion models to continuous time. In this work, we introduce SDEs to generate super-resolution face imag… ▽ More Diffusion models have proven effective for various applications such as images, audio and graph generation. Other important applications are image super-resolution and the solution of inverse problems. More recently, some works have used stochastic differential equations (SDEs) to generalize diffusion models to continuous time. In this work, we introduce SDEs to generate super-resolution face images. To the best of our knowledge, this is the first time SDEs have been used for such an application. The proposed method provides an improved peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and consistency than the existing super-resolution methods based on diffusion models. In particular, we also assess the potential application of this method for the face recognition task. A generic facial feature extractor is used to compare the super-resolution images with the ground truth and superior results were obtained compared with other methods. Our code is publicly available at https://github.com/marcelowds/sr-sde △ Less

Submitted 24 September, 2022; originally announced September 2022.

Comments: Accepted for presentation at the Conference on Graphics, Patterns and Images (SIBGRAPI) 2022

arXiv:2209.12061 [pdf, other]

doi 10.1109/LSP.2022.3200605

Global Semantic Descriptors for Zero-Shot Action Recognition

Authors: Valter Estevam, Rayson Laroca, Helio Pedrini, David Menotti

Abstract: The success of Zero-shot Action Recognition (ZSAR) methods is intrinsically related to the nature of semantic side information used to transfer knowledge, although this aspect has not been primarily investigated in the literature. This work introduces a new ZSAR method based on the relationships of actions-objects and actions-descriptive sentences. We demonstrate that representing all object class… ▽ More The success of Zero-shot Action Recognition (ZSAR) methods is intrinsically related to the nature of semantic side information used to transfer knowledge, although this aspect has not been primarily investigated in the literature. This work introduces a new ZSAR method based on the relationships of actions-objects and actions-descriptive sentences. We demonstrate that representing all object classes using descriptive sentences generates an accurate object-action affinity estimation when a paraphrase estimation method is used as an embedder. We also show how to estimate probabilities over the set of action classes based only on a set of sentences without hard human labeling. In our method, the probabilities from these two global classifiers (i.e., which use features computed over the entire video) are combined, producing an efficient transfer knowledge model for action classification. Our results are state-of-the-art in the Kinetics-400 dataset and are competitive on UCF-101 under the ZSAR evaluation. Our code is available at https://github.com/valterlej/objsentzsar △ Less

Submitted 24 September, 2022; originally announced September 2022.

Journal ref: IEEE Signal Processing Letters, vol. 29, pp. 1843-1847, 2022

arXiv:2208.10657 [pdf, other]

doi 10.1109/SIBGRAPI55357.2022.9991768

A First Look at Dataset Bias in License Plate Recognition

Authors: Rayson Laroca, Marcelo Santos, Valter Estevam, Eduardo Luz, David Menotti

Abstract: Public datasets have played a key role in advancing the state of the art in License Plate Recognition (LPR). Although dataset bias has been recognized as a severe problem in the computer vision community, it has been largely overlooked in the LPR literature. LPR models are usually trained and evaluated separately on each dataset. In this scenario, they have often proven robust in the dataset they… ▽ More Public datasets have played a key role in advancing the state of the art in License Plate Recognition (LPR). Although dataset bias has been recognized as a severe problem in the computer vision community, it has been largely overlooked in the LPR literature. LPR models are usually trained and evaluated separately on each dataset. In this scenario, they have often proven robust in the dataset they were trained in but showed limited performance in unseen ones. Therefore, this work investigates the dataset bias problem in the LPR context. We performed experiments on eight datasets, four collected in Brazil and four in mainland China, and observed that each dataset has a unique, identifiable "signature" since a lightweight classification model predicts the source dataset of a license plate (LP) image with more than 95% accuracy. In our discussion, we draw attention to the fact that most LPR models are probably exploiting such signatures to improve the results achieved in each dataset at the cost of losing generalization capability. These results emphasize the importance of evaluating LPR models in cross-dataset setups, as they provide a better indication of generalization (hence real-world performance) than within-dataset ones. △ Less

Submitted 30 December, 2022; v1 submitted 22 August, 2022; originally announced August 2022.

Comments: Accepted for presentation at the Conference on Graphics, Patterns and Images (SIBGRAPI) 2022

arXiv:2201.02850 [pdf, other]

doi 10.1016/j.measurement.2022.112025

Image-based Automatic Dial Meter Reading in Unconstrained Scenarios

Authors: Gabriel Salomon, Rayson Laroca, David Menotti

Abstract: The replacement of analog meters with smart meters is costly, laborious, and far from complete in develo** countries. The Energy Company of Parana (Copel) (Brazil) performs more than 4 million meter readings (almost entirely of non-smart devices) per month, and we estimate that 850 thousand of them are from dial meters. Therefore, an image-based automatic reading system can reduce human errors,… ▽ More The replacement of analog meters with smart meters is costly, laborious, and far from complete in develo** countries. The Energy Company of Parana (Copel) (Brazil) performs more than 4 million meter readings (almost entirely of non-smart devices) per month, and we estimate that 850 thousand of them are from dial meters. Therefore, an image-based automatic reading system can reduce human errors, create a proof of reading, and enable the customers to perform the reading themselves through a mobile application. We propose novel approaches for Automatic Dial Meter Reading (ADMR) and introduce a new dataset for ADMR in unconstrained scenarios, called UFPR-ADMR-v2. Our best-performing method combines YOLOv4 with a novel regression approach (AngReg), and explores several postprocessing techniques. Compared to previous works, it decreased the Mean Absolute Error (MAE) from 1,343 to 129 and achieved a meter recognition rate (MRR) of 98.90% -- with an error tolerance of 1 Kilowatt-hour (kWh). △ Less

Submitted 23 October, 2022; v1 submitted 8 January, 2022; originally announced January 2022.

Journal ref: Measurement, vol. 204, p. 112025, 2022

arXiv:2201.00267 [pdf, other]

doi 10.5220/0010846800003124

On the Cross-dataset Generalization in License Plate Recognition

Authors: Rayson Laroca, Everton V. Cardoso, Diego R. Lucio, Valter Estevam, David Menotti

Abstract: Automatic License Plate Recognition (ALPR) systems have shown remarkable performance on license plates (LPs) from multiple regions due to advances in deep learning and the increasing availability of datasets. The evaluation of deep ALPR systems is usually done within each dataset; therefore, it is questionable if such results are a reliable indicator of generalization ability. In this paper, we pr… ▽ More Automatic License Plate Recognition (ALPR) systems have shown remarkable performance on license plates (LPs) from multiple regions due to advances in deep learning and the increasing availability of datasets. The evaluation of deep ALPR systems is usually done within each dataset; therefore, it is questionable if such results are a reliable indicator of generalization ability. In this paper, we propose a traditional-split versus leave-one-dataset-out experimental setup to empirically assess the cross-dataset generalization of 12 Optical Character Recognition (OCR) models applied to LP recognition on nine publicly available datasets with a great variety in several aspects (e.g., acquisition settings, image resolution, and LP layouts). We also introduce a public dataset for end-to-end ALPR that is the first to contain images of vehicles with Mercosur LPs and the one with the highest number of motorcycle images. The experimental results shed light on the limitations of the traditional-split protocol for evaluating approaches in the ALPR context, as there are significant drops in performance for most datasets when training and testing the models in a leave-one-dataset-out fashion. △ Less

Submitted 21 December, 2022; v1 submitted 1 January, 2022; originally announced January 2022.

Comments: Accepted for presentation at the International Conference on Computer Vision Theory and Applications (VISAPP) 2022

arXiv:2112.09976 [pdf, other]

doi 10.1007/s11042-023-16566-5

Tell me what you see: A zero-shot action recognition method based on natural language descriptions

Authors: Valter Estevam, Rayson Laroca, David Menotti, Helio Pedrini

Abstract: This paper presents a novel approach to Zero-Shot Action Recognition. Recent works have explored the detection and classification of objects to obtain semantic information from videos with remarkable performance. Inspired by them, we propose using video captioning methods to extract semantic information about objects, scenes, humans, and their relationships. To the best of our knowledge, this is t… ▽ More This paper presents a novel approach to Zero-Shot Action Recognition. Recent works have explored the detection and classification of objects to obtain semantic information from videos with remarkable performance. Inspired by them, we propose using video captioning methods to extract semantic information about objects, scenes, humans, and their relationships. To the best of our knowledge, this is the first work to represent both videos and labels with descriptive sentences. More specifically, we represent videos using sentences generated via video captioning methods and classes using sentences extracted from documents acquired through search engines on the Internet. Using these representations, we build a shared semantic space employing BERT-based embedders pre-trained in the paraphrasing task on multiple text datasets. The projection of both visual and semantic information onto this space is straightforward, as they are sentences, enabling classification using the nearest neighbor rule. We demonstrate that representing videos and labels with sentences alleviates the domain adaptation problem. Additionally, we show that word vectors are unsuitable for building the semantic embedding space of our descriptions. Our method outperforms the state-of-the-art performance on the UCF101 dataset by 3.3 p.p. in accuracy under the TruZe protocol and achieves competitive results on both the UCF101 and HMDB51 datasets under the conventional protocol (0/50\% - training/testing split). Our code is available at https://github.com/valterlej/zsarcap. △ Less

Submitted 11 September, 2023; v1 submitted 18 December, 2021; originally announced December 2021.

Comments: Published at Multimedia Tools and Applications

arXiv:2112.08455 [pdf, other]

Dense Video Captioning Using Unsupervised Semantic Information

Authors: Valter Estevam, Rayson Laroca, Helio Pedrini, David Menotti

Abstract: We introduce a method to learn unsupervised semantic visual information based on the premise that complex events (e.g., minutes) can be decomposed into simpler events (e.g., a few seconds), and that these simple events are shared across several complex events. We split a long video into short frame sequences to extract their latent representation with three-dimensional convolutional neural network… ▽ More We introduce a method to learn unsupervised semantic visual information based on the premise that complex events (e.g., minutes) can be decomposed into simpler events (e.g., a few seconds), and that these simple events are shared across several complex events. We split a long video into short frame sequences to extract their latent representation with three-dimensional convolutional neural networks. A clustering method is used to group representations producing a visual codebook (i.e., a long video is represented by a sequence of integers given by the cluster labels). A dense representation is learned by encoding the co-occurrence probability matrix for the codebook entries. We demonstrate how this representation can leverage the performance of the dense video captioning task in a scenario with only visual features. As a result of this approach, we are able to replace the audio signal in the Bi-Modal Transformer (BMT) method and produce temporal proposals with comparable performance. Furthermore, we concatenate the visual signal with our descriptor in a vanilla transformer method to achieve state-of-the-art performance in captioning compared to the methods that explore only visual features, as well as a competitive performance with multi-modal methods. Our code is available at https://github.com/valterlej/dvcusi. △ Less

Submitted 15 December, 2021; originally announced December 2021.

arXiv:2011.12427 [pdf, other]

doi 10.1038/s41598-022-22811-y

A New Periocular Dataset Collected by Mobile Devices in Unconstrained Scenarios

Authors: Luiz A. Zanlorensi, Rayson Laroca, Diego R. Lucio, Lucas R. Santos, Alceu S. Britto Jr., David Menotti

Abstract: Recently, ocular biometrics in unconstrained environments using images obtained at visible wavelength have gained the researchers' attention, especially with images captured by mobile devices. Periocular recognition has been demonstrated to be an alternative when the iris trait is not available due to occlusions or low image resolution. However, the periocular trait does not have the high uniquene… ▽ More Recently, ocular biometrics in unconstrained environments using images obtained at visible wavelength have gained the researchers' attention, especially with images captured by mobile devices. Periocular recognition has been demonstrated to be an alternative when the iris trait is not available due to occlusions or low image resolution. However, the periocular trait does not have the high uniqueness presented in the iris trait. Thus, the use of datasets containing many subjects is essential to assess biometric systems' capacity to extract discriminating information from the periocular region. Also, to address the within-class variability caused by lighting and attributes in the periocular region, it is of paramount importance to use datasets with images of the same subject captured in distinct sessions. As the datasets available in the literature do not present all these factors, in this work, we present a new periocular dataset containing samples from 1,122 subjects, acquired in 3 sessions by 196 different mobile devices. The images were captured under unconstrained environments with just a single instruction to the participants: to place their eyes on a region of interest. We also performed an extensive benchmark with several Convolutional Neural Network (CNN) architectures and models that have been employed in state-of-the-art approaches based on Multi-class Classification, Multitask Learning, Pairwise Filters Network, and Siamese Network. The results achieved in the closed- and open-world protocol, considering the identification and verification tasks, show that this area still needs research and development. △ Less

Submitted 14 November, 2022; v1 submitted 24 November, 2020; originally announced November 2020.

Journal ref: Scientific Reports, vol. 12, p. 17989, 2022

arXiv:2010.16307 [pdf, other]

Automatic Counting and Identification of Train Wagons Based on Computer Vision and Deep Learning

Authors: Rayson Laroca, Alessander Cidral Boslooper, David Menotti

Abstract: In this work, we present a robust and efficient solution for counting and identifying train wagons using computer vision and deep learning. The proposed solution is cost-effective and can easily replace solutions based on radiofrequency identification (RFID), which are known to have high installation and maintenance costs. According to our experiments, our two-stage methodology achieves impressive… ▽ More In this work, we present a robust and efficient solution for counting and identifying train wagons using computer vision and deep learning. The proposed solution is cost-effective and can easily replace solutions based on radiofrequency identification (RFID), which are known to have high installation and maintenance costs. According to our experiments, our two-stage methodology achieves impressive results on real-world scenarios, i.e., 100% accuracy in the counting stage and 99.7% recognition rate in the identification one. Moreover, the system is able to automatically reject some of the train wagons successfully counted, as they have damaged identification codes. The results achieved were surprising considering that the proposed system requires low processing power (i.e., it can run in low-end setups) and that we used a relatively small number of images to train our Convolutional Neural Network (CNN) for character recognition. The proposed method is registered, under number BR512020000808-9, with the National Institute of Industrial Property (Brazil). △ Less

Submitted 30 October, 2020; originally announced October 2020.

Comments: An article about the proposed system has been published in the October 2020 issue of Railway Gazette International, the leading business journal for the worldwide rail industry

arXiv:2009.10181 [pdf, other]

doi 10.1109/ACCESS.2021.3077415

Towards Image-based Automatic Meter Reading in Unconstrained Scenarios: A Robust and Efficient Approach

Authors: Rayson Laroca, Alessandra B. Araujo, Luiz A. Zanlorensi, Eduardo C. de Almeida, David Menotti

Abstract: Existing approaches for image-based Automatic Meter Reading (AMR) have been evaluated on images captured in well-controlled scenarios. However, real-world meter reading presents unconstrained scenarios that are way more challenging due to dirt, various lighting conditions, scale variations, in-plane and out-of-plane rotations, among other factors. In this work, we present an end-to-end approach fo… ▽ More Existing approaches for image-based Automatic Meter Reading (AMR) have been evaluated on images captured in well-controlled scenarios. However, real-world meter reading presents unconstrained scenarios that are way more challenging due to dirt, various lighting conditions, scale variations, in-plane and out-of-plane rotations, among other factors. In this work, we present an end-to-end approach for AMR focusing on unconstrained scenarios. Our main contribution is the insertion of a new stage in the AMR pipeline, called corner detection and counter classification, which enables the counter region to be rectified -- as well as the rejection of illegible/faulty meters -- prior to the recognition stage. We also introduce a publicly available dataset, called Copel-AMR, that contains 12,500 meter images acquired in the field by the service company's employees themselves, including 2,500 images of faulty meters or cases where the reading is illegible due to occlusions. Experimental evaluation demonstrates that the proposed system, which has three networks operating in a cascaded mode, outperforms all baselines in terms of recognition rate while still being quite efficient. Moreover, as very few reading errors are tolerated in real-world applications, we show that our AMR system achieves impressive recognition rates (i.e., > 99%) when rejecting readings made with lower confidence values. △ Less

Submitted 12 May, 2021; v1 submitted 21 September, 2020; originally announced September 2020.

Journal ref: IEEE Access, vol. 9, pp. 67569-67584, 2021

arXiv:2005.03106 [pdf, other]

doi 10.1109/IJCNN48605.2020.9207318

Deep Learning for Image-based Automatic Dial Meter Reading: Dataset and Baselines

Authors: Gabriel Salomon, Rayson Laroca, David Menotti

Abstract: Smart meters enable remote and automatic electricity, water and gas consumption reading and are being widely deployed in developed countries. Nonetheless, there is still a huge number of non-smart meters in operation. Image-based Automatic Meter Reading (AMR) focuses on dealing with this type of meter readings. We estimate that the Energy Company of Paraná (Copel), in Brazil, performs more than 85… ▽ More Smart meters enable remote and automatic electricity, water and gas consumption reading and are being widely deployed in developed countries. Nonetheless, there is still a huge number of non-smart meters in operation. Image-based Automatic Meter Reading (AMR) focuses on dealing with this type of meter readings. We estimate that the Energy Company of Paraná (Copel), in Brazil, performs more than 850,000 readings of dial meters per month. Those meters are the focus of this work. Our main contributions are: (i) a public real-world dial meter dataset (shared upon request) called UFPR-ADMR; (ii) a deep learning-based recognition baseline on the proposed dataset; and (iii) a detailed error analysis of the main issues present in AMR for dial meters. To the best of our knowledge, this is the first work to introduce deep learning approaches to multi-dial meter reading, and perform experiments on unconstrained images. We achieved a 100.0% F1-score on the dial detection stage with both Faster R-CNN and YOLO, while the recognition rates reached 93.6% for dials and 75.25% for meters using Faster R-CNN (ResNext-101). △ Less

Submitted 8 May, 2020; v1 submitted 6 May, 2020; originally announced May 2020.

Comments: Accepted for presentation at the 2020 International Joint Conference on Neural Networks (IJCNN)

arXiv:1911.09646 [pdf, other]

doi 10.1007/s10462-021-10028-w

Ocular Recognition Databases and Competitions: A Survey

Authors: Luiz A. Zanlorensi, Rayson Laroca, Eduardo Luz, Alceu S. Britto Jr., Luiz S. Oliveira, David Menotti

Abstract: The use of the iris and periocular region as biometric traits has been extensively investigated, mainly due to the singularity of the iris features and the use of the periocular region when the image resolution is not sufficient to extract iris information. In addition to providing information about an individual's identity, features extracted from these traits can also be explored to obtain other… ▽ More The use of the iris and periocular region as biometric traits has been extensively investigated, mainly due to the singularity of the iris features and the use of the periocular region when the image resolution is not sufficient to extract iris information. In addition to providing information about an individual's identity, features extracted from these traits can also be explored to obtain other information such as the individual's gender, the influence of drug use, the use of contact lenses, spoofing, among others. This work presents a survey of the databases created for ocular recognition, detailing their protocols and how their images were acquired. We also describe and discuss the most popular ocular recognition competitions (contests), highlighting the submitted algorithms that achieved the best results using only iris trait and also fusing iris and periocular region information. Finally, we describe some relevant works applying deep learning techniques to ocular recognition and point out new challenges and future directions. Considering that there are a large number of ocular databases, and each one is usually designed for a specific problem, we believe this survey can provide a broad overview of the challenges in ocular biometrics. △ Less

Submitted 4 February, 2022; v1 submitted 21 November, 2019; originally announced November 2019.

Comments: Artificial Intelligence Review, vol. 55, pp. 129-180, 2022

arXiv:1911.05541 [pdf, other]

doi 10.1109/ACCESS.2021.3097964

Vehicle-Rear: A New Dataset to Explore Feature Fusion for Vehicle Identification Using Convolutional Neural Networks

Authors: Icaro O. de Oliveira, Rayson Laroca, David Menotti, Keiko V. O. Fonseca, Rodrigo Minetto

Abstract: This work addresses the problem of vehicle identification through non-overlap** cameras. As our main contribution, we introduce a novel dataset for vehicle identification, called Vehicle-Rear, that contains more than three hours of high-resolution videos, with accurate information about the make, model, color and year of nearly 3,000 vehicles, in addition to the position and identification of th… ▽ More This work addresses the problem of vehicle identification through non-overlap** cameras. As our main contribution, we introduce a novel dataset for vehicle identification, called Vehicle-Rear, that contains more than three hours of high-resolution videos, with accurate information about the make, model, color and year of nearly 3,000 vehicles, in addition to the position and identification of their license plates. To explore our dataset we design a two-stream CNN that simultaneously uses two of the most distinctive and persistent features available: the vehicle's appearance and its license plate. This is an attempt to tackle a major problem: false alarms caused by vehicles with similar designs or by very close license plate identifiers. In the first network stream, shape similarities are identified by a Siamese CNN that uses a pair of low-resolution vehicle patches recorded by two different cameras. In the second stream, we use a CNN for OCR to extract textual information, confidence scores, and string similarities from a pair of high-resolution license plate patches. Then, features from both streams are merged by a sequence of fully connected layers for decision. In our experiments, we compared the two-stream network against several well-known CNN architectures using single or multiple vehicle features. The architectures, trained models, and dataset are publicly available at https://github.com/icarofua/vehicle-rear. △ Less

Submitted 25 July, 2021; v1 submitted 13 November, 2019; originally announced November 2019.

Journal ref: IEEE Access, vol. 9, pp. 101065-101077, 2021

arXiv:1909.01754 [pdf, other]

doi 10.1049/itr2.12030

An Efficient and Layout-Independent Automatic License Plate Recognition System Based on the YOLO detector

Authors: Rayson Laroca, Luiz A. Zanlorensi, Gabriel R. Gonçalves, Eduardo Todt, William Robson Schwartz, David Menotti

Abstract: This paper presents an efficient and layout-independent Automatic License Plate Recognition (ALPR) system based on the state-of-the-art YOLO object detector that contains a unified approach for license plate (LP) detection and layout classification to improve the recognition results using post-processing rules. The system is conceived by evaluating and optimizing different models, aiming at achiev… ▽ More This paper presents an efficient and layout-independent Automatic License Plate Recognition (ALPR) system based on the state-of-the-art YOLO object detector that contains a unified approach for license plate (LP) detection and layout classification to improve the recognition results using post-processing rules. The system is conceived by evaluating and optimizing different models, aiming at achieving the best speed/accuracy trade-off at each stage. The networks are trained using images from several datasets, with the addition of various data augmentation techniques, so that they are robust under different conditions. The proposed system achieved an average end-to-end recognition rate of 96.9% across eight public datasets (from five different regions) used in the experiments, outperforming both previous works and commercial systems in the ChineseLP, OpenALPR-EU, SSIG-SegPlate and UFPR-ALPR datasets. In the other datasets, the proposed approach achieved competitive results to those attained by the baselines. Our system also achieved impressive frames per second (FPS) rates on a high-end GPU, being able to perform in real time even when there are four vehicles in the scene. An additional contribution is that we manually labeled 38,351 bounding boxes on 6,239 images from public datasets and made the annotations publicly available to the research community. △ Less

Submitted 9 March, 2021; v1 submitted 4 September, 2019; originally announced September 2019.

Journal ref: IET Intelligent Transport Systems, vol. 15, no. 4, pp. 483-503, 2021

arXiv:1908.00069 [pdf, other]

doi 10.1109/SIBGRAPI.2019.00032

Simultaneous Iris and Periocular Region Detection Using Coarse Annotations

Authors: Diego R. Lucio, Rayson Laroca, Luiz A. Zanlorensi, Gladston Moreira, David Menotti

Abstract: In this work, we propose to detect the iris and periocular regions simultaneously using coarse annotations and two well-known object detectors: YOLOv2 and Faster R-CNN. We believe coarse annotations can be used in recognition systems based on the iris and periocular regions, given the much smaller engineering effort required to manually annotate the training images. We manually made coarse annotat… ▽ More In this work, we propose to detect the iris and periocular regions simultaneously using coarse annotations and two well-known object detectors: YOLOv2 and Faster R-CNN. We believe coarse annotations can be used in recognition systems based on the iris and periocular regions, given the much smaller engineering effort required to manually annotate the training images. We manually made coarse annotations of the iris and periocular regions (122K images from the visible (VIS) spectrum and 38K images from the near-infrared (NIR) spectrum). The iris annotations in the NIR databases were generated semi-automatically by first applying an iris segmentation CNN and then performing a manual inspection. These annotations were made for 11 well-known public databases (3 NIR and 8 VIS) designed for the iris-based recognition problem and are publicly available to the research community. Experimenting our proposal on these databases, we highlight two results. First, the Faster R-CNN + Feature Pyramid Network (FPN) model reported an Intersection over Union (IoU) higher than YOLOv2 (91.86% vs 85.30%). Second, the detection of the iris and periocular regions being performed simultaneously is as accurate as performed separately, but with a lower computational cost, i.e., two tasks were carried out at the cost of one. △ Less

Submitted 31 July, 2019; originally announced August 2019.

Comments: Accepted for presentation at the Conference on Graphics, Patterns and Images (SIBGRAPI) 2019

arXiv:1902.09600 [pdf, other]

doi 10.1117/1.JEI.28.1.013023

Convolutional Neural Networks for Automatic Meter Reading

Authors: Rayson Laroca, Victor Barroso, Matheus A. Diniz, Gabriel R. Gonçalves, William Robson Schwartz, David Menotti

Abstract: In this paper, we tackle Automatic Meter Reading (AMR) by leveraging the high capability of Convolutional Neural Networks (CNNs). We design a two-stage approach that employs the Fast-YOLO object detector for counter detection and evaluates three different CNN-based approaches for counter recognition. In the AMR literature, most datasets are not available to the research community since the images… ▽ More In this paper, we tackle Automatic Meter Reading (AMR) by leveraging the high capability of Convolutional Neural Networks (CNNs). We design a two-stage approach that employs the Fast-YOLO object detector for counter detection and evaluates three different CNN-based approaches for counter recognition. In the AMR literature, most datasets are not available to the research community since the images belong to a service company. In this sense, we introduce a new public dataset, called UFPR-AMR dataset, with 2,000 fully and manually annotated images. This dataset is, to the best of our knowledge, three times larger than the largest public dataset found in the literature and contains a well-defined evaluation protocol to assist the development and evaluation of AMR methods. Furthermore, we propose the use of a data augmentation technique to generate a balanced training set with many more examples to train the CNN models for counter recognition. In the proposed dataset, impressive results were obtained and a detailed speed/accuracy trade-off evaluation of each model was performed. In a public dataset, state-of-the-art results were achieved using less than 200 images for training. △ Less

Submitted 25 February, 2019; originally announced February 2019.

Journal ref: Journal of Electronic Imaging 28(1), 013023 (5 February 2019)

arXiv:1809.00769 [pdf, other]

doi 10.1109/SIBGRAPI.2018.00043

Robust Iris Segmentation Based on Fully Convolutional Networks and Generative Adversarial Networks

Authors: Cides S. Bezerra, Rayson Laroca, Diego R. Lucio, Evair Severo, Lucas F. Oliveira, Alceu S. Britto Jr., David Menotti

Abstract: The iris can be considered as one of the most important biometric traits due to its high degree of uniqueness. Iris-based biometrics applications depend mainly on the iris segmentation whose suitability is not robust for different environments such as near-infrared (NIR) and visible (VIS) ones. In this paper, two approaches for robust iris segmentation based on Fully Convolutional Networks (FCNs)… ▽ More The iris can be considered as one of the most important biometric traits due to its high degree of uniqueness. Iris-based biometrics applications depend mainly on the iris segmentation whose suitability is not robust for different environments such as near-infrared (NIR) and visible (VIS) ones. In this paper, two approaches for robust iris segmentation based on Fully Convolutional Networks (FCNs) and Generative Adversarial Networks (GANs) are described. Similar to a common convolutional network, but without the fully connected layers (i.e., the classification layers), an FCN employs at its end a combination of pooling layers from different convolutional layers. Based on the game theory, a GAN is designed as two networks competing with each other to generate the best segmentation. The proposed segmentation networks achieved promising results in all evaluated datasets (i.e., BioSec, CasiaI3, CasiaT4, IITD-1) of NIR images and (NICE.I, CrEye-Iris and MICHE-I) of VIS images in both non-cooperative and cooperative domains, outperforming the baselines techniques which are the best ones found so far in the literature, i.e., a new state of the art for these datasets. Furthermore, we manually labeled 2,431 images from CasiaT4, CrEye-Iris and MICHE-I datasets, making the masks available for research purposes. △ Less

Submitted 3 September, 2018; originally announced September 2018.

Comments: Accepted for presentation at the Conference on Graphics, Patterns and Images (SIBGRAPI) 2018

arXiv:1808.10032 [pdf, other]

doi 10.1109/SIBGRAPI.2018.00044

The Impact of Preprocessing on Deep Representations for Iris Recognition on Unconstrained Environments

Authors: Luiz A. Zanlorensi, Eduardo Luz, Rayson Laroca, Alceu S. Britto Jr., Luiz S. Oliveira, David Menotti

Abstract: The use of iris as a biometric trait is widely used because of its high level of distinction and uniqueness. Nowadays, one of the major research challenges relies on the recognition of iris images obtained in visible spectrum under unconstrained environments. In this scenario, the acquired iris are affected by capture distance, rotation, blur, motion blur, low contrast and specular reflection, cre… ▽ More The use of iris as a biometric trait is widely used because of its high level of distinction and uniqueness. Nowadays, one of the major research challenges relies on the recognition of iris images obtained in visible spectrum under unconstrained environments. In this scenario, the acquired iris are affected by capture distance, rotation, blur, motion blur, low contrast and specular reflection, creating noises that disturb the iris recognition systems. Besides delineating the iris region, usually preprocessing techniques such as normalization and segmentation of noisy iris images are employed to minimize these problems. But these techniques inevitably run into some errors. In this context, we propose the use of deep representations, more specifically, architectures based on VGG and ResNet-50 networks, for dealing with the images using (and not) iris segmentation and normalization. We use transfer learning from the face domain and also propose a specific data augmentation technique for iris images. Our results show that the approach using non-normalized and only circle-delimited iris images reaches a new state of the art in the official protocol of the NICE.II competition, a subset of the UBIRIS database, one of the most challenging databases on unconstrained environments, reporting an average Equal Error Rate (EER) of 13.98% which represents an absolute reduction of about 5%. △ Less

Submitted 29 August, 2018; originally announced August 2018.

Comments: Accepted for presentation at the Conference on Graphics, Patterns and Images (SIBGRAPI) 2018

arXiv:1806.08722 [pdf, other]

doi 10.1109/BTAS.2018.8698597

Fully Convolutional Networks and Generative Adversarial Networks Applied to Sclera Segmentation

Authors: Diego R. Lucio, Rayson Laroca, Evair Severo, Alceu S. Britto Jr., David Menotti

Abstract: Due to the world's demand for security systems, biometrics can be seen as an important topic of research in computer vision. One of the biometric forms that has been gaining attention is the recognition based on sclera. The initial and paramount step for performing this type of recognition is the segmentation of the region of interest, i.e. the sclera. In this context, two approaches for such task… ▽ More Due to the world's demand for security systems, biometrics can be seen as an important topic of research in computer vision. One of the biometric forms that has been gaining attention is the recognition based on sclera. The initial and paramount step for performing this type of recognition is the segmentation of the region of interest, i.e. the sclera. In this context, two approaches for such task based on the Fully Convolutional Network (FCN) and on Generative Adversarial Network (GAN) are introduced in this work. FCN is similar to a common convolution neural network, however the fully connected layers (i.e., the classification layers) are removed from the end of the network and the output is generated by combining the output of pooling layers from different convolutional ones. The GAN is based on the game theory, where we have two networks competing with each other to generate the best segmentation. In order to perform fair comparison with baselines and quantitative and objective evaluations of the proposed approaches, we provide to the scientific community new 1,300 manually segmented images from two databases. The experiments are performed on the UBIRIS.v2 and MICHE databases and the best performing configurations of our propositions achieved F-score's measures of 87.48% and 88.32%, respectively. △ Less

Submitted 9 July, 2018; v1 submitted 22 June, 2018; originally announced June 2018.

Comments: Accepted for presentation at the IEEE International Conference on Biometrics: Theory, Applications, and Systems (BTAS) 2018

arXiv:1803.01250 [pdf, other]

doi 10.1109/IJCNN.2018.8489638

A Benchmark for Iris Location and a Deep Learning Detector Evaluation

Authors: Evair Severo, Rayson Laroca, Cides S. Bezerra, Luiz A. Zanlorensi, Daniel Weingaertner, Gladston Moreira, David Menotti

Abstract: The iris is considered as the biometric trait with the highest unique probability. The iris location is an important task for biometrics systems, affecting directly the results obtained in specific applications such as iris recognition, spoofing and contact lenses detection, among others. This work defines the iris location problem as the delimitation of the smallest squared window that encompasse… ▽ More The iris is considered as the biometric trait with the highest unique probability. The iris location is an important task for biometrics systems, affecting directly the results obtained in specific applications such as iris recognition, spoofing and contact lenses detection, among others. This work defines the iris location problem as the delimitation of the smallest squared window that encompasses the iris region. In order to build a benchmark for iris location we annotate (iris squared bounding boxes) four databases from different biometric applications and make them publicly available to the community. Besides these 4 annotated databases, we include 2 others from the literature. We perform experiments on these six databases, five obtained with near infra-red sensors and one with visible light sensor. We compare the classical and outstanding Daugman iris location approach with two window based detectors: 1) a sliding window detector based on features from Histogram of Oriented Gradients (HOG) and a linear Support Vector Machines (SVM) classifier; 2) a deep learning based detector fine-tuned from YOLO object detector. Experimental results showed that the deep learning based detector outperforms the other ones in terms of accuracy and runtime (GPUs version) and should be chosen whenever possible. △ Less

Submitted 30 April, 2018; v1 submitted 3 March, 2018; originally announced March 2018.

Comments: Accepted for presentation at the International Joint Conference on Neural Networks (IJCNN) 2018

arXiv:1802.09567 [pdf, other]

doi 10.1109/IJCNN.2018.8489629

A Robust Real-Time Automatic License Plate Recognition Based on the YOLO Detector

Authors: Rayson Laroca, Evair Severo, Luiz A. Zanlorensi, Luiz S. Oliveira, Gabriel Resende Gonçalves, William Robson Schwartz, David Menotti

Abstract: Automatic License Plate Recognition (ALPR) has been a frequent topic of research due to many practical applications. However, many of the current solutions are still not robust in real-world situations, commonly depending on many constraints. This paper presents a robust and efficient ALPR system based on the state-of-the-art YOLO object detector. The Convolutional Neural Networks (CNNs) are train… ▽ More Automatic License Plate Recognition (ALPR) has been a frequent topic of research due to many practical applications. However, many of the current solutions are still not robust in real-world situations, commonly depending on many constraints. This paper presents a robust and efficient ALPR system based on the state-of-the-art YOLO object detector. The Convolutional Neural Networks (CNNs) are trained and fine-tuned for each ALPR stage so that they are robust under different conditions (e.g., variations in camera, lighting, and background). Specially for character segmentation and recognition, we design a two-stage approach employing simple data augmentation tricks such as inverted License Plates (LPs) and flipped characters. The resulting ALPR approach achieved impressive results in two datasets. First, in the SSIG dataset, composed of 2,000 frames from 101 vehicle videos, our system achieved a recognition rate of 93.53% and 47 Frames Per Second (FPS), performing better than both Sighthound and OpenALPR commercial systems (89.80% and 93.03%, respectively) and considerably outperforming previous results (81.80%). Second, targeting a more realistic scenario, we introduce a larger public dataset, called UFPR-ALPR dataset, designed to ALPR. This dataset contains 150 videos and 4,500 frames captured when both camera and vehicles are moving and also contains different types of vehicles (cars, motorcycles, buses and trucks). In our proposed dataset, the trial versions of commercial systems achieved recognition rates below 70%. On the other hand, our system performed better, with recognition rate of 78.33% and 35 FPS. △ Less

Submitted 28 April, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

Comments: Accepted for presentation at the International Joint Conference on Neural Networks (IJCNN) 2018

Showing 1–26 of 26 results for author: Laroca, R