-
Production of $ B \bar{B}$ bound state via $Υ(4S)$ radiative decays
Authors:
André L. M. Britto,
Luciano M. Abreu
Abstract:
Motivated by recent theoretical predictions about the existence of a $ B \bar{B}$ bound state (also denoted as $ X(10550) $), in this work we estimate the production of the $S$-wave $ B^+ B^-$ molecule via $Υ(4S)$ radiative decays. In particular, we make use of effective Lagrangian approach and the compositeness condition to calculate the $ X(10550) $ production rate via…
▽ More
Motivated by recent theoretical predictions about the existence of a $ B \bar{B}$ bound state (also denoted as $ X(10550) $), in this work we estimate the production of the $S$-wave $ B^+ B^-$ molecule via $Υ(4S)$ radiative decays. In particular, we make use of effective Lagrangian approach and the compositeness condition to calculate the $ X(10550) $ production rate via $Υ(4S)\rightarrow γX(10550)$ decays employing triangle diagrams. Our results show that the partial decay width of this reaction is of the order of $0.5 - 192 \ \keV $ for a respective binding energy of $1 - 100 $ MeV, corresponding to a branching fraction of $ 10^{-5} - 10^{-3}$. These findings suggest that the existence of the $ X(10550) $ might be checked via the analysis of the mentioned decay in present and future experiments.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Alleviating Catastrophic Forgetting in Facial Expression Recognition with Emotion-Centered Models
Authors:
Israel A. Laurensi,
Alceu de Souza Britto Jr.,
Jean Paul Barddal,
Alessandro Lameiras Koerich
Abstract:
Facial expression recognition is a pivotal component in machine learning, facilitating various applications. However, convolutional neural networks (CNNs) are often plagued by catastrophic forgetting, impeding their adaptability. The proposed method, emotion-centered generative replay (ECgr), tackles this challenge by integrating synthetic images from generative adversarial networks. Moreover, ECg…
▽ More
Facial expression recognition is a pivotal component in machine learning, facilitating various applications. However, convolutional neural networks (CNNs) are often plagued by catastrophic forgetting, impeding their adaptability. The proposed method, emotion-centered generative replay (ECgr), tackles this challenge by integrating synthetic images from generative adversarial networks. Moreover, ECgr incorporates a quality assurance algorithm to ensure the fidelity of generated images. This dual approach enables CNNs to retain past knowledge while learning new tasks, enhancing their performance in emotion recognition. The experimental results on four diverse facial expression datasets demonstrate that incorporating images generated by our pseudo-rehearsal method enhances training on the targeted dataset and the source dataset while making the CNN retain previously learned knowledge.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Dynamic Modality and View Selection for Multimodal Emotion Recognition with Missing Modalities
Authors:
Luciana Trinkaus Menon,
Luiz Carlos Ribeiro Neduziak,
Jean Paul Barddal,
Alessandro Lameiras Koerich,
Alceu de Souza Britto Jr
Abstract:
The study of human emotions, traditionally a cornerstone in fields like psychology and neuroscience, has been profoundly impacted by the advent of artificial intelligence (AI). Multiple channels, such as speech (voice) and facial expressions (image), are crucial in understanding human emotions. However, AI's journey in multimodal emotion recognition (MER) is marked by substantial technical challen…
▽ More
The study of human emotions, traditionally a cornerstone in fields like psychology and neuroscience, has been profoundly impacted by the advent of artificial intelligence (AI). Multiple channels, such as speech (voice) and facial expressions (image), are crucial in understanding human emotions. However, AI's journey in multimodal emotion recognition (MER) is marked by substantial technical challenges. One significant hurdle is how AI models manage the absence of a particular modality - a frequent occurrence in real-world situations. This study's central focus is assessing the performance and resilience of two strategies when confronted with the lack of one modality: a novel multimodal dynamic modality and view selection and a cross-attention mechanism. Results on the RECOLA dataset show that dynamic selection-based methods are a promising approach for MER. In the missing modalities scenarios, all dynamic selection-based methods outperformed the baseline. The study concludes by emphasizing the intricate interplay between audio and video modalities in emotion prediction, showcasing the adaptability of dynamic selection methods in handling missing modalities.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Improving Sampling Methods for Fine-tuning SentenceBERT in Text Streams
Authors:
Cristiano Mesquita Garcia,
Alessandro Lameiras Koerich,
Alceu de Souza Britto Jr,
Jean Paul Barddal
Abstract:
The proliferation of textual data on the Internet presents a unique opportunity for institutions and companies to monitor public opinion about their services and products. Given the rapid generation of such data, the text stream mining setting, which handles sequentially arriving, potentially infinite text streams, is often more suitable than traditional batch learning. While pre-trained language…
▽ More
The proliferation of textual data on the Internet presents a unique opportunity for institutions and companies to monitor public opinion about their services and products. Given the rapid generation of such data, the text stream mining setting, which handles sequentially arriving, potentially infinite text streams, is often more suitable than traditional batch learning. While pre-trained language models are commonly employed for their high-quality text vectorization capabilities in streaming contexts, they face challenges adapting to concept drift - the phenomenon where the data distribution changes over time, adversely affecting model performance. Addressing the issue of concept drift, this study explores the efficacy of seven text sampling methods designed to selectively fine-tune language models, thereby mitigating performance degradation. We precisely assess the impact of these methods on fine-tuning the SBERT model using four different loss functions. Our evaluation, focused on Macro F1-score and elapsed time, employs two text stream datasets and an incremental SVM classifier to benchmark performance. Our findings indicate that Softmax loss and Batch All Triplets loss are particularly effective for text stream classification, demonstrating that larger sample sizes generally correlate with improved macro F1-scores. Notably, our proposed WordPieceToken ratio sampling method significantly enhances performance with the identified loss functions, surpassing baseline results.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Methods for Generating Drift in Text Streams
Authors:
Cristiano Mesquita Garcia,
Alessandro Lameiras Koerich,
Alceu de Souza Britto Jr,
Jean Paul Barddal
Abstract:
Systems and individuals produce data continuously. On the Internet, people share their knowledge, sentiments, and opinions, provide reviews about services and products, and so on. Automatically learning from these textual data can provide insights to organizations and institutions, thus preventing financial impacts, for example. To learn from textual data over time, the machine learning system mus…
▽ More
Systems and individuals produce data continuously. On the Internet, people share their knowledge, sentiments, and opinions, provide reviews about services and products, and so on. Automatically learning from these textual data can provide insights to organizations and institutions, thus preventing financial impacts, for example. To learn from textual data over time, the machine learning system must account for concept drift. Concept drift is a frequent phenomenon in real-world datasets and corresponds to changes in data distribution over time. For instance, a concept drift occurs when sentiments change or a word's meaning is adjusted over time. Although concept drift is frequent in real-world applications, benchmark datasets with labeled drifts are rare in the literature. To bridge this gap, this paper provides four textual drift generation methods to ease the production of datasets with labeled drifts. These methods were applied to Yelp and Airbnb datasets and tested using incremental classifiers respecting the stream mining paradigm to evaluate their ability to recover from the drifts. Results show that all methods have their performance degraded right after the drifts, and the incremental SVM is the fastest to run and recover the previous performance levels regarding accuracy and Macro F1-Score.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Temporal Analysis of Drifting Hashtags in Textual Data Streams: A Graph-Based Application
Authors:
Cristiano M. Garcia,
Alceu de Souza Britto Jr,
Jean Paul Barddal
Abstract:
Social media has played an important role since its emergence. People use the internet to express opinions about anything, making social media platforms a social sensor. Initially supported by Twitter, the hashtags are now in use on several social media platforms. Hashtags are helpful to tag, track, and group posts on similar topics. In this paper, we analyze hashtag drifts over time using concept…
▽ More
Social media has played an important role since its emergence. People use the internet to express opinions about anything, making social media platforms a social sensor. Initially supported by Twitter, the hashtags are now in use on several social media platforms. Hashtags are helpful to tag, track, and group posts on similar topics. In this paper, we analyze hashtag drifts over time using concepts from graph analysis and textual data streams using the Girvan-Newman method to uncover hashtag communities in annual snapshots. More specifically, we analyzed the #mybodymychoice hashtag between 2018 and 2022. In addition, we offer insights about some hashtags found in the study. Furthermore, our approach can be useful for monitoring changes over time in opinions and sentiment patterns about an entity on social media. Even though the hashtag #mybodymychoice was initially coupled with women's rights, abortion, and bodily autonomy, we observe that it suffered drifts during the studied period across topics such as drug legalization, vaccination, political protests, war, and civil rights. The year 2021 was the most significant drifting year, in which the communities detected suggest that #mybodymychoice significantly drifted to vaccination and Covid-19-related topics.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Concept Drift Adaptation in Text Stream Mining Settings: A Comprehensive Review
Authors:
Cristiano Mesquita Garcia,
Ramon Simoes Abilio,
Alessandro Lameiras Koerich,
Alceu de Souza Britto Jr.,
Jean Paul Barddal
Abstract:
Due to the advent and increase in the popularity of the Internet, people have been producing and disseminating textual data in several ways, such as reviews, social media posts, and news articles. As a result, numerous researchers have been working on discovering patterns in textual data, especially because social media posts function as social sensors, indicating peoples' opinions, interests, etc…
▽ More
Due to the advent and increase in the popularity of the Internet, people have been producing and disseminating textual data in several ways, such as reviews, social media posts, and news articles. As a result, numerous researchers have been working on discovering patterns in textual data, especially because social media posts function as social sensors, indicating peoples' opinions, interests, etc. However, most tasks regarding natural language processing are addressed using traditional machine learning methods and static datasets. This setting can lead to several problems, such as an outdated dataset, which may not correspond to reality, and an outdated model, which has its performance degrading over time. Concept drift is another aspect that emphasizes these issues, which corresponds to data distribution and pattern changes. In a text stream scenario, it is even more challenging due to its characteristics, such as the high speed and data arriving sequentially. In addition, models for this type of scenario must adhere to the constraints mentioned above while learning from the stream by storing texts for a limited time and consuming low memory. In this study, we performed a systematic literature review regarding concept drift adaptation in text stream scenarios. Considering well-defined criteria, we selected 40 papers to unravel aspects such as text drift categories, types of text drift detection, model update mechanism, the addressed stream mining tasks, types of text representations, and text representation update mechanism. In addition, we discussed drift visualization and simulation and listed real-world datasets used in the selected papers. Therefore, this paper comprehensively reviews the concept drift adaptation in text stream mining scenarios.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
The $χ_{c1}(4274)$ multiplicity in heavy-ion collisions
Authors:
L. M. Abreu,
A. L. M. Britto,
F. S. Navarra,
H. P. L. Vieira
Abstract:
In a previous work we computed the thermally-averaged cross sections for the production and absorption of the $χ_{c1}(4274)$ state in the hot hadron gas formed in heavy ion collisions. In the present work we estimate the final yield of this exotic state in these collisions. We use the coalescence model to fix the initial multiplicities. The state is is treated as a $P-$wave bound state of…
▽ More
In a previous work we computed the thermally-averaged cross sections for the production and absorption of the $χ_{c1}(4274)$ state in the hot hadron gas formed in heavy ion collisions. In the present work we estimate the final yield of this exotic state in these collisions. We use the coalescence model to fix the initial multiplicities. The state is is treated as a $P-$wave bound state of $D_s\bar D_{s0}$ and also as a compact tetraquark. The Bjorken picture is used to model the hydrodynamic expansion and cooling. Then, the kinetic equation is solved to evaluate the time evolution of the $χ_{c1}(4274)$ yield during the hot hadron gas phase. Since the $χ_{c1}(4274)$ decay width is large it might decay inside the hadron gas. Therefore we also include the $chi_{c1}(4274)$ decay and regeneration terms by means of an effective coupling, estimated from the available data. The combined effects of hadronic interactions and the $χ_{c1}(4274)$ decay have a strong impact on the final yield. Also, predictions of the $χ_{c1}(4274)$ multiplicity as a function of centrality and of the charged hadron multiplicity (measured at midrapidity) are presented. Finally, we calculate the yield of a proposed $P-$wave molecular state of $D_s \bar{D_{s0}}$, $Y^{\prime}(4274)$, characterized by a smaller width and smaller coupling constant obtained from the Weinberg compositeness condition.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Distance Functions and Normalization Under Stream Scenarios
Authors:
Eduardo V. L. Barboza,
Paulo R. Lisboa de Almeida,
Alceu de Souza Britto Jr,
Rafael M. O. Cruz
Abstract:
Data normalization is an essential task when modeling a classification system. When dealing with data streams, data normalization becomes especially challenging since we may not know in advance the properties of the features, such as their minimum/maximum values, and these properties may change over time. We compare the accuracies generated by eight well-known distance functions in data streams wi…
▽ More
Data normalization is an essential task when modeling a classification system. When dealing with data streams, data normalization becomes especially challenging since we may not know in advance the properties of the features, such as their minimum/maximum values, and these properties may change over time. We compare the accuracies generated by eight well-known distance functions in data streams without normalization, normalized considering the statistics of the first batch of data received, and considering the previous batch received. We argue that experimental protocols for streams that consider the full stream as normalized are unrealistic and can lead to biased and poor results. Our results indicate that using the original data stream without applying normalization, and the Canberra distance, can be a good combination when no information about the data stream is known beforehand.
△ Less
Submitted 4 July, 2023; v1 submitted 30 June, 2023;
originally announced July 2023.
-
Interactions of the $χ_{c1}(4274)$ state with light mesons
Authors:
A. L. M. Britto,
L. M. Abreu,
F. S. Navarra
Abstract:
We investigate the interactions of the $χ_{c1}(4274)$ state with light mesons in the hot hadron gas formed in heavy ion collisions. The vacuum and thermally-averaged cross sections of production of $χ_{c1}(4274)$ accompanied by light pseudoscalar and light vector mesons as well as the corresponding inverse processes are estimated within the context of an effective Lagrangian approach. The results…
▽ More
We investigate the interactions of the $χ_{c1}(4274)$ state with light mesons in the hot hadron gas formed in heavy ion collisions. The vacuum and thermally-averaged cross sections of production of $χ_{c1}(4274)$ accompanied by light pseudoscalar and light vector mesons as well as the corresponding inverse processes are estimated within the context of an effective Lagrangian approach. The results suggest non-negligible thermal cross-sections, with larger magnitudes for most of the suppression reactions than those for production. This might be a relevant feature to be considered in the analysis of future data collected in heavy ion collisions.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
Do We Train on Test Data? The Impact of Near-Duplicates on License Plate Recognition
Authors:
Rayson Laroca,
Valter Estevam,
Alceu S. Britto Jr.,
Rodrigo Minetto,
David Menotti
Abstract:
This work draws attention to the large fraction of near-duplicates in the training and test sets of datasets widely adopted in License Plate Recognition (LPR) research. These duplicates refer to images that, although different, show the same license plate. Our experiments, conducted on the two most popular datasets in the field, show a substantial decrease in recognition rate when six well-known m…
▽ More
This work draws attention to the large fraction of near-duplicates in the training and test sets of datasets widely adopted in License Plate Recognition (LPR) research. These duplicates refer to images that, although different, show the same license plate. Our experiments, conducted on the two most popular datasets in the field, show a substantial decrease in recognition rate when six well-known models are trained and tested under fair splits, that is, in the absence of duplicates in the training and test sets. Moreover, in one of the datasets, the ranking of models changed considerably when they were trained and tested under duplicate-free splits. These findings suggest that such duplicates have significantly biased the evaluation and development of deep learning-based models for LPR. The list of near-duplicates we have found and proposals for fair splits are publicly available for further research at https://raysonlaroca.github.io/supp/lpr-train-on-test/
△ Less
Submitted 4 August, 2023; v1 submitted 10 April, 2023;
originally announced April 2023.
-
Pattern Spotting and Image Retrieval in Historical Documents using Deep Hashing
Authors:
Caio da S. Dias,
Alceu de S. Britto Jr.,
Jean P. Barddal,
Laurent Heutte,
Alessandro L. Koerich
Abstract:
This paper presents a deep learning approach for image retrieval and pattern spotting in digital collections of historical documents. First, a region proposal algorithm detects object candidates in the document page images. Next, deep learning models are used for feature extraction, considering two distinct variants, which provide either real-valued or binary code representations. Finally, candida…
▽ More
This paper presents a deep learning approach for image retrieval and pattern spotting in digital collections of historical documents. First, a region proposal algorithm detects object candidates in the document page images. Next, deep learning models are used for feature extraction, considering two distinct variants, which provide either real-valued or binary code representations. Finally, candidate images are ranked by computing the feature similarity with a given input query. A robust experimental protocol evaluates the proposed approach considering each representation scheme (real-valued and binary code) on the DocExplore image database. The experimental results show that the proposed deep models compare favorably to the state-of-the-art image retrieval approaches for images of historical documents, outperforming other deep models by 2.56 percentage points using the same techniques for pattern spotting. Besides, the proposed approach also reduces the search time by up to 200x and the storage cost up to 6,000x when compared to related works based on real-valued representations.
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
Evaluation of Different Annotation Strategies for Deployment of Parking Spaces Classification Systems
Authors:
Andre G. Hochuli,
Alceu S. Britto Jr.,
Paulo R. L. de Almeida,
Williams B. S. Alves,
Fabio M. C. Cagni
Abstract:
When using vision-based approaches to classify individual parking spaces between occupied and empty, human experts often need to annotate the locations and label a training set containing images collected in the target parking lot to fine-tune the system. We propose investigating three annotation types (polygons, bounding boxes, and fixed-size squares), providing different data representations of…
▽ More
When using vision-based approaches to classify individual parking spaces between occupied and empty, human experts often need to annotate the locations and label a training set containing images collected in the target parking lot to fine-tune the system. We propose investigating three annotation types (polygons, bounding boxes, and fixed-size squares), providing different data representations of the parking spaces. The rationale is to elucidate the best trade-off between handcraft annotation precision and model performance. We also investigate the number of annotated parking spaces necessary to fine-tune a pre-trained model in the target parking lot. Experiments using the PKLot dataset show that it is possible to fine-tune a model to the target parking lot with less than 1,000 labeled samples, using low precision annotations such as fixed-size squares.
△ Less
Submitted 22 July, 2022;
originally announced July 2022.
-
Evaluation of Self-taught Learning-based Representations for Facial Emotion Recognition
Authors:
Bruna Delazeri,
Leonardo L. Veras,
Alceu de S. Britto Jr.,
Jean Paul Barddal,
Alessandro L. Koerich
Abstract:
This work describes different strategies to generate unsupervised representations obtained through the concept of self-taught learning for facial emotion recognition (FER). The idea is to create complementary representations promoting diversity by varying the autoencoders' initialization, architecture, and training data. SVM, Bagging, Random Forest, and a dynamic ensemble selection method are eval…
▽ More
This work describes different strategies to generate unsupervised representations obtained through the concept of self-taught learning for facial emotion recognition (FER). The idea is to create complementary representations promoting diversity by varying the autoencoders' initialization, architecture, and training data. SVM, Bagging, Random Forest, and a dynamic ensemble selection method are evaluated as final classification methods. Experimental results on Jaffe and Cohn-Kanade datasets using a leave-one-subject-out protocol show that FER methods based on the proposed diverse representations compare favorably against state-of-the-art approaches that also explore unsupervised feature learning.
△ Less
Submitted 26 April, 2022;
originally announced April 2022.
-
Multiscale Analysis for Improving Texture Classification
Authors:
Steve T. M. Ataky,
Diego Saqui,
Jonathan de Matos,
Alceu S. Britto Jr.,
Alessandro L. Koerich
Abstract:
Information from an image occurs over multiple and distinct spatial scales. Image pyramid multiresolution representations are a useful data structure for image analysis and manipulation over a spectrum of spatial scales. This paper employs the Gaussian-Laplacian pyramid to treat different spatial frequency bands of a texture separately. First, we generate three images corresponding to three levels…
▽ More
Information from an image occurs over multiple and distinct spatial scales. Image pyramid multiresolution representations are a useful data structure for image analysis and manipulation over a spectrum of spatial scales. This paper employs the Gaussian-Laplacian pyramid to treat different spatial frequency bands of a texture separately. First, we generate three images corresponding to three levels of the Gaussian-Laplacian pyramid for an input image to capture intrinsic details. Then we aggregate features extracted from gray and color texture images using bio-inspired texture descriptors, information-theoretic measures, gray-level co-occurrence matrix features, and Haralick statistical features into a single feature vector. Such an aggregation aims at producing features that characterize textures to their maximum extent, unlike employing each descriptor separately, which may lose some relevant textural information and reduce the classification performance. The experimental results on texture and histopathologic image datasets have shown the advantages of the proposed method compared to state-of-the-art approaches. Such findings emphasize the importance of multiscale image analysis and corroborate that the descriptors mentioned above are complementary.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
Open-set Face Recognition for Small Galleries Using Siamese Networks
Authors:
Gabriel Salomon,
Alceu Britto,
Rafael H. Vareto,
William R. Schwartz,
David Menotti
Abstract:
Face recognition has been one of the most relevant and explored fields of Biometrics. In real-world applications, face recognition methods usually must deal with scenarios where not all probe individuals were seen during the training phase (open-set scenarios). Therefore, open-set face recognition is a subject of increasing interest as it deals with identifying individuals in a space where not all…
▽ More
Face recognition has been one of the most relevant and explored fields of Biometrics. In real-world applications, face recognition methods usually must deal with scenarios where not all probe individuals were seen during the training phase (open-set scenarios). Therefore, open-set face recognition is a subject of increasing interest as it deals with identifying individuals in a space where not all faces are known in advance. This is useful in several applications, such as access authentication, on which only a few individuals that have been previously enrolled in a gallery are allowed. The present work introduces a novel approach towards open-set face recognition focusing on small galleries and in enrollment detection, not identity retrieval. A Siamese Network architecture is proposed to learn a model to detect if a face probe is enrolled in the gallery based on a verification-like approach. Promising results were achieved for small galleries on experiments carried out on Pubfig83, FRGCv1 and LFW datasets. State-of-the-art methods like HFCN and HPLS were outperformed on FRGCv1. Besides, a new evaluation protocol is introduced for experiments in small galleries on LFW.
△ Less
Submitted 14 May, 2021;
originally announced May 2021.
-
A Mathematical Construction of an E6 Grand Unified Theory
Authors:
Anthony Britto
Abstract:
Of the five exceptional groups, $\mathrm{E}_6$ is considered the most attractive for unification due to the following reasons: (i) it contains both $\mathrm{Spin} (10) \times \mathrm{U}(1)$ and $\mathrm{SU} (3) \times \mathrm{SU}(3) \times \mathrm{SU}(3)$ as maximal subgroups, each of which admit embeddings of the Standard Model; (ii) uniquely among the exceptional groups, it admits complex repres…
▽ More
Of the five exceptional groups, $\mathrm{E}_6$ is considered the most attractive for unification due to the following reasons: (i) it contains both $\mathrm{Spin} (10) \times \mathrm{U}(1)$ and $\mathrm{SU} (3) \times \mathrm{SU}(3) \times \mathrm{SU}(3)$ as maximal subgroups, each of which admit embeddings of the Standard Model; (ii) uniquely among the exceptional groups, it admits complex representations; in particular, its 27 dimensional fundamental representation accommodates one generation of left-handed fermions under the usual charge assignments; (iii) all of its representations are anomaly-free. In this master's thesis, written in the spirit of Baez and Huerta's "The Algebra of Grand Unified Theories", we rigorously show how an $\mathrm{E}_6$ grand unified theory is mathematically constructed. Our modest contribution to the literature includes an explicit check that that $\mathbb{Z}_4$ kernel of the homomorphism $\mathrm{Spin} (10) \times \mathrm{U}(1) \to \mathrm{E}_6$ acts trivially on every fermion; we also formulate symmetry breaking, in particular the symmetry breaking of the exotic $\mathrm{E}_6$ fermions under $\mathrm{Spin} (10) \to \mathrm{SU}(5)$, using a different approach than the usual Dynkin diagrams: we explicitly embedded $\mathfrak{su}(5) \hookrightarrow \mathfrak{so}(10) \cong \mathfrak{spin} (10)$ and solve the related eigenvalue problem. Phenomenological aspects of grand unified theories are also discussed.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
Machine Learning Methods for Histopathological Image Analysis: A Review
Authors:
Jonathan de Matos,
Steve Tsham Mpinda Ataky,
Alceu de Souza Britto Jr.,
Luiz Eduardo Soares de Oliveira,
Alessandro Lameiras Koerich
Abstract:
Histopathological images (HIs) are the gold standard for evaluating some types of tumors for cancer diagnosis. The analysis of such images is not only time and resource consuming, but also very challenging even for experienced pathologists, resulting in inter- and intra-observer disagreements. One of the ways of accelerating such an analysis is to use computer-aided diagnosis (CAD) systems. In thi…
▽ More
Histopathological images (HIs) are the gold standard for evaluating some types of tumors for cancer diagnosis. The analysis of such images is not only time and resource consuming, but also very challenging even for experienced pathologists, resulting in inter- and intra-observer disagreements. One of the ways of accelerating such an analysis is to use computer-aided diagnosis (CAD) systems. In this paper, we present a review on machine learning methods for histopathological image analysis, including shallow and deep learning methods. We also cover the most common tasks in HI analysis, such as segmentation and feature extraction. In addition, we present a list of publicly available and private datasets that have been used in HI research.
△ Less
Submitted 7 February, 2021;
originally announced February 2021.
-
A New Periocular Dataset Collected by Mobile Devices in Unconstrained Scenarios
Authors:
Luiz A. Zanlorensi,
Rayson Laroca,
Diego R. Lucio,
Lucas R. Santos,
Alceu S. Britto Jr.,
David Menotti
Abstract:
Recently, ocular biometrics in unconstrained environments using images obtained at visible wavelength have gained the researchers' attention, especially with images captured by mobile devices. Periocular recognition has been demonstrated to be an alternative when the iris trait is not available due to occlusions or low image resolution. However, the periocular trait does not have the high uniquene…
▽ More
Recently, ocular biometrics in unconstrained environments using images obtained at visible wavelength have gained the researchers' attention, especially with images captured by mobile devices. Periocular recognition has been demonstrated to be an alternative when the iris trait is not available due to occlusions or low image resolution. However, the periocular trait does not have the high uniqueness presented in the iris trait. Thus, the use of datasets containing many subjects is essential to assess biometric systems' capacity to extract discriminating information from the periocular region. Also, to address the within-class variability caused by lighting and attributes in the periocular region, it is of paramount importance to use datasets with images of the same subject captured in distinct sessions. As the datasets available in the literature do not present all these factors, in this work, we present a new periocular dataset containing samples from 1,122 subjects, acquired in 3 sessions by 196 different mobile devices. The images were captured under unconstrained environments with just a single instruction to the participants: to place their eyes on a region of interest. We also performed an extensive benchmark with several Convolutional Neural Network (CNN) architectures and models that have been employed in state-of-the-art approaches based on Multi-class Classification, Multitask Learning, Pairwise Filters Network, and Siamese Network. The results achieved in the closed- and open-world protocol, considering the identification and verification tasks, show that this area still needs research and development.
△ Less
Submitted 14 November, 2022; v1 submitted 24 November, 2020;
originally announced November 2020.
-
Classifier Pool Generation based on a Two-level Diversity Approach
Authors:
Marcos Monteiro,
Alceu S. Britto Jr,
Jean P. Barddal,
Luiz S. Oliveira,
Robert Sabourin
Abstract:
This paper describes a classifier pool generation method guided by the diversity estimated on the data complexity and classifier decisions. First, the behavior of complexity measures is assessed by considering several subsamples of the dataset. The complexity measures with high variability across the subsamples are selected for posterior pool adaptation, where an evolutionary algorithm optimizes d…
▽ More
This paper describes a classifier pool generation method guided by the diversity estimated on the data complexity and classifier decisions. First, the behavior of complexity measures is assessed by considering several subsamples of the dataset. The complexity measures with high variability across the subsamples are selected for posterior pool adaptation, where an evolutionary algorithm optimizes diversity in both complexity and decision spaces. A robust experimental protocol with 28 datasets and 20 replications is used to evaluate the proposed method. Results show significant accuracy improvements in 69.4% of the experiments when Dynamic Classifier Selection and Dynamic Ensemble Selection methods are applied.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
A Comprehensive Comparison of End-to-End Approaches for Handwritten Digit String Recognition
Authors:
Andre G. Hochuli,
Alceu S. Britto Jr,
David A. Saji,
Jose M. Saavedra,
Robert Sabourin,
Luiz S. Oliveira
Abstract:
Over the last decades, most approaches proposed for handwritten digit string recognition (HDSR) have resorted to digit segmentation, which is dominated by heuristics, thereby imposing substantial constraints on the final performance. Few of them have been based on segmentation-free strategies where each pixel column has a potential cut location. Recently, segmentation-free strategies has added ano…
▽ More
Over the last decades, most approaches proposed for handwritten digit string recognition (HDSR) have resorted to digit segmentation, which is dominated by heuristics, thereby imposing substantial constraints on the final performance. Few of them have been based on segmentation-free strategies where each pixel column has a potential cut location. Recently, segmentation-free strategies has added another perspective to the problem, leading to promising results. However, these strategies still show some limitations when dealing with a large number of touching digits. To bridge the resulting gap, in this paper, we hypothesize that a string of digits can be approached as a sequence of objects. We thus evaluate different end-to-end approaches to solve the HDSR problem, particularly in two verticals: those based on object-detection (e.g., Yolo and RetinaNet) and those based on sequence-to-sequence representation (CRNN). The main contribution of this work lies in its provision of a comprehensive comparison with a critical analysis of the above mentioned strategies on five benchmarks commonly used to assess HDSR, including the challenging Touching Pair dataset, NIST SD19, and two real-world datasets (CAR and CVL) proposed for the ICFHR 2014 competition on HDSR. Our results show that the Yolo model compares favorably against segmentation-free models with the advantage of having a shorter pipeline that minimizes the presence of heuristics-based models. It achieved a 97%, 96%, and 84% recognition rate on the NIST-SD19, CAR, and CVL datasets, respectively.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
Intrapersonal Parameter Optimization for Offline Handwritten Signature Augmentation
Authors:
Teruo M. Maruyama,
Luiz S. Oliveira,
Alceu S. Britto Jr,
Robert Sabourin
Abstract:
Usually, in a real-world scenario, few signature samples are available to train an automatic signature verification system (ASVS). However, such systems do indeed need a lot of signatures to achieve an acceptable performance. Neuromotor signature duplication methods and feature space augmentation methods may be used to meet the need for an increase in the number of samples. Such techniques manuall…
▽ More
Usually, in a real-world scenario, few signature samples are available to train an automatic signature verification system (ASVS). However, such systems do indeed need a lot of signatures to achieve an acceptable performance. Neuromotor signature duplication methods and feature space augmentation methods may be used to meet the need for an increase in the number of samples. Such techniques manually or empirically define a set of parameters to introduce a degree of writer variability. Therefore, in the present study, a method to automatically model the most common writer variability traits is proposed. The method is used to generate offline signatures in the image and the feature space and train an ASVS. We also introduce an alternative approach to evaluate the quality of samples considering their feature vectors. We evaluated the performance of an ASVS with the generated samples using three well-known offline signature datasets: GPDS, MCYT-75, and CEDAR. In GPDS-300, when the SVM classifier was trained using one genuine signature per writer and the duplicates generated in the image space, the Equal Error Rate (EER) decreased from 5.71% to 1.08%. Under the same conditions, the EER decreased to 1.04% using the feature space augmentation technique. We also verified that the model that generates duplicates in the image space reproduces the most common writer variability traits in the three different datasets.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
scikit-dyn2sel -- A Dynamic Selection Framework for Data Streams
Authors:
Lucca Portes Cavalheiro,
Jean Paul Barddal,
Alceu de Souza Britto Jr,
Laurent Heutte
Abstract:
Mining data streams is a challenge per se. It must be ready to deal with an enormous amount of data and with problems not present in batch machine learning, such as concept drift. Therefore, applying a batch-designed technique, such as dynamic selection of classifiers (DCS) also presents a challenge. The dynamic characteristic of ensembles that deal with streams presents barriers to the applicatio…
▽ More
Mining data streams is a challenge per se. It must be ready to deal with an enormous amount of data and with problems not present in batch machine learning, such as concept drift. Therefore, applying a batch-designed technique, such as dynamic selection of classifiers (DCS) also presents a challenge. The dynamic characteristic of ensembles that deal with streams presents barriers to the application of traditional DCS techniques in such classifiers. scikit-dyn2sel is an open-source python library tailored for dynamic selection techniques in streaming data. scikit-dyn2sel's development follows code quality and testing standards, including PEP8 compliance and automated high test coverage using codecov.io and circleci.com. Source code, documentation, and examples are made available on GitHub at https://github.com/luccaportes/Scikit-DYN2SEL.
△ Less
Submitted 17 August, 2020;
originally announced August 2020.
-
A multimodal approach for multi-label movie genre classification
Authors:
Rafael B. Mangolin,
Rodolfo M. Pereira,
Alceu S. Britto Jr.,
Carlos N. Silla Jr.,
Valéria D. Feltrim,
Diego Bertolini,
Yandre M. G. Costa
Abstract:
Movie genre classification is a challenging task that has increasingly attracted the attention of researchers. In this paper, we addressed the multi-label classification of the movie genres in a multimodal way. For this purpose, we created a dataset composed of trailer video clips, subtitles, synopses, and movie posters taken from 152,622 movie titles from The Movie Database. The dataset was caref…
▽ More
Movie genre classification is a challenging task that has increasingly attracted the attention of researchers. In this paper, we addressed the multi-label classification of the movie genres in a multimodal way. For this purpose, we created a dataset composed of trailer video clips, subtitles, synopses, and movie posters taken from 152,622 movie titles from The Movie Database. The dataset was carefully curated and organized, and it was also made available as a contribution of this work. Each movie of the dataset was labeled according to a set of eighteen genre labels. We extracted features from these data using different kinds of descriptors, namely Mel Frequency Cepstral Coefficients, Statistical Spectrum Descriptor , Local Binary Pattern with spectrograms, Long-Short Term Memory, and Convolutional Neural Networks. The descriptors were evaluated using different classifiers, such as BinaryRelevance and ML-kNN. We have also investigated the performance of the combination of different classifiers/features using a late fusion strategy, which obtained encouraging results. Based on the F-Score metric, our best result, 0.628, was obtained by the fusion of a classifier created using LSTM on the synopses, and a classifier created using CNN on movie trailer frames. When considering the AUC-PR metric, the best result, 0.673, was also achieved by combining those representations, but in addition, a classifier based on LSTM created from the subtitles was used. These results corroborate the existence of complementarity among classifiers based on different sources of information in this field of application. As far as we know, this is the most comprehensive study developed in terms of the diversity of multimedia sources of information to perform movie genre classification.
△ Less
Submitted 31 May, 2020;
originally announced June 2020.
-
Two-View Fine-grained Classification of Plant Species
Authors:
Voncarlos M. Araujo,
Alceu S. Britto Jr.,
Luiz E. S. Oliveira,
Alessandro L. Koerich
Abstract:
Automatic plant classification is a challenging problem due to the wide biodiversity of the existing plant species in a fine-grained scenario. Powerful deep learning architectures have been used to improve the classification performance in such a fine-grained problem, but usually building models that are highly dependent on a large training dataset and which are not scalable. In this paper, we pro…
▽ More
Automatic plant classification is a challenging problem due to the wide biodiversity of the existing plant species in a fine-grained scenario. Powerful deep learning architectures have been used to improve the classification performance in such a fine-grained problem, but usually building models that are highly dependent on a large training dataset and which are not scalable. In this paper, we propose a novel method based on a two-view leaf image representation and a hierarchical classification strategy for fine-grained recognition of plant species. It uses the botanical taxonomy as a basis for a coarse-to-fine strategy applied to identify the plant genus and species. The two-view representation provides complementary global and local features of leaf images. A deep metric based on Siamese convolutional neural networks is used to reduce the dependence on a large number of training samples and make the method scalable to new plant species. The experimental results on two challenging fine-grained datasets of leaf images (i.e. LifeCLEF 2015 and LeafSnap) have shown the effectiveness of the proposed method, which achieved recognition accuracy of 0.87 and 0.96 respectively.
△ Less
Submitted 4 October, 2021; v1 submitted 18 May, 2020;
originally announced May 2020.
-
Single-sample writers -- "Document Filter" and their impacts on writer identification
Authors:
Fabio Pinhelli,
Alceu S. Britto Jr,
Luiz S. Oliveira,
Yandre M. G. Costa,
Diego Bertolini
Abstract:
The writing can be used as an important biometric modality which allows to unequivocally identify an individual. It happens because the writing of two different persons present differences that can be explored both in terms of graphometric properties or even by addressing the manuscript as a digital image, taking into account the use of image processing techniques that can properly capture differe…
▽ More
The writing can be used as an important biometric modality which allows to unequivocally identify an individual. It happens because the writing of two different persons present differences that can be explored both in terms of graphometric properties or even by addressing the manuscript as a digital image, taking into account the use of image processing techniques that can properly capture different visual attributes of the image (e.g. texture). In this work, perform a detailed study in which we dissect whether or not the use of a database with only a single sample taken from some writers may skew the results obtained in the experimental protocol. In this sense, we propose here what we call "document filter". The "document filter" protocol is supposed to be used as a preprocessing technique, such a way that all the data taken from fragments of the same document must be placed either into the training or into the test set. The rationale behind it, is that the classifier must capture the features from the writer itself, and not features regarding other particularities which could affect the writing in a specific document (i.e. emotional state of the writer, pen used, paper type, and etc.). By analyzing the literature, one can find several works dealing the writer identification problem. However, the performance of the writer identification systems must be evaluated also taking into account the occurrence of writer volunteers who contributed with a single sample during the creation of the manuscript databases. To address the open issue investigated here, a comprehensive set of experiments was performed on the IAM, BFL and CVL databases. They have shown that, in the most extreme case, the recognition rate obtained using the "document filter" protocol drops from 81.80% to 50.37%.
△ Less
Submitted 17 May, 2020;
originally announced May 2020.
-
An End-to-End Approach for Recognition of Modern and Historical Handwritten Numeral Strings
Authors:
Andre G. Hochuli,
Alceu S. Britto Jr.,
Jean P. Barddal,
Luiz E. S. Oliveira,
Robert Sabourin
Abstract:
An end-to-end solution for handwritten numeral string recognition is proposed, in which the numeral string is considered as composed of objects automatically detected and recognized by a YoLo-based model. The main contribution of this paper is to avoid heuristic-based methods for string preprocessing and segmentation, the need for task-oriented classifiers, and also the use of specific constraints…
▽ More
An end-to-end solution for handwritten numeral string recognition is proposed, in which the numeral string is considered as composed of objects automatically detected and recognized by a YoLo-based model. The main contribution of this paper is to avoid heuristic-based methods for string preprocessing and segmentation, the need for task-oriented classifiers, and also the use of specific constraints related to the string length. A robust experimental protocol based on several numeral string datasets, including one composed of historical documents, has shown that the proposed method is a feasible end-to-end solution for numeral string recognition. Besides, it reduces the complexity of the string recognition task considerably since it drops out classical steps, in special preprocessing, segmentation, and a set of classifiers devoted to strings with a specific length.
△ Less
Submitted 28 March, 2020;
originally announced April 2020.
-
CNN Hyperparameter tuning applied to Iris Liveness Detection
Authors:
Gabriela Y. Kimura,
Diego R. Lucio,
Alceu S. Britto Jr.,
David Menotti
Abstract:
The iris pattern has significantly improved the biometric recognition field due to its high level of stability and uniqueness. Such physical feature has played an important role in security and other related areas. However, presentation attacks, also known as spoofing techniques, can be used to bypass the biometric system with artifacts such as printed images, artificial eyes, and textured contact…
▽ More
The iris pattern has significantly improved the biometric recognition field due to its high level of stability and uniqueness. Such physical feature has played an important role in security and other related areas. However, presentation attacks, also known as spoofing techniques, can be used to bypass the biometric system with artifacts such as printed images, artificial eyes, and textured contact lenses. To improve the security of these systems, many liveness detection methods have been proposed, and the first Internacional Iris Liveness Detection competition was launched in 2013 to evaluate their effectiveness. In this paper, we propose a hyperparameter tuning of the CASIA algorithm, submitted by the Chinese Academy of Sciences to the third competition of Iris Liveness Detection, in 2017. The modifications proposed promoted an overall improvement, with an 8.48% Attack Presentation Classification Error Rate (APCER) and 0.18% Bonafide Presentation Classification Error Rate (BPCER) for the evaluation of the combined datasets. Other threshold values were evaluated in an attempt to reduce the trade-off between the APCER and the BPCER on the evaluated datasets and worked out successfully.
△ Less
Submitted 12 February, 2020;
originally announced March 2020.
-
Data Augmentation for Histopathological Images Based on Gaussian-Laplacian Pyramid Blending
Authors:
Steve Tsham Mpinda Ataky,
Jonathan de Matos,
Alceu de S. Britto Jr.,
Luiz E. S. Oliveira,
Alessandro L. Koerich
Abstract:
Data imbalance is a major problem that affects several machine learning (ML) algorithms. Such a problem is troublesome because most of the ML algorithms attempt to optimize a loss function that does not take into account the data imbalance. Accordingly, the ML algorithm simply generates a trivial model that is biased toward predicting the most frequent class in the training data. In the case of hi…
▽ More
Data imbalance is a major problem that affects several machine learning (ML) algorithms. Such a problem is troublesome because most of the ML algorithms attempt to optimize a loss function that does not take into account the data imbalance. Accordingly, the ML algorithm simply generates a trivial model that is biased toward predicting the most frequent class in the training data. In the case of histopathologic images (HIs), both low-level and high-level data augmentation (DA) techniques still present performance issues when applied in the presence of inter-patient variability; whence the model tends to learn color representations, which is related to the staining process. In this paper, we propose a novel approach capable of not only augmenting HI dataset but also distributing the inter-patient variability by means of image blending using the Gaussian-Laplacian pyramid. The proposed approach consists of finding the Gaussian pyramids of two images of different patients and finding the Laplacian pyramids thereof. Afterwards, the left-half side and the right-half side of different HIs are joined in each level of the Laplacian pyramid, and from the joint pyramids, the original image is reconstructed. This composition combines the stain variation of two patients, avoiding that color differences mislead the learning process. Experimental results on the BreakHis dataset have shown promising gains vis-a-vis the majority of DA techniques presented in the literature.
△ Less
Submitted 16 May, 2020; v1 submitted 31 January, 2020;
originally announced February 2020.
-
Continuous Emotion Recognition via Deep Convolutional Autoencoder and Support Vector Regressor
Authors:
Sevegni Odilon Clement Allognon,
Alessandro L. Koerich,
Alceu de S. Britto Jr
Abstract:
Automatic facial expression recognition is an important research area in the emotion recognition and computer vision. Applications can be found in several domains such as medical treatment, driver fatigue surveillance, sociable robotics, and several other human-computer interaction systems. Therefore, it is crucial that the machine should be able to recognize the emotional state of the user with h…
▽ More
Automatic facial expression recognition is an important research area in the emotion recognition and computer vision. Applications can be found in several domains such as medical treatment, driver fatigue surveillance, sociable robotics, and several other human-computer interaction systems. Therefore, it is crucial that the machine should be able to recognize the emotional state of the user with high accuracy. In recent years, deep neural networks have been used with great success in recognizing emotions. In this paper, we present a new model for continuous emotion recognition based on facial expression recognition by using an unsupervised learning approach based on transfer learning and autoencoders. The proposed approach also includes preprocessing and post-processing techniques which contribute favorably to improving the performance of predicting the concordance correlation coefficient for arousal and valence dimensions. Experimental results for predicting spontaneous and natural emotions on the RECOLA 2016 dataset have shown that the proposed approach based on visual information can achieve CCCs of 0.516 and 0.264 for valence and arousal, respectively.
△ Less
Submitted 31 January, 2020;
originally announced January 2020.
-
Ocular Recognition Databases and Competitions: A Survey
Authors:
Luiz A. Zanlorensi,
Rayson Laroca,
Eduardo Luz,
Alceu S. Britto Jr.,
Luiz S. Oliveira,
David Menotti
Abstract:
The use of the iris and periocular region as biometric traits has been extensively investigated, mainly due to the singularity of the iris features and the use of the periocular region when the image resolution is not sufficient to extract iris information. In addition to providing information about an individual's identity, features extracted from these traits can also be explored to obtain other…
▽ More
The use of the iris and periocular region as biometric traits has been extensively investigated, mainly due to the singularity of the iris features and the use of the periocular region when the image resolution is not sufficient to extract iris information. In addition to providing information about an individual's identity, features extracted from these traits can also be explored to obtain other information such as the individual's gender, the influence of drug use, the use of contact lenses, spoofing, among others. This work presents a survey of the databases created for ocular recognition, detailing their protocols and how their images were acquired. We also describe and discuss the most popular ocular recognition competitions (contests), highlighting the submitted algorithms that achieved the best results using only iris trait and also fusing iris and periocular region information. Finally, we describe some relevant works applying deep learning techniques to ocular recognition and point out new challenges and future directions. Considering that there are a large number of ocular databases, and each one is usually designed for a specific problem, we believe this survey can provide a broad overview of the challenges in ocular biometrics.
△ Less
Submitted 4 February, 2022; v1 submitted 21 November, 2019;
originally announced November 2019.
-
Deep Representations for Cross-spectral Ocular Biometrics
Authors:
Luiz A. Zanlorensi,
Diego R. Lucio,
Alceu S. Britto Jr.,
Hugo Proença,
David Menotti
Abstract:
One of the major challenges in ocular biometrics is the cross-spectral scenario, i.e., how to match images acquired in different wavelengths (typically visible (VIS) against near-infrared (NIR)). This article designs and extensively evaluates cross-spectral ocular verification methods, for both the closed and open-world settings, using well known deep learning representations based on the iris and…
▽ More
One of the major challenges in ocular biometrics is the cross-spectral scenario, i.e., how to match images acquired in different wavelengths (typically visible (VIS) against near-infrared (NIR)). This article designs and extensively evaluates cross-spectral ocular verification methods, for both the closed and open-world settings, using well known deep learning representations based on the iris and periocular regions. Using as inputs the bounding boxes of non-normalized iris/periocular regions, we fine-tune Convolutional Neural Network(CNN) models (based either on VGG16 or ResNet-50 architectures), originally trained for face recognition. Based on the experiments carried out in two publicly available cross-spectral ocular databases, we report results for intra-spectral and cross-spectral scenarios, with the best performance being observed when fusing ResNet-50 deep representations from both the periocular and iris regions. When compared to the state-of-the-art, we observed that the proposed solution consistently reduces the Equal Error Rate(EER) values by 90% / 93% / 96% and 61% / 77% / 83% on the cross-spectral scenario and in the PolyU Bi-spectral and Cross-eye-cross-spectral datasets. Lastly, we evaluate the effect that the "deepness" factor of feature representations has in recognition effectiveness, and - based on a subjective analysis of the most problematic pairwise comparisons - we point out further directions for this field of research.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
Cross-Representation Transferability of Adversarial Attacks: From Spectrograms to Audio Waveforms
Authors:
Karl Michel Koerich,
Mohammad Esmaeilpour,
Sajjad Abdoli,
Alceu de Souza Britto Jr.,
Alessandro Lameiras Koerich
Abstract:
This paper shows the susceptibility of spectrogram-based audio classifiers to adversarial attacks and the transferability of such attacks to audio waveforms. Some commonly used adversarial attacks to images have been applied to Mel-frequency and short-time Fourier transform spectrograms, and such perturbed spectrograms are able to fool a 2D convolutional neural network (CNN). Such attacks produce…
▽ More
This paper shows the susceptibility of spectrogram-based audio classifiers to adversarial attacks and the transferability of such attacks to audio waveforms. Some commonly used adversarial attacks to images have been applied to Mel-frequency and short-time Fourier transform spectrograms, and such perturbed spectrograms are able to fool a 2D convolutional neural network (CNN). Such attacks produce perturbed spectrograms that are visually imperceptible by humans. Furthermore, the audio waveforms reconstructed from the perturbed spectrograms are also able to fool a 1D CNN trained on the original audio. Experimental results on a dataset of western music have shown that the 2D CNN achieves up to 81.87% of mean accuracy on legitimate examples and such performance drops to 12.09% on adversarial examples. Likewise, the 1D CNN achieves up to 78.29% of mean accuracy on original audio samples and such performance drops to 27.91% on adversarial audio waveforms reconstructed from the perturbed spectrograms.
△ Less
Submitted 29 July, 2020; v1 submitted 22 October, 2019;
originally announced October 2019.
-
Style Transfer Applied to Face Liveness Detection with User-Centered Models
Authors:
Israel A. Laurensi R.,
Luciana T. Menon,
Manoel Camillo O. Penna N.,
Alessandro L. Koerich,
Alceu S. Britto Jr
Abstract:
This paper proposes a face anti-spoofing user-centered model (FAS-UCM). The major difficulty, in this case, is obtaining fraudulent images from all users to train the models. To overcome this problem, the proposed method is divided in three main parts: generation of new spoof images, based on style transfer and spoof image representation models; training of a Convolutional Neural Network (CNN) for…
▽ More
This paper proposes a face anti-spoofing user-centered model (FAS-UCM). The major difficulty, in this case, is obtaining fraudulent images from all users to train the models. To overcome this problem, the proposed method is divided in three main parts: generation of new spoof images, based on style transfer and spoof image representation models; training of a Convolutional Neural Network (CNN) for liveness detection; evaluation of the live and spoof testing images for each subject. The generalization of the CNN to perform style transfer has shown promising qualitative results. Preliminary results have shown that the proposed method is capable of distinguishing between live and spoof images on the SiW database, with an average classification error rate of 0.22.
△ Less
Submitted 16 July, 2019;
originally announced July 2019.
-
Image Retrieval and Pattern Spotting using Siamese Neural Network
Authors:
Kelly L. Wiggers,
Alceu S. Britto Jr.,
Laurent Heutte,
Alessandro L. Koerich,
Luiz S. Oliveira
Abstract:
This paper presents a novel approach for image retrieval and pattern spotting in document image collections. The manual feature engineering is avoided by learning a similarity-based representation using a Siamese Neural Network trained on a previously prepared subset of image pairs from the ImageNet dataset. The learned representation is used to provide the similarity-based feature maps used to fi…
▽ More
This paper presents a novel approach for image retrieval and pattern spotting in document image collections. The manual feature engineering is avoided by learning a similarity-based representation using a Siamese Neural Network trained on a previously prepared subset of image pairs from the ImageNet dataset. The learned representation is used to provide the similarity-based feature maps used to find relevant image candidates in the data collection given an image query. A robust experimental protocol based on the public Tobacco800 document image collection shows that the proposed method compares favorably against state-of-the-art document image retrieval methods, reaching 0.94 and 0.83 of mean average precision (mAP) for retrieval and pattern spotting (IoU=0.7), respectively. Besides, we have evaluated the proposed method considering feature maps of different sizes, showing the impact of reducing the number of features in the retrieval performance and time-consuming.
△ Less
Submitted 22 June, 2019;
originally announced June 2019.
-
Memory Integrity of CNNs for Cross-Dataset Facial Expression Recognition
Authors:
Dylan C. Tannugi,
Alceu S. Britto Jr.,
Alessandro L. Koerich
Abstract:
Facial expression recognition is a major problem in the domain of artificial intelligence. One of the best ways to solve this problem is the use of convolutional neural networks (CNNs). However, a large amount of data is required to train properly these networks but most of the datasets available for facial expression recognition are relatively small. A common way to circumvent the lack of data is…
▽ More
Facial expression recognition is a major problem in the domain of artificial intelligence. One of the best ways to solve this problem is the use of convolutional neural networks (CNNs). However, a large amount of data is required to train properly these networks but most of the datasets available for facial expression recognition are relatively small. A common way to circumvent the lack of data is to use CNNs trained on large datasets of different domains and fine-tuning the layers of such networks to the target domain. However, the fine-tuning process does not preserve the memory integrity as CNNs have the tendency to forget patterns they have learned. In this paper, we evaluate different strategies of fine-tuning a CNN with the aim of assessing the memory integrity of such strategies in a cross-dataset scenario. A CNN pre-trained on a source dataset is used as the baseline and four adaptation strategies have been evaluated: fine-tuning its fully connected layers; fine-tuning its last convolutional layer and its fully connected layers; retraining the CNN on a target dataset; and the fusion of the source and target datasets and retraining the CNN. Experimental results on four datasets have shown that the fusion of the source and the target datasets provides the best trade-off between accuracy and memory integrity.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
Texture CNN for Histopathological Image Classification
Authors:
Jonathan de Matos,
Alceu de S. Britto Jr.,
Luiz E. S. de Oliveira,
Alessandro L. Koerich
Abstract:
Biopsies are the gold standard for breast cancer diagnosis. This task can be improved by the use of Computer Aided Diagnosis (CAD) systems, reducing the time of diagnosis and reducing the inter and intra-observer variability. The advances in computing have brought this type of system closer to reality. However, datasets of Histopathological Images (HI) from biopsies are quite small and unbalanced…
▽ More
Biopsies are the gold standard for breast cancer diagnosis. This task can be improved by the use of Computer Aided Diagnosis (CAD) systems, reducing the time of diagnosis and reducing the inter and intra-observer variability. The advances in computing have brought this type of system closer to reality. However, datasets of Histopathological Images (HI) from biopsies are quite small and unbalanced what makes difficult to use modern machine learning techniques such as deep learning. In this paper we propose a compact architecture based on texture filters that has fewer parameters than traditional deep models but is able to capture the difference between malignant and benign tissues with relative accuracy. The experimental results on the BreakHis dataset have show that the proposed texture CNN achieves almost 90% of accuracy for classifying benign and malignant tissues.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
Texture CNN for Thermoelectric Metal Pipe Image Classification
Authors:
Daniel Vriesman,
Alessandro Zimmer,
Alceu S. Britto Jr.,
Alessandro L. Koerich
Abstract:
In this paper, the concept of representation learning based on deep neural networks is applied as an alternative to the use of handcrafted features in a method for automatic visual inspection of corroded thermoelectric metallic pipes. A texture convolutional neural network (TCNN) replaces handcrafted features based on Local Phase Quantization (LPQ) and Haralick descriptors (HD) with the advantage…
▽ More
In this paper, the concept of representation learning based on deep neural networks is applied as an alternative to the use of handcrafted features in a method for automatic visual inspection of corroded thermoelectric metallic pipes. A texture convolutional neural network (TCNN) replaces handcrafted features based on Local Phase Quantization (LPQ) and Haralick descriptors (HD) with the advantage of learning an appropriate textural representation and the decision boundaries into a single optimization process. Experimental results have shown that it is possible to reach the accuracy of 99.20% in the task of identifying different levels of corrosion in the internal surface of thermoelectric pipe walls, while using a compact network that requires much less effort in tuning parameters when compared to the handcrafted approach since the TCNN architecture is compact regarding the number of layers and connections. The observed results open up the possibility of using deep neural networks in real-time applications such as the automatic inspection of thermoelectric metal pipes.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
A Novel Orthogonal Direction Mesh Adaptive Direct Search Approach for SVM Hyperparameter Tuning
Authors:
Alexandre Reeberg Mello,
Jonathan de Matos,
Marcelo R. Stemmer,
Alceu de Souza Britto Jr.,
Alessandro Lameiras Koerich
Abstract:
In this paper, we propose the use of a black-box optimization method called deterministic Mesh Adaptive Direct Search (MADS) algorithm with orthogonal directions (Ortho-MADS) for the selection of hyperparameters of Support Vector Machines with a Gaussian kernel. Different from most of the methods in the literature that exploit the properties of the data or attempt to minimize the accuracy of a val…
▽ More
In this paper, we propose the use of a black-box optimization method called deterministic Mesh Adaptive Direct Search (MADS) algorithm with orthogonal directions (Ortho-MADS) for the selection of hyperparameters of Support Vector Machines with a Gaussian kernel. Different from most of the methods in the literature that exploit the properties of the data or attempt to minimize the accuracy of a validation dataset over the first quadrant of (C, gamma), the Ortho-MADS provides convergence proof. We present the MADS, followed by the Ortho-MADS, the dynamic stop** criterion defined by the MADS mesh size and two different search strategies (Nelder-Mead and Variable Neighborhood Search) that contribute to a competitive convergence rate as well as a mechanism to escape from undesired local minima. We have investigated the practical selection of hyperparameters for the Support Vector Machine with a Gaussian kernel, i.e., properly choose the hyperparameters gamma (bandwidth) and C (trade-off) on several benchmark datasets. The experimental results have shown that the proposed approach for hyperparameter tuning consistently finds comparable or better solutions, when using a common configuration, than other methods. We have also evaluated the accuracy and the number of function evaluations of the Ortho-MADS with the Nelder-Mead search strategy and the Variable Neighborhood Search strategy using the mesh size as a stop** criterion, and we have achieved accuracy that no other method for hyperparameters optimization could reach.
△ Less
Submitted 25 April, 2019;
originally announced April 2019.
-
Histopathologic Image Processing: A Review
Authors:
Jonathan de Matos,
Alceu de Souza Britto Jr.,
Luiz E. S. Oliveira,
Alessandro L. Koerich
Abstract:
Histopathologic Images (HI) are the gold standard for evaluation of some tumors. However, the analysis of such images is challenging even for experienced pathologists, resulting in problems of inter and intra observer. Besides that, the analysis is time and resource consuming. One of the ways to accelerate such an analysis is by using Computer Aided Diagnosis systems. In this work we present a lit…
▽ More
Histopathologic Images (HI) are the gold standard for evaluation of some tumors. However, the analysis of such images is challenging even for experienced pathologists, resulting in problems of inter and intra observer. Besides that, the analysis is time and resource consuming. One of the ways to accelerate such an analysis is by using Computer Aided Diagnosis systems. In this work we present a literature review about the computing techniques to process HI, including shallow and deep methods. We cover the most common tasks for processing HI such as segmentation, feature extraction, unsupervised learning and supervised learning. A dataset section show some datasets found during the literature review. We also bring a study case of breast cancer classification using a mix of deep and shallow machine learning methods. The proposed method obtained an accuracy of 91% in the best case, outperforming the compared baseline of the dataset.
△ Less
Submitted 16 April, 2019;
originally announced April 2019.
-
Double Transfer Learning for Breast Cancer Histopathologic Image Classification
Authors:
Jonathan de Matos,
Alceu de S. Britto Jr.,
Luiz E. S. Oliveira,
Alessandro L. Koerich
Abstract:
This work proposes a classification approach for breast cancer histopathologic images (HI) that uses transfer learning to extract features from HI using an Inception-v3 CNN pre-trained with ImageNet dataset. We also use transfer learning on training a support vector machine (SVM) classifier on a tissue labeled colorectal cancer dataset aiming to filter the patches from a breast cancer HI and remov…
▽ More
This work proposes a classification approach for breast cancer histopathologic images (HI) that uses transfer learning to extract features from HI using an Inception-v3 CNN pre-trained with ImageNet dataset. We also use transfer learning on training a support vector machine (SVM) classifier on a tissue labeled colorectal cancer dataset aiming to filter the patches from a breast cancer HI and remove the irrelevant ones. We show that removing irrelevant patches before training a second SVM classifier, improves the accuracy for classifying malign and benign tumors on breast cancer images. We are able to improve the classification accuracy in 3.7% using the feature extraction transfer learning and an additional 0.7% using the irrelevant patch elimination. The proposed approach outperforms the state-of-the-art in three out of the four magnification factors of the breast cancer dataset.
△ Less
Submitted 16 April, 2019;
originally announced April 2019.
-
Multi-label Classification of User Reactions in Online News
Authors:
Zacarias Curi,
Alceu de Souza Britto Jr,
Emerson Cabrera Paraiso
Abstract:
The increase in the number of Internet users and the strong interaction brought by Web 2.0 made the Opinion Mining an important task in the area of natural language processing. Although several methods are capable of performing this task, few use multi-label classification, where there is a group of true labels for each example. This type of classification is useful for situations where the opinio…
▽ More
The increase in the number of Internet users and the strong interaction brought by Web 2.0 made the Opinion Mining an important task in the area of natural language processing. Although several methods are capable of performing this task, few use multi-label classification, where there is a group of true labels for each example. This type of classification is useful for situations where the opinions are analyzed from the perspective of the reader, this happens because each person can have different interpretations and opinions on the same subject. This paper discuss the efficiency of problem transformation methods combined with different classification algorithms for the task of multi-label classification of reactions in news texts. To do that, extensive tests were carried out on two news corpora written in Brazilian Portuguese annotated with reactions. A new corpus called BFRC-PT is presented. In the tests performed, the highest number of correct predictions was obtained with the Classifier Chains method combined with the Random Forest algorithm. When considering the class distribution, the best results were obtained with the Binary Relevance method combined with the LSTM and Random Forest algorithms.
△ Less
Submitted 27 November, 2018; v1 submitted 8 September, 2018;
originally announced September 2018.
-
Robust Iris Segmentation Based on Fully Convolutional Networks and Generative Adversarial Networks
Authors:
Cides S. Bezerra,
Rayson Laroca,
Diego R. Lucio,
Evair Severo,
Lucas F. Oliveira,
Alceu S. Britto Jr.,
David Menotti
Abstract:
The iris can be considered as one of the most important biometric traits due to its high degree of uniqueness. Iris-based biometrics applications depend mainly on the iris segmentation whose suitability is not robust for different environments such as near-infrared (NIR) and visible (VIS) ones. In this paper, two approaches for robust iris segmentation based on Fully Convolutional Networks (FCNs)…
▽ More
The iris can be considered as one of the most important biometric traits due to its high degree of uniqueness. Iris-based biometrics applications depend mainly on the iris segmentation whose suitability is not robust for different environments such as near-infrared (NIR) and visible (VIS) ones. In this paper, two approaches for robust iris segmentation based on Fully Convolutional Networks (FCNs) and Generative Adversarial Networks (GANs) are described. Similar to a common convolutional network, but without the fully connected layers (i.e., the classification layers), an FCN employs at its end a combination of pooling layers from different convolutional layers. Based on the game theory, a GAN is designed as two networks competing with each other to generate the best segmentation. The proposed segmentation networks achieved promising results in all evaluated datasets (i.e., BioSec, CasiaI3, CasiaT4, IITD-1) of NIR images and (NICE.I, CrEye-Iris and MICHE-I) of VIS images in both non-cooperative and cooperative domains, outperforming the baselines techniques which are the best ones found so far in the literature, i.e., a new state of the art for these datasets. Furthermore, we manually labeled 2,431 images from CasiaT4, CrEye-Iris and MICHE-I datasets, making the masks available for research purposes.
△ Less
Submitted 3 September, 2018;
originally announced September 2018.
-
The Impact of Preprocessing on Deep Representations for Iris Recognition on Unconstrained Environments
Authors:
Luiz A. Zanlorensi,
Eduardo Luz,
Rayson Laroca,
Alceu S. Britto Jr.,
Luiz S. Oliveira,
David Menotti
Abstract:
The use of iris as a biometric trait is widely used because of its high level of distinction and uniqueness. Nowadays, one of the major research challenges relies on the recognition of iris images obtained in visible spectrum under unconstrained environments. In this scenario, the acquired iris are affected by capture distance, rotation, blur, motion blur, low contrast and specular reflection, cre…
▽ More
The use of iris as a biometric trait is widely used because of its high level of distinction and uniqueness. Nowadays, one of the major research challenges relies on the recognition of iris images obtained in visible spectrum under unconstrained environments. In this scenario, the acquired iris are affected by capture distance, rotation, blur, motion blur, low contrast and specular reflection, creating noises that disturb the iris recognition systems. Besides delineating the iris region, usually preprocessing techniques such as normalization and segmentation of noisy iris images are employed to minimize these problems. But these techniques inevitably run into some errors. In this context, we propose the use of deep representations, more specifically, architectures based on VGG and ResNet-50 networks, for dealing with the images using (and not) iris segmentation and normalization. We use transfer learning from the face domain and also propose a specific data augmentation technique for iris images. Our results show that the approach using non-normalized and only circle-delimited iris images reaches a new state of the art in the official protocol of the NICE.II competition, a subset of the UBIRIS database, one of the most challenging databases on unconstrained environments, reporting an average Equal Error Rate (EER) of 13.98% which represents an absolute reduction of about 5%.
△ Less
Submitted 29 August, 2018;
originally announced August 2018.
-
Fully Convolutional Networks and Generative Adversarial Networks Applied to Sclera Segmentation
Authors:
Diego R. Lucio,
Rayson Laroca,
Evair Severo,
Alceu S. Britto Jr.,
David Menotti
Abstract:
Due to the world's demand for security systems, biometrics can be seen as an important topic of research in computer vision. One of the biometric forms that has been gaining attention is the recognition based on sclera. The initial and paramount step for performing this type of recognition is the segmentation of the region of interest, i.e. the sclera. In this context, two approaches for such task…
▽ More
Due to the world's demand for security systems, biometrics can be seen as an important topic of research in computer vision. One of the biometric forms that has been gaining attention is the recognition based on sclera. The initial and paramount step for performing this type of recognition is the segmentation of the region of interest, i.e. the sclera. In this context, two approaches for such task based on the Fully Convolutional Network (FCN) and on Generative Adversarial Network (GAN) are introduced in this work. FCN is similar to a common convolution neural network, however the fully connected layers (i.e., the classification layers) are removed from the end of the network and the output is generated by combining the output of pooling layers from different convolutional ones. The GAN is based on the game theory, where we have two networks competing with each other to generate the best segmentation. In order to perform fair comparison with baselines and quantitative and objective evaluations of the proposed approaches, we provide to the scientific community new 1,300 manually segmented images from two databases. The experiments are performed on the UBIRIS.v2 and MICHE databases and the best performing configurations of our propositions achieved F-score's measures of 87.48% and 88.32%, respectively.
△ Less
Submitted 9 July, 2018; v1 submitted 22 June, 2018;
originally announced June 2018.
-
Segmentation-Free Approaches for Handwritten Numeral String Recognition
Authors:
Andre G Hochuli,
Luiz E S Oliveira,
Alceu S Britto Jr,
Robert Sabourin
Abstract:
This paper presents segmentation-free strategies for the recognition of handwritten numeral strings of unknown length. A synthetic dataset of touching numeral strings of sizes 2-, 3- and 4-digits was created to train end-to-end solutions based on Convolutional Neural Networks. A robust experimental protocol is used to show that the proposed segmentation-free methods may reach the state-of-the-art…
▽ More
This paper presents segmentation-free strategies for the recognition of handwritten numeral strings of unknown length. A synthetic dataset of touching numeral strings of sizes 2-, 3- and 4-digits was created to train end-to-end solutions based on Convolutional Neural Networks. A robust experimental protocol is used to show that the proposed segmentation-free methods may reach the state-of-the-art performance without suffering the heavy burden of over-segmentation based methods. In addition, they confirmed the importance of introducing contextual information in the design of end-to-end solutions, such as the proposed length classifier when recognizing numeral strings.
△ Less
Submitted 27 April, 2018; v1 submitted 24 April, 2018;
originally announced April 2018.
-
Fermion propagator in an external potential and generalized Airy functions
Authors:
A. L. M. Britto,
Ashok K. Das,
J. Frenkel
Abstract:
We study the behavior of the fermion propagator in an external time dependent potential in 0+1 dimension. We show that, when the potential has upto quadratic terms in time, the propagator can be expressed in terms of generalized Airy functions (or standard Airy functions depending on the exact time dependence). We study various properties of these new generalized functions which reduce to the stan…
▽ More
We study the behavior of the fermion propagator in an external time dependent potential in 0+1 dimension. We show that, when the potential has upto quadratic terms in time, the propagator can be expressed in terms of generalized Airy functions (or standard Airy functions depending on the exact time dependence). We study various properties of these new generalized functions which reduce to the standard Airy functions in a particular limit.
△ Less
Submitted 24 May, 2017;
originally announced May 2017.
-
People Counting in Crowded and Outdoor Scenes using a Hybrid Multi-Camera Approach
Authors:
Fabio Dittrich,
Luiz E. S. de Oliveira,
Alceu S. Britto Jr.,
Alessandro L. Koerich
Abstract:
This paper presents two novel approaches for people counting in crowded and open environments that combine the information gathered by multiple views. Multiple camera are used to expand the field of view as well as to mitigate the problem of occlusion that commonly affects the performance of counting methods using single cameras. The first approach is regarded as a direct approach and it attempts…
▽ More
This paper presents two novel approaches for people counting in crowded and open environments that combine the information gathered by multiple views. Multiple camera are used to expand the field of view as well as to mitigate the problem of occlusion that commonly affects the performance of counting methods using single cameras. The first approach is regarded as a direct approach and it attempts to segment and count each individual in the crowd. For such an aim, two head detectors trained with head images are employed: one based on support vector machines and another based on Adaboost perceptron. The second approach, regarded as an indirect approach employs learning algorithms and statistical analysis on the whole crowd to achieve counting. For such an aim, corner points are extracted from groups of people in a foreground image and computed by a learning algorithm which estimates the number of people in the scene. Both approaches count the number of people on the scene and not only on a given image or video frame of the scene. The experimental results obtained on the benchmark PETS2009 video dataset show that proposed indirect method surpasses other methods with improvements of up to 46.7% and provides accurate counting results for the crowded scenes. On the other hand, the direct method shows high error rates due to the fact that the latter has much more complex problems to solve, such as segmentation of heads.
△ Less
Submitted 8 May, 2017; v1 submitted 2 April, 2017;
originally announced April 2017.
-
Generalized Kadanoff-Baym relation in nonequilibrium quenched models
Authors:
A. L. M. Britto,
Ashok K. Das,
J. Frenkel
Abstract:
In the context of a broad class of quenched models, we derive a generalized differential form of the Kadanoff-Baym (KB) ansatz which relates the out of equilibrium correlated and spectral Green's functions. This relation holds at any time both before the quench (when it coincides with the fluctuation-dissipation theorem) as well as after it. We also examine, in the context of exactly soluble quenc…
▽ More
In the context of a broad class of quenched models, we derive a generalized differential form of the Kadanoff-Baym (KB) ansatz which relates the out of equilibrium correlated and spectral Green's functions. This relation holds at any time both before the quench (when it coincides with the fluctuation-dissipation theorem) as well as after it. We also examine, in the context of exactly soluble quenched models, the validity of some of the earlier alternative extensions of the KB ansatz.
△ Less
Submitted 24 May, 2016;
originally announced May 2016.
-
Generalized fluctuation-dissipation theorem in a soluble out of equilibrium model
Authors:
A. L. M. Britto,
Ashok K. Das,
J. Frenkel
Abstract:
In the context of an exactly soluble out of equilibrium (quenched) model, we study an extension of the fluctuation-dissipation relation. This involves a modified differential form of this relation, with an effective temperature which may have an explicit dependence on time scales.
In the context of an exactly soluble out of equilibrium (quenched) model, we study an extension of the fluctuation-dissipation relation. This involves a modified differential form of this relation, with an effective temperature which may have an explicit dependence on time scales.
△ Less
Submitted 23 July, 2015; v1 submitted 3 July, 2015;
originally announced July 2015.