-
Portuguese FAQ for Financial Services
Authors:
Paulo Finardi,
Wanderley M. Melo,
Edgard D. Medeiros Neto,
Alex F. Mansano,
Pablo B. Costa,
Vinicius F. Caridá
Abstract:
Scarcity of domain-specific data in the Portuguese financial domain has disfavored the development of Natural Language Processing (NLP) applications. To address this limitation, the present study advocates for the utilization of synthetic data generated through data augmentation techniques. The investigation focuses on the augmentation of a dataset sourced from the Central Bank of Brazil FAQ, empl…
▽ More
Scarcity of domain-specific data in the Portuguese financial domain has disfavored the development of Natural Language Processing (NLP) applications. To address this limitation, the present study advocates for the utilization of synthetic data generated through data augmentation techniques. The investigation focuses on the augmentation of a dataset sourced from the Central Bank of Brazil FAQ, employing techniques that vary in semantic similarity. Supervised and unsupervised tasks are conducted to evaluate the impact of augmented data on both low and high semantic similarity scenarios. Additionally, the resultant dataset will be publicly disseminated on the Hugging Face Datasets platform, thereby enhancing accessibility and fostering broader engagement within the NLP research community.
△ Less
Submitted 19 November, 2023;
originally announced November 2023.
-
Improving the matching of deformable objects by learning to detect keypoints
Authors:
Felipe Cadar,
Welerson Melo,
Vaishnavi Kanagasabapathi,
Guilherme Potje,
Renato Martins,
Erickson R. Nascimento
Abstract:
We propose a novel learned keypoint detection method to increase the number of correct matches for the task of non-rigid image correspondence. By leveraging true correspondences acquired by matching annotated image pairs with a specified descriptor extractor, we train an end-to-end convolutional neural network (CNN) to find keypoint locations that are more appropriate to the considered descriptor.…
▽ More
We propose a novel learned keypoint detection method to increase the number of correct matches for the task of non-rigid image correspondence. By leveraging true correspondences acquired by matching annotated image pairs with a specified descriptor extractor, we train an end-to-end convolutional neural network (CNN) to find keypoint locations that are more appropriate to the considered descriptor. For that, we apply geometric and photometric war**s to images to generate a supervisory signal, allowing the optimization of the detector. Experiments demonstrate that our method enhances the Mean Matching Accuracy of numerous descriptors when used in conjunction with our detection method, while outperforming the state-of-the-art keypoint detectors on real images of non-rigid objects by 20 p.p. We also apply our method on the complex real-world task of object retrieval where our detector performs on par with the finest keypoint detectors currently available for this task. The source code and trained models are publicly available at https://github.com/verlab/LearningToDetect_PRL_2023
△ Less
Submitted 12 September, 2023; v1 submitted 1 September, 2023;
originally announced September 2023.
-
Feature point detection in HDR images based on coefficient of variation
Authors:
Artur Santos Nascimento,
Welerson Augusto Lino de Jesus Melo,
Daniel Oliveira Dantas,
Beatriz Trinchão Andrade
Abstract:
Feature point (FP) detection is a fundamental step of many computer vision tasks. However, FP detectors are usually designed for low dynamic range (LDR) images. In scenes with extreme light conditions, LDR images present saturated pixels, which degrade FP detection. On the other hand, high dynamic range (HDR) images usually present no saturated pixels but FP detection algorithms do not take advant…
▽ More
Feature point (FP) detection is a fundamental step of many computer vision tasks. However, FP detectors are usually designed for low dynamic range (LDR) images. In scenes with extreme light conditions, LDR images present saturated pixels, which degrade FP detection. On the other hand, high dynamic range (HDR) images usually present no saturated pixels but FP detection algorithms do not take advantage of all the information present in such images. FP detection frequently relies on differential methods, which work well in LDR images. However, in HDR images, the differential operation response in bright areas overshadows the response in dark areas. As an alternative to standard FP detection methods, this study proposes an FP detector based on a coefficient of variation (CV) designed for HDR images. The CV operation adapts its response based on the standard deviation of pixels inside a window, working well in both dark and bright areas of HDR images. The proposed and standard detectors are evaluated by measuring their repeatability rate (RR) and uniformity. Our proposed detector shows better performance when compared to other standard state-of-the-art detectors. In uniformity metric, our proposed detector surpasses all the other algorithms. In other hand, when using the repeatability rate metric, the proposed detector is worse than Harris for HDR and SURF detectors.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Learning to Detect Good Keypoints to Match Non-Rigid Objects in RGB Images
Authors:
Welerson Melo,
Guilherme Potje,
Felipe Cadar,
Renato Martins,
Erickson R. Nascimento
Abstract:
We present a novel learned keypoint detection method designed to maximize the number of correct matches for the task of non-rigid image correspondence. Our training framework uses true correspondences, obtained by matching annotated image pairs with a predefined descriptor extractor, as a ground-truth to train a convolutional neural network (CNN). We optimize the model architecture by applying kno…
▽ More
We present a novel learned keypoint detection method designed to maximize the number of correct matches for the task of non-rigid image correspondence. Our training framework uses true correspondences, obtained by matching annotated image pairs with a predefined descriptor extractor, as a ground-truth to train a convolutional neural network (CNN). We optimize the model architecture by applying known geometric transformations to images as the supervisory signal. Experiments show that our method outperforms the state-of-the-art keypoint detector on real images of non-rigid objects by 20 p.p. on Mean Matching Accuracy and also improves the matching performance of several descriptors when coupled with our detection method. We also employ the proposed method in one challenging realworld application: object retrieval, where our detector exhibits performance on par with the best available keypoint detectors. The source code and trained model are publicly available at https://github.com/verlab/LearningToDetect SIBGRAPI 2022
△ Less
Submitted 13 December, 2022;
originally announced December 2022.
-
A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
Authors:
Gnana Praveen Rajasekar,
Wheidima Carneiro de Melo,
Nasib Ullah,
Haseeb Aslam,
Osama Zeeshan,
Théo Denorme,
Marco Pedersoli,
Alessandro Koerich,
Simon Bacon,
Patrick Cardinal,
Eric Granger
Abstract:
Multimodal emotion recognition has recently gained much attention since it can leverage diverse and complementary relationships over multiple modalities (e.g., audio, visual, biosignals, etc.), and can provide some robustness to noisy modalities. Most state-of-the-art methods for audio-visual (A-V) fusion rely on recurrent networks or conventional attention mechanisms that do not effectively lever…
▽ More
Multimodal emotion recognition has recently gained much attention since it can leverage diverse and complementary relationships over multiple modalities (e.g., audio, visual, biosignals, etc.), and can provide some robustness to noisy modalities. Most state-of-the-art methods for audio-visual (A-V) fusion rely on recurrent networks or conventional attention mechanisms that do not effectively leverage the complementary nature of A-V modalities. In this paper, we focus on dimensional emotion recognition based on the fusion of facial and vocal modalities extracted from videos. Specifically, we propose a joint cross-attention model that relies on the complementary relationships to extract the salient features across A-V modalities, allowing for accurate prediction of continuous values of valence and arousal. The proposed fusion model efficiently leverages the inter-modal relationships, while reducing the heterogeneity between the features. In particular, it computes the cross-attention weights based on correlation between the combined feature representation and individual modalities. By deploying the combined A-V feature representation into the cross-attention module, the performance of our fusion module improves significantly over the vanilla cross-attention module. Experimental results on validation-set videos from the AffWild2 dataset indicate that our proposed A-V fusion model provides a cost-effective solution that can outperform state-of-the-art approaches. The code is available on GitHub: https://github.com/praveena2j/JointCrossAttentional-AV-Fusion.
△ Less
Submitted 20 April, 2022; v1 submitted 28 March, 2022;
originally announced March 2022.
-
Facial Expression Analysis Using Decomposed Multiscale Spatiotemporal Networks
Authors:
Wheidima Carneiro de Melo,
Eric Granger,
Miguel Bordallo Lopez
Abstract:
Video-based analysis of facial expressions has been increasingly applied to infer health states of individuals, such as depression and pain. Among the existing approaches, deep learning models composed of structures for multiscale spatiotemporal processing have shown strong potential for encoding facial dynamics. However, such models have high computational complexity, making for a difficult deplo…
▽ More
Video-based analysis of facial expressions has been increasingly applied to infer health states of individuals, such as depression and pain. Among the existing approaches, deep learning models composed of structures for multiscale spatiotemporal processing have shown strong potential for encoding facial dynamics. However, such models have high computational complexity, making for a difficult deployment of these solutions. To address this issue, we introduce a new technique to decompose the extraction of multiscale spatiotemporal features. Particularly, a building block structure called Decomposed Multiscale Spatiotemporal Network (DMSN) is presented along with three variants: DMSN-A, DMSN-B, and DMSN-C blocks. The DMSN-A block generates multiscale representations by analyzing spatiotemporal features at multiple temporal ranges, while the DMSN-B block analyzes spatiotemporal features at multiple ranges, and the DMSN-C block analyzes spatiotemporal features at multiple spatial sizes. Using these variants, we design our DMSN architecture which has the ability to explore a variety of multiscale spatiotemporal features, favoring the adaptation to different facial behaviors. Our extensive experiments on challenging datasets show that the DMSN-C block is effective for depression detection, whereas the DMSN-A block is efficient for pain estimation. Results also indicate that our DMSN architecture provides a cost-effective solution for expressions that range from fewer facial variations over time, as in depression detection, to greater variations, as in pain estimation.
△ Less
Submitted 21 March, 2022;
originally announced March 2022.
-
THE ADELE-TEMPO experience : an environment to support process modeling and enaction
Authors:
Noureddine Belkhatir,
Jacky Estublier,
Walcelio Melo
Abstract:
Process-Centered Software Engineering Environments (PSEE) have recently attracted a large number of researchers. In such environments the software processes are explicitly described and interpreted by the PSEE, allowing software activities to be automated, assisted, and enforced. Lehman and Belady (1985) & Osterweil (1987) claim that this capability is a central element in a software development e…
▽ More
Process-Centered Software Engineering Environments (PSEE) have recently attracted a large number of researchers. In such environments the software processes are explicitly described and interpreted by the PSEE, allowing software activities to be automated, assisted, and enforced. Lehman and Belady (1985) & Osterweil (1987) claim that this capability is a central element in a software development environment for the improvement of software product quality and software developers productivity. We have addressed these problems in the framework of the Adele project. The Adele kernel, initially a configuration management system has been extended with respect to 1) modeling and support of complex product models: the Object Manager 2) modeling and support of software processes: the Activity Manager and 3) modeling and support of software product evolution: the Configuration Manager. For data and product modelling, an ER/OO model has been implemented including SEE specific features; On top of Adele kernel, which is a commercial product, we developed a Process Manager research prototype, Tempo, an enactable formalism based on two major concepts: objects may have a different description (role) depending on the process in which they are used, and processes are synchronized and coordinated by explicit connections. ADL-Tempo is organized around the concepts of software product, Work Environment and software process.
△ Less
Submitted 21 May, 2020;
originally announced May 2020.