Search | arXiv e-print repository

VWise: A novel benchmark for evaluating scene classification for vehicular applications

Authors: Pedro Azevedo, Emanuella Araújo, Gabriel Pierre, Willams de Lima Costa, João Marcelo Teixeira, Valter Ferreira, Roberto Jones, Veronica Teichrieb

Abstract: Current datasets for vehicular applications are mostly collected in North America or Europe. Models trained or evaluated on these datasets might suffer from geographical bias when deployed in other regions. Specifically, for scene classification, a highway in a Latin American country differs drastically from an Autobahn, for example, both in design and maintenance levels. We propose VWise, a novel… ▽ More Current datasets for vehicular applications are mostly collected in North America or Europe. Models trained or evaluated on these datasets might suffer from geographical bias when deployed in other regions. Specifically, for scene classification, a highway in a Latin American country differs drastically from an Autobahn, for example, both in design and maintenance levels. We propose VWise, a novel benchmark for road-type classification and scene classification tasks, in addition to tasks focused on external contexts related to vehicular applications in LatAm. We collected over 520 video clips covering diverse urban and rural environments across Latin American countries, annotated with six classes of road types. We also evaluated several state-of-the-art classification models in baseline experiments, obtaining over 84% accuracy. With this dataset, we aim to enhance research on vehicular tasks in Latin America. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.13903 [pdf, other]

ST-Gait++: Leveraging spatio-temporal convolutions for gait-based emotion recognition on videos

Authors: Maria Luísa Lima, Willams de Lima Costa, Estefania Talavera Martinez, Veronica Teichrieb

Abstract: Emotion recognition is relevant for human behaviour understanding, where facial expression and speech recognition have been widely explored by the computer vision community. Literature in the field of behavioural psychology indicates that gait, described as the way a person walks, is an additional indicator of emotions. In this work, we propose a deep framework for emotion recognition through the… ▽ More Emotion recognition is relevant for human behaviour understanding, where facial expression and speech recognition have been widely explored by the computer vision community. Literature in the field of behavioural psychology indicates that gait, described as the way a person walks, is an additional indicator of emotions. In this work, we propose a deep framework for emotion recognition through the analysis of gait. More specifically, our model is composed of a sequence of spatial-temporal Graph Convolutional Networks that produce a robust skeleton-based representation for the task of emotion classification. We evaluate our proposed framework on the E-Gait dataset, composed of a total of 2177 samples. The results obtained represent an improvement of approximately 5% in accuracy compared to the state of the art. In addition, during training we observed a faster convergence of our model compared to the state-of-the-art methodologies. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: Accepted for publication in the LXCV Workshop @ CVPR 2024

arXiv:2401.08686 [pdf, other]

Attention Modules Improve Modern Image-Level Anomaly Detection: A DifferNet Case Study

Authors: André Luiz B. Vieira e Silva, Francisco Simões, Danny Kowerko, Tobias Schlosser, Felipe Battisti, Veronica Teichrieb

Abstract: Within (semi-)automated visual inspection, learning-based approaches for assessing visual defects, including deep neural networks, enable the processing of otherwise small defect patterns in pixel size on high-resolution imagery. The emergence of these often rarely occurring defect patterns explains the general need for labeled data corpora. To not only alleviate this issue but to furthermore adva… ▽ More Within (semi-)automated visual inspection, learning-based approaches for assessing visual defects, including deep neural networks, enable the processing of otherwise small defect patterns in pixel size on high-resolution imagery. The emergence of these often rarely occurring defect patterns explains the general need for labeled data corpora. To not only alleviate this issue but to furthermore advance the current state of the art in unsupervised visual inspection, this contribution proposes a DifferNet-based solution enhanced with attention modules utilizing SENet and CBAM as backbone - AttentDifferNet - to improve the detection and classification capabilities on three different visual inspection and anomaly detection datasets: MVTec AD, InsPLAD-fault, and Semiconductor Wafer. In comparison to the current state of the art, it is shown that AttentDifferNet achieves improved results, which are, in turn, highlighted throughout our quantitative as well as qualitative evaluation, indicated by a general improvement in AUC of 94.34 vs. 92.46, 96.67 vs. 94.69, and 90.20 vs. 88.74%. As our variants to AttentDifferNet show great prospects in the context of currently investigated approaches, a baseline is formulated, emphasizing the importance of attention for anomaly detection. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: Accepted to CVPRW 2023: VISION'23 - 1st workshop on Vision-based InduStrial InspectiON (Extended Abstract). arXiv admin note: substantial text overlap with arXiv:2311.02747

arXiv:2311.11980 [pdf, other]

Leveraging Previous Facial Action Units Knowledge for Emotion Recognition on Faces

Authors: Pietro B. S. Masur, Willams Costa, Lucas S. Figueredo, Veronica Teichrieb

Abstract: People naturally understand emotions, thus permitting a machine to do the same could open new paths for human-computer interaction. Facial expressions can be very useful for emotion recognition techniques, as these are the biggest transmitters of non-verbal cues capable of being correlated with emotions. Several techniques are based on Convolutional Neural Networks (CNNs) to extract information in… ▽ More People naturally understand emotions, thus permitting a machine to do the same could open new paths for human-computer interaction. Facial expressions can be very useful for emotion recognition techniques, as these are the biggest transmitters of non-verbal cues capable of being correlated with emotions. Several techniques are based on Convolutional Neural Networks (CNNs) to extract information in a machine learning process. However, simple CNNs are not always sufficient to locate points of interest on the face that can be correlated with emotions. In this work, we intend to expand the capacity of emotion recognition techniques by proposing the usage of Facial Action Units (AUs) recognition techniques to recognize emotions. This recognition will be based on the Facial Action Coding System (FACS) and computed by a machine learning system. In particular, our method expands over EmotiRAM, an approach for multi-cue emotion recognition, in which we improve over their facial encoding module. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.02747 [pdf, other]

Attention Modules Improve Image-Level Anomaly Detection for Industrial Inspection: A DifferNet Case Study

Authors: André Luiz Buarque Vieira e Silva, Francisco Simões, Danny Kowerko, Tobias Schlosser, Felipe Battisti, Veronica Teichrieb

Abstract: Within (semi-)automated visual industrial inspection, learning-based approaches for assessing visual defects, including deep neural networks, enable the processing of otherwise small defect patterns in pixel size on high-resolution imagery. The emergence of these often rarely occurring defect patterns explains the general need for labeled data corpora. To alleviate this issue and advance the curre… ▽ More Within (semi-)automated visual industrial inspection, learning-based approaches for assessing visual defects, including deep neural networks, enable the processing of otherwise small defect patterns in pixel size on high-resolution imagery. The emergence of these often rarely occurring defect patterns explains the general need for labeled data corpora. To alleviate this issue and advance the current state of the art in unsupervised visual inspection, this work proposes a DifferNet-based solution enhanced with attention modules: AttentDifferNet. It improves image-level detection and classification capabilities on three visual anomaly detection datasets for industrial inspection: InsPLAD-fault, MVTec AD, and Semiconductor Wafer. In comparison to the state of the art, AttentDifferNet achieves improved results, which are, in turn, highlighted throughout our quali-quantitative study. Our quantitative evaluation shows an average improvement - compared to DifferNet - of 1.77 +/- 0.25 percentage points in overall AUROC considering all three datasets, reaching SOTA results in InsPLAD-fault, an industrial inspection in-the-wild dataset. As our variants to AttentDifferNet show great prospects in the context of currently investigated approaches, a baseline is formulated, emphasizing the importance of attention for industrial anomaly detection both in the wild and in controlled environments. △ Less

Submitted 7 November, 2023; v1 submitted 5 November, 2023; originally announced November 2023.

Comments: Accepted at WACV 2024

arXiv:2311.01619 [pdf, other]

doi 10.1080/01431161.2023.2283900

InsPLAD: A Dataset and Benchmark for Power Line Asset Inspection in UAV Images

Authors: André Luiz Buarque Vieira e Silva, Heitor de Castro Felix, Franscisco Paulo Magalhães Simões, Veronica Teichrieb, Michel Mozinho dos Santos, Hemir Santiago, Virginia Sgotti, Henrique Lott Neto

Abstract: Power line maintenance and inspection are essential to avoid power supply interruptions, reducing its high social and financial impacts yearly. Automating power line visual inspections remains a relevant open problem for the industry due to the lack of public real-world datasets of power line components and their various defects to foster new research. This paper introduces InsPLAD, a Power Line A… ▽ More Power line maintenance and inspection are essential to avoid power supply interruptions, reducing its high social and financial impacts yearly. Automating power line visual inspections remains a relevant open problem for the industry due to the lack of public real-world datasets of power line components and their various defects to foster new research. This paper introduces InsPLAD, a Power Line Asset Inspection Dataset and Benchmark containing 10,607 high-resolution Unmanned Aerial Vehicles colour images. The dataset contains seventeen unique power line assets captured from real-world operating power lines. Additionally, five of those assets present six defects: four of which are corrosion, one is a broken component, and one is a bird's nest presence. All assets were labelled according to their condition, whether normal or the defect name found on an image level. We thoroughly evaluate state-of-the-art and popular methods for three image-level computer vision tasks covered by InsPLAD: object detection, through the AP metric; defect classification, through Balanced Accuracy; and anomaly detection, through the AUROC metric. InsPLAD offers various vision challenges from uncontrolled environments, such as multi-scale objects, multi-size class instances, multiple objects per image, intra-class variation, cluttered background, distinct point-of-views, perspective distortion, occlusion, and varied lighting conditions. To the best of our knowledge, InsPLAD is the first large real-world dataset and benchmark for power line asset inspection with multiple components and defects for various computer vision tasks, with a potential impact to improve state-of-the-art methods in the field. It will be publicly available in its integrity on a repository with a thorough description. It can be found at https://github.com/andreluizbvs/InsPLAD. △ Less

Submitted 3 December, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: This is an original manuscript of an article published by Taylor & Francis in the International Journal of Remote Sensing on 29 Nov 2023, available online: https://doi.org/10.1080/01431161.2023.2283900

arXiv:2308.04515 [pdf, other]

Toward unlabeled multi-view 3D pedestrian detection by generalizable AI: techniques and performance analysis

Authors: João Paulo Lima, Diego Thomas, Hideaki Uchiyama, Veronica Teichrieb

Abstract: We unveil how generalizable AI can be used to improve multi-view 3D pedestrian detection in unlabeled target scenes. One way to increase generalization to new scenes is to automatically label target data, which can then be used for training a detector model. In this context, we investigate two approaches for automatically labeling target data: pseudo-labeling using a supervised detector and automa… ▽ More We unveil how generalizable AI can be used to improve multi-view 3D pedestrian detection in unlabeled target scenes. One way to increase generalization to new scenes is to automatically label target data, which can then be used for training a detector model. In this context, we investigate two approaches for automatically labeling target data: pseudo-labeling using a supervised detector and automatic labeling using an untrained detector (that can be applied out of the box without any training). We adopt a training framework for optimizing detector models using automatic labeling procedures. This framework encompasses different training sets/modes and multi-round automatic labeling strategies. We conduct our analyses on the publicly-available WILDTRACK and MultiviewX datasets. We show that, by using the automatic labeling approach based on an untrained detector, we can obtain superior results than directly using the untrained detector or a detector trained with an existing labeled source dataset. It achieved a MODA about 4% and 1% better than the best existing unlabeled method when using WILDTRACK and MultiviewX as target datasets, respectively. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: Accepted to SIBGRAPI 2023

arXiv:2305.03500 [pdf, other]

High-Level Context Representation for Emotion Recognition in Images

Authors: Willams de Lima Costa, Estefania Talavera Martinez, Lucas Silva Figueiredo, Veronica Teichrieb

Abstract: Emotion recognition is the task of classifying perceived emotions in people. Previous works have utilized various nonverbal cues to extract features from images and correlate them to emotions. Of these cues, situational context is particularly crucial in emotion perception since it can directly influence the emotion of a person. In this paper, we propose an approach for high-level context represen… ▽ More Emotion recognition is the task of classifying perceived emotions in people. Previous works have utilized various nonverbal cues to extract features from images and correlate them to emotions. Of these cues, situational context is particularly crucial in emotion perception since it can directly influence the emotion of a person. In this paper, we propose an approach for high-level context representation extraction from images. The model relies on a single cue and a single encoding stream to correlate this representation with emotions. Our model competes with the state-of-the-art, achieving an mAP of 0.3002 on the EMOTIC dataset while also being capable of execution on consumer-grade hardware at approximately 90 frames per second. Overall, our approach is more efficient than previous models and can be easily deployed to address real-world problems related to emotion recognition. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: Accepted for publication at LXAI @ CVPR 2023

arXiv:2111.02273 [pdf, other]

Multi-Cue Adaptive Emotion Recognition Network

Authors: Willams Costa, David Macêdo, Cleber Zanchettin, Lucas S. Figueiredo, Veronica Teichrieb

Abstract: Expressing and identifying emotions through facial and physical expressions is a significant part of social interaction. Emotion recognition is an essential task in computer vision due to its various applications and mainly for allowing a more natural interaction between humans and machines. The common approaches for emotion recognition focus on analyzing facial expressions and requires the automa… ▽ More Expressing and identifying emotions through facial and physical expressions is a significant part of social interaction. Emotion recognition is an essential task in computer vision due to its various applications and mainly for allowing a more natural interaction between humans and machines. The common approaches for emotion recognition focus on analyzing facial expressions and requires the automatic localization of the face in the image. Although these methods can correctly classify emotion in controlled scenarios, such techniques are limited when dealing with unconstrained daily interactions. We propose a new deep learning approach for emotion recognition based on adaptive multi-cues that extract information from context and body poses, which humans commonly use in social interaction and communication. We compare the proposed approach with the state-of-art approaches in the CAER-S dataset, evaluating different components in a pipeline that reached an accuracy of 89.30% △ Less

Submitted 14 December, 2021; v1 submitted 3 November, 2021; originally announced November 2021.

arXiv:2108.07944 [pdf, other]

STN PLAD: A Dataset for Multi-Size Power Line Assets Detection in High-Resolution UAV Images

Authors: André Luiz Buarque Vieira-e-Silva, Heitor Felix, Thiago de Menezes Chaves, Francisco Paulo Magalhães Simões, Veronica Teichrieb, Michel Mozinho dos Santos, Hemir da Cunha Santiago, Virginia Adélia Cordeiro Sgotti, Henrique Baptista Duffles Teixeira Lott Neto

Abstract: Many power line companies are using UAVs to perform their inspection processes instead of putting their workers at risk by making them climb high voltage power line towers, for instance. A crucial task for the inspection is to detect and classify assets in the power transmission lines. However, public data related to power line assets are scarce, preventing a faster evolution of this area. This wo… ▽ More Many power line companies are using UAVs to perform their inspection processes instead of putting their workers at risk by making them climb high voltage power line towers, for instance. A crucial task for the inspection is to detect and classify assets in the power transmission lines. However, public data related to power line assets are scarce, preventing a faster evolution of this area. This work proposes the Power Line Assets Dataset, containing high-resolution and real-world images of multiple high-voltage power line components. It has 2,409 annotated objects divided into five classes: transmission tower, insulator, spacer, tower plate, and Stockbridge damper, which vary in size (resolution), orientation, illumination, angulation, and background. This work also presents an evaluation with popular deep object detection methods, showing considerable room for improvement. The STN PLAD dataset is publicly available at https://github.com/andreluizbvs/PLAD. △ Less

Submitted 2 September, 2021; v1 submitted 17 August, 2021; originally announced August 2021.

Comments: Accepted for presentation at SIBGRAPI 2021

arXiv:2105.01677 [pdf, other]

doi 10.1016/j.cpc.2020.107572

A fluid simulation system based on the MPS method

Authors: André Luiz Buarque Vieira-e-Silva, Caio José dos Santos Brito, Francisco Paulo Magalhães Simões, Veronica Teichrieb

Abstract: Fluid flow simulation is a highly active area with applications in a wide range of engineering problems and interactive systems. Meshless methods like the Moving Particle Semi-implicit (MPS) are a great alternative to deal efficiently with large deformations and free-surface flow. However, mesh-based approaches can achieve higher numerical precision than particle-based techniques with a performanc… ▽ More Fluid flow simulation is a highly active area with applications in a wide range of engineering problems and interactive systems. Meshless methods like the Moving Particle Semi-implicit (MPS) are a great alternative to deal efficiently with large deformations and free-surface flow. However, mesh-based approaches can achieve higher numerical precision than particle-based techniques with a performance cost. This paper presents a numerically stable and parallelized system that benefits from advances in the literature and parallel computing to obtain an adaptable MPS method. The proposed technique can simulate liquids using different approaches, such as two ways to calculate the particles' pressure, turbulent flow, and multiphase interaction. The method is evaluated under traditional test cases presenting comparable results to recent techniques. This work integrates the previously mentioned advances into a single solution, which can switch on improvements, such as better momentum conservation and less spurious pressure oscillations, through a graphical interface. The code is entirely open-source under the GPLv3 free software license. The GPU-accelerated code reached speedups ranging from 3 to 43 times, depending on the total number of particles. The simulation runs at one fps for a case with approximately 200,000 particles. Code: https://github.com/andreluizbvs/VoxarMPS △ Less

Submitted 19 August, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

Journal ref: Computer Physics Communications 258 (2021): 107572

arXiv:2104.05813 [pdf, other]

Generalizable Multi-Camera 3D Pedestrian Detection

Authors: João Paulo Lima, Rafael Roberto, Lucas Figueiredo, Francisco Simões, Veronica Teichrieb

Abstract: We present a multi-camera 3D pedestrian detection method that does not need to train using data from the target scene. We estimate pedestrian location on the ground plane using a novel heuristic based on human body poses and person's bounding boxes from an off-the-shelf monocular detector. We then project these locations onto the world ground plane and fuse them with a new formulation of a clique… ▽ More We present a multi-camera 3D pedestrian detection method that does not need to train using data from the target scene. We estimate pedestrian location on the ground plane using a novel heuristic based on human body poses and person's bounding boxes from an off-the-shelf monocular detector. We then project these locations onto the world ground plane and fuse them with a new formulation of a clique cover problem. We also propose an optional step for exploiting pedestrian appearance during fusion by using a domain-generalizable person re-identification model. We evaluated the proposed approach on the challenging WILDTRACK dataset. It obtained a MODA of 0.569 and an F-score of 0.78, superior to state-of-the-art generalizable detection techniques. △ Less

Submitted 12 April, 2021; originally announced April 2021.

Comments: Accepted to CVPRW 2021, LatinX in Computer Vision (LXCV) Workshop

arXiv:2003.13586 [pdf, other]

doi 10.1109/IJCNN48605.2020.9207459

Squeezed Deep 6DoF Object Detection Using Knowledge Distillation

Authors: Heitor Felix, Walber M. Rodrigues, David Macêdo, Francisco Simões, Adriano L. I. Oliveira, Veronica Teichrieb, Cleber Zanchettin

Abstract: The detection of objects considering a 6DoF pose is a common requirement to build virtual and augmented reality applications. It is usually a complex task which requires real-time processing and high precision results for adequate user experience. Recently, different deep learning techniques have been proposed to detect objects in 6DoF in RGB images. However, they rely on high complexity networks,… ▽ More The detection of objects considering a 6DoF pose is a common requirement to build virtual and augmented reality applications. It is usually a complex task which requires real-time processing and high precision results for adequate user experience. Recently, different deep learning techniques have been proposed to detect objects in 6DoF in RGB images. However, they rely on high complexity networks, requiring a computational power that prevents them from working on mobile devices. In this paper, we propose an approach to reduce the complexity of 6DoF detection networks while maintaining accuracy. We used Knowledge Distillation to teach portables Convolutional Neural Networks (CNN) to learn from a real-time 6DoF detection CNN. The proposed method allows real-time applications using only RGB images while decreasing the hardware requirements. We used the LINEMOD dataset to evaluate the proposed method, and the experimental results show that the proposed method reduces the memory requirement by almost 99\% in comparison to the original architecture with the cost of reducing half the accuracy in one of the metrics. Code is available at https://github.com/heitorcfelix/singleshot6Dpose. △ Less

Submitted 29 May, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

Comments: This paper was accepted by 2020 International Joint Conference on Neural Networks (IJCNN)

MSC Class: 68-04

Journal ref: 2020 International Joint Conference on Neural Networks (IJCNN)

arXiv:0902.2187 [pdf]

A Standalone Markerless 3D Tracker for Handheld Augmented Reality

Authors: Joao Paulo Lima, Veronica Teichrieb, Judith Kelner

Abstract: This paper presents an implementation of a markerless tracking technique targeted to the Windows Mobile Pocket PC platform. The primary aim of this work is to allow the development of standalone augmented reality applications for handheld devices based on natural feature tracking. In order to achieve this goal, a subset of two computer vision libraries was ported to the Pocket PC platform. They… ▽ More This paper presents an implementation of a markerless tracking technique targeted to the Windows Mobile Pocket PC platform. The primary aim of this work is to allow the development of standalone augmented reality applications for handheld devices based on natural feature tracking. In order to achieve this goal, a subset of two computer vision libraries was ported to the Pocket PC platform. They were also adapted to use fixed point math, with the purpose of improving the overall performance of the routines. The port of these libraries opens up the possibility of having other computer vision tasks being executed on mobile platforms. A model based tracking approach that relies on edge information was adopted. Since it does not require a high processing power, it is suitable for constrained devices such as handhelds. The OpenGL ES graphics library was used to perform computer vision tasks, taking advantage of existing graphics hardware acceleration. An augmented reality application was created using the implemented technique and evaluations were done regarding tracking performance and accuracy △ Less

Submitted 12 February, 2009; originally announced February 2009.

Showing 1–14 of 14 results for author: Teichrieb, V