-
VWise: A novel benchmark for evaluating scene classification for vehicular applications
Authors:
Pedro Azevedo,
Emanuella Araújo,
Gabriel Pierre,
Willams de Lima Costa,
João Marcelo Teixeira,
Valter Ferreira,
Roberto Jones,
Veronica Teichrieb
Abstract:
Current datasets for vehicular applications are mostly collected in North America or Europe. Models trained or evaluated on these datasets might suffer from geographical bias when deployed in other regions. Specifically, for scene classification, a highway in a Latin American country differs drastically from an Autobahn, for example, both in design and maintenance levels. We propose VWise, a novel…
▽ More
Current datasets for vehicular applications are mostly collected in North America or Europe. Models trained or evaluated on these datasets might suffer from geographical bias when deployed in other regions. Specifically, for scene classification, a highway in a Latin American country differs drastically from an Autobahn, for example, both in design and maintenance levels. We propose VWise, a novel benchmark for road-type classification and scene classification tasks, in addition to tasks focused on external contexts related to vehicular applications in LatAm. We collected over 520 video clips covering diverse urban and rural environments across Latin American countries, annotated with six classes of road types. We also evaluated several state-of-the-art classification models in baseline experiments, obtaining over 84% accuracy. With this dataset, we aim to enhance research on vehicular tasks in Latin America.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
ST-Gait++: Leveraging spatio-temporal convolutions for gait-based emotion recognition on videos
Authors:
Maria Luísa Lima,
Willams de Lima Costa,
Estefania Talavera Martinez,
Veronica Teichrieb
Abstract:
Emotion recognition is relevant for human behaviour understanding, where facial expression and speech recognition have been widely explored by the computer vision community. Literature in the field of behavioural psychology indicates that gait, described as the way a person walks, is an additional indicator of emotions. In this work, we propose a deep framework for emotion recognition through the…
▽ More
Emotion recognition is relevant for human behaviour understanding, where facial expression and speech recognition have been widely explored by the computer vision community. Literature in the field of behavioural psychology indicates that gait, described as the way a person walks, is an additional indicator of emotions. In this work, we propose a deep framework for emotion recognition through the analysis of gait. More specifically, our model is composed of a sequence of spatial-temporal Graph Convolutional Networks that produce a robust skeleton-based representation for the task of emotion classification. We evaluate our proposed framework on the E-Gait dataset, composed of a total of 2177 samples. The results obtained represent an improvement of approximately 5% in accuracy compared to the state of the art. In addition, during training we observed a faster convergence of our model compared to the state-of-the-art methodologies.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Attention Modules Improve Modern Image-Level Anomaly Detection: A DifferNet Case Study
Authors:
André Luiz B. Vieira e Silva,
Francisco Simões,
Danny Kowerko,
Tobias Schlosser,
Felipe Battisti,
Veronica Teichrieb
Abstract:
Within (semi-)automated visual inspection, learning-based approaches for assessing visual defects, including deep neural networks, enable the processing of otherwise small defect patterns in pixel size on high-resolution imagery. The emergence of these often rarely occurring defect patterns explains the general need for labeled data corpora. To not only alleviate this issue but to furthermore adva…
▽ More
Within (semi-)automated visual inspection, learning-based approaches for assessing visual defects, including deep neural networks, enable the processing of otherwise small defect patterns in pixel size on high-resolution imagery. The emergence of these often rarely occurring defect patterns explains the general need for labeled data corpora. To not only alleviate this issue but to furthermore advance the current state of the art in unsupervised visual inspection, this contribution proposes a DifferNet-based solution enhanced with attention modules utilizing SENet and CBAM as backbone - AttentDifferNet - to improve the detection and classification capabilities on three different visual inspection and anomaly detection datasets: MVTec AD, InsPLAD-fault, and Semiconductor Wafer. In comparison to the current state of the art, it is shown that AttentDifferNet achieves improved results, which are, in turn, highlighted throughout our quantitative as well as qualitative evaluation, indicated by a general improvement in AUC of 94.34 vs. 92.46, 96.67 vs. 94.69, and 90.20 vs. 88.74%. As our variants to AttentDifferNet show great prospects in the context of currently investigated approaches, a baseline is formulated, emphasizing the importance of attention for anomaly detection.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Leveraging Previous Facial Action Units Knowledge for Emotion Recognition on Faces
Authors:
Pietro B. S. Masur,
Willams Costa,
Lucas S. Figueredo,
Veronica Teichrieb
Abstract:
People naturally understand emotions, thus permitting a machine to do the same could open new paths for human-computer interaction. Facial expressions can be very useful for emotion recognition techniques, as these are the biggest transmitters of non-verbal cues capable of being correlated with emotions. Several techniques are based on Convolutional Neural Networks (CNNs) to extract information in…
▽ More
People naturally understand emotions, thus permitting a machine to do the same could open new paths for human-computer interaction. Facial expressions can be very useful for emotion recognition techniques, as these are the biggest transmitters of non-verbal cues capable of being correlated with emotions. Several techniques are based on Convolutional Neural Networks (CNNs) to extract information in a machine learning process. However, simple CNNs are not always sufficient to locate points of interest on the face that can be correlated with emotions. In this work, we intend to expand the capacity of emotion recognition techniques by proposing the usage of Facial Action Units (AUs) recognition techniques to recognize emotions. This recognition will be based on the Facial Action Coding System (FACS) and computed by a machine learning system. In particular, our method expands over EmotiRAM, an approach for multi-cue emotion recognition, in which we improve over their facial encoding module.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Attention Modules Improve Image-Level Anomaly Detection for Industrial Inspection: A DifferNet Case Study
Authors:
André Luiz Buarque Vieira e Silva,
Francisco Simões,
Danny Kowerko,
Tobias Schlosser,
Felipe Battisti,
Veronica Teichrieb
Abstract:
Within (semi-)automated visual industrial inspection, learning-based approaches for assessing visual defects, including deep neural networks, enable the processing of otherwise small defect patterns in pixel size on high-resolution imagery. The emergence of these often rarely occurring defect patterns explains the general need for labeled data corpora. To alleviate this issue and advance the curre…
▽ More
Within (semi-)automated visual industrial inspection, learning-based approaches for assessing visual defects, including deep neural networks, enable the processing of otherwise small defect patterns in pixel size on high-resolution imagery. The emergence of these often rarely occurring defect patterns explains the general need for labeled data corpora. To alleviate this issue and advance the current state of the art in unsupervised visual inspection, this work proposes a DifferNet-based solution enhanced with attention modules: AttentDifferNet. It improves image-level detection and classification capabilities on three visual anomaly detection datasets for industrial inspection: InsPLAD-fault, MVTec AD, and Semiconductor Wafer. In comparison to the state of the art, AttentDifferNet achieves improved results, which are, in turn, highlighted throughout our quali-quantitative study. Our quantitative evaluation shows an average improvement - compared to DifferNet - of 1.77 +/- 0.25 percentage points in overall AUROC considering all three datasets, reaching SOTA results in InsPLAD-fault, an industrial inspection in-the-wild dataset. As our variants to AttentDifferNet show great prospects in the context of currently investigated approaches, a baseline is formulated, emphasizing the importance of attention for industrial anomaly detection both in the wild and in controlled environments.
△ Less
Submitted 7 November, 2023; v1 submitted 5 November, 2023;
originally announced November 2023.
-
InsPLAD: A Dataset and Benchmark for Power Line Asset Inspection in UAV Images
Authors:
André Luiz Buarque Vieira e Silva,
Heitor de Castro Felix,
Franscisco Paulo Magalhães Simões,
Veronica Teichrieb,
Michel Mozinho dos Santos,
Hemir Santiago,
Virginia Sgotti,
Henrique Lott Neto
Abstract:
Power line maintenance and inspection are essential to avoid power supply interruptions, reducing its high social and financial impacts yearly. Automating power line visual inspections remains a relevant open problem for the industry due to the lack of public real-world datasets of power line components and their various defects to foster new research. This paper introduces InsPLAD, a Power Line A…
▽ More
Power line maintenance and inspection are essential to avoid power supply interruptions, reducing its high social and financial impacts yearly. Automating power line visual inspections remains a relevant open problem for the industry due to the lack of public real-world datasets of power line components and their various defects to foster new research. This paper introduces InsPLAD, a Power Line Asset Inspection Dataset and Benchmark containing 10,607 high-resolution Unmanned Aerial Vehicles colour images. The dataset contains seventeen unique power line assets captured from real-world operating power lines. Additionally, five of those assets present six defects: four of which are corrosion, one is a broken component, and one is a bird's nest presence. All assets were labelled according to their condition, whether normal or the defect name found on an image level. We thoroughly evaluate state-of-the-art and popular methods for three image-level computer vision tasks covered by InsPLAD: object detection, through the AP metric; defect classification, through Balanced Accuracy; and anomaly detection, through the AUROC metric. InsPLAD offers various vision challenges from uncontrolled environments, such as multi-scale objects, multi-size class instances, multiple objects per image, intra-class variation, cluttered background, distinct point-of-views, perspective distortion, occlusion, and varied lighting conditions. To the best of our knowledge, InsPLAD is the first large real-world dataset and benchmark for power line asset inspection with multiple components and defects for various computer vision tasks, with a potential impact to improve state-of-the-art methods in the field. It will be publicly available in its integrity on a repository with a thorough description. It can be found at https://github.com/andreluizbvs/InsPLAD.
△ Less
Submitted 3 December, 2023; v1 submitted 2 November, 2023;
originally announced November 2023.
-
Toward unlabeled multi-view 3D pedestrian detection by generalizable AI: techniques and performance analysis
Authors:
João Paulo Lima,
Diego Thomas,
Hideaki Uchiyama,
Veronica Teichrieb
Abstract:
We unveil how generalizable AI can be used to improve multi-view 3D pedestrian detection in unlabeled target scenes. One way to increase generalization to new scenes is to automatically label target data, which can then be used for training a detector model. In this context, we investigate two approaches for automatically labeling target data: pseudo-labeling using a supervised detector and automa…
▽ More
We unveil how generalizable AI can be used to improve multi-view 3D pedestrian detection in unlabeled target scenes. One way to increase generalization to new scenes is to automatically label target data, which can then be used for training a detector model. In this context, we investigate two approaches for automatically labeling target data: pseudo-labeling using a supervised detector and automatic labeling using an untrained detector (that can be applied out of the box without any training). We adopt a training framework for optimizing detector models using automatic labeling procedures. This framework encompasses different training sets/modes and multi-round automatic labeling strategies. We conduct our analyses on the publicly-available WILDTRACK and MultiviewX datasets. We show that, by using the automatic labeling approach based on an untrained detector, we can obtain superior results than directly using the untrained detector or a detector trained with an existing labeled source dataset. It achieved a MODA about 4% and 1% better than the best existing unlabeled method when using WILDTRACK and MultiviewX as target datasets, respectively.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
High-Level Context Representation for Emotion Recognition in Images
Authors:
Willams de Lima Costa,
Estefania Talavera Martinez,
Lucas Silva Figueiredo,
Veronica Teichrieb
Abstract:
Emotion recognition is the task of classifying perceived emotions in people. Previous works have utilized various nonverbal cues to extract features from images and correlate them to emotions. Of these cues, situational context is particularly crucial in emotion perception since it can directly influence the emotion of a person. In this paper, we propose an approach for high-level context represen…
▽ More
Emotion recognition is the task of classifying perceived emotions in people. Previous works have utilized various nonverbal cues to extract features from images and correlate them to emotions. Of these cues, situational context is particularly crucial in emotion perception since it can directly influence the emotion of a person. In this paper, we propose an approach for high-level context representation extraction from images. The model relies on a single cue and a single encoding stream to correlate this representation with emotions. Our model competes with the state-of-the-art, achieving an mAP of 0.3002 on the EMOTIC dataset while also being capable of execution on consumer-grade hardware at approximately 90 frames per second. Overall, our approach is more efficient than previous models and can be easily deployed to address real-world problems related to emotion recognition.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Multi-Cue Adaptive Emotion Recognition Network
Authors:
Willams Costa,
David Macêdo,
Cleber Zanchettin,
Lucas S. Figueiredo,
Veronica Teichrieb
Abstract:
Expressing and identifying emotions through facial and physical expressions is a significant part of social interaction. Emotion recognition is an essential task in computer vision due to its various applications and mainly for allowing a more natural interaction between humans and machines. The common approaches for emotion recognition focus on analyzing facial expressions and requires the automa…
▽ More
Expressing and identifying emotions through facial and physical expressions is a significant part of social interaction. Emotion recognition is an essential task in computer vision due to its various applications and mainly for allowing a more natural interaction between humans and machines. The common approaches for emotion recognition focus on analyzing facial expressions and requires the automatic localization of the face in the image. Although these methods can correctly classify emotion in controlled scenarios, such techniques are limited when dealing with unconstrained daily interactions. We propose a new deep learning approach for emotion recognition based on adaptive multi-cues that extract information from context and body poses, which humans commonly use in social interaction and communication. We compare the proposed approach with the state-of-art approaches in the CAER-S dataset, evaluating different components in a pipeline that reached an accuracy of 89.30%
△ Less
Submitted 14 December, 2021; v1 submitted 3 November, 2021;
originally announced November 2021.
-
STN PLAD: A Dataset for Multi-Size Power Line Assets Detection in High-Resolution UAV Images
Authors:
André Luiz Buarque Vieira-e-Silva,
Heitor Felix,
Thiago de Menezes Chaves,
Francisco Paulo Magalhães Simões,
Veronica Teichrieb,
Michel Mozinho dos Santos,
Hemir da Cunha Santiago,
Virginia Adélia Cordeiro Sgotti,
Henrique Baptista Duffles Teixeira Lott Neto
Abstract:
Many power line companies are using UAVs to perform their inspection processes instead of putting their workers at risk by making them climb high voltage power line towers, for instance. A crucial task for the inspection is to detect and classify assets in the power transmission lines. However, public data related to power line assets are scarce, preventing a faster evolution of this area. This wo…
▽ More
Many power line companies are using UAVs to perform their inspection processes instead of putting their workers at risk by making them climb high voltage power line towers, for instance. A crucial task for the inspection is to detect and classify assets in the power transmission lines. However, public data related to power line assets are scarce, preventing a faster evolution of this area. This work proposes the Power Line Assets Dataset, containing high-resolution and real-world images of multiple high-voltage power line components. It has 2,409 annotated objects divided into five classes: transmission tower, insulator, spacer, tower plate, and Stockbridge damper, which vary in size (resolution), orientation, illumination, angulation, and background. This work also presents an evaluation with popular deep object detection methods, showing considerable room for improvement. The STN PLAD dataset is publicly available at https://github.com/andreluizbvs/PLAD.
△ Less
Submitted 2 September, 2021; v1 submitted 17 August, 2021;
originally announced August 2021.
-
A fluid simulation system based on the MPS method
Authors:
André Luiz Buarque Vieira-e-Silva,
Caio José dos Santos Brito,
Francisco Paulo Magalhães Simões,
Veronica Teichrieb
Abstract:
Fluid flow simulation is a highly active area with applications in a wide range of engineering problems and interactive systems. Meshless methods like the Moving Particle Semi-implicit (MPS) are a great alternative to deal efficiently with large deformations and free-surface flow. However, mesh-based approaches can achieve higher numerical precision than particle-based techniques with a performanc…
▽ More
Fluid flow simulation is a highly active area with applications in a wide range of engineering problems and interactive systems. Meshless methods like the Moving Particle Semi-implicit (MPS) are a great alternative to deal efficiently with large deformations and free-surface flow. However, mesh-based approaches can achieve higher numerical precision than particle-based techniques with a performance cost. This paper presents a numerically stable and parallelized system that benefits from advances in the literature and parallel computing to obtain an adaptable MPS method. The proposed technique can simulate liquids using different approaches, such as two ways to calculate the particles' pressure, turbulent flow, and multiphase interaction. The method is evaluated under traditional test cases presenting comparable results to recent techniques. This work integrates the previously mentioned advances into a single solution, which can switch on improvements, such as better momentum conservation and less spurious pressure oscillations, through a graphical interface. The code is entirely open-source under the GPLv3 free software license. The GPU-accelerated code reached speedups ranging from 3 to 43 times, depending on the total number of particles. The simulation runs at one fps for a case with approximately 200,000 particles. Code: https://github.com/andreluizbvs/VoxarMPS
△ Less
Submitted 19 August, 2021; v1 submitted 4 May, 2021;
originally announced May 2021.
-
Generalizable Multi-Camera 3D Pedestrian Detection
Authors:
João Paulo Lima,
Rafael Roberto,
Lucas Figueiredo,
Francisco Simões,
Veronica Teichrieb
Abstract:
We present a multi-camera 3D pedestrian detection method that does not need to train using data from the target scene. We estimate pedestrian location on the ground plane using a novel heuristic based on human body poses and person's bounding boxes from an off-the-shelf monocular detector. We then project these locations onto the world ground plane and fuse them with a new formulation of a clique…
▽ More
We present a multi-camera 3D pedestrian detection method that does not need to train using data from the target scene. We estimate pedestrian location on the ground plane using a novel heuristic based on human body poses and person's bounding boxes from an off-the-shelf monocular detector. We then project these locations onto the world ground plane and fuse them with a new formulation of a clique cover problem. We also propose an optional step for exploiting pedestrian appearance during fusion by using a domain-generalizable person re-identification model. We evaluated the proposed approach on the challenging WILDTRACK dataset. It obtained a MODA of 0.569 and an F-score of 0.78, superior to state-of-the-art generalizable detection techniques.
△ Less
Submitted 12 April, 2021;
originally announced April 2021.
-
Squeezed Deep 6DoF Object Detection Using Knowledge Distillation
Authors:
Heitor Felix,
Walber M. Rodrigues,
David Macêdo,
Francisco Simões,
Adriano L. I. Oliveira,
Veronica Teichrieb,
Cleber Zanchettin
Abstract:
The detection of objects considering a 6DoF pose is a common requirement to build virtual and augmented reality applications. It is usually a complex task which requires real-time processing and high precision results for adequate user experience. Recently, different deep learning techniques have been proposed to detect objects in 6DoF in RGB images. However, they rely on high complexity networks,…
▽ More
The detection of objects considering a 6DoF pose is a common requirement to build virtual and augmented reality applications. It is usually a complex task which requires real-time processing and high precision results for adequate user experience. Recently, different deep learning techniques have been proposed to detect objects in 6DoF in RGB images. However, they rely on high complexity networks, requiring a computational power that prevents them from working on mobile devices. In this paper, we propose an approach to reduce the complexity of 6DoF detection networks while maintaining accuracy. We used Knowledge Distillation to teach portables Convolutional Neural Networks (CNN) to learn from a real-time 6DoF detection CNN. The proposed method allows real-time applications using only RGB images while decreasing the hardware requirements. We used the LINEMOD dataset to evaluate the proposed method, and the experimental results show that the proposed method reduces the memory requirement by almost 99\% in comparison to the original architecture with the cost of reducing half the accuracy in one of the metrics. Code is available at https://github.com/heitorcfelix/singleshot6Dpose.
△ Less
Submitted 29 May, 2020; v1 submitted 30 March, 2020;
originally announced March 2020.
-
A Standalone Markerless 3D Tracker for Handheld Augmented Reality
Authors:
Joao Paulo Lima,
Veronica Teichrieb,
Judith Kelner
Abstract:
This paper presents an implementation of a markerless tracking technique targeted to the Windows Mobile Pocket PC platform. The primary aim of this work is to allow the development of standalone augmented reality applications for handheld devices based on natural feature tracking. In order to achieve this goal, a subset of two computer vision libraries was ported to the Pocket PC platform. They…
▽ More
This paper presents an implementation of a markerless tracking technique targeted to the Windows Mobile Pocket PC platform. The primary aim of this work is to allow the development of standalone augmented reality applications for handheld devices based on natural feature tracking. In order to achieve this goal, a subset of two computer vision libraries was ported to the Pocket PC platform. They were also adapted to use fixed point math, with the purpose of improving the overall performance of the routines. The port of these libraries opens up the possibility of having other computer vision tasks being executed on mobile platforms. A model based tracking approach that relies on edge information was adopted. Since it does not require a high processing power, it is suitable for constrained devices such as handhelds. The OpenGL ES graphics library was used to perform computer vision tasks, taking advantage of existing graphics hardware acceleration. An augmented reality application was created using the implemented technique and evaluations were done regarding tracking performance and accuracy
△ Less
Submitted 12 February, 2009;
originally announced February 2009.