-
PointNetPGAP-SLC: A 3D LiDAR-based Place Recognition Approach with Segment-level Consistency Training for Mobile Robots in Horticulture
Authors:
T. Barros,
L. Garrote,
P. Conde,
M. J. Coombes,
C. Liu,
C. Premebida,
U. J. Nunes
Abstract:
This paper addresses robotic place recognition in horticultural environments using 3D-LiDAR technology and deep learning. Three main contributions are proposed: (i) a novel model called PointNetPGAP, which combines a global average pooling aggregator and a pairwise feature interaction aggregator; (ii) a Segment-Level Consistency (SLC) model, used only during training, with the goal of augmenting t…
▽ More
This paper addresses robotic place recognition in horticultural environments using 3D-LiDAR technology and deep learning. Three main contributions are proposed: (i) a novel model called PointNetPGAP, which combines a global average pooling aggregator and a pairwise feature interaction aggregator; (ii) a Segment-Level Consistency (SLC) model, used only during training, with the goal of augmenting the contrastive loss with a context-specific training signal to enhance descriptors; and (iii) a novel dataset named HORTO-3DLM featuring sequences from orchards and strawberry plantations. The experimental evaluation, conducted on the new HORTO-3DLM dataset, compares PointNetPGAP at the sequence- and segment-level with state-of-the-art (SOTA) models, including OverlapTransformer, PointNetVLAD, and LOGG3D. Additionally, all models were trained and evaluated using the SLC. Empirical results obtained through a cross-validation evaluation protocol demonstrate the superiority of PointNetPGAP compared to existing SOTA models. PointNetPGAP emerges as the best model in retrieving the top-1 candidate, outperforming PointNetVLAD (the second-best model). Moreover, when comparing the impact of training with the SLC model, performance increased on four out of the five evaluated models, indicating that adding a context-specific signal to the contrastive loss leads to improved descriptors.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Causality from Bottom to Top: A Survey
Authors:
Abraham Itzhak Weinberg,
Cristiano Premebida,
Diego Resende Faria
Abstract:
Causality has become a fundamental approach for explaining the relationships between events, phenomena, and outcomes in various fields of study. It has invaded various fields and applications, such as medicine, healthcare, economics, finance, fraud detection, cybersecurity, education, public policy, recommender systems, anomaly detection, robotics, control, sociology, marketing, and advertising. I…
▽ More
Causality has become a fundamental approach for explaining the relationships between events, phenomena, and outcomes in various fields of study. It has invaded various fields and applications, such as medicine, healthcare, economics, finance, fraud detection, cybersecurity, education, public policy, recommender systems, anomaly detection, robotics, control, sociology, marketing, and advertising. In this paper, we survey its development over the past five decades, shedding light on the differences between causality and other approaches, as well as the preconditions for using it. Furthermore, the paper illustrates how causality interacts with new approaches such as Artificial Intelligence (AI), Generative AI (GAI), Machine and Deep Learning, Reinforcement Learning (RL), and Fuzzy Logic. We study the impact of causality on various fields, its contribution, and its interaction with state-of-the-art approaches. Additionally, the paper exemplifies the trustworthiness and explainability of causality models. We offer several ways to evaluate causality models and discuss future directions.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Reducing the False Positive Rate Using Bayesian Inference in Autonomous Driving Perception
Authors:
Gledson Melotti,
Johann J. S. Bastos,
Bruno L. S. da Silva,
Tiago Zanotelli,
Cristiano Premebida
Abstract:
Object recognition is a crucial step in perception systems for autonomous and intelligent vehicles, as evidenced by the numerous research works in the topic. In this paper, object recognition is explored by using multisensory and multimodality approaches, with the intention of reducing the false positive rate (FPR). The reduction of the FPR becomes increasingly important in perception systems sinc…
▽ More
Object recognition is a crucial step in perception systems for autonomous and intelligent vehicles, as evidenced by the numerous research works in the topic. In this paper, object recognition is explored by using multisensory and multimodality approaches, with the intention of reducing the false positive rate (FPR). The reduction of the FPR becomes increasingly important in perception systems since the misclassification of an object can potentially cause accidents. In particular, this work presents a strategy through Bayesian inference to reduce the FPR considering the likelihood function as a cumulative distribution function from Gaussian kernel density estimations, and the prior probabilities as cumulative functions of normalized histograms. The validation of the proposed methodology is performed on the KITTI dataset using deep networks (DenseNet, NasNet, and EfficientNet), and recent 3D point cloud networks (PointNet, and PintNet++), by considering three object-categories (cars, cyclists, pedestrians) and the RGB and LiDAR sensor modalities.
△ Less
Submitted 22 October, 2023; v1 submitted 9 September, 2023;
originally announced October 2023.
-
A Theoretical and Practical Framework for Evaluating Uncertainty Calibration in Object Detection
Authors:
Pedro Conde,
Rui L. Lopes,
Cristiano Premebida
Abstract:
The proliferation of Deep Neural Networks has resulted in machine learning systems becoming increasingly more present in various real-world applications. Consequently, there is a growing demand for highly reliable models in many domains, making the problem of uncertainty calibration pivotal when considering the future of deep learning. This is especially true when considering object detection syst…
▽ More
The proliferation of Deep Neural Networks has resulted in machine learning systems becoming increasingly more present in various real-world applications. Consequently, there is a growing demand for highly reliable models in many domains, making the problem of uncertainty calibration pivotal when considering the future of deep learning. This is especially true when considering object detection systems, that are commonly present in safety-critical applications such as autonomous driving, robotics and medical diagnosis. For this reason, this work presents a novel theoretical and practical framework to evaluate object detection systems in the context of uncertainty calibration. This encompasses a new comprehensive formulation of this concept through distinct formal definitions, and also three novel evaluation metrics derived from such theoretical foundation. The robustness of the proposed uncertainty calibration metrics is shown through a series of representative experiments.
△ Less
Submitted 18 March, 2024; v1 submitted 1 September, 2023;
originally announced September 2023.
-
Multispectral Image Segmentation in Agriculture: A Comprehensive Study on Fusion Approaches
Authors:
Nuno Cunha,
Tiago Barros,
Mário Reis,
Tiago Marta,
Cristiano Premebida,
Urbano J. Nunes
Abstract:
Multispectral imagery is frequently incorporated into agricultural tasks, providing valuable support for applications such as image segmentation, crop monitoring, field robotics, and yield estimation. From an image segmentation perspective, multispectral cameras can provide rich spectral information, hel** with noise reduction and feature extraction. As such, this paper concentrates on the use o…
▽ More
Multispectral imagery is frequently incorporated into agricultural tasks, providing valuable support for applications such as image segmentation, crop monitoring, field robotics, and yield estimation. From an image segmentation perspective, multispectral cameras can provide rich spectral information, hel** with noise reduction and feature extraction. As such, this paper concentrates on the use of fusion approaches to enhance the segmentation process in agricultural applications. More specifically, in this work, we compare different fusion approaches by combining RGB and NDVI as inputs for crop row detection, which can be useful in autonomous robots operating in the field. The inputs are used individually as well as combined at different times of the process (early and late fusion) to perform classical and DL-based semantic segmentation. In this study, two agriculture-related datasets are subjected to analysis using both deep learning (DL)-based and classical segmentation methodologies. The experiments reveal that classical segmentation methods, utilizing techniques such as edge detection and thresholding, can effectively compete with DL-based algorithms, particularly in tasks requiring precise foreground-background separation. This suggests that traditional methods retain their efficacy in certain specialized applications within the agricultural domain. Moreover, among the fusion strategies examined, late fusion emerges as the most robust approach, demonstrating superiority in adaptability and effectiveness across varying segmentation scenarios. The dataset and code is available at https://github.com/Cybonic/MISAgriculture.git.
△ Less
Submitted 31 July, 2023;
originally announced August 2023.
-
TReR: A Lightweight Transformer Re-Ranking Approach for 3D LiDAR Place Recognition
Authors:
Tiago Barros,
Luís Garrote,
Martin Aleksandrov,
Cristiano Premebida,
Urbano J. Nunes
Abstract:
Autonomous driving systems often require reliable loop closure detection to guarantee reduced localization drift. Recently, 3D LiDAR-based localization methods have used retrieval-based place recognition to find revisited places efficiently. However, when deployed in challenging real-world scenarios, the place recognition models become more complex, which comes at the cost of high computational de…
▽ More
Autonomous driving systems often require reliable loop closure detection to guarantee reduced localization drift. Recently, 3D LiDAR-based localization methods have used retrieval-based place recognition to find revisited places efficiently. However, when deployed in challenging real-world scenarios, the place recognition models become more complex, which comes at the cost of high computational demand. This work tackles this problem from an information-retrieval perspective, adopting a first-retrieve-then-re-ranking paradigm, where an initial loop candidate ranking, generated from a 3D place recognition model, is re-ordered by a proposed lightweight transformer-based re-ranking approach (TReR). The proposed approach relies on global descriptors only, being agnostic to the place recognition model. The experimental evaluation, conducted on the KITTI Odometry dataset, where we compared TReR with s.o.t.a. re-ranking approaches such as alphaQE and SGV, indicate the robustness and efficiency when compared to alphaQE while offering a good trade-off between robustness and efficiency when compared to SGV.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
Approaching Test Time Augmentation in the Context of Uncertainty Calibration for Deep Neural Networks
Authors:
Pedro Conde,
Tiago Barros,
Rui L. Lopes,
Cristiano Premebida,
Urbano J. Nunes
Abstract:
With the rise of Deep Neural Networks, machine learning systems are nowadays ubiquitous in a number of real-world applications, which bears the need for highly reliable models. This requires a thorough look not only at the accuracy of such systems, but also at their predictive uncertainty. Hence, we propose a novel technique (with two different variations, named M-ATTA and V-ATTA) based on test ti…
▽ More
With the rise of Deep Neural Networks, machine learning systems are nowadays ubiquitous in a number of real-world applications, which bears the need for highly reliable models. This requires a thorough look not only at the accuracy of such systems, but also at their predictive uncertainty. Hence, we propose a novel technique (with two different variations, named M-ATTA and V-ATTA) based on test time augmentation, to improve the uncertainty calibration of deep models for image classification. By leveraging na adaptive weighting system, M/V-ATTA improves uncertainty calibration without affecting the model's accuracy. The performance of these techniques is evaluated by considering diverse metrics related to uncertainty calibration, demonstrating their robustness. Empirical results, obtained on CIFAR-10, CIFAR-100, Aerial Image Dataset, as well as in two different scenarios under distribution-shift, indicate that the proposed methods outperform several state-of-the-art post-hoc calibration techniques. Furthermore, the methods proposed also show improvements in terms of predictive entropy on out-of-distribution samples. Code for M/V-ATTA available at: https://github.com/pedrormconde/MV-ATTA
△ Less
Submitted 18 March, 2024; v1 submitted 11 April, 2023;
originally announced April 2023.
-
ORCHNet: A Robust Global Feature Aggregation approach for 3D LiDAR-based Place recognition in Orchards
Authors:
T. Barros,
L. Garrote,
P. Conde,
M. J. Coombes,
C. Liu,
C. Premebida,
U. J. Nunes
Abstract:
Robust and reliable place recognition and loop closure detection in agricultural environments is still an open problem. In particular, orchards are a difficult case study due to structural similarity across the entire field. In this work, we address the place recognition problem in orchards resorting to 3D LiDAR data, which is considered a key modality for robustness. Hence, we propose ORCHNet, a…
▽ More
Robust and reliable place recognition and loop closure detection in agricultural environments is still an open problem. In particular, orchards are a difficult case study due to structural similarity across the entire field. In this work, we address the place recognition problem in orchards resorting to 3D LiDAR data, which is considered a key modality for robustness. Hence, we propose ORCHNet, a deep-learning-based approach that maps 3D-LiDAR scans to global descriptors. Specifically, this work proposes a new global feature aggregation approach, which fuses multiple aggregation methods into a robust global descriptor. ORCHNet is evaluated on real-world data collected in orchards, comprising data from the summer and autumn seasons. To assess the robustness, we compare ORCHNet with state-of-the-art aggregation approaches on data from the same season and across seasons. Moreover, we additionally evaluate the proposed approach as part of a localization framework, where ORCHNet is used as a loop closure detector. The empirical results indicate that, on the place recognition task, ORCHNet outperforms the remaining approaches, and is also more robust across seasons. As for the localization, the edge cases where the path goes through the trees are solved when integrating ORCHNet as a loop detector, showing the potential applicability of the proposed approach in this task. The code will be publicly available at:\url{https://github.com/Cybonic/ORCHNet.git}
△ Less
Submitted 6 February, 2024; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Survey on Deep Fuzzy Systems in regression applications: a view on interpretability
Authors:
Jorge S. S. Júnior,
Jérôme Mendes,
Francisco Souza,
Cristiano Premebida
Abstract:
Regression problems have been more and more embraced by deep learning (DL) techniques. The increasing number of papers recently published in this domain, including surveys and reviews, shows that deep regression has captured the attention of the community due to efficiency and good accuracy in systems with high-dimensional data. However, many DL methodologies have complex structures that are not r…
▽ More
Regression problems have been more and more embraced by deep learning (DL) techniques. The increasing number of papers recently published in this domain, including surveys and reviews, shows that deep regression has captured the attention of the community due to efficiency and good accuracy in systems with high-dimensional data. However, many DL methodologies have complex structures that are not readily transparent to human users. Accessing the interpretability of these models is an essential factor for addressing problems in sensitive areas such as cyber-security systems, medical, financial surveillance, and industrial processes. Fuzzy logic systems (FLS) are inherently interpretable models, well known in the literature, capable of using nonlinear representations for complex systems through linguistic terms with membership degrees mimicking human thought. Within an atmosphere of explainable artificial intelligence, it is necessary to consider a trade-off between accuracy and interpretability for develo** intelligent models. This paper aims to investigate the state-of-the-art on existing methodologies that combine DL and FLS, namely deep fuzzy systems, to address regression problems, configuring a topic that is currently not sufficiently explored in the literature and thus deserves a comprehensive survey.
△ Less
Submitted 9 September, 2022;
originally announced September 2022.
-
High-Order Conditional Mutual Information Maximization for dealing with High-Order Dependencies in Feature Selection
Authors:
Francisco Souza,
Cristiano Premebida,
Rui Araújo
Abstract:
This paper presents a novel feature selection method based on the conditional mutual information (CMI). The proposed High Order Conditional Mutual Information Maximization (HOCMIM) incorporates high order dependencies into the feature selection procedure and has a straightforward interpretation due to its bottom-up derivation. The HOCMIM is derived from the CMI's chain expansion and expressed as a…
▽ More
This paper presents a novel feature selection method based on the conditional mutual information (CMI). The proposed High Order Conditional Mutual Information Maximization (HOCMIM) incorporates high order dependencies into the feature selection procedure and has a straightforward interpretation due to its bottom-up derivation. The HOCMIM is derived from the CMI's chain expansion and expressed as a maximization optimization problem. The maximization problem is solved using a greedy search procedure, which speeds up the entire feature selection process. The experiments are run on a set of benchmark datasets (20 in total). The HOCMIM is compared with eighteen state-of-the-art feature selection algorithms, from the results of two supervised learning classifiers (Support Vector Machine and K-Nearest Neighbor). The HOCMIM achieves the best results in terms of accuracy and shows to be faster than high order feature selection counterparts.
△ Less
Submitted 24 August, 2022; v1 submitted 18 July, 2022;
originally announced July 2022.
-
Reducing Overconfidence Predictions for Autonomous Driving Perception
Authors:
Gledson Melotti,
Cristiano Premebida,
Jordan J. Bird,
Diego R. Faria,
Nuno Gonçalves
Abstract:
In state-of-the-art deep learning for object recognition, SoftMax and Sigmoid functions are most commonly employed as the predictor outputs. Such layers often produce overconfident predictions rather than proper probabilistic scores, which can thus harm the decision-making of `critical' perception systems applied in autonomous driving and robotics. Given this, the experiments in this work propose…
▽ More
In state-of-the-art deep learning for object recognition, SoftMax and Sigmoid functions are most commonly employed as the predictor outputs. Such layers often produce overconfident predictions rather than proper probabilistic scores, which can thus harm the decision-making of `critical' perception systems applied in autonomous driving and robotics. Given this, the experiments in this work propose a probabilistic approach based on distributions calculated out of the Logit layer scores of pre-trained networks. We demonstrate that Maximum Likelihood (ML) and Maximum a-Posteriori (MAP) functions are more suitable for probabilistic interpretations than SoftMax and Sigmoid-based predictions for object recognition. We explore distinct sensor modalities via RGB images and LiDARs (RV: range-view) data from the KITTI and Lyft Level-5 datasets, where our approach shows promising performance compared to the usual SoftMax and Sigmoid layers, with the benefit of enabling interpretable probabilistic predictions. Another advantage of the approach introduced in this paper is that the ML and MAP functions can be implemented in existing trained networks, that is, the approach benefits from the output of the Logit layer of pre-trained networks. Thus, there is no need to carry out a new training phase since the ML and MAP functions are used in the test/prediction phase.
△ Less
Submitted 11 May, 2022; v1 submitted 15 February, 2022;
originally announced February 2022.
-
Probabilistic Approach for Road-Users Detection
Authors:
G. Melotti,
W. Lu,
P. Conde,
D. Zhao,
A. Asvadi,
N. Gonçalves,
C. Premebida
Abstract:
Object detection in autonomous driving applications implies that the detection and tracking of semantic objects are commonly native to urban driving environments, as pedestrians and vehicles. One of the major challenges in state-of-the-art deep-learning based object detection are false positives which occur with overconfident scores. This is highly undesirable in autonomous driving and other criti…
▽ More
Object detection in autonomous driving applications implies that the detection and tracking of semantic objects are commonly native to urban driving environments, as pedestrians and vehicles. One of the major challenges in state-of-the-art deep-learning based object detection are false positives which occur with overconfident scores. This is highly undesirable in autonomous driving and other critical robotic-perception domains because of safety concerns. This paper proposes an approach to alleviate the problem of overconfident predictions by introducing a novel probabilistic layer to deep object detection networks in testing. The suggested approach avoids the traditional Sigmoid or Softmax prediction layer which often produces overconfident predictions. It is demonstrated that the proposed technique reduces overconfidence in the false positives without degrading the performance on the true positives. The approach is validated on the 2D-KITTI objection detection through the YOLOV4 and SECOND (Lidar-based detector). The proposed approach enables interpretable probabilistic predictions without the requirement of re-training the network and therefore is very practical.
△ Less
Submitted 21 April, 2023; v1 submitted 2 December, 2021;
originally announced December 2021.
-
Multispectral Vineyard Segmentation: A Deep Learning approach
Authors:
T. Barros,
P. Conde,
G. Gonçalves,
C. Premebida,
M. Monteiro,
C. S. S. Ferreira,
U. J. Nunes
Abstract:
Digital agriculture has evolved significantly over the last few years due to the technological developments in automation and computational intelligence applied to the agricultural sector, including vineyards which are a relevant crop in the Mediterranean region. In this work, a study is presented of semantic segmentation for vine detection in real-world vineyards by exploring state-of-the-art dee…
▽ More
Digital agriculture has evolved significantly over the last few years due to the technological developments in automation and computational intelligence applied to the agricultural sector, including vineyards which are a relevant crop in the Mediterranean region. In this work, a study is presented of semantic segmentation for vine detection in real-world vineyards by exploring state-of-the-art deep segmentation networks and conventional unsupervised methods. Camera data have been collected on vineyards using an Unmanned Aerial System (UAS) equipped with a dual imaging sensor payload, namely a high-definition RGB camera and a five-band multispectral and thermal camera. Extensive experiments using deep-segmentation networks and unsupervised methods have been performed on multimodal datasets representing four distinct vineyards located in the central region of Portugal. The reported results indicate that SegNet, U-Net, and ModSegNet have equivalent overall performance in vine segmentation. The results also show that multimodality slightly improves the performance of vine segmentation, but the NIR spectrum alone generally is sufficient on most of the datasets. Furthermore, results suggest that high-definition RGB images produce equivalent or higher performance than any lower resolution multispectral band combination. Lastly, Deep Learning (DL) networks have higher overall performance than classical methods. The code and dataset are publicly available at https://github.com/Cybonic/DL_vineyard_segmentation_study.git
△ Less
Submitted 1 March, 2022; v1 submitted 2 August, 2021;
originally announced August 2021.
-
Place recognition survey: An update on deep learning approaches
Authors:
Tiago Barros,
Ricardo Pereira,
Luís Garrote,
Cristiano Premebida,
Urbano J. Nunes
Abstract:
Autonomous Vehicles (AV) are becoming more capable of navigating in complex environments with dynamic and changing conditions. A key component that enables these intelligent vehicles to overcome such conditions and become more autonomous is the sophistication of the perception and localization systems. As part of the localization system, place recognition has benefited from recent developments in…
▽ More
Autonomous Vehicles (AV) are becoming more capable of navigating in complex environments with dynamic and changing conditions. A key component that enables these intelligent vehicles to overcome such conditions and become more autonomous is the sophistication of the perception and localization systems. As part of the localization system, place recognition has benefited from recent developments in other perception tasks such as place categorization or object recognition, namely with the emergence of deep learning (DL) frameworks. This paper surveys recent approaches and methods used in place recognition, particularly those based on deep learning. The contributions of this work are twofold: surveying recent sensors such as 3D LiDARs and RADARs, applied in place recognition; and categorizing the various DL-based place recognition works into supervised, unsupervised, semi-supervised, parallel, and hierarchical categories. First, this survey introduces key place recognition concepts to contextualize the reader. Then, sensor characteristics are addressed. This survey proceeds by elaborating on the various DL-based works, presenting summaries for each framework. Some lessons learned from this survey include: the importance of NetVLAD for supervised end-to-end learning; the advantages of unsupervised approaches in place recognition, namely for cross-domain applications; or the increasing tendency of recent works to seek, not only for higher performance but also for higher efficiency.
△ Less
Submitted 1 March, 2022; v1 submitted 19 June, 2021;
originally announced June 2021.
-
AttDLNet: Attention-based DL Network for 3D LiDAR Place Recognition
Authors:
Tiago Barros,
Luís Garrote,
Ricardo Pereira,
Cristiano Premebida,
Urbano J. Nunes
Abstract:
LiDAR-based place recognition is one of the key components of SLAM and global localization in autonomous vehicles and robotics applications. With the success of DL approaches in learning useful information from 3D LiDARs, place recognition has also benefited from this modality, which has led to higher re-localization and loop-closure detection performance, particularly, in environments with signif…
▽ More
LiDAR-based place recognition is one of the key components of SLAM and global localization in autonomous vehicles and robotics applications. With the success of DL approaches in learning useful information from 3D LiDARs, place recognition has also benefited from this modality, which has led to higher re-localization and loop-closure detection performance, particularly, in environments with significant changing conditions. Despite the progress in this field, the extraction of proper and efficient descriptors from 3D LiDAR data that are invariant to changing conditions and orientation is still an unsolved challenge. To address this problem, this work proposes a novel 3D LiDAR-based deep learning network (named AttDLNet) that uses a range-based proxy representation for point clouds and an attention network with stacked attention layers to selectively focus on long-range context and inter-feature relationships. The proposed network is trained and validated on the KITTI dataset and an ablation study is presented to assess the novel attention network. Results show that adding attention to the network improves performance, leading to efficient loop closures, and outperforming an established 3D LiDAR-based place recognition approach. From the ablation study, results indicate that the middle encoder layers have the highest mean performance, while deeper layers are more robust to orientation change. The code is publicly available at https://github.com/Cybonic/AttDLNet
△ Less
Submitted 4 January, 2023; v1 submitted 17 June, 2021;
originally announced June 2021.
-
Look and Listen: A Multi-modality Late Fusion Approach to Scene Classification for Autonomous Machines
Authors:
Jordan J. Bird,
Diego R. Faria,
Cristiano Premebida,
Anikó Ekárt,
George Vogiatzis
Abstract:
The novelty of this study consists in a multi-modality approach to scene classification, where image and audio complement each other in a process of deep late fusion. The approach is demonstrated on a difficult classification problem, consisting of two synchronised and balanced datasets of 16,000 data objects, encompassing 4.4 hours of video of 8 environments with varying degrees of similarity. We…
▽ More
The novelty of this study consists in a multi-modality approach to scene classification, where image and audio complement each other in a process of deep late fusion. The approach is demonstrated on a difficult classification problem, consisting of two synchronised and balanced datasets of 16,000 data objects, encompassing 4.4 hours of video of 8 environments with varying degrees of similarity. We first extract video frames and accompanying audio at one second intervals. The image and the audio datasets are first classified independently, using a fine-tuned VGG16 and an evolutionary optimised deep neural network, with accuracies of 89.27% and 93.72%, respectively. This is followed by late fusion of the two neural networks to enable a higher order function, leading to accuracy of 96.81% in this multi-modality classifier with synchronised video frames and audio clips. The tertiary neural network implemented for late fusion outperforms classical state-of-the-art classifiers by around 3% when the two primary networks are considered as feature generators. We show that situations where a single-modality may be confused by anomalous data points are now corrected through an emerging higher order integration. Prominent examples include a water feature in a city misclassified as a river by the audio classifier alone and a densely crowded street misclassified as a forest by the image classifier alone. Both are examples which are correctly classified by our multi-modality approach.
△ Less
Submitted 11 July, 2020;
originally announced July 2020.
-
LSTM and GPT-2 Synthetic Speech Transfer Learning for Speaker Recognition to Overcome Data Scarcity
Authors:
Jordan J. Bird,
Diego R. Faria,
Anikó Ekárt,
Cristiano Premebida,
Pedro P. S. Ayrosa
Abstract:
In speech recognition problems, data scarcity often poses an issue due to the willingness of humans to provide large amounts of data for learning and classification. In this work, we take a set of 5 spoken Harvard sentences from 7 subjects and consider their MFCC attributes. Using character level LSTMs (supervised learning) and OpenAI's attention-based GPT-2 models, synthetic MFCCs are generated b…
▽ More
In speech recognition problems, data scarcity often poses an issue due to the willingness of humans to provide large amounts of data for learning and classification. In this work, we take a set of 5 spoken Harvard sentences from 7 subjects and consider their MFCC attributes. Using character level LSTMs (supervised learning) and OpenAI's attention-based GPT-2 models, synthetic MFCCs are generated by learning from the data provided on a per-subject basis. A neural network is trained to classify the data against a large dataset of Flickr8k speakers and is then compared to a transfer learning network performing the same task but with an initial weight distribution dictated by learning from the synthetic data generated by the two models. The best result for all of the 7 subjects were networks that had been exposed to synthetic data, the model pre-trained with LSTM-produced data achieved the best result 3 times and the GPT-2 equivalent 5 times (since one subject had their best result from both models at a draw). Through these results, we argue that speaker classification can be improved by utilising a small amount of user data but with exposure to synthetically-generated MFCCs which then allow the networks to achieve near maximum classification scores.
△ Less
Submitted 3 July, 2020; v1 submitted 1 July, 2020;
originally announced July 2020.
-
Probabilistic Object Classification using CNN ML-MAP layers
Authors:
G. Melotti,
C. Premebida,
J. J. Bird,
D. R. Faria,
N. Gonçalves
Abstract:
Deep networks are currently the state-of-the-art for sensory perception in autonomous driving and robotics. However, deep models often generate overconfident predictions precluding proper probabilistic interpretation which we argue is due to the nature of the SoftMax layer. To reduce the overconfidence without compromising the classification performance, we introduce a CNN probabilistic approach b…
▽ More
Deep networks are currently the state-of-the-art for sensory perception in autonomous driving and robotics. However, deep models often generate overconfident predictions precluding proper probabilistic interpretation which we argue is due to the nature of the SoftMax layer. To reduce the overconfidence without compromising the classification performance, we introduce a CNN probabilistic approach based on distributions calculated in the network's Logit layer. The approach enables Bayesian inference by means of ML and MAP layers. Experiments with calibrated and the proposed prediction layers are carried out on object classification using data from the KITTI database. Results are reported for camera ($RGB$) and LiDAR (range-view) modalities, where the new approach shows promising performance compared to SoftMax.
△ Less
Submitted 24 August, 2020; v1 submitted 29 May, 2020;
originally announced May 2020.
-
High-resolution LIDAR-based Depth Map** using Bilateral Filter
Authors:
C. Premebida,
L. Garrote,
A. Asvadi,
A. Pedro Ribeiro,
U. Nunes
Abstract:
High resolution depth-maps, obtained by upsampling sparse range data from a 3D-LIDAR, find applications in many fields ranging from sensory perception to semantic segmentation and object detection. Upsampling is often based on combining data from a monocular camera to compensate the low-resolution of a LIDAR. This paper, on the other hand, introduces a novel framework to obtain dense depth-map sol…
▽ More
High resolution depth-maps, obtained by upsampling sparse range data from a 3D-LIDAR, find applications in many fields ranging from sensory perception to semantic segmentation and object detection. Upsampling is often based on combining data from a monocular camera to compensate the low-resolution of a LIDAR. This paper, on the other hand, introduces a novel framework to obtain dense depth-map solely from a single LIDAR point cloud; which is a research direction that has been barely explored. The formulation behind the proposed depth-map** process relies on local spatial interpolation, using sliding-window (mask) technique, and on the Bilateral Filter (BF) where the variable of interest, the distance from the sensor, is considered in the interpolation problem. In particular, the BF is conveniently modified to perform depth-map upsampling such that the edges (foreground-background discontinuities) are better preserved by means of a proposed method which influences the range-based weighting term. Other methods for spatial upsampling are discussed, evaluated and compared in terms of different error measures. This paper also researches the role of the mask's size in the performance of the implemented methods. Quantitative and qualitative results from experiments on the KITTI Database, using LIDAR point clouds only, show very satisfactory performance of the approach introduced in this work.
△ Less
Submitted 17 June, 2016;
originally announced June 2016.