Search | arXiv e-print repository

Visual place recognition for aerial imagery: A survey

Authors: Ivan Moskalenko, Anastasiia Kornilova, Gonzalo Ferrer

Abstract: Aerial imagery and its direct application to visual localization is an essential problem for many Robotics and Computer Vision tasks. While Global Navigation Satellite Systems (GNSS) are the standard default solution for solving the aerial localization problem, it is subject to a number of limitations, such as, signal instability or solution unreliability that make this option not so desirable. Co… ▽ More Aerial imagery and its direct application to visual localization is an essential problem for many Robotics and Computer Vision tasks. While Global Navigation Satellite Systems (GNSS) are the standard default solution for solving the aerial localization problem, it is subject to a number of limitations, such as, signal instability or solution unreliability that make this option not so desirable. Consequently, visual geolocalization is emerging as a viable alternative. However, adapting Visual Place Recognition (VPR) task to aerial imagery presents significant challenges, including weather variations and repetitive patterns. Current VPR reviews largely neglect the specific context of aerial data. This paper introduces a methodology tailored for evaluating VPR techniques specifically in the domain of aerial imagery, providing a comprehensive assessment of various methods and their performance. However, we not only compare various VPR methods, but also demonstrate the importance of selecting appropriate zoom and overlap levels when constructing map tiles to achieve maximum efficiency of VPR algorithms in the case of aerial imagery. The code is available on our GitHub repository -- https://github.com/prime-slam/aero-vloc. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.02162 [pdf, other]

Map** the Unseen: Unified Promptable Panoptic Map** with Dynamic Labeling using Foundation Models

Authors: Mohamad Al Mdfaa, Raghad Salameh, Sergey Zagoruyko, Gonzalo Ferrer

Abstract: In the field of robotics and computer vision, efficient and accurate semantic map** remains a significant challenge due to the growing demand for intelligent machines that can comprehend and interact with complex environments. Conventional panoptic map** methods, however, are limited by predefined semantic classes, thus making them ineffective for handling novel or unforeseen objects. In respo… ▽ More In the field of robotics and computer vision, efficient and accurate semantic map** remains a significant challenge due to the growing demand for intelligent machines that can comprehend and interact with complex environments. Conventional panoptic map** methods, however, are limited by predefined semantic classes, thus making them ineffective for handling novel or unforeseen objects. In response to this limitation, we introduce the Unified Promptable Panoptic Map** (UPPM) method. UPPM utilizes recent advances in foundation models to enable real-time, on-demand label generation using natural language prompts. By incorporating a dynamic labeling strategy into traditional panoptic map** techniques, UPPM provides significant improvements in adaptability and versatility while maintaining high performance levels in map reconstruction. We demonstrate our approach on real-world and simulated datasets. Results show that UPPM can accurately reconstruct scenes and segment objects while generating rich semantic labels through natural language interactions. A series of ablation experiments validated the advantages of foundation model-based labeling over fixed label sets. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2307.01069 [pdf, other]

Shi-NeSS: Detecting Good and Stable Keypoints with a Neural Stability Score

Authors: Konstantin Pakulev, Alexander Vakhitov, Gonzalo Ferrer

Abstract: Learning a feature point detector presents a challenge both due to the ambiguity of the definition of a keypoint and correspondingly the need for a specially prepared ground truth labels for such points. In our work, we address both of these issues by utilizing a combination of a hand-crafted Shi detector and a neural network. We build on the principled and localized keypoints provided by the Shi… ▽ More Learning a feature point detector presents a challenge both due to the ambiguity of the definition of a keypoint and correspondingly the need for a specially prepared ground truth labels for such points. In our work, we address both of these issues by utilizing a combination of a hand-crafted Shi detector and a neural network. We build on the principled and localized keypoints provided by the Shi detector and perform their selection using the keypoint stability score regressed by the neural network - Neural Stability Score (NeSS). Therefore, our method is named Shi-NeSS since it combines the Shi detector and the properties of the keypoint stability score, and it only requires for training sets of images without dataset pre-labeling or the need for reconstructed correspondence labels. We evaluate Shi-NeSS on HPatches, ScanNet, MegaDepth and IMC-PT, demonstrating state-of-the-art performance and good generalization on downstream tasks. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: 10 pages, 4 figures

arXiv:2305.02859 [pdf, other]

Social Robot Navigation through Constrained Optimization: a Comparative Study of Uncertainty-based Objectives and Constraints

Authors: Timur Akhtyamov, Aleksandr Kashirin, Aleksey Postnikov, Gonzalo Ferrer

Abstract: This work is dedicated to the study of how uncertainty estimation of the human motion prediction can be embedded into constrained optimization techniques, such as Model Predictive Control (MPC) for the social robot navigation. We propose several cost objectives and constraint functions obtained from the uncertainty of predicting pedestrian positions and related to the probability of the collision… ▽ More This work is dedicated to the study of how uncertainty estimation of the human motion prediction can be embedded into constrained optimization techniques, such as Model Predictive Control (MPC) for the social robot navigation. We propose several cost objectives and constraint functions obtained from the uncertainty of predicting pedestrian positions and related to the probability of the collision that can be applied to the MPC, and all the different variants are compared in challenging scenes with multiple agents. The main question this paper tries to answer is: what are the most important uncertainty-based criteria for social MPC? For that, we evaluate the proposed approaches with several social navigation metrics in an extensive set of scenarios of different complexity in reproducible synthetic environments. The main outcome of our study is a foundation for a practical guide on when and how to use uncertainty-aware approaches for social robot navigation in practice and what are the most effective criteria. △ Less

Submitted 17 July, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

arXiv:2304.05342 [pdf, other]

TT-SDF2PC: Registration of Point Cloud and Compressed SDF Directly in the Memory-Efficient Tensor Train Domain

Authors: Alexey I. Boyko, Anastasiia Kornilova, Rahim Tariverdizadeh, Mirfarid Musavian, Larisa Markeeva, Ivan Oseledets, Gonzalo Ferrer

Abstract: This paper addresses the following research question: ``can one compress a detailed 3D representation and use it directly for point cloud registration?''. Map compression of the scene can be achieved by the tensor train (TT) decomposition of the signed distance function (SDF) representation. It regulates the amount of data reduced by the so-called TT-ranks. Using this representation we have prop… ▽ More This paper addresses the following research question: ``can one compress a detailed 3D representation and use it directly for point cloud registration?''. Map compression of the scene can be achieved by the tensor train (TT) decomposition of the signed distance function (SDF) representation. It regulates the amount of data reduced by the so-called TT-ranks. Using this representation we have proposed an algorithm, the TT-SDF2PC, that is capable of directly registering a PC to the compressed SDF by making use of efficient calculations of its derivatives in the TT domain, saving computations and memory. We compare TT-SDF2PC with SOTA local and global registration methods in a synthetic dataset and a real dataset and show on par performance while requiring significantly less resources. △ Less

Submitted 11 April, 2023; originally announced April 2023.

arXiv:2304.01055 [pdf, other]

Eigen-Factors an Alternating Optimization for Back-end Plane SLAM of 3D Point Clouds

Authors: Gonzalo Ferrer, Dmitrii Iarosh, Anastasiia Kornilova

Abstract: Modern depth sensors can generate a huge number of 3D points in few seconds to be latter processed by Localization and Map** algorithms. Ideally, these algorithms should handle efficiently large sizes of Point Clouds under the assumption that using more points implies more information available. The Eigen Factors (EF) is a new algorithm that solves SLAM by using planes as the main geometric prim… ▽ More Modern depth sensors can generate a huge number of 3D points in few seconds to be latter processed by Localization and Map** algorithms. Ideally, these algorithms should handle efficiently large sizes of Point Clouds under the assumption that using more points implies more information available. The Eigen Factors (EF) is a new algorithm that solves SLAM by using planes as the main geometric primitive. To do so, EF exhaustively calculates the error of all points at complexity $O(1)$, thanks to the {\em Summation matrix} $S$ of homogeneous points. The solution of EF is highly efficient: i) the state variables are only the sensor poses -- trajectory, while the plane parameters are estimated previously in closed from and ii) EF alternating optimization uses a Newton-Raphson method by a direct analytical calculation of the gradient and the Hessian, which turns out to be a block diagonal matrix. Since we require to differentiate over eigenvalues and matrix elements, we have developed an intuitive methodology to calculate partial derivatives in the manifold of rigid body transformations $SE(3)$, which could be applied to unrelated problems that require analytical derivatives of certain complexity. We evaluate EF and other state-of-the-art plane SLAM back-end algorithms in a synthetic environment. The evaluation is extended to ICL dataset (RGBD) and LiDAR KITTI dataset. Code is publicly available at https://github.com/prime-slam/EF-plane-SLAM. △ Less

Submitted 4 September, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

arXiv:2303.05162 [pdf, other]

EVOLIN Benchmark: Evaluation of Line Detection and Association

Authors: Kirill Ivanov, Gonzalo Ferrer, Anastasiia Kornilova

Abstract: Lines are interesting geometrical features commonly seen in indoor and urban environments. There is missing a complete benchmark where one can evaluate lines from a sequential stream of images in all its stages: Line detection, Line Association and Pose error. To do so, we present a complete and exhaustive benchmark for visual lines in a SLAM front-end, both for RGB and RGBD, by providing a pletho… ▽ More Lines are interesting geometrical features commonly seen in indoor and urban environments. There is missing a complete benchmark where one can evaluate lines from a sequential stream of images in all its stages: Line detection, Line Association and Pose error. To do so, we present a complete and exhaustive benchmark for visual lines in a SLAM front-end, both for RGB and RGBD, by providing a plethora of complementary metrics. We have also labelled data from well-known SLAM datasets in order to have all in one poses and accurately annotated lines. In particular, we have evaluated 17 line detection algorithms, 5 line associations methods and the resultant pose error for aligning a pair of frames with several combinations of detector-association. We have packaged all methods and evaluations metrics and made them publicly available on web-page https://prime-slam.github.io/evolin/. △ Less

Submitted 31 July, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

arXiv:2303.05123 [pdf, other]

Dominating Set Database Selection for Visual Place Recognition

Authors: Anastasiia Kornilova, Ivan Moskalenko, Timofei Pushkin, Fakhriddin Tojiboev, Rahim Tariverdizadeh, Gonzalo Ferrer

Abstract: This paper presents an approach for creating a visual place recognition (VPR) database for localization in indoor environments from RGBD scanning sequences. The proposed approach is formulated as a minimization problem in terms of dominating set algorithm for graph, constructed from spatial information, and referred as DominatingSet. Our algorithm shows better scene coverage in comparison to other… ▽ More This paper presents an approach for creating a visual place recognition (VPR) database for localization in indoor environments from RGBD scanning sequences. The proposed approach is formulated as a minimization problem in terms of dominating set algorithm for graph, constructed from spatial information, and referred as DominatingSet. Our algorithm shows better scene coverage in comparison to other methodologies that are used for database creation. Also, we demonstrate that using DominatingSet, a database size could be up to 250-1400 times smaller than the original scanning sequence while maintaining a recall rate of more than 80% on testing sequences. We evaluated our algorithm on 7-scenes and BundleFusion datasets and an additionally recorded sequence in a highly repetitive office setting. In addition, the database selection can produce weakly-supervised labels for fine-tuning neural place recognition algorithms to particular settings, improving even more their accuracy. The paper also presents a fully automated pipeline for VPR database creation from RGBD scanning sequences, as well as a set of metrics for VPR database evaluation. The code and released data are available on our web-page~ -- https://prime-slam.github.io/place-recognition-db/ △ Less

Submitted 21 January, 2024; v1 submitted 9 March, 2023; originally announced March 2023.

arXiv:2301.07433 [pdf, other]

DDPEN: Trajectory Optimisation With Sub Goal Generation Model

Authors: Aleksander Gamayunov, Aleksey Postnikov, Gonzalo Ferrer

Abstract: Differential dynamic programming (DDP) is a widely used and powerful trajectory optimization technique, however, due to its internal structure, it is not exempt from local minima. In this paper, we present Differential Dynamic Programming with Escape Network (DDPEN) - a novel approach to avoid DDP local minima by utilising an additional term used in the optimization criteria pointing towards the d… ▽ More Differential dynamic programming (DDP) is a widely used and powerful trajectory optimization technique, however, due to its internal structure, it is not exempt from local minima. In this paper, we present Differential Dynamic Programming with Escape Network (DDPEN) - a novel approach to avoid DDP local minima by utilising an additional term used in the optimization criteria pointing towards the direction where robot should move in order to escape local minima. In order to produce the aforementioned directions, we propose to utilize a deep model that takes as an input the map of the environment in the form of a costmap together with the desired goal position. The Model produces possible future directions that will lead to the goal, avoiding local minima which is possible to run in real time conditions. The model is trained on a synthetic dataset and overall the system is evaluated at the Gazebo simulator. In this work we show that our proposed method allows avoiding local minima of trajectory optimization algorithm and successfully execute a trajectory 278 m long with various convex and nonconvex obstacles. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: 4 pages, 6 figures, IROS2022 Workshop: Artificial Intelligence for Social Robots Interacting with Humans in the Real World [intellect4hri]

arXiv:2209.08895 [pdf, other]

Best Axes Composition Extended: Multiple Gyroscopes and Accelerometers Data Fusion to Reduce Systematic Error

Authors: Marsel Faizullin, Gonzalo Ferrer

Abstract: Multiple rigidly attached Inertial Measurement Unit (IMU) sensors provide a richer flow of data compared to a single IMU. State-of-the-art methods follow a probabilistic model of IMU measurements based on the random nature of errors combined under a Bayesian framework. However, affordable low-grade IMUs, in addition, suffer from systematic errors due to their imperfections not covered by their cor… ▽ More Multiple rigidly attached Inertial Measurement Unit (IMU) sensors provide a richer flow of data compared to a single IMU. State-of-the-art methods follow a probabilistic model of IMU measurements based on the random nature of errors combined under a Bayesian framework. However, affordable low-grade IMUs, in addition, suffer from systematic errors due to their imperfections not covered by their corresponding probabilistic model. In this paper, we propose a method, the Best Axes Composition (BAC) of combining Multiple IMU (MIMU) sensors data for accurate 3D-pose estimation that takes into account both random and systematic errors by dynamically choosing the best IMU axes from the set of all available axes. We evaluate our approach on our MIMU visual-inertial sensor and compare the performance of the method with a purely probabilistic state-of-the-art approach of MIMU data fusion. We show that BAC outperforms the latter and achieves up to 20% accuracy improvement for both orientation and position estimation in open loop, but needs proper treatment to keep the obtained gain. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: Accepted to Robotics and Autonomous Systems journal. arXiv admin note: substantial text overlap with arXiv:2107.02632

arXiv:2208.01421 [pdf, other]

T4DT: Tensorizing Time for Learning Temporal 3D Visual Data

Authors: Mikhail Usvyatsov, Rafael Ballester-Rippoll, Lina Bashaeva, Konrad Schindler, Gonzalo Ferrer, Ivan Oseledets

Abstract: Unlike 2D raster images, there is no single dominant representation for 3D visual data processing. Different formats like point clouds, meshes, or implicit functions each have their strengths and weaknesses. Still, grid representations such as signed distance functions have attractive properties also in 3D. In particular, they offer constant-time random access and are eminently suitable for modern… ▽ More Unlike 2D raster images, there is no single dominant representation for 3D visual data processing. Different formats like point clouds, meshes, or implicit functions each have their strengths and weaknesses. Still, grid representations such as signed distance functions have attractive properties also in 3D. In particular, they offer constant-time random access and are eminently suitable for modern machine learning. Unfortunately, the storage size of a grid grows exponentially with its dimension. Hence they often exceed memory limits even at moderate resolution. This work proposes using low-rank tensor formats, including the Tucker, tensor train, and quantics tensor train decompositions, to compress time-varying 3D data. Our method iteratively computes, voxelizes, and compresses each frame's truncated signed distance function and applies tensor rank truncation to condense all frames into a single, compressed tensor that represents the entire 4D scene. We show that low-rank tensor compression is extremely compact to store and query time-varying signed distance functions. It significantly reduces the memory footprint of 4D scenes while remarkably preserving their geometric quality. Unlike existing, iterative learning-based approaches like DeepSDF and NeRF, our method uses a closed-form algorithm with theoretical guarantees. △ Less

Submitted 5 October, 2022; v1 submitted 2 August, 2022; originally announced August 2022.

arXiv:2206.14442 [pdf, other]

Conditioned Human Trajectory Prediction using Iterative Attention Blocks

Authors: Aleksey Postnikov, Aleksander Gamayunov, Gonzalo Ferrer

Abstract: Human motion prediction is key to understand social environments, with direct applications in robotics, surveillance, etc. We present a simple yet effective pedestrian trajectory prediction model aimed at pedestrians positions prediction in urban-like environments conditioned by the environment: map and surround agents. Our model is a neural-based architecture that can run several layers of attent… ▽ More Human motion prediction is key to understand social environments, with direct applications in robotics, surveillance, etc. We present a simple yet effective pedestrian trajectory prediction model aimed at pedestrians positions prediction in urban-like environments conditioned by the environment: map and surround agents. Our model is a neural-based architecture that can run several layers of attention blocks and transformers in an iterative sequential fashion, allowing to capture the important features in the environment that improve prediction. We show that without explicit introduction of social masks, dynamical models, social pooling layers, or complicated graph-like structures, it is possible to produce on par results with SoTA models, which makes our approach easily extendable and configurable, depending on the data available. We report results performing similarly with SoTA models on publicly available and extensible-used datasets with unimodal prediction metrics ADE and FDE. △ Less

Submitted 29 June, 2022; originally announced June 2022.

arXiv:2204.10211 [pdf, other]

SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis

Authors: Anastasiia Kornilova, Marsel Faizullin, Konstantin Pakulev, Andrey Sadkov, Denis Kukushkin, Azat Akhmetyanov, Timur Akhtyamov, Hekmat Taherinejad, Gonzalo Ferrer

Abstract: We present a dataset of 1000 video sequences of human portraits recorded in real and uncontrolled conditions by using a handheld smartphone accompanied by an external high-quality depth camera. The collected dataset contains 200 people captured in different poses and locations and its main purpose is to bridge the gap between raw measurements obtained from a smartphone and downstream applications,… ▽ More We present a dataset of 1000 video sequences of human portraits recorded in real and uncontrolled conditions by using a handheld smartphone accompanied by an external high-quality depth camera. The collected dataset contains 200 people captured in different poses and locations and its main purpose is to bridge the gap between raw measurements obtained from a smartphone and downstream applications, such as state estimation, 3D reconstruction, view synthesis, etc. The sensors employed in data collection are the smartphone's camera and Inertial Measurement Unit (IMU), and an external Azure Kinect DK depth camera software synchronized with sub-millisecond precision to the smartphone system. During the recording, the smartphone flash is used to provide a periodic secondary source of lightning. Accurate mask of the foremost person is provided as well as its impact on the camera alignment accuracy. For evaluation purposes, we compare multiple state-of-the-art camera alignment methods by using a Motion Capture system. We provide a smartphone visual-inertial benchmark for portrait capturing, where we report results for multiple methods and motivate further use of the provided trajectories, available in the dataset, in view synthesis and 3D reconstruction tasks. △ Less

Submitted 21 April, 2022; originally announced April 2022.

Comments: Accepted to CVPR'2022

arXiv:2204.05799 [pdf, other]

EVOPS Benchmark: Evaluation of Plane Segmentation from RGBD and LiDAR Data

Authors: Anastasiia Kornilova, Dmitrii Iarosh, Denis Kukushkin, Nikolai Goncharov, Pavel Mokeev, Arthur Saliou, Gonzalo Ferrer

Abstract: This paper provides the EVOPS dataset for plane segmentation from 3D data, both from RGBD images and LiDAR point clouds. We have designed two annotation methodologies (RGBD and LiDAR) running on well-known and widely-used datasets for SLAM evaluation and we have provided a complete set of benchmarking tools including point, planes and segmentation metrics. The data includes a total number of 10k R… ▽ More This paper provides the EVOPS dataset for plane segmentation from 3D data, both from RGBD images and LiDAR point clouds. We have designed two annotation methodologies (RGBD and LiDAR) running on well-known and widely-used datasets for SLAM evaluation and we have provided a complete set of benchmarking tools including point, planes and segmentation metrics. The data includes a total number of 10k RGBD and 7K LiDAR frames over different selected scenes which consist of high quality segmented planes. The experiments report quality of SOTA methods for RGBD plane segmentation on our annotated data. We also have provided learnable baseline for plane segmentation in LiDAR point clouds. All labeled data and benchmark tools used have been made publicly available at https://evops.netlify.app/. △ Less

Submitted 24 August, 2022; v1 submitted 12 April, 2022; originally announced April 2022.

Comments: Accepted to IROS'2022

arXiv:2112.04350 [pdf, other]

Transformer based trajectory prediction

Authors: Aleksey Postnikov, Aleksander Gamayunov, Gonzalo Ferrer

Abstract: To plan a safe and efficient route, an autonomous vehicle should anticipate future motions of other agents around it. Motion prediction is an extremely challenging task which recently gained significant attention of the research community. In this work, we present a simple and yet strong baseline for uncertainty aware motion prediction based purely on transformer neural networks, which has shown i… ▽ More To plan a safe and efficient route, an autonomous vehicle should anticipate future motions of other agents around it. Motion prediction is an extremely challenging task which recently gained significant attention of the research community. In this work, we present a simple and yet strong baseline for uncertainty aware motion prediction based purely on transformer neural networks, which has shown its effectiveness in conditions of domain change. While being easy-to-implement, the proposed approach achieves competitive performance and ranks 1$^{st}$ on the 2021 Shifts Vehicle Motion Prediction Competition. △ Less

Submitted 8 December, 2021; originally announced December 2021.

arXiv:2111.03552 [pdf, other]

doi 10.1109/JSEN.2022.3150973

SmartDepthSync: Open Source Synchronized Video Recording System of Smartphone RGB and Depth Camera Range Image Frames with Sub-millisecond Precision

Authors: Marsel Faizullin, Anastasiia Kornilova, Azat Akhmetyanov, Konstantin Pakulev, Andrey Sadkov, Gonzalo Ferrer

Abstract: Nowadays, smartphones can produce a synchronized (synced) stream of high-quality data, including RGB images, inertial measurements, and other data. Therefore, smartphones are becoming appealing sensor systems in the robotics community. Unfortunately, there is still the need for external supporting sensing hardware, such as a depth camera precisely synced with the smartphone sensors. In this pape… ▽ More Nowadays, smartphones can produce a synchronized (synced) stream of high-quality data, including RGB images, inertial measurements, and other data. Therefore, smartphones are becoming appealing sensor systems in the robotics community. Unfortunately, there is still the need for external supporting sensing hardware, such as a depth camera precisely synced with the smartphone sensors. In this paper, we propose a hardware-software recording system that presents a heterogeneous structure and contains a smartphone and an external depth camera for recording visual, depth, and inertial data that are mutually synchronized. The system is synced at the time and the frame levels: every RGB image frame from the smartphone camera is exposed at the same moment of time with a depth camera frame with sub-millisecond precision. We provide a method and a tool for sync performance evaluation that can be applied to any pair of depth and RGB cameras. Our system could be replicated, modified, or extended by employing our open-sourced materials. △ Less

Submitted 13 September, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

Comments: IEEE Sensors Journal paper

arXiv:2109.02965 [pdf, other]

CovarianceNet: Conditional Generative Model for Correct Covariance Prediction in Human Motion Prediction

Authors: Aleksey Postnikov, Aleksander Gamayunov, Gonzalo Ferrer

Abstract: The correct characterization of uncertainty when predicting human motion is equally important as the accuracy of this prediction. We present a new method to correctly predict the uncertainty associated with the predicted distribution of future trajectories. Our approach, CovariaceNet, is based on a Conditional Generative Model with Gaussian latent variables in order to predict the parameters of a… ▽ More The correct characterization of uncertainty when predicting human motion is equally important as the accuracy of this prediction. We present a new method to correctly predict the uncertainty associated with the predicted distribution of future trajectories. Our approach, CovariaceNet, is based on a Conditional Generative Model with Gaussian latent variables in order to predict the parameters of a bi-variate Gaussian distribution. The combination of CovarianceNet with a motion prediction model results in a hybrid approach that outputs a uni-modal distribution. We will show how some state of the art methods in motion prediction become overconfident when predicting uncertainty, according to our proposed metric and validated in the ETH data-set \cite{pellegrini2009you}. CovarianceNet correctly predicts uncertainty, which makes our method suitable for applications that use predicted distributions, e.g., planning or decision making. △ Less

Submitted 7 September, 2021; originally announced September 2021.

arXiv:2108.01654 [pdf, other]

Comparison of modern open-source visual SLAM approaches

Authors: Dinar Sharafutdinov, Mark Griguletskii, Pavel Kopanev, Mikhail Kurenkov, Gonzalo Ferrer, Aleksey Burkov, Aleksei Gonnochenko, Dzmitry Tsetserukou

Abstract: SLAM is one of the most fundamental areas of research in robotics and computer vision. State of the art solutions has advanced significantly in terms of accuracy and stability. Unfortunately, not all the approaches are available as open-source solutions and free to use. The results of some of them are difficult to reproduce, and there is a lack of comparison on common datasets. In our work, we mak… ▽ More SLAM is one of the most fundamental areas of research in robotics and computer vision. State of the art solutions has advanced significantly in terms of accuracy and stability. Unfortunately, not all the approaches are available as open-source solutions and free to use. The results of some of them are difficult to reproduce, and there is a lack of comparison on common datasets. In our work, we make a comparative analysis of state of the art open-source methods. We assess the algorithms based on accuracy, computational performance, robustness, and fault tolerance. Moreover, we present a comparison of datasets as well as an analysis of algorithms from a practical point of view. The findings of the work raise several crucial questions for SLAM researchers. △ Less

Submitted 4 February, 2023; v1 submitted 3 August, 2021; originally announced August 2021.

Comments: Preprint, 19 pages

arXiv:2107.02632 [pdf, other]

Best Axes Composition: Multiple Gyroscopes IMU Sensor Fusion to Reduce Systematic Error

Authors: Marsel Faizullin, Gonzalo Ferrer

Abstract: In this paper, we propose an algorithm to combine multiple cheap Inertial Measurement Unit (IMU) sensors to calculate 3D-orientations accurately. Our approach takes into account the inherent and non-negligible systematic error in the gyroscope model and provides a solution based on the error observed during previous instants of time. Our algorithm, the Best Axes Composition (BAC), chooses dynamica… ▽ More In this paper, we propose an algorithm to combine multiple cheap Inertial Measurement Unit (IMU) sensors to calculate 3D-orientations accurately. Our approach takes into account the inherent and non-negligible systematic error in the gyroscope model and provides a solution based on the error observed during previous instants of time. Our algorithm, the Best Axes Composition (BAC), chooses dynamically the most fitted axes among IMUs to improve the estimation performance. We compare our approach with a probabilistic Multiple IMU (MIMU) approach, and we validate our algorithm in our collected dataset. As a result, it only takes as few as 2 IMUs to significantly improve accuracy, while other MIMU approaches need a higher number of sensors to achieve the same results. △ Less

Submitted 22 July, 2021; v1 submitted 6 July, 2021; originally announced July 2021.

Comments: Accepted for the 10th European Conference on Mobile Robots (ECMR 2021)

arXiv:2107.02625 [pdf, other]

Open-Source LiDAR Time Synchronization System by Mimicking GNSS-clock

Authors: Marsel Faizullin, Anastasiia Kornilova, Gonzalo Ferrer

Abstract: Data fusion algorithms that employ LiDAR measurements, such as Visual-LiDAR, LiDAR-Inertial, or Multiple LiDAR Odometry and simultaneous localization and map** (SLAM) rely on precise timestam** schemes that grant synchronicity to data from LiDAR and other sensors. Poor synchronization performance, due to incorrect timestam** procedure, may negatively affect the algorithms' state estimation r… ▽ More Data fusion algorithms that employ LiDAR measurements, such as Visual-LiDAR, LiDAR-Inertial, or Multiple LiDAR Odometry and simultaneous localization and map** (SLAM) rely on precise timestam** schemes that grant synchronicity to data from LiDAR and other sensors. Poor synchronization performance, due to incorrect timestam** procedure, may negatively affect the algorithms' state estimation results. To provide highly accurate and precise synchronization between the sensors, we introduce an open-source hardware-software LiDAR to other sensors time synchronization system that exploits a dedicated hardware LiDAR time synchronization interface by providing emulated GNSS-clock to this interface, no physical GNSS-receiver is needed. The emulator is based on a general-purpose microcontroller and, due to concise hardware and software architecture, can be easily modified or extended for synchronization of sets of different sensors such as cameras, inertial measurement units (IMUs), wheel encoders, other LiDARs, etc. In the paper, we provide an example of such a system with synchronized LiDAR and IMU sensors. We conducted an evaluation of the sensors synchronization accuracy and precision, and state 1 microsecond performance. We compared our results with timestam** provided by ROS software and by a LiDAR inner clocking scheme to underline clear advantages over these two baseline methods. △ Less

Submitted 13 September, 2022; v1 submitted 6 July, 2021; originally announced July 2021.

Comments: Accepted to IEEE ISPCS 2022 Conference (International Symposium on Precision Clock Synchronization for Measurement, Control and Communication)

arXiv:2107.00987 [pdf, other]

Sub-millisecond Video Synchronization of Multiple Android Smartphones

Authors: Azat Akhmetyanov, Anastasiia Kornilova, Marsel Faizullin, David Pozo, Gonzalo Ferrer

Abstract: This paper addresses the problem of building an affordable easy-to-setup synchronized multi-view camera system, which is in demand for many Computer Vision and Robotics applications in high-dynamic environments. In our work, we propose a solution for this problem -- a publicly-available Android application for synchronized video recording on multiple smartphones with sub-millisecond accuracy. We p… ▽ More This paper addresses the problem of building an affordable easy-to-setup synchronized multi-view camera system, which is in demand for many Computer Vision and Robotics applications in high-dynamic environments. In our work, we propose a solution for this problem -- a publicly-available Android application for synchronized video recording on multiple smartphones with sub-millisecond accuracy. We present a generalized mathematical model of timestam** for Android smartphones and prove its applicability on 47 different physical devices. Also, we estimate the time drift parameter for those smartphones, which is less than 1.2 msec per minute for most of the considered devices, that makes smartphones' camera system a worthy analog for professional multi-view systems. Finally, we demonstrate Android-app performance on the camera system built from Android smartphones quantitatively on setup with lights and qualitatively -- on panorama stitching task. △ Less

Submitted 26 August, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

Comments: Accepted to conference IEEE Sensors'2021 as Lecture presentation

arXiv:2106.11351 [pdf, other]

doi 10.1109/ECMR50962.2021.9568822

Be your own Benchmark: No-Reference Trajectory Metric on Registered Point Clouds

Authors: Anastasiia Kornilova, Gonzalo Ferrer

Abstract: This paper addresses the problem of assessing trajectory quality in conditions when no ground truth poses are available or when their accuracy is not enough for the specific task - for example, small-scale map** in outdoor scenes. In our work, we propose a no-reference metric, Mutually Orthogonal Metric (MOM), that estimates the quality of the map from registered point clouds via the trajectory… ▽ More This paper addresses the problem of assessing trajectory quality in conditions when no ground truth poses are available or when their accuracy is not enough for the specific task - for example, small-scale map** in outdoor scenes. In our work, we propose a no-reference metric, Mutually Orthogonal Metric (MOM), that estimates the quality of the map from registered point clouds via the trajectory poses. MOM strongly correlates with full-reference trajectory metric Relative Pose Error, making it a trajectory benchmarking tool on setups where 3D sensing technologies are employed. We provide a mathematical foundation for such correlation and confirm it statistically in synthetic environments. Furthermore, since our metric uses a subset of points from mutually orthogonal surfaces, we provide an algorithm for the extraction of such subset and evaluate its performance in synthetic CARLA environment and on KITTI dataset. The code of the proposed metric is publicly available as pip-package. △ Less

Submitted 12 August, 2021; v1 submitted 21 June, 2021; originally announced June 2021.

Comments: Accepted for the 10th European Conference on Mobile Robots (ECMR 2021)

arXiv:2012.09963 [pdf, other]

Relightable 3D Head Portraits from a Smartphone Video

Authors: Artem Sevastopolsky, Savva Ignatiev, Gonzalo Ferrer, Evgeny Burnaev, Victor Lempitsky

Abstract: In this work, a system for creating a relightable 3D portrait of a human head is presented. Our neural pipeline operates on a sequence of frames captured by a smartphone camera with the flash blinking (flash-no flash sequence). A coarse point cloud reconstructed via structure-from-motion software and multi-view denoising is then used as a geometric proxy. Afterwards, a deep rendering network is tr… ▽ More In this work, a system for creating a relightable 3D portrait of a human head is presented. Our neural pipeline operates on a sequence of frames captured by a smartphone camera with the flash blinking (flash-no flash sequence). A coarse point cloud reconstructed via structure-from-motion software and multi-view denoising is then used as a geometric proxy. Afterwards, a deep rendering network is trained to regress dense albedo, normals, and environmental lighting maps for arbitrary new viewpoints. Effectively, the proxy geometry and the rendering network constitute a relightable 3D portrait model, that can be synthesized from an arbitrary viewpoint and under arbitrary lighting, e.g. directional light, point light, or an environment map. The model is fitted to the sequence of frames with human face-specific priors that enforce the plausibility of albedo-lighting decomposition and operates at the interactive frame rate. We evaluate the performance of the method under varying lighting conditions and at the extrapolated viewpoints and compare with existing relighting methods. △ Less

Submitted 17 December, 2020; originally announced December 2020.

arXiv:2011.00594 [pdf, other]

Random Fourier Features based SLAM

Authors: Yermek Kapushev, Anastasia Kishkun, Gonzalo Ferrer, Evgeny Burnaev

Abstract: This work is dedicated to simultaneous continuous-time trajectory estimation and map** based on Gaussian Processes (GP). State-of-the-art GP-based models for Simultaneous Localization and Map** (SLAM) are computationally efficient but can only be used with a restricted class of kernel functions. This paper provides the algorithm based on GP with Random Fourier Features (RFF) approximation for… ▽ More This work is dedicated to simultaneous continuous-time trajectory estimation and map** based on Gaussian Processes (GP). State-of-the-art GP-based models for Simultaneous Localization and Map** (SLAM) are computationally efficient but can only be used with a restricted class of kernel functions. This paper provides the algorithm based on GP with Random Fourier Features (RFF) approximation for SLAM without any constraints. The advantages of RFF for continuous-time SLAM are that we can consider a broader class of kernels and, at the same time, maintain computational complexity at reasonably low level by operating in the Fourier space of features. The accuracy-speed trade-off can be controlled by the number of features. Our experimental results on synthetic and real-world benchmarks demonstrate the cases in which our approach provides better results compared to the current state-of-the-art. △ Less

Submitted 6 September, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

arXiv:2009.04299 [pdf, other]

HSFM-$Σ$nn: Combining a Feedforward Motion Prediction Network and Covariance Prediction

Authors: A. Postnikov, A. Gamayunov, G. Ferrer

Abstract: In this paper, we propose a new method for motion prediction: HSFM-$Σ$nn. Our proposed method combines two different approaches: a feedforward network whose layers are model-based transition functions using the HSFM and a Neural Network (NN), on each of these layers, for covariance prediction. We will compare our method with classical methods for covariance estimation showing their limitations. We… ▽ More In this paper, we propose a new method for motion prediction: HSFM-$Σ$nn. Our proposed method combines two different approaches: a feedforward network whose layers are model-based transition functions using the HSFM and a Neural Network (NN), on each of these layers, for covariance prediction. We will compare our method with classical methods for covariance estimation showing their limitations. We will also compare with a learning-based approach, social-LSTM, showing that our method is more precise and efficient. △ Less

Submitted 9 September, 2020; originally announced September 2020.

arXiv:1609.01176 [pdf, other]

The Player Kernel: Learning Team Strengths Based on Implicit Player Contributions

Authors: Lucas Maystre, Victor Kristof, Antonio J. González Ferrer, Matthias Grossglauser

Abstract: In this work, we draw attention to a connection between skill-based models of game outcomes and Gaussian process classification models. The Gaussian process perspective enables a) a principled way of dealing with uncertainty and b) rich models, specified through kernel functions. Using this connection, we tackle the problem of predicting outcomes of football matches between national teams. We deve… ▽ More In this work, we draw attention to a connection between skill-based models of game outcomes and Gaussian process classification models. The Gaussian process perspective enables a) a principled way of dealing with uncertainty and b) rich models, specified through kernel functions. Using this connection, we tackle the problem of predicting outcomes of football matches between national teams. We develop a player kernel that relates any two football matches through the players lined up on the field. This makes it possible to share knowledge gained from observing matches between clubs (available in large quantities) and matches between national teams (available only in limited quantities). We evaluate our approach on the Euro 2008, 2012 and 2016 final tournaments. △ Less

Submitted 5 September, 2016; originally announced September 2016.

arXiv:1602.08158 [pdf, ps, other]

Associative Memories and Human-Robot Social Interaction

Authors: Gabriel J. Ferrer

Abstract: In this position paper, we discuss how the use of a cognitive architecture based on unsupervised clustering (the Kohonen Self-Organizing Map) enables us to meet our goals of efficient action selection in a mobile robot. This architecture provides several opportunities for human-robot interaction, and we discuss how its features facilitate these interactions. In this position paper, we discuss how the use of a cognitive architecture based on unsupervised clustering (the Kohonen Self-Organizing Map) enables us to meet our goals of efficient action selection in a mobile robot. This architecture provides several opportunities for human-robot interaction, and we discuss how its features facilitate these interactions. △ Less

Submitted 25 February, 2016; originally announced February 2016.

Comments: Presented at "2nd Workshop on Cognitive Architectures for Social Human-Robot Interaction 2016 (arXiv:1602.01868)

Report number: CogArch4sHRI/2016/03

Showing 1–27 of 27 results for author: Ferrer, G