Search | arXiv e-print repository

doi 10.1109/ICAR58858.2023.10407017

Exploring Deep Reinforcement Learning for Robust Target Tracking using Micro Aerial Vehicles

Authors: Alberto Dionigi, Mirko Leomanni, Alessandro Saviolo, Giuseppe Loianno, Gabriele Costante

Abstract: The capability to autonomously track a non-cooperative target is a key technological requirement for micro aerial vehicles. In this paper, we propose an output feedback control scheme based on deep reinforcement learning for controlling a micro aerial vehicle to persistently track a flying target while maintaining visual contact. The proposed method leverages relative position data for control, re… ▽ More The capability to autonomously track a non-cooperative target is a key technological requirement for micro aerial vehicles. In this paper, we propose an output feedback control scheme based on deep reinforcement learning for controlling a micro aerial vehicle to persistently track a flying target while maintaining visual contact. The proposed method leverages relative position data for control, relaxing the assumption of having access to full state information which is typical of related approaches in literature. Moreover, we exploit classical robustness indicators in the learning process through domain randomization to increase the robustness of the learned policy. Experimental results validate the proposed approach for target tracking, demonstrating high performance and robustness with respect to mass mismatches and control delays. The resulting nonlinear controller significantly outperforms a standard model-based design in numerous off-nominal scenarios. △ Less

Submitted 7 February, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

Journal ref: 2023 21st International Conference on Advanced Robotics (ICAR)

arXiv:2308.16874 [pdf, other]

doi 10.1109/LRA.2024.3385700

D-VAT: End-to-End Visual Active Tracking for Micro Aerial Vehicles

Authors: Alberto Dionigi, Simone Felicioni, Mirko Leomanni, Gabriele Costante

Abstract: Visual active tracking is a growing research topic in robotics due to its key role in applications such as human assistance, disaster recovery, and surveillance. In contrast to passive tracking, active tracking approaches combine vision and control capabilities to detect and actively track the target. Most of the work in this area focuses on ground robots, while the very few contributions on aeria… ▽ More Visual active tracking is a growing research topic in robotics due to its key role in applications such as human assistance, disaster recovery, and surveillance. In contrast to passive tracking, active tracking approaches combine vision and control capabilities to detect and actively track the target. Most of the work in this area focuses on ground robots, while the very few contributions on aerial platforms still pose important design constraints that limit their applicability. To overcome these limitations, in this paper we propose D-VAT, a novel end-to-end visual active tracking methodology based on deep reinforcement learning that is tailored to micro aerial vehicle platforms. The D-VAT agent computes the vehicle thrust and angular velocity commands needed to track the target by directly processing monocular camera measurements. We show that the proposed approach allows for precise and collision-free tracking operations, outperforming different state-of-the-art baselines on simulated environments which differ significantly from those encountered during training. Moreover, we demonstrate a smooth real-world transition to a quadrotor platform with mixed-reality. △ Less

Submitted 7 April, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

Journal ref: IEEE Robotics and Automation Letters 2024

arXiv:2303.08181 [pdf, other]

doi 10.1109/ICRA48891.2023.10160726

GaPT: Gaussian Process Toolkit for Online Regression with Application to Learning Quadrotor Dynamics

Authors: Francesco Crocetti, Jeffrey Mao, Alessandro Saviolo, Gabriele Costante, Giuseppe Loianno

Abstract: Gaussian Processes (GPs) are expressive models for capturing signal statistics and expressing prediction uncertainty. As a result, the robotics community has gathered interest in leveraging these methods for inference, planning, and control. Unfortunately, despite providing a closed-form inference solution, GPs are non-parametric models that typically scale cubically with the dataset size, hence m… ▽ More Gaussian Processes (GPs) are expressive models for capturing signal statistics and expressing prediction uncertainty. As a result, the robotics community has gathered interest in leveraging these methods for inference, planning, and control. Unfortunately, despite providing a closed-form inference solution, GPs are non-parametric models that typically scale cubically with the dataset size, hence making them difficult to be used especially on onboard Size, Weight, and Power (SWaP) constrained aerial robots. In addition, the integration of popular libraries with GPs for different kernels is not trivial. In this paper, we propose GaPT, a novel toolkit that converts GPs to their state space form and performs regression in linear time. GaPT is designed to be highly compatible with several optimizers popular in robotics. We thoroughly validate the proposed approach for learning quadrotor dynamics on both single and multiple input GP settings. GaPT accurately captures the system behavior in multiple flight regimes and operating conditions, including those producing highly nonlinear effects such as aerodynamic forces and rotor interactions. Moreover, the results demonstrate the superior computational performance of GaPT compared to a classical GP inference approach on both single and multi-input settings especially when considering large number of data points, enabling real-time regression speed on embedded platforms used on SWaP-constrained aerial robots. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted for ICRA 2023

Journal ref: 2023 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2212.06519 [pdf, other]

doi 10.1109/MN55117.2022.9887703

Development of a Cooperative Localization System using a UWB Network and BLE Technology

Authors: Valerio Brunacci, Alessio De Angelis, Gabriele Costante

Abstract: This paper presents the development of a system able to estimate the 2D relative position of nodes in a wireless network, based on distance measurements between the nodes. The system uses ultra wide band ranging technology and the Bluetooth Low Energy protocol to acquire data. Furthermore, a nonlinear least squares problem is formulated and solved numerically for estimating the relative positions… ▽ More This paper presents the development of a system able to estimate the 2D relative position of nodes in a wireless network, based on distance measurements between the nodes. The system uses ultra wide band ranging technology and the Bluetooth Low Energy protocol to acquire data. Furthermore, a nonlinear least squares problem is formulated and solved numerically for estimating the relative positions of the nodes. The localization performance of the system is validated by experimental tests, demonstrating the capability of measuring the relative position of a network comprised of 4 nodes with an accuracy of the order of 3 cm and an update rate of 10 Hz. This shows the feasibility of applying the proposed system for multi-robot cooperative localization and formation control scenarios. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: 6 pages, 12 figures, 2022 IEEE International Symposium on Measurements & Networking (M&N)

arXiv:2211.10336 [pdf, other]

doi 10.1109/MED51440.2021.9480241

Tire-road friction estimation and uncertainty assessment to improve electric aircraft braking system

Authors: Francesco Crocetti, G. Costante, M. L. Fravolini, P. Valigi

Abstract: The accurate online estimation of the road-friction coefficient is an essential feature for any advanced brake control system. In this study, a data-driven scheme based on a MLP Neural Net is proposed to estimate the optimum friction coefficient as a function of windowed slip-friction measurements. A stochastic NN weights drop-out mechanism is used to online estimate the confidence interval of the… ▽ More The accurate online estimation of the road-friction coefficient is an essential feature for any advanced brake control system. In this study, a data-driven scheme based on a MLP Neural Net is proposed to estimate the optimum friction coefficient as a function of windowed slip-friction measurements. A stochastic NN weights drop-out mechanism is used to online estimate the confidence interval of the estimated best friction coefficient thus providing a characterization of the epistemic uncertainty associated to the NN block. Open loop and closed loop simulations of the landing phase of an aircraft on an unknown surface are used to show the potentiality and efficacy of the proposed robust friction estimation approach. △ Less

Submitted 14 November, 2022; originally announced November 2022.

arXiv:2211.02558 [pdf, other]

doi 10.1109/MED48518.2020.9182792

A Data-Driven Slip Estimation Approach for Effective Braking Control under Varying Road Conditions

Authors: F. Crocetti, G. Costante, M. L. Fravolini, P. Valigi

Abstract: The performances of braking control systems for robotic platforms, e.g., assisted and autonomous vehicles, airplanes and drones, are deeply influenced by the road-tire friction experienced during the maneuver. Therefore, the availability of accurate estimation algorithms is of major importance in the development of advanced control schemes. The focus of this paper is on the estimation problem. In… ▽ More The performances of braking control systems for robotic platforms, e.g., assisted and autonomous vehicles, airplanes and drones, are deeply influenced by the road-tire friction experienced during the maneuver. Therefore, the availability of accurate estimation algorithms is of major importance in the development of advanced control schemes. The focus of this paper is on the estimation problem. In particular, a novel estimation algorithm is proposed, based on a multi-layer neural network. The training is based on a synthetic data set, derived from a widely used friction model. The open loop performances of the proposed algorithm are evaluated in a number of simulated scenarios. Moreover, different control schemes are used to test the closed loop scenario, where the estimated optimal slip is used as the set-point. The experimental results and the comparison with a model based baseline show that the proposed approach can provide an effective best slip estimation. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2105.04900 [pdf]

doi 10.1038/s41598-023-38400-6

Forecasting consumer confidence through semantic network analysis of online news

Authors: A. Fronzetti Colladon, F. Grippa, B. Guardabascio, G. Costante, F. Ravazzolo

Abstract: This research studies the impact of online news on social and economic consumer perceptions through semantic network analysis. Using over 1.8 million online articles on Italian media covering four years, we calculate the semantic importance of specific economic-related keywords to see if words appearing in the articles could anticipate consumers' judgments about the economic situation and the Cons… ▽ More This research studies the impact of online news on social and economic consumer perceptions through semantic network analysis. Using over 1.8 million online articles on Italian media covering four years, we calculate the semantic importance of specific economic-related keywords to see if words appearing in the articles could anticipate consumers' judgments about the economic situation and the Consumer Confidence Index. We use an innovative approach to analyze big textual data, combining methods and tools of text mining and social network analysis. Results show a strong predictive power for the judgments about the current households and national situation. Our indicator offers a complementary approach to estimating consumer confidence, lessening the limitations of traditional survey-based methods. △ Less

Submitted 21 July, 2023; v1 submitted 11 May, 2021; originally announced May 2021.

ACM Class: J.4

Journal ref: Scientific Reports 13, 11785 (2023)

arXiv:2102.05067 [pdf, other]

The Role of the Input in Natural Language Video Description

Authors: Silvia Cascianelli, Gabriele Costante, Alessandro Devo, Thomas A. Ciarfuglia, Paolo Valigi, Mario L. Fravolini

Abstract: Natural Language Video Description (NLVD) has recently received strong interest in the Computer Vision, Natural Language Processing (NLP), Multimedia, and Autonomous Robotics communities. The State-of-the-Art (SotA) approaches obtained remarkable results when tested on the benchmark datasets. However, those approaches poorly generalize to new datasets. In addition, none of the existing works focus… ▽ More Natural Language Video Description (NLVD) has recently received strong interest in the Computer Vision, Natural Language Processing (NLP), Multimedia, and Autonomous Robotics communities. The State-of-the-Art (SotA) approaches obtained remarkable results when tested on the benchmark datasets. However, those approaches poorly generalize to new datasets. In addition, none of the existing works focus on the processing of the input to the NLVD systems, which is both visual and textual. In this work, it is presented an extensive study dealing with the role of the visual input, evaluated with respect to the overall NLP performance. This is achieved performing data augmentation of the visual component, applying common transformations to model camera distortions, noise, lighting, and camera positioning, that are typical in real-world operative scenarios. A t-SNE based analysis is proposed to evaluate the effects of the considered transformations on the overall visual data distribution. For this study, it is considered the English subset of Microsoft Research Video Description (MSVD) dataset, which is used commonly for NLVD. It was observed that this dataset contains a relevant amount of syntactic and semantic errors. These errors have been amended manually, and the new version of the dataset (called MSVD-v2) is used in the experimentation. The MSVD-v2 dataset is released to help to gain insight into the NLVD problem. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: In IEEE Transactions on Multimedia

Journal ref: IEEE Transactions on Multimedia, 22(1), 271-283 (2019)

arXiv:2009.13475 [pdf, other]

Enhancing Continuous Control of Mobile Robots for End-to-End Visual Active Tracking

Authors: Alessandro Devo, Alberto Dionigi, Gabriele Costante

Abstract: In the last decades, visual target tracking has been one of the primary research interests of the Robotics research community. The recent advances in Deep Learning technologies have made the exploitation of visual tracking approaches effective and possible in a wide variety of applications, ranging from automotive to surveillance and human assistance. However, the majority of the existing works fo… ▽ More In the last decades, visual target tracking has been one of the primary research interests of the Robotics research community. The recent advances in Deep Learning technologies have made the exploitation of visual tracking approaches effective and possible in a wide variety of applications, ranging from automotive to surveillance and human assistance. However, the majority of the existing works focus exclusively on passive visual tracking, i.e., tracking elements in sequences of images by assuming that no actions can be taken to adapt the camera position to the motion of the tracked entity. On the contrary, in this work, we address visual active tracking, in which the tracker has to actively search for and track a specified target. Current State-of-the-Art approaches use Deep Reinforcement Learning (DRL) techniques to address the problem in an end-to-end manner. However, two main problems arise: i) most of the contributions focus only on discrete action spaces and the ones that consider continuous control do not achieve the same level of performance; and ii) if not properly tuned, DRL models can be challenging to train, resulting in a considerably slow learning progress and poor final performance. To address these challenges, we propose a novel DRL-based visual active tracking system that provides continuous action policies. To accelerate training and improve the overall performance, we introduce additional objective functions and a Heuristic Trajectory Generator (HTG) to facilitate learning. Through an extensive experimentation, we show that our method can reach and surpass other State-of-the-Art approaches performances, and demonstrate that, even if trained exclusively in simulation, it can successfully perform visual active tracking even in real scenarios. △ Less

Submitted 28 September, 2020; originally announced September 2020.

arXiv:1803.05387 [pdf, other]

Towards Monocular Digital Elevation Model (DEM) Estimation by Convolutional Neural Networks - Application on Synthetic Aperture Radar Images

Authors: Gabriele Costante, Thomas A. Ciarfuglia, Filippo Biondi

Abstract: Synthetic aperture radar (SAR) interferometry (InSAR) is performed using repeat-pass geometry. InSAR technique is used to estimate the topographic reconstruction of the earth surface. The main problem of the range-Doppler focusing technique is the nature of the two-dimensional SAR result, affected by the layover indetermination. In order to resolve this problem, a minimum of two sensor acquisition… ▽ More Synthetic aperture radar (SAR) interferometry (InSAR) is performed using repeat-pass geometry. InSAR technique is used to estimate the topographic reconstruction of the earth surface. The main problem of the range-Doppler focusing technique is the nature of the two-dimensional SAR result, affected by the layover indetermination. In order to resolve this problem, a minimum of two sensor acquisitions, separated by a baseline and extended in the cross-slant-range, are needed. However, given its multi-temporal nature, these techniques are vulnerable to atmosphere and Earth environment parameters variation in addition to physical platform instabilities. Furthermore, either two radars are needed or an interferometric cycle is required (that spans from days to weeks), which makes real time DEM estimation impossible. In this work, the authors propose a novel experimental alternative to the InSAR method that uses single-pass acquisitions, using a data driven approach implemented by Deep Neural Networks. We propose a fully Convolutional Neural Network (CNN) Encoder-Decoder architecture, training it on radar images in order to estimate DEMs from single pass image acquisitions. Our results on a set of Sentinel images show that this method is able to learn to some extent the statistical properties of the DEM. The results of this exploratory analysis are encouraging and open the way to the solution of single-pass DEM estimation problem with data driven approaches. △ Less

Submitted 14 March, 2018; originally announced March 2018.

Comments: Accepted for publication in Proceedings of the 12th European Conference on Synthetic Aperture Radar

arXiv:1709.08480 [pdf, other]

doi 10.1109/LRA.2018.2800083

J-MOD$^{2}$: Joint Monocular Obstacle Detection and Depth Estimation

Authors: Michele Mancini, Gabriele Costante, Paolo Valigi, Thomas A. Ciarfuglia

Abstract: In this work, we propose an end-to-end deep architecture that jointly learns to detect obstacles and estimate their depth for MAV flight applications. Most of the existing approaches either rely on Visual SLAM systems or on depth estimation models to build 3D maps and detect obstacles. However, for the task of avoiding obstacles this level of complexity is not required. Recent works have proposed… ▽ More In this work, we propose an end-to-end deep architecture that jointly learns to detect obstacles and estimate their depth for MAV flight applications. Most of the existing approaches either rely on Visual SLAM systems or on depth estimation models to build 3D maps and detect obstacles. However, for the task of avoiding obstacles this level of complexity is not required. Recent works have proposed multi task architectures to both perform scene understanding and depth estimation. We follow their track and propose a specific architecture to jointly estimate depth and obstacles, without the need to compute a global map, but maintaining compatibility with a global SLAM system if needed. The network architecture is devised to exploit the joint information of the obstacle detection task, that produces more reliable bounding boxes, with the depth estimation one, increasing the robustness of both to scenario changes. We call this architecture J-MOD$^{2}$. We test the effectiveness of our approach with experiments on sequences with different appearance and focal lengths and compare it to SotA multi task methods that jointly perform semantic segmentation and depth estimation. In addition, we show the integration in a full system using a set of simulated navigation experiments where a MAV explores an unknown scenario and plans safe trajectories by using our detection model. △ Less

Submitted 13 December, 2017; v1 submitted 25 September, 2017; originally announced September 2017.

Journal ref: IEEE Robotics and Automation Letters, July 2018

arXiv:1709.06019 [pdf, other]

doi 10.1109/LRA.2018.2803211

LS-VO: Learning Dense Optical Subspace for Robust Visual Odometry Estimation

Authors: Gabriele Costante, Thomas A. Ciarfuglia

Abstract: This work proposes a novel deep network architecture to solve the camera Ego-Motion estimation problem. A motion estimation network generally learns features similar to Optical Flow (OF) fields starting from sequences of images. This OF can be described by a lower dimensional latent space. Previous research has shown how to find linear approximations of this space. We propose to use an Auto-Encode… ▽ More This work proposes a novel deep network architecture to solve the camera Ego-Motion estimation problem. A motion estimation network generally learns features similar to Optical Flow (OF) fields starting from sequences of images. This OF can be described by a lower dimensional latent space. Previous research has shown how to find linear approximations of this space. We propose to use an Auto-Encoder network to find a non-linear representation of the OF manifold. In addition, we propose to learn the latent space jointly with the estimation task, so that the learned OF features become a more robust description of the OF input. We call this novel architecture LS-VO. The experiments show that LS-VO achieves a considerable increase in performances in respect to baselines, while the number of parameters of the estimation network only slightly increases. △ Less

Submitted 12 December, 2017; v1 submitted 18 September, 2017; originally announced September 2017.

arXiv:1607.06349 [pdf, ps, other]

Fast Robust Monocular Depth Estimation for Obstacle Detection with Fully Convolutional Networks

Authors: Michele Mancini, Gabriele Costante, Paolo Valigi, Thomas A. Ciarfuglia

Abstract: Obstacle Detection is a central problem for any robotic system, and critical for autonomous systems that travel at high speeds in unpredictable environment. This is often achieved through scene depth estimation, by various means. When fast motion is considered, the detection range must be longer enough to allow for safe avoidance and path planning. Current solutions often make assumption on the mo… ▽ More Obstacle Detection is a central problem for any robotic system, and critical for autonomous systems that travel at high speeds in unpredictable environment. This is often achieved through scene depth estimation, by various means. When fast motion is considered, the detection range must be longer enough to allow for safe avoidance and path planning. Current solutions often make assumption on the motion of the vehicle that limit their applicability, or work at very limited ranges due to intrinsic constraints. We propose a novel appearance-based Object Detection system that is able to detect obstacles at very long range and at a very high speed (~300Hz), without making assumptions on the type of motion. We achieve these results using a Deep Neural Network approach trained on real and synthetic images and trading some depth accuracy for fast, robust and consistent operation. We show how photo-realistic synthetic images are able to solve the problem of training set dimension and variety typical of machine learning approaches, and how our system is robust to massive blurring of test images. △ Less

Submitted 21 July, 2016; originally announced July 2016.

Comments: Accepted for publication in the Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2016)

arXiv:1605.04151 [pdf, other]

Perception-aware Path Planning

Authors: Gabriele Costante, Christian Forster, Jeffrey Delmerico, Paolo Valigi, Davide Scaramuzza

Abstract: In this paper, we give a double twist to the problem of planning under uncertainty. State-of-the-art planners seek to minimize the localization uncertainty by only considering the geometric structure of the scene. In this paper, we argue that motion planning for vision-controlled robots should be perception aware in that the robot should also favor texture-rich areas to minimize the localization u… ▽ More In this paper, we give a double twist to the problem of planning under uncertainty. State-of-the-art planners seek to minimize the localization uncertainty by only considering the geometric structure of the scene. In this paper, we argue that motion planning for vision-controlled robots should be perception aware in that the robot should also favor texture-rich areas to minimize the localization uncertainty during a goal-reaching task. Thus, we describe how to optimally incorporate the photometric information (i.e., texture) of the scene, in addition to the the geometric one, to compute the uncertainty of vision-based localization during path planning. To avoid the caveats of feature-based localization systems (i.e., dependence on feature type and user-defined thresholds), we use dense, direct methods. This allows us to compute the localization uncertainty directly from the intensity values of every pixel in the image. We also describe how to compute trajectories online, considering also scenarios with no prior knowledge about the map. The proposed framework is general and can easily be adapted to different robotic platforms and scenarios. The effectiveness of our approach is demonstrated with extensive experiments in both simulated and real-world environments using a vision-controlled micro aerial vehicle. △ Less

Submitted 10 February, 2017; v1 submitted 13 May, 2016; originally announced May 2016.

Comments: 16 pages, 20 figures, revised version. Conditionally accepted for IEEE Transactions on Robotics

Showing 1–14 of 14 results for author: Costante, G