-
Exploring Deep Reinforcement Learning for Robust Target Tracking using Micro Aerial Vehicles
Authors:
Alberto Dionigi,
Mirko Leomanni,
Alessandro Saviolo,
Giuseppe Loianno,
Gabriele Costante
Abstract:
The capability to autonomously track a non-cooperative target is a key technological requirement for micro aerial vehicles. In this paper, we propose an output feedback control scheme based on deep reinforcement learning for controlling a micro aerial vehicle to persistently track a flying target while maintaining visual contact. The proposed method leverages relative position data for control, re…
▽ More
The capability to autonomously track a non-cooperative target is a key technological requirement for micro aerial vehicles. In this paper, we propose an output feedback control scheme based on deep reinforcement learning for controlling a micro aerial vehicle to persistently track a flying target while maintaining visual contact. The proposed method leverages relative position data for control, relaxing the assumption of having access to full state information which is typical of related approaches in literature. Moreover, we exploit classical robustness indicators in the learning process through domain randomization to increase the robustness of the learned policy. Experimental results validate the proposed approach for target tracking, demonstrating high performance and robustness with respect to mass mismatches and control delays. The resulting nonlinear controller significantly outperforms a standard model-based design in numerous off-nominal scenarios.
△ Less
Submitted 7 February, 2024; v1 submitted 29 December, 2023;
originally announced December 2023.
-
D-VAT: End-to-End Visual Active Tracking for Micro Aerial Vehicles
Authors:
Alberto Dionigi,
Simone Felicioni,
Mirko Leomanni,
Gabriele Costante
Abstract:
Visual active tracking is a growing research topic in robotics due to its key role in applications such as human assistance, disaster recovery, and surveillance. In contrast to passive tracking, active tracking approaches combine vision and control capabilities to detect and actively track the target. Most of the work in this area focuses on ground robots, while the very few contributions on aeria…
▽ More
Visual active tracking is a growing research topic in robotics due to its key role in applications such as human assistance, disaster recovery, and surveillance. In contrast to passive tracking, active tracking approaches combine vision and control capabilities to detect and actively track the target. Most of the work in this area focuses on ground robots, while the very few contributions on aerial platforms still pose important design constraints that limit their applicability. To overcome these limitations, in this paper we propose D-VAT, a novel end-to-end visual active tracking methodology based on deep reinforcement learning that is tailored to micro aerial vehicle platforms. The D-VAT agent computes the vehicle thrust and angular velocity commands needed to track the target by directly processing monocular camera measurements. We show that the proposed approach allows for precise and collision-free tracking operations, outperforming different state-of-the-art baselines on simulated environments which differ significantly from those encountered during training. Moreover, we demonstrate a smooth real-world transition to a quadrotor platform with mixed-reality.
△ Less
Submitted 7 April, 2024; v1 submitted 31 August, 2023;
originally announced August 2023.
-
GaPT: Gaussian Process Toolkit for Online Regression with Application to Learning Quadrotor Dynamics
Authors:
Francesco Crocetti,
Jeffrey Mao,
Alessandro Saviolo,
Gabriele Costante,
Giuseppe Loianno
Abstract:
Gaussian Processes (GPs) are expressive models for capturing signal statistics and expressing prediction uncertainty. As a result, the robotics community has gathered interest in leveraging these methods for inference, planning, and control. Unfortunately, despite providing a closed-form inference solution, GPs are non-parametric models that typically scale cubically with the dataset size, hence m…
▽ More
Gaussian Processes (GPs) are expressive models for capturing signal statistics and expressing prediction uncertainty. As a result, the robotics community has gathered interest in leveraging these methods for inference, planning, and control. Unfortunately, despite providing a closed-form inference solution, GPs are non-parametric models that typically scale cubically with the dataset size, hence making them difficult to be used especially on onboard Size, Weight, and Power (SWaP) constrained aerial robots. In addition, the integration of popular libraries with GPs for different kernels is not trivial. In this paper, we propose GaPT, a novel toolkit that converts GPs to their state space form and performs regression in linear time. GaPT is designed to be highly compatible with several optimizers popular in robotics. We thoroughly validate the proposed approach for learning quadrotor dynamics on both single and multiple input GP settings. GaPT accurately captures the system behavior in multiple flight regimes and operating conditions, including those producing highly nonlinear effects such as aerodynamic forces and rotor interactions. Moreover, the results demonstrate the superior computational performance of GaPT compared to a classical GP inference approach on both single and multi-input settings especially when considering large number of data points, enabling real-time regression speed on embedded platforms used on SWaP-constrained aerial robots.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Development of a Cooperative Localization System using a UWB Network and BLE Technology
Authors:
Valerio Brunacci,
Alessio De Angelis,
Gabriele Costante
Abstract:
This paper presents the development of a system able to estimate the 2D relative position of nodes in a wireless network, based on distance measurements between the nodes. The system uses ultra wide band ranging technology and the Bluetooth Low Energy protocol to acquire data. Furthermore, a nonlinear least squares problem is formulated and solved numerically for estimating the relative positions…
▽ More
This paper presents the development of a system able to estimate the 2D relative position of nodes in a wireless network, based on distance measurements between the nodes. The system uses ultra wide band ranging technology and the Bluetooth Low Energy protocol to acquire data. Furthermore, a nonlinear least squares problem is formulated and solved numerically for estimating the relative positions of the nodes. The localization performance of the system is validated by experimental tests, demonstrating the capability of measuring the relative position of a network comprised of 4 nodes with an accuracy of the order of 3 cm and an update rate of 10 Hz. This shows the feasibility of applying the proposed system for multi-robot cooperative localization and formation control scenarios.
△ Less
Submitted 13 December, 2022;
originally announced December 2022.
-
Tire-road friction estimation and uncertainty assessment to improve electric aircraft braking system
Authors:
Francesco Crocetti,
G. Costante,
M. L. Fravolini,
P. Valigi
Abstract:
The accurate online estimation of the road-friction coefficient is an essential feature for any advanced brake control system. In this study, a data-driven scheme based on a MLP Neural Net is proposed to estimate the optimum friction coefficient as a function of windowed slip-friction measurements. A stochastic NN weights drop-out mechanism is used to online estimate the confidence interval of the…
▽ More
The accurate online estimation of the road-friction coefficient is an essential feature for any advanced brake control system. In this study, a data-driven scheme based on a MLP Neural Net is proposed to estimate the optimum friction coefficient as a function of windowed slip-friction measurements. A stochastic NN weights drop-out mechanism is used to online estimate the confidence interval of the estimated best friction coefficient thus providing a characterization of the epistemic uncertainty associated to the NN block. Open loop and closed loop simulations of the landing phase of an aircraft on an unknown surface are used to show the potentiality and efficacy of the proposed robust friction estimation approach.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
A Data-Driven Slip Estimation Approach for Effective Braking Control under Varying Road Conditions
Authors:
F. Crocetti,
G. Costante,
M. L. Fravolini,
P. Valigi
Abstract:
The performances of braking control systems for robotic platforms, e.g., assisted and autonomous vehicles, airplanes and drones, are deeply influenced by the road-tire friction experienced during the maneuver. Therefore, the availability of accurate estimation algorithms is of major importance in the development of advanced control schemes. The focus of this paper is on the estimation problem. In…
▽ More
The performances of braking control systems for robotic platforms, e.g., assisted and autonomous vehicles, airplanes and drones, are deeply influenced by the road-tire friction experienced during the maneuver. Therefore, the availability of accurate estimation algorithms is of major importance in the development of advanced control schemes. The focus of this paper is on the estimation problem. In particular, a novel estimation algorithm is proposed, based on a multi-layer neural network. The training is based on a synthetic data set, derived from a widely used friction model. The open loop performances of the proposed algorithm are evaluated in a number of simulated scenarios. Moreover, different control schemes are used to test the closed loop scenario, where the estimated optimal slip is used as the set-point. The experimental results and the comparison with a model based baseline show that the proposed approach can provide an effective best slip estimation.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Forecasting consumer confidence through semantic network analysis of online news
Authors:
A. Fronzetti Colladon,
F. Grippa,
B. Guardabascio,
G. Costante,
F. Ravazzolo
Abstract:
This research studies the impact of online news on social and economic consumer perceptions through semantic network analysis. Using over 1.8 million online articles on Italian media covering four years, we calculate the semantic importance of specific economic-related keywords to see if words appearing in the articles could anticipate consumers' judgments about the economic situation and the Cons…
▽ More
This research studies the impact of online news on social and economic consumer perceptions through semantic network analysis. Using over 1.8 million online articles on Italian media covering four years, we calculate the semantic importance of specific economic-related keywords to see if words appearing in the articles could anticipate consumers' judgments about the economic situation and the Consumer Confidence Index. We use an innovative approach to analyze big textual data, combining methods and tools of text mining and social network analysis. Results show a strong predictive power for the judgments about the current households and national situation. Our indicator offers a complementary approach to estimating consumer confidence, lessening the limitations of traditional survey-based methods.
△ Less
Submitted 21 July, 2023; v1 submitted 11 May, 2021;
originally announced May 2021.
-
The Role of the Input in Natural Language Video Description
Authors:
Silvia Cascianelli,
Gabriele Costante,
Alessandro Devo,
Thomas A. Ciarfuglia,
Paolo Valigi,
Mario L. Fravolini
Abstract:
Natural Language Video Description (NLVD) has recently received strong interest in the Computer Vision, Natural Language Processing (NLP), Multimedia, and Autonomous Robotics communities. The State-of-the-Art (SotA) approaches obtained remarkable results when tested on the benchmark datasets. However, those approaches poorly generalize to new datasets. In addition, none of the existing works focus…
▽ More
Natural Language Video Description (NLVD) has recently received strong interest in the Computer Vision, Natural Language Processing (NLP), Multimedia, and Autonomous Robotics communities. The State-of-the-Art (SotA) approaches obtained remarkable results when tested on the benchmark datasets. However, those approaches poorly generalize to new datasets. In addition, none of the existing works focus on the processing of the input to the NLVD systems, which is both visual and textual. In this work, it is presented an extensive study dealing with the role of the visual input, evaluated with respect to the overall NLP performance. This is achieved performing data augmentation of the visual component, applying common transformations to model camera distortions, noise, lighting, and camera positioning, that are typical in real-world operative scenarios. A t-SNE based analysis is proposed to evaluate the effects of the considered transformations on the overall visual data distribution. For this study, it is considered the English subset of Microsoft Research Video Description (MSVD) dataset, which is used commonly for NLVD. It was observed that this dataset contains a relevant amount of syntactic and semantic errors. These errors have been amended manually, and the new version of the dataset (called MSVD-v2) is used in the experimentation. The MSVD-v2 dataset is released to help to gain insight into the NLVD problem.
△ Less
Submitted 9 February, 2021;
originally announced February 2021.
-
Enhancing Continuous Control of Mobile Robots for End-to-End Visual Active Tracking
Authors:
Alessandro Devo,
Alberto Dionigi,
Gabriele Costante
Abstract:
In the last decades, visual target tracking has been one of the primary research interests of the Robotics research community. The recent advances in Deep Learning technologies have made the exploitation of visual tracking approaches effective and possible in a wide variety of applications, ranging from automotive to surveillance and human assistance. However, the majority of the existing works fo…
▽ More
In the last decades, visual target tracking has been one of the primary research interests of the Robotics research community. The recent advances in Deep Learning technologies have made the exploitation of visual tracking approaches effective and possible in a wide variety of applications, ranging from automotive to surveillance and human assistance. However, the majority of the existing works focus exclusively on passive visual tracking, i.e., tracking elements in sequences of images by assuming that no actions can be taken to adapt the camera position to the motion of the tracked entity. On the contrary, in this work, we address visual active tracking, in which the tracker has to actively search for and track a specified target. Current State-of-the-Art approaches use Deep Reinforcement Learning (DRL) techniques to address the problem in an end-to-end manner. However, two main problems arise: i) most of the contributions focus only on discrete action spaces and the ones that consider continuous control do not achieve the same level of performance; and ii) if not properly tuned, DRL models can be challenging to train, resulting in a considerably slow learning progress and poor final performance. To address these challenges, we propose a novel DRL-based visual active tracking system that provides continuous action policies. To accelerate training and improve the overall performance, we introduce additional objective functions and a Heuristic Trajectory Generator (HTG) to facilitate learning. Through an extensive experimentation, we show that our method can reach and surpass other State-of-the-Art approaches performances, and demonstrate that, even if trained exclusively in simulation, it can successfully perform visual active tracking even in real scenarios.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
Towards Monocular Digital Elevation Model (DEM) Estimation by Convolutional Neural Networks - Application on Synthetic Aperture Radar Images
Authors:
Gabriele Costante,
Thomas A. Ciarfuglia,
Filippo Biondi
Abstract:
Synthetic aperture radar (SAR) interferometry (InSAR) is performed using repeat-pass geometry. InSAR technique is used to estimate the topographic reconstruction of the earth surface. The main problem of the range-Doppler focusing technique is the nature of the two-dimensional SAR result, affected by the layover indetermination. In order to resolve this problem, a minimum of two sensor acquisition…
▽ More
Synthetic aperture radar (SAR) interferometry (InSAR) is performed using repeat-pass geometry. InSAR technique is used to estimate the topographic reconstruction of the earth surface. The main problem of the range-Doppler focusing technique is the nature of the two-dimensional SAR result, affected by the layover indetermination. In order to resolve this problem, a minimum of two sensor acquisitions, separated by a baseline and extended in the cross-slant-range, are needed. However, given its multi-temporal nature, these techniques are vulnerable to atmosphere and Earth environment parameters variation in addition to physical platform instabilities. Furthermore, either two radars are needed or an interferometric cycle is required (that spans from days to weeks), which makes real time DEM estimation impossible. In this work, the authors propose a novel experimental alternative to the InSAR method that uses single-pass acquisitions, using a data driven approach implemented by Deep Neural Networks. We propose a fully Convolutional Neural Network (CNN) Encoder-Decoder architecture, training it on radar images in order to estimate DEMs from single pass image acquisitions. Our results on a set of Sentinel images show that this method is able to learn to some extent the statistical properties of the DEM. The results of this exploratory analysis are encouraging and open the way to the solution of single-pass DEM estimation problem with data driven approaches.
△ Less
Submitted 14 March, 2018;
originally announced March 2018.
-
J-MOD$^{2}$: Joint Monocular Obstacle Detection and Depth Estimation
Authors:
Michele Mancini,
Gabriele Costante,
Paolo Valigi,
Thomas A. Ciarfuglia
Abstract:
In this work, we propose an end-to-end deep architecture that jointly learns to detect obstacles and estimate their depth for MAV flight applications. Most of the existing approaches either rely on Visual SLAM systems or on depth estimation models to build 3D maps and detect obstacles. However, for the task of avoiding obstacles this level of complexity is not required. Recent works have proposed…
▽ More
In this work, we propose an end-to-end deep architecture that jointly learns to detect obstacles and estimate their depth for MAV flight applications. Most of the existing approaches either rely on Visual SLAM systems or on depth estimation models to build 3D maps and detect obstacles. However, for the task of avoiding obstacles this level of complexity is not required. Recent works have proposed multi task architectures to both perform scene understanding and depth estimation. We follow their track and propose a specific architecture to jointly estimate depth and obstacles, without the need to compute a global map, but maintaining compatibility with a global SLAM system if needed. The network architecture is devised to exploit the joint information of the obstacle detection task, that produces more reliable bounding boxes, with the depth estimation one, increasing the robustness of both to scenario changes. We call this architecture J-MOD$^{2}$. We test the effectiveness of our approach with experiments on sequences with different appearance and focal lengths and compare it to SotA multi task methods that jointly perform semantic segmentation and depth estimation. In addition, we show the integration in a full system using a set of simulated navigation experiments where a MAV explores an unknown scenario and plans safe trajectories by using our detection model.
△ Less
Submitted 13 December, 2017; v1 submitted 25 September, 2017;
originally announced September 2017.
-
LS-VO: Learning Dense Optical Subspace for Robust Visual Odometry Estimation
Authors:
Gabriele Costante,
Thomas A. Ciarfuglia
Abstract:
This work proposes a novel deep network architecture to solve the camera Ego-Motion estimation problem. A motion estimation network generally learns features similar to Optical Flow (OF) fields starting from sequences of images. This OF can be described by a lower dimensional latent space. Previous research has shown how to find linear approximations of this space. We propose to use an Auto-Encode…
▽ More
This work proposes a novel deep network architecture to solve the camera Ego-Motion estimation problem. A motion estimation network generally learns features similar to Optical Flow (OF) fields starting from sequences of images. This OF can be described by a lower dimensional latent space. Previous research has shown how to find linear approximations of this space. We propose to use an Auto-Encoder network to find a non-linear representation of the OF manifold. In addition, we propose to learn the latent space jointly with the estimation task, so that the learned OF features become a more robust description of the OF input. We call this novel architecture LS-VO.
The experiments show that LS-VO achieves a considerable increase in performances in respect to baselines, while the number of parameters of the estimation network only slightly increases.
△ Less
Submitted 12 December, 2017; v1 submitted 18 September, 2017;
originally announced September 2017.
-
Fast Robust Monocular Depth Estimation for Obstacle Detection with Fully Convolutional Networks
Authors:
Michele Mancini,
Gabriele Costante,
Paolo Valigi,
Thomas A. Ciarfuglia
Abstract:
Obstacle Detection is a central problem for any robotic system, and critical for autonomous systems that travel at high speeds in unpredictable environment. This is often achieved through scene depth estimation, by various means. When fast motion is considered, the detection range must be longer enough to allow for safe avoidance and path planning. Current solutions often make assumption on the mo…
▽ More
Obstacle Detection is a central problem for any robotic system, and critical for autonomous systems that travel at high speeds in unpredictable environment. This is often achieved through scene depth estimation, by various means. When fast motion is considered, the detection range must be longer enough to allow for safe avoidance and path planning. Current solutions often make assumption on the motion of the vehicle that limit their applicability, or work at very limited ranges due to intrinsic constraints. We propose a novel appearance-based Object Detection system that is able to detect obstacles at very long range and at a very high speed (~300Hz), without making assumptions on the type of motion. We achieve these results using a Deep Neural Network approach trained on real and synthetic images and trading some depth accuracy for fast, robust and consistent operation. We show how photo-realistic synthetic images are able to solve the problem of training set dimension and variety typical of machine learning approaches, and how our system is robust to massive blurring of test images.
△ Less
Submitted 21 July, 2016;
originally announced July 2016.
-
Perception-aware Path Planning
Authors:
Gabriele Costante,
Christian Forster,
Jeffrey Delmerico,
Paolo Valigi,
Davide Scaramuzza
Abstract:
In this paper, we give a double twist to the problem of planning under uncertainty. State-of-the-art planners seek to minimize the localization uncertainty by only considering the geometric structure of the scene. In this paper, we argue that motion planning for vision-controlled robots should be perception aware in that the robot should also favor texture-rich areas to minimize the localization u…
▽ More
In this paper, we give a double twist to the problem of planning under uncertainty. State-of-the-art planners seek to minimize the localization uncertainty by only considering the geometric structure of the scene. In this paper, we argue that motion planning for vision-controlled robots should be perception aware in that the robot should also favor texture-rich areas to minimize the localization uncertainty during a goal-reaching task. Thus, we describe how to optimally incorporate the photometric information (i.e., texture) of the scene, in addition to the the geometric one, to compute the uncertainty of vision-based localization during path planning. To avoid the caveats of feature-based localization systems (i.e., dependence on feature type and user-defined thresholds), we use dense, direct methods. This allows us to compute the localization uncertainty directly from the intensity values of every pixel in the image. We also describe how to compute trajectories online, considering also scenarios with no prior knowledge about the map. The proposed framework is general and can easily be adapted to different robotic platforms and scenarios. The effectiveness of our approach is demonstrated with extensive experiments in both simulated and real-world environments using a vision-controlled micro aerial vehicle.
△ Less
Submitted 10 February, 2017; v1 submitted 13 May, 2016;
originally announced May 2016.