-
Show and Grasp: Few-shot Semantic Segmentation for Robot Gras** through Zero-shot Foundation Models
Authors:
Leonardo Barcellona,
Alberto Bacchin,
Matteo Terreran,
Emanuele Menegatti,
Stefano Ghidoni
Abstract:
The ability of a robot to pick an object, known as robot gras**, is crucial for several applications, such as assembly or sorting. In such tasks, selecting the right target to pick is as essential as inferring a correct configuration of the gripper. A common solution to this problem relies on semantic segmentation models, which often show poor generalization to unseen objects and require conside…
▽ More
The ability of a robot to pick an object, known as robot gras**, is crucial for several applications, such as assembly or sorting. In such tasks, selecting the right target to pick is as essential as inferring a correct configuration of the gripper. A common solution to this problem relies on semantic segmentation models, which often show poor generalization to unseen objects and require considerable time and massive data to be trained. To reduce the need for large datasets, some gras** pipelines exploit few-shot semantic segmentation models, which are capable of recognizing new classes given a few examples. However, this often comes at the cost of limited performance and fine-tuning is required to be effective in robot gras** scenarios. In this work, we propose to overcome all these limitations by combining the impressive generalization capability reached by foundation models with a high-performing few-shot classifier, working as a score function to select the segmentation that is closer to the support set. The proposed model is designed to be embedded in a grasp synthesis pipeline. The extensive experiments using one or five examples show that our novel approach overcomes existing performance limitations, improving the state of the art both in few-shot semantic segmentation on the Graspnet-1B (+10.5% mIoU) and Ocid-grasp (+1.6% AP) datasets, and real-world few-shot grasp synthesis (+21.7% grasp accuracy). The project page is available at: https://leobarcellona.github.io/showandgrasp.github.io/
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
A Graph-based Optimization Framework for Hand-Eye Calibration for Multi-Camera Setups
Authors:
Daniele Evangelista,
Emilio Olivastri,
Davide Allegro,
Emanuele Menegatti,
Alberto Pretto
Abstract:
Hand-eye calibration is the problem of estimating the spatial transformation between a reference frame, usually the base of a robot arm or its gripper, and the reference frame of one or multiple cameras. Generally, this calibration is solved as a non-linear optimization problem, what instead is rarely done is to exploit the underlying graph structure of the problem itself. Actually, the problem of…
▽ More
Hand-eye calibration is the problem of estimating the spatial transformation between a reference frame, usually the base of a robot arm or its gripper, and the reference frame of one or multiple cameras. Generally, this calibration is solved as a non-linear optimization problem, what instead is rarely done is to exploit the underlying graph structure of the problem itself. Actually, the problem of hand-eye calibration can be seen as an instance of the Simultaneous Localization and Map** (SLAM) problem. Inspired by this fact, in this work we present a pose-graph approach to the hand-eye calibration problem that extends a recent state-of-the-art solution in two different ways: i) by formulating the solution to eye-on-base setups with one camera; ii) by covering multi-camera robotic setups. The proposed approach has been validated in simulation against standard hand-eye calibration methods. Moreover, a real application is shown. In both scenarios, the proposed approach overcomes all alternative methods. We release with this paper an open-source implementation of our graph-based optimization framework for multi-camera setups.
△ Less
Submitted 28 July, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
Pushing the Limits of Learning-based Traversability Analysis for Autonomous Driving on CPU
Authors:
Daniel Fusaro,
Emilio Olivastri,
Daniele Evangelista,
Marco Imperoli,
Emanuele Menegatti,
Alberto Pretto
Abstract:
Self-driving vehicles and autonomous ground robots require a reliable and accurate method to analyze the traversability of the surrounding environment for safe navigation. This paper proposes and evaluates a real-time machine learning-based Traversability Analysis method that combines geometric features with appearance-based features in a hybrid approach based on a SVM classifier. In particular, w…
▽ More
Self-driving vehicles and autonomous ground robots require a reliable and accurate method to analyze the traversability of the surrounding environment for safe navigation. This paper proposes and evaluates a real-time machine learning-based Traversability Analysis method that combines geometric features with appearance-based features in a hybrid approach based on a SVM classifier. In particular, we show that integrating a new set of geometric and visual features and focusing on important implementation details enables a noticeable boost in performance and reliability. The proposed approach has been compared with state-of-the-art Deep Learning approaches on a public dataset of outdoor driving scenarios. It reaches an accuracy of 89.2% in scenarios of varying complexity, demonstrating its effectiveness and robustness. The method runs fully on CPU and reaches comparable results with respect to the other methods, operates faster, and requires fewer hardware resources.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
People Tracking in Panoramic Video for Guiding Robots
Authors:
Alberto Bacchin,
Filippo Berno,
Emanuele Menegatti,
Alberto Pretto
Abstract:
A guiding robot aims to effectively bring people to and from specific places within environments that are possibly unknown to them. During this operation the robot should be able to detect and track the accompanied person, trying never to lose sight of her/him. A solution to minimize this event is to use an omnidirectional camera: its 360° Field of View (FoV) guarantees that any framed object cann…
▽ More
A guiding robot aims to effectively bring people to and from specific places within environments that are possibly unknown to them. During this operation the robot should be able to detect and track the accompanied person, trying never to lose sight of her/him. A solution to minimize this event is to use an omnidirectional camera: its 360° Field of View (FoV) guarantees that any framed object cannot leave the FoV if not occluded or very far from the sensor. However, the acquired panoramic videos introduce new challenges in perception tasks such as people detection and tracking, including the large size of the images to be processed, the distortion effects introduced by the cylindrical projection and the periodic nature of panoramic images. In this paper, we propose a set of targeted methods that allow to effectively adapt to panoramic videos a standard people detection and tracking pipeline originally designed for perspective cameras. Our methods have been implemented and tested inside a deep learning-based people detection and tracking framework with a commercial 360° camera. Experiments performed on datasets specifically acquired for guiding robot applications and on a real service robot show the effectiveness of the proposed approach over other state-of-the-art systems. We release with this paper the acquired and annotated datasets and the open-source implementation of our method.
△ Less
Submitted 6 June, 2022;
originally announced June 2022.
-
Learning to Segment Human Body Parts with Synthetically Trained Deep Convolutional Networks
Authors:
Alessandro Saviolo,
Matteo Bonotto,
Daniele Evangelista,
Marco Imperoli,
Jacopo Lazzaro,
Emanuele Menegatti,
Alberto Pretto
Abstract:
This paper presents a new framework for human body part segmentation based on Deep Convolutional Neural Networks trained using only synthetic data. The proposed approach achieves cutting-edge results without the need of training the models with real annotated data of human body parts. Our contributions include a data generation pipeline, that exploits a game engine for the creation of the syntheti…
▽ More
This paper presents a new framework for human body part segmentation based on Deep Convolutional Neural Networks trained using only synthetic data. The proposed approach achieves cutting-edge results without the need of training the models with real annotated data of human body parts. Our contributions include a data generation pipeline, that exploits a game engine for the creation of the synthetic data used for training the network, and a novel pre-processing module, that combines edge response maps and adaptive histogram equalization to guide the network to learn the shape of the human body parts ensuring robustness to changes in the illumination conditions. For selecting the best candidate architecture, we perform exhaustive tests on manually annotated images of real human body limbs. We further compare our method against several high-end commercial segmentation tools on the body parts segmentation task. The results show that our method outperforms the other models by a significant margin. Finally, we present an ablation study to validate our pre-processing module. With this paper, we release an implementation of the proposed approach along with the acquired datasets.
△ Less
Submitted 7 June, 2022; v1 submitted 2 February, 2021;
originally announced February 2021.
-
Receding Horizon Task and Motion Planning in Changing Environments
Authors:
Nicola Castaman,
Enrico Pagello,
Emanuele Menegatti,
Alberto Pretto
Abstract:
Complex manipulation tasks require careful integration of symbolic reasoning and motion planning. This problem, commonly referred to as Task and Motion Planning (TAMP), is even more challenging if the workspace is non-static, e.g. due to human interventions and perceived with noisy non-ideal sensors. This work proposes an online approximated TAMP method that combines a geometric reasoning module a…
▽ More
Complex manipulation tasks require careful integration of symbolic reasoning and motion planning. This problem, commonly referred to as Task and Motion Planning (TAMP), is even more challenging if the workspace is non-static, e.g. due to human interventions and perceived with noisy non-ideal sensors. This work proposes an online approximated TAMP method that combines a geometric reasoning module and a motion planner with a standard task planner in a receding horizon fashion. Our approach iteratively solves a reduced planning problem over a receding window of a limited number of future actions during the implementation of the actions. Thus, only the first action of the horizon is actually scheduled at each iteration, then the window is moved forward, and the problem is solved again. This procedure allows to naturally take into account potential changes in the scene while ensuring good runtime performance. We validate our approach within extensive experiments in a simulated environment. We showed that our approach is able to deal with unexpected changes in the environment while ensuring comparable performance with respect to other recent TAMP approaches in solving traditional static benchmarks. We release with this paper the open-source implementation of our method.
△ Less
Submitted 29 August, 2021; v1 submitted 7 September, 2020;
originally announced September 2020.
-
Quaternion Equivariant Capsule Networks for 3D Point Clouds
Authors:
Yongheng Zhao,
Tolga Birdal,
Jan Eric Lenssen,
Emanuele Menegatti,
Leonidas Guibas,
Federico Tombari
Abstract:
We present a 3D capsule module for processing point clouds that is equivariant to 3D rotations and translations, as well as invariant to permutations of the input points. The operator receives a sparse set of local reference frames, computed from an input point cloud and establishes end-to-end transformation equivariance through a novel dynamic routing procedure on quaternions. Further, we theoret…
▽ More
We present a 3D capsule module for processing point clouds that is equivariant to 3D rotations and translations, as well as invariant to permutations of the input points. The operator receives a sparse set of local reference frames, computed from an input point cloud and establishes end-to-end transformation equivariance through a novel dynamic routing procedure on quaternions. Further, we theoretically connect dynamic routing between capsules to the well-known Weiszfeld algorithm, a scheme for solving \emph{iterative re-weighted least squares} (IRLS) problems with provable convergence properties. It is shown that such group dynamic routing can be interpreted as robust IRLS rotation averaging on capsule votes, where information is routed based on the final inlier scores. Based on our operator, we build a capsule network that disentangles geometry from pose, paving the way for more informative descriptors and a structured latent space. Our architecture allows joint object classification and orientation estimation without explicit supervision of rotations. We validate our algorithm empirically on common benchmark datasets.
△ Less
Submitted 23 August, 2020; v1 submitted 27 December, 2019;
originally announced December 2019.
-
Real-time Tracking-by-Detection of Human Motion in RGB-D Camera Networks
Authors:
Alessandro Malaguti,
Marco Carraro,
Mattia Guidolin,
Luca Tagliapietra,
Emanuele Menegatti,
Stefano Ghidoni
Abstract:
This paper presents a novel real-time tracking system capable of improving body pose estimation algorithms in distributed camera networks. The first stage of our approach introduces a linear Kalman filter operating at the body joints level, used to fuse single-view body poses coming from different detection nodes of the network and to ensure temporal consistency between them. The second stage, ins…
▽ More
This paper presents a novel real-time tracking system capable of improving body pose estimation algorithms in distributed camera networks. The first stage of our approach introduces a linear Kalman filter operating at the body joints level, used to fuse single-view body poses coming from different detection nodes of the network and to ensure temporal consistency between them. The second stage, instead, refines the Kalman filter estimates by fitting a hierarchical model of the human body having constrained link sizes in order to ensure the physical consistency of the tracking. The effectiveness of the proposed approach is demonstrated through a broad experimental validation, performed on a set of sequences whose ground truth references are generated by a commercial marker-based motion capture system. The obtained results show how the proposed system outperforms the considered state-of-the-art approaches, granting accurate and reliable estimates. Moreover, the developed methodology constrains neither the number of persons to track, nor the number, position, synchronization, frame-rate, and manufacturer of the RGB-D cameras used. Finally, the real-time performances of the system are of paramount importance for a large number of real-world applications.
△ Less
Submitted 28 July, 2019;
originally announced July 2019.
-
Fast human motion prediction for human-robot collaboration with wearable interfaces
Authors:
Stefano Tortora,
Stefano Michieletto,
Francesca Stival,
Emanuele Menegatti
Abstract:
In this paper, we aim at improving human motion prediction during human-robot collaboration in industrial facilities by exploiting contributions from both physical and physiological signals. Improved human-machine collaboration could prove useful in several areas, while it is crucial for interacting robots to understand human movement as soon as possible to avoid accidents and injuries. In this pe…
▽ More
In this paper, we aim at improving human motion prediction during human-robot collaboration in industrial facilities by exploiting contributions from both physical and physiological signals. Improved human-machine collaboration could prove useful in several areas, while it is crucial for interacting robots to understand human movement as soon as possible to avoid accidents and injuries. In this perspective, we propose a novel human-robot interface capable to anticipate the user intention while performing reaching movements on a working bench in order to plan the action of a collaborative robot. The proposed interface can find many applications in the Industry 4.0 framework, where autonomous and collaborative robots will be an essential part of innovative facilities. A motion intention prediction and a motion direction prediction levels have been developed to improve detection speed and accuracy. A Gaussian Mixture Model (GMM) has been trained with IMU and EMG data following an evidence accumulation approach to predict reaching direction. Novel dynamic stop** criteria have been proposed to flexibly adjust the trade-off between early anticipation and accuracy according to the application. The output of the two predictors has been used as external inputs to a Finite State Machine (FSM) to control the behaviour of a physical robot according to user's action or inaction. Results show that our system outperforms previous methods, achieving a real-time classification accuracy of $94.3\pm2.9\%$ after $160.0msec\pm80.0msec$ from movement onset.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
Proceedings of the Workshop on Social Robots in Therapy: Focusing on Autonomy and Ethical Challenges
Authors:
Pablo G. Esteban,
Daniel Hernández García,
Hee Rin Lee,
Pauline Chevalier,
Paul Baxter,
Cindy L. Bethel,
Jainendra Shukla,
Joan Oliver,
Domènec Puig,
Jason R. Wilson,
Linda Tickle-Degnen,
Madeleine Bartlett,
Tony Belpaeme,
Serge Thill,
Kim Baraka,
Francisco S. Melo,
Manuela Veloso,
David Becerra,
Maja Matarić,
Eduard Fosch-Villaronga,
Jordi Albo-Canals,
Gloria Beraldo,
Emanuele Menegatti,
Valentina De Tommasi,
Roberto Mancin
, et al. (13 additional authors not shown)
Abstract:
Robot-Assisted Therapy (RAT) has successfully been used in HRI research by including social robots in health-care interventions by virtue of their ability to engage human users both social and emotional dimensions. Research projects on this topic exist all over the globe in the USA, Europe, and Asia. All of these projects have the overall ambitious goal to increase the well-being of a vulnerable p…
▽ More
Robot-Assisted Therapy (RAT) has successfully been used in HRI research by including social robots in health-care interventions by virtue of their ability to engage human users both social and emotional dimensions. Research projects on this topic exist all over the globe in the USA, Europe, and Asia. All of these projects have the overall ambitious goal to increase the well-being of a vulnerable population. Typical work in RAT is performed using remote controlled robots; a technique called Wizard-of-Oz (WoZ). The robot is usually controlled, unbeknownst to the patient, by a human operator. However, WoZ has been demonstrated to not be a sustainable technique in the long-term. Providing the robots with autonomy (while remaining under the supervision of the therapist) has the potential to lighten the therapists burden, not only in the therapeutic session itself but also in longer-term diagnostic tasks. Therefore, there is a need for exploring several degrees of autonomy in social robots used in therapy. Increasing the autonomy of robots might also bring about a new set of challenges. In particular, there will be a need to answer new ethical questions regarding the use of robots with a vulnerable population, as well as a need to ensure ethically-compliant robot behaviours. Therefore, in this workshop we want to gather findings and explore which degree of autonomy might help to improve health-care interventions and how we can overcome the ethical challenges inherent to it.
△ Less
Submitted 18 December, 2018;
originally announced December 2018.
-
Brain-Computer Interface meets ROS: A robotic approach to mentally drive telepresence robots
Authors:
Gloria Beraldo,
Morris Antonello,
Andrea Cimolato,
Emanuele Menegatti,
Luca Tonin
Abstract:
This paper shows and evaluates a novel approach to integrate a non-invasive Brain-Computer Interface (BCI) with the Robot Operating System (ROS) to mentally drive a telepresence robot. Controlling a mobile device by using human brain signals might improve the quality of life of people suffering from severe physical disabilities or elderly people who cannot move anymore. Thus, the BCI user is able…
▽ More
This paper shows and evaluates a novel approach to integrate a non-invasive Brain-Computer Interface (BCI) with the Robot Operating System (ROS) to mentally drive a telepresence robot. Controlling a mobile device by using human brain signals might improve the quality of life of people suffering from severe physical disabilities or elderly people who cannot move anymore. Thus, the BCI user is able to actively interact with relatives and friends located in different rooms thanks to a video streaming connection to the robot. To facilitate the control of the robot via BCI, we explore new ROS-based algorithms for navigation and obstacle avoidance, making the system safer and more reliable. In this regard, the robot can exploit two maps of the environment, one for localization and one for navigation, and both can be used also by the BCI user to watch the position of the robot while it is moving. As demonstrated by the experimental results, the user's cognitive workload is reduced, decreasing the number of commands necessary to complete the task and hel** him/her to keep attention for longer periods of time.
△ Less
Submitted 4 April, 2019; v1 submitted 5 December, 2017;
originally announced December 2017.
-
RUR53: an Unmanned Ground Vehicle for Navigation, Recognition and Manipulation
Authors:
Nicola Castaman,
Elisa Tosello,
Morris Antonello,
Nicola Bagarello,
Silvia Gandin,
Marco Carraro,
Matteo Munaro,
Roberto Bortoletto,
Stefano Ghidoni,
Emanuele Menegatti,
Enrico Pagello
Abstract:
This paper proposes RUR53: an Unmanned Ground Vehicle able to autonomously navigate through, identify, and reach areas of interest; and there recognize, localize, and manipulate work tools to perform complex manipulation tasks. The proposed contribution includes a modular software architecture where each module solves specific sub-tasks and that can be easily enlarged to satisfy new requirements.…
▽ More
This paper proposes RUR53: an Unmanned Ground Vehicle able to autonomously navigate through, identify, and reach areas of interest; and there recognize, localize, and manipulate work tools to perform complex manipulation tasks. The proposed contribution includes a modular software architecture where each module solves specific sub-tasks and that can be easily enlarged to satisfy new requirements. Included indoor and outdoor tests demonstrate the capability of the proposed system to autonomously detect a target object (a panel) and precisely dock in front of it while avoiding obstacles. They show it can autonomously recognize and manipulate target work tools (i.e., wrenches and valve stems) to accomplish complex tasks (i.e., use a wrench to rotate a valve stem). A specific case study is described where the proposed modular architecture lets easy switch to a semi-teleoperated mode. The paper exhaustively describes description of both the hardware and software setup of RUR53, its performance when tests at the 2017 Mohamed Bin Zayed International Robotics Challenge, and the lessons we learned when participating at this competition, where we ranked third in the Gran Challenge in collaboration with the Czech Technical University in Prague, the University of Pennsylvania, and the University of Lincoln (UK).
△ Less
Submitted 3 October, 2020; v1 submitted 23 November, 2017;
originally announced November 2017.
-
Real-time marker-less multi-person 3D pose estimation in RGB-Depth camera networks
Authors:
Marco Carraro,
Matteo Munaro,
Jeff Burke,
Emanuele Menegatti
Abstract:
This paper proposes a novel system to estimate and track the 3D poses of multiple persons in calibrated RGB-Depth camera networks. The multi-view 3D pose of each person is computed by a central node which receives the single-view outcomes from each camera of the network. Each single-view outcome is computed by using a CNN for 2D pose estimation and extending the resulting skeletons to 3D by means…
▽ More
This paper proposes a novel system to estimate and track the 3D poses of multiple persons in calibrated RGB-Depth camera networks. The multi-view 3D pose of each person is computed by a central node which receives the single-view outcomes from each camera of the network. Each single-view outcome is computed by using a CNN for 2D pose estimation and extending the resulting skeletons to 3D by means of the sensor depth. The proposed system is marker-less, multi-person, independent of background and does not make any assumption on people appearance and initial pose. The system provides real-time outcomes, thus being perfectly suited for applications requiring user interaction. Experimental results show the effectiveness of this work with respect to a baseline multi-view approach in different scenarios. To foster research and applications based on this work, we released the source code in OpenPTrack, an open source project for RGB-D people tracking.
△ Less
Submitted 17 October, 2017;
originally announced October 2017.
-
Fast and Robust Detection of Fallen People from a Mobile Robot
Authors:
Morris Antonello,
Marco Carraro,
Marco Pierobon,
Emanuele Menegatti
Abstract:
This paper deals with the problem of detecting fallen people lying on the floor by means of a mobile robot equipped with a 3D depth sensor. In the proposed algorithm, inspired by semantic segmentation techniques, the 3D scene is over-segmented into small patches. Fallen people are then detected by means of two SVM classifiers: the first one labels each patch, while the second one captures the spat…
▽ More
This paper deals with the problem of detecting fallen people lying on the floor by means of a mobile robot equipped with a 3D depth sensor. In the proposed algorithm, inspired by semantic segmentation techniques, the 3D scene is over-segmented into small patches. Fallen people are then detected by means of two SVM classifiers: the first one labels each patch, while the second one captures the spatial relations between them. This novel approach showed to be robust and fast. Indeed, thanks to the use of small patches, fallen people in real cluttered scenes with objects side by side are correctly detected. Moreover, the algorithm can be executed on a mobile robot fitted with a standard laptop making it possible to exploit the 2D environmental map built by the robot and the multiple points of view obtained during the robot navigation. Additionally, this algorithm is robust to illumination changes since it does not rely on RGB data but on depth data. All the methods have been thoroughly validated on the IASLAB-RGBD Fallen Person Dataset, which is published online as a further contribution. It consists of several static and dynamic sequences with 15 different people and 2 different environments.
△ Less
Submitted 9 March, 2017;
originally announced March 2017.
-
Robust Intrinsic and Extrinsic Calibration of RGB-D Cameras
Authors:
Filippo Basso,
Emanuele Menegatti,
Alberto Pretto
Abstract:
Color-depth cameras (RGB-D cameras) have become the primary sensors in most robotics systems, from service robotics to industrial robotics applications. Typical consumer-grade RGB-D cameras are provided with a coarse intrinsic and extrinsic calibration that generally does not meet the accuracy requirements needed by many robotics applications (e.g., highly accurate 3D environment reconstruction an…
▽ More
Color-depth cameras (RGB-D cameras) have become the primary sensors in most robotics systems, from service robotics to industrial robotics applications. Typical consumer-grade RGB-D cameras are provided with a coarse intrinsic and extrinsic calibration that generally does not meet the accuracy requirements needed by many robotics applications (e.g., highly accurate 3D environment reconstruction and map**, high precision object recognition and localization, ...). In this paper, we propose a human-friendly, reliable and accurate calibration framework that enables to easily estimate both the intrinsic and extrinsic parameters of a general color-depth sensor couple. Our approach is based on a novel two components error model. This model unifies the error sources of RGB-D pairs based on different technologies, such as structured-light 3D cameras and time-of-flight cameras. Our method provides some important advantages compared to other state-of-the-art systems: it is general (i.e., well suited for different types of sensors), based on an easy and stable calibration protocol, provides a greater calibration accuracy, and has been implemented within the ROS robotics framework. We report detailed experimental validations and performance comparisons to support our statements.
△ Less
Submitted 19 October, 2018; v1 submitted 20 January, 2017;
originally announced January 2017.