Search | arXiv e-print repository

Multi-Camera Hand-Eye Calibration for Human-Robot Collaboration in Industrial Robotic Workcells

Authors: Davide Allegro, Matteo Terreran, Stefano Ghidoni

Abstract: In industrial scenarios, effective human-robot collaboration relies on multi-camera systems to robustly monitor human operators despite the occlusions that typically show up in a robotic workcell. In this scenario, precise localization of the person in the robot coordinate system is essential, making the hand-eye calibration of the camera network critical. This process presents significant challen… ▽ More In industrial scenarios, effective human-robot collaboration relies on multi-camera systems to robustly monitor human operators despite the occlusions that typically show up in a robotic workcell. In this scenario, precise localization of the person in the robot coordinate system is essential, making the hand-eye calibration of the camera network critical. This process presents significant challenges when high calibration accuracy should be achieved in short time to minimize production downtime, and when dealing with extensive camera networks used for monitoring wide areas, such as industrial robotic workcells. Our paper introduces an innovative and robust multi-camera hand-eye calibration method, designed to optimize each camera's pose relative to both the robot's base and to each other camera. This optimization integrates two types of key constraints: i) a single board-to-end-effector transformation, and ii) the relative camera-to-camera transformations. We demonstrate the superior performance of our method through comprehensive experiments employing the METRIC dataset and real-world data collected on industrial scenarios, showing notable advancements over state-of-the-art techniques even using less than 10 images. Additionally, we release an open-source version of our multi-camera hand-eye calibration algorithm at https://github.com/davidea97/Multi-Camera-Hand-Eye-Calibration.git. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2404.12717 [pdf, other]

Show and Grasp: Few-shot Semantic Segmentation for Robot Gras** through Zero-shot Foundation Models

Authors: Leonardo Barcellona, Alberto Bacchin, Matteo Terreran, Emanuele Menegatti, Stefano Ghidoni

Abstract: The ability of a robot to pick an object, known as robot gras**, is crucial for several applications, such as assembly or sorting. In such tasks, selecting the right target to pick is as essential as inferring a correct configuration of the gripper. A common solution to this problem relies on semantic segmentation models, which often show poor generalization to unseen objects and require conside… ▽ More The ability of a robot to pick an object, known as robot gras**, is crucial for several applications, such as assembly or sorting. In such tasks, selecting the right target to pick is as essential as inferring a correct configuration of the gripper. A common solution to this problem relies on semantic segmentation models, which often show poor generalization to unseen objects and require considerable time and massive data to be trained. To reduce the need for large datasets, some gras** pipelines exploit few-shot semantic segmentation models, which are capable of recognizing new classes given a few examples. However, this often comes at the cost of limited performance and fine-tuning is required to be effective in robot gras** scenarios. In this work, we propose to overcome all these limitations by combining the impressive generalization capability reached by foundation models with a high-performing few-shot classifier, working as a score function to select the segmentation that is closer to the support set. The proposed model is designed to be embedded in a grasp synthesis pipeline. The extensive experiments using one or five examples show that our novel approach overcomes existing performance limitations, improving the state of the art both in few-shot semantic segmentation on the Graspnet-1B (+10.5% mIoU) and Ocid-grasp (+1.6% AP) datasets, and real-world few-shot grasp synthesis (+21.7% grasp accuracy). The project page is available at: https://leobarcellona.github.io/showandgrasp.github.io/ △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2104.03488 [pdf]

doi 10.3390/jimaging6120143

Deep Features for training Support Vector Machine

Authors: Loris Nanni, Stefano Ghidoni, Sheryl Brahnam

Abstract: Features play a crucial role in computer vision. Initially designed to detect salient elements by means of handcrafted algorithms, features are now often learned by different layers in Convolutional Neural Networks (CNNs). This paper develops a generic computer vision system based on features extracted from trained CNNs. Multiple learned features are combined into a single structure to work on dif… ▽ More Features play a crucial role in computer vision. Initially designed to detect salient elements by means of handcrafted algorithms, features are now often learned by different layers in Convolutional Neural Networks (CNNs). This paper develops a generic computer vision system based on features extracted from trained CNNs. Multiple learned features are combined into a single structure to work on different image classification tasks. The proposed system was experimentally derived by testing several approaches for extracting features from the inner layers of CNNs and using them as inputs to SVMs that are then combined by sum rule. Dimensionality reduction techniques are used to reduce the high dimensionality of inner layers. The resulting vision system is shown to significantly boost the performance of standard CNNs across a large and diverse collection of image data sets. An ensemble of different topologies using the same approach obtains state-of-the-art results on a virus data set. △ Less

Submitted 28 June, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

arXiv:2011.11834 [pdf]

Comparisons among different stochastic selection of activation layers for convolutional neural networks for healthcare

Authors: Loris Nanni, Alessandra Lumini, Stefano Ghidoni, Gianluca Maguolo

Abstract: Classification of biological images is an important task with crucial application in many fields, such as cell phenotypes recognition, detection of cell organelles and histopathological classification, and it might help in early medical diagnosis, allowing automatic disease classification without the need of a human expert. In this paper we classify biomedical images using ensembles of neural netw… ▽ More Classification of biological images is an important task with crucial application in many fields, such as cell phenotypes recognition, detection of cell organelles and histopathological classification, and it might help in early medical diagnosis, allowing automatic disease classification without the need of a human expert. In this paper we classify biomedical images using ensembles of neural networks. We create this ensemble using a ResNet50 architecture and modifying its activation layers by substituting ReLUs with other functions. We select our activations among the following ones: ReLU, leaky ReLU, Parametric ReLU, ELU, Adaptive Piecewice Linear Unit, S-Shaped ReLU, Swish , Mish, Mexican Linear Unit, Gaussian Linear Unit, Parametric Deformable Linear Unit, Soft Root Sign (SRS) and others. As a baseline, we used an ensemble of neural networks that only use ReLU activations. We tested our networks on several small and medium sized biomedical image datasets. Our results prove that our best ensemble obtains a better performance than the ones of the naive approaches. In order to encourage the reproducibility of this work, the MATLAB code of all the experiments will be shared at https://github.com/LorisNanni. △ Less

Submitted 23 November, 2020; originally announced November 2020.

arXiv:2005.02632 [pdf, other]

Robotic Arm Control and Task Training through Deep Reinforcement Learning

Authors: Andrea Franceschetti, Elisa Tosello, Nicola Castaman, Stefano Ghidoni

Abstract: This paper proposes a detailed and extensive comparison of the Trust Region Policy Optimization and DeepQ-Network with Normalized Advantage Functions with respect to other state of the art algorithms, namely Deep Deterministic Policy Gradient and Vanilla Policy Gradient. Comparisons demonstrate that the former have better performances then the latter when asking robotic arms to accomplish manipula… ▽ More This paper proposes a detailed and extensive comparison of the Trust Region Policy Optimization and DeepQ-Network with Normalized Advantage Functions with respect to other state of the art algorithms, namely Deep Deterministic Policy Gradient and Vanilla Policy Gradient. Comparisons demonstrate that the former have better performances then the latter when asking robotic arms to accomplish manipulation tasks such as reaching a random target pose and pick &placing an object. Both simulated and real-world experiments are provided. Simulation lets us show the procedures that we adopted to precisely estimate the algorithms hyper-parameters and to correctly design good policies. Real-world experiments let show that our polices, if correctly trained on simulation, can be transferred and executed in a real environment with almost no changes. △ Less

Submitted 6 May, 2020; originally announced May 2020.

Comments: Submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018

arXiv:1907.12112 [pdf, other]

Real-time Tracking-by-Detection of Human Motion in RGB-D Camera Networks

Authors: Alessandro Malaguti, Marco Carraro, Mattia Guidolin, Luca Tagliapietra, Emanuele Menegatti, Stefano Ghidoni

Abstract: This paper presents a novel real-time tracking system capable of improving body pose estimation algorithms in distributed camera networks. The first stage of our approach introduces a linear Kalman filter operating at the body joints level, used to fuse single-view body poses coming from different detection nodes of the network and to ensure temporal consistency between them. The second stage, ins… ▽ More This paper presents a novel real-time tracking system capable of improving body pose estimation algorithms in distributed camera networks. The first stage of our approach introduces a linear Kalman filter operating at the body joints level, used to fuse single-view body poses coming from different detection nodes of the network and to ensure temporal consistency between them. The second stage, instead, refines the Kalman filter estimates by fitting a hierarchical model of the human body having constrained link sizes in order to ensure the physical consistency of the tracking. The effectiveness of the proposed approach is demonstrated through a broad experimental validation, performed on a set of sequences whose ground truth references are generated by a commercial marker-based motion capture system. The obtained results show how the proposed system outperforms the considered state-of-the-art approaches, granting accurate and reliable estimates. Moreover, the developed methodology constrains neither the number of persons to track, nor the number, position, synchronization, frame-rate, and manufacturer of the RGB-D cameras used. Finally, the real-time performances of the system are of paramount importance for a large number of real-world applications. △ Less

Submitted 28 July, 2019; originally announced July 2019.

Comments: Accepted to IEEE SMC 2019

arXiv:1905.02473 [pdf]

Ensemble of Convolutional Neural Networks Trained with Different Activation Functions

Authors: Gianluca Maguolo, Loris Nanni, Stefano Ghidoni

Abstract: Activation functions play a vital role in the training of Convolutional Neural Networks. For this reason, to develop efficient and performing functions is a crucial problem in the deep learning community. Key to these approaches is to permit a reliable parameter learning, avoiding vanishing gradient problems. The goal of this work is to propose an ensemble of Convolutional Neural Networks trained… ▽ More Activation functions play a vital role in the training of Convolutional Neural Networks. For this reason, to develop efficient and performing functions is a crucial problem in the deep learning community. Key to these approaches is to permit a reliable parameter learning, avoiding vanishing gradient problems. The goal of this work is to propose an ensemble of Convolutional Neural Networks trained using several different activation functions. Moreover, a novel activation function is here proposed for the first time. Our aim is to improve the performance of Convolutional Neural Networks in small/medium size biomedical datasets. Our results clearly show that the proposed ensemble outperforms Convolutional Neural Networks trained with standard ReLU as activation function. The proposed ensemble outperforms with a p-value of 0.01 each tested stand-alone activation function; for reliable performance comparison we have tested our approach in more than 10 datasets, using two well-known Convolutional Neural Network: Vgg16 and ResNet50. MATLAB code used here will be available at https://github.com/LorisNanni. △ Less

Submitted 21 September, 2020; v1 submitted 7 May, 2019; originally announced May 2019.

arXiv:1904.08084 [pdf]

General Purpose (GenP) Bioimage Ensemble of Handcrafted and Learned Features with Data Augmentation

Authors: L. Nanni, S. Brahnam, S. Ghidoni, G. Maguolo

Abstract: Bioimage classification plays a crucial role in many biological problems. In this work, we present a new General Purpose (GenP) ensemble that boosts performance by combining local features, dense sampling features, and deep learning approaches. First, we introduce three new methods for data augmentation based on PCA/DCT; second, we show that different data augmentation approaches can boost the per… ▽ More Bioimage classification plays a crucial role in many biological problems. In this work, we present a new General Purpose (GenP) ensemble that boosts performance by combining local features, dense sampling features, and deep learning approaches. First, we introduce three new methods for data augmentation based on PCA/DCT; second, we show that different data augmentation approaches can boost the performance of an ensemble of CNNs; and, finally, we propose a set of handcrafted/learned descriptors that are highly generalizable. Each handcrafted descriptor is used to train a different Support Vector Machine (SVM), and the different SVMs are combined with the ensemble of CNNs. Our method is evaluated on a diverse set of bioimage classification problems. Results demonstrate that the proposed GenP bioimage ensemble obtains state-of-the-art performance without any ad-hoc dataset tuning of parameters (thus avoiding the risk of overfitting/overtraining). △ Less

Submitted 6 July, 2021; v1 submitted 17 April, 2019; originally announced April 2019.

Comments: 27 pages, 1 figure, 5 tables, manuscript

arXiv:1807.08008 [pdf]

Ensemble of Deep Learned Features for Melanoma Classification

Authors: Loris Nanni, Alessandra Lumini, Stefano Ghidoni

Abstract: The aim of this work is to propose an ensemble of descriptors for Melanoma Classification, whose performance has been evaluated on validation and test datasets of the melanoma challenge 2018. The system proposed here achieves a strong discriminative power thanks to the combination of multiple descriptors. The proposed system represents a very simple yet effective way of boosting the performance of… ▽ More The aim of this work is to propose an ensemble of descriptors for Melanoma Classification, whose performance has been evaluated on validation and test datasets of the melanoma challenge 2018. The system proposed here achieves a strong discriminative power thanks to the combination of multiple descriptors. The proposed system represents a very simple yet effective way of boosting the performance of trained CNNs by composing multiple CNNs into an ensemble and combining scores by sum rule. Several types of ensembles are considered, with different CNN architectures along with different learning parameter sets. Moreover CNN are used as feature extractors: an input image is processed by a trained CNN and the response of a particular layer (usually the classification layer, but also internal layers can be employed) is treated as a descriptor for the image and used for training a set of Support Vector Machines (SVM). △ Less

Submitted 20 July, 2018; originally announced July 2018.

arXiv:1711.08764 [pdf, other]

doi 10.1080/01691864.2020.1833752

RUR53: an Unmanned Ground Vehicle for Navigation, Recognition and Manipulation

Authors: Nicola Castaman, Elisa Tosello, Morris Antonello, Nicola Bagarello, Silvia Gandin, Marco Carraro, Matteo Munaro, Roberto Bortoletto, Stefano Ghidoni, Emanuele Menegatti, Enrico Pagello

Abstract: This paper proposes RUR53: an Unmanned Ground Vehicle able to autonomously navigate through, identify, and reach areas of interest; and there recognize, localize, and manipulate work tools to perform complex manipulation tasks. The proposed contribution includes a modular software architecture where each module solves specific sub-tasks and that can be easily enlarged to satisfy new requirements.… ▽ More This paper proposes RUR53: an Unmanned Ground Vehicle able to autonomously navigate through, identify, and reach areas of interest; and there recognize, localize, and manipulate work tools to perform complex manipulation tasks. The proposed contribution includes a modular software architecture where each module solves specific sub-tasks and that can be easily enlarged to satisfy new requirements. Included indoor and outdoor tests demonstrate the capability of the proposed system to autonomously detect a target object (a panel) and precisely dock in front of it while avoiding obstacles. They show it can autonomously recognize and manipulate target work tools (i.e., wrenches and valve stems) to accomplish complex tasks (i.e., use a wrench to rotate a valve stem). A specific case study is described where the proposed modular architecture lets easy switch to a semi-teleoperated mode. The paper exhaustively describes description of both the hardware and software setup of RUR53, its performance when tests at the 2017 Mohamed Bin Zayed International Robotics Challenge, and the lessons we learned when participating at this competition, where we ranked third in the Gran Challenge in collaboration with the Czech Technical University in Prague, the University of Pennsylvania, and the University of Lincoln (UK). △ Less

Submitted 3 October, 2020; v1 submitted 23 November, 2017; originally announced November 2017.

Comments: This article has been accepted for publication in Advanced Robotics, published by Taylor & Francis

Showing 1–10 of 10 results for author: Ghidoni, S