Search | arXiv e-print repository

Safe Real-World Autonomous Driving by Learning to Predict and Plan with a Mixture of Experts

Authors: Stefano Pini, Christian S. Perone, Aayush Ahuja, Ana Sofia Rufino Ferreira, Moritz Niendorf, Sergey Zagoruyko

Abstract: The goal of autonomous vehicles is to navigate public roads safely and comfortably. To enforce safety, traditional planning approaches rely on handcrafted rules to generate trajectories. Machine learning-based systems, on the other hand, scale with data and are able to learn more complex behaviors. However, they often ignore that agents and self-driving vehicle trajectory distributions can be leve… ▽ More The goal of autonomous vehicles is to navigate public roads safely and comfortably. To enforce safety, traditional planning approaches rely on handcrafted rules to generate trajectories. Machine learning-based systems, on the other hand, scale with data and are able to learn more complex behaviors. However, they often ignore that agents and self-driving vehicle trajectory distributions can be leveraged to improve safety. In this paper, we propose modeling a distribution over multiple future trajectories for both the self-driving vehicle and other road agents, using a unified neural network architecture for prediction and planning. During inference, we select the planning trajectory that minimizes a cost taking into account safety and the predicted probabilities. Our approach does not depend on any rule-based planners for trajectory generation or optimization, improves with more training data and is simple to implement. We extensively evaluate our method through a realistic simulator and show that the predicted trajectory distribution corresponds to different driving profiles. We also successfully deploy it on a self-driving vehicle on urban public roads, confirming that it drives safely without compromising comfort. The code for training and testing our model on a public prediction dataset and the video of the road test are available at https://woven.mobi/safepathnet △ Less

Submitted 3 November, 2022; originally announced November 2022.

arXiv:2210.02174 [pdf, other]

CW-ERM: Improving Autonomous Driving Planning with Closed-loop Weighted Empirical Risk Minimization

Authors: Eesha Kumar, Yiming Zhang, Stefano Pini, Simon Stent, Ana Ferreira, Sergey Zagoruyko, Christian S. Perone

Abstract: The imitation learning of self-driving vehicle policies through behavioral cloning is often carried out in an open-loop fashion, ignoring the effect of actions to future states. Training such policies purely with Empirical Risk Minimization (ERM) can be detrimental to real-world performance, as it biases policy networks towards matching only open-loop behavior, showing poor results when evaluated… ▽ More The imitation learning of self-driving vehicle policies through behavioral cloning is often carried out in an open-loop fashion, ignoring the effect of actions to future states. Training such policies purely with Empirical Risk Minimization (ERM) can be detrimental to real-world performance, as it biases policy networks towards matching only open-loop behavior, showing poor results when evaluated in closed-loop. In this work, we develop an efficient and simple-to-implement principle called Closed-loop Weighted Empirical Risk Minimization (CW-ERM), in which a closed-loop evaluation procedure is first used to identify training data samples that are important for practical driving performance and then we these samples to help debias the policy network. We evaluate CW-ERM in a challenging urban driving dataset and show that this procedure yields a significant reduction in collisions as well as other non-differentiable closed-loop metrics. △ Less

Submitted 11 October, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: v2: minor update in dataset and results (no changes in improvements or conclusions)

arXiv:2207.02519 [pdf, other]

Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from Depth Maps

Authors: Alessandro Simoni, Stefano Pini, Guido Borghi, Roberto Vezzani

Abstract: Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications, such as the detection of unsafe situations or the study of mutual interactions for statistical and social purposes. In this paper, we propose a non-invasive and light-invariant framework based on depth devices and deep neural networks to estimate the 3D pose of robots from an exter… ▽ More Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications, such as the detection of unsafe situations or the study of mutual interactions for statistical and social purposes. In this paper, we propose a non-invasive and light-invariant framework based on depth devices and deep neural networks to estimate the 3D pose of robots from an external camera. The method can be applied to any robot without requiring hardware access to the internal states. We introduce a novel representation of the predicted pose, namely Semi-Perspective Decoupled Heatmaps (SPDH), to accurately compute 3D joint locations in world coordinates adapting efficient deep networks designed for the 2D Human Pose Estimation. The proposed approach, which takes as input a depth representation based on XYZ coordinates, can be trained on synthetic depth data and applied to real-world settings without the need for domain adaptation techniques. To this end, we present the SimBa dataset, based on both synthetic and real depth images, and use it for the experimental evaluation. Results show that the proposed approach, made of a specific depth map representation and the SPDH, overcomes the current state of the art. △ Less

Submitted 6 July, 2022; originally announced July 2022.

Comments: IROS2022 and IEEE Robotics and Automation Letters (RA-L). Accepted June, 2022

arXiv:2110.11256 [pdf, other]

Multi-Category Mesh Reconstruction From Image Collections

Authors: Alessandro Simoni, Stefano Pini, Roberto Vezzani, Rita Cucchiara

Abstract: Recently, learning frameworks have shown the capability of inferring the accurate shape, pose, and texture of an object from a single RGB image. However, current methods are trained on image collections of a single category in order to exploit specific priors, and they often make use of category-specific 3D templates. In this paper, we present an alternative approach that infers the textured mesh… ▽ More Recently, learning frameworks have shown the capability of inferring the accurate shape, pose, and texture of an object from a single RGB image. However, current methods are trained on image collections of a single category in order to exploit specific priors, and they often make use of category-specific 3D templates. In this paper, we present an alternative approach that infers the textured mesh of objects combining a series of deformable 3D models and a set of instance-specific deformation, pose, and texture. Differently from previous works, our method is trained with images of multiple object categories using only foreground masks and rough camera poses as supervision. Without specific 3D templates, the framework learns category-level models which are deformed to recover the 3D shape of the depicted object. The instance-specific deformations are predicted independently for each vertex of the learned 3D mesh, enabling the dynamic subdivision of the mesh during the training process. Experiments show that the proposed framework can distinguish between different object categories and learn category-specific shape priors in an unsupervised manner. Predicted shapes are smooth and can leverage from multiple steps of subdivision during the training process, obtaining comparable or state-of-the-art results on two public datasets. Models and code are publicly released. △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: Accepted at 3DV 2021

arXiv:2106.10980 [pdf, other]

SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild

Authors: Ariel Caputo, Andrea Giachetti, Simone Soso, Deborah Pintani, Andrea D'Eusanio, Stefano Pini, Guido Borghi, Alessandro Simoni, Roberto Vezzani, Rita Cucchiara, Andrea Ranieri, Franca Giannini, Katia Lupinetti, Marina Monti, Mehran Maghoumi, Joseph J. LaViola Jr, Minh-Quan Le, Hai-Dang Nguyen, Minh-Triet Tran

Abstract: Gesture recognition is a fundamental tool to enable novel interaction paradigms in a variety of application scenarios like Mixed Reality environments, touchless public kiosks, entertainment systems, and more. Recognition of hand gestures can be nowadays performed directly from the stream of hand skeletons estimated by software provided by low-cost trackers (Ultraleap) and MR headsets (Hololens, Oc… ▽ More Gesture recognition is a fundamental tool to enable novel interaction paradigms in a variety of application scenarios like Mixed Reality environments, touchless public kiosks, entertainment systems, and more. Recognition of hand gestures can be nowadays performed directly from the stream of hand skeletons estimated by software provided by low-cost trackers (Ultraleap) and MR headsets (Hololens, Oculus Quest) or by video processing software modules (e.g. Google Mediapipe). Despite the recent advancements in gesture and action recognition from skeletons, it is unclear how well the current state-of-the-art techniques can perform in a real-world scenario for the recognition of a wide set of heterogeneous gestures, as many benchmarks do not test online recognition and use limited dictionaries. This motivated the proposal of the SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild. For this contest, we created a novel dataset with heterogeneous gestures featuring different types and duration. These gestures have to be found inside sequences in an online recognition scenario. This paper presents the result of the contest, showing the performances of the techniques proposed by four research groups on the challenging task compared with a simple baseline method. △ Less

Submitted 21 June, 2021; originally announced June 2021.

Comments: 12 pages, to be published on Computers & Graphics

arXiv:2104.12546 [pdf, other]

doi 10.1016/j.procs.2022.09.112

The Effects of Air Quality on the Spread of the COVID-19 Pandemic in Italy: An Artificial Intelligence Approach

Authors: Andrea Loreggia, Anna Passarelli, Maria Silvia Pini

Abstract: The COVID-19 pandemic considerably affects public health systems around the world. The lack of knowledge about the virus, the extension of this phenomenon, and the speed of the evolution of the infection are all factors that highlight the necessity of employing new approaches to study these events. Artificial intelligence techniques may be useful in analyzing data related to areas affected by the… ▽ More The COVID-19 pandemic considerably affects public health systems around the world. The lack of knowledge about the virus, the extension of this phenomenon, and the speed of the evolution of the infection are all factors that highlight the necessity of employing new approaches to study these events. Artificial intelligence techniques may be useful in analyzing data related to areas affected by the virus. The aim of this work is to investigate any possible relationships between air quality and confirmed cases of COVID-19 in Italian districts. Specifically, we report an analysis of the correlation between daily COVID-19 cases and environmental factors, such as temperature, relative humidity, and atmospheric pollutants. Our analysis confirms a significant association of some environmental parameters with the spread of the virus. This suggests that machine learning models trained on the environmental parameters to predict the number of future infected cases may be accurate. Predictive models may be useful for hel** institutions in making decisions for protecting the population and contrasting the pandemic. △ Less

Submitted 31 August, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

Report number: Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 26th International Conference KES2022

Journal ref: Procedia Computer Science Volume 207, 2022, Pages 573-582

arXiv:1909.08996 [pdf, ps, other]

doi 10.1007/s10458-021-09504-y

Voting with Random Classifiers (VORACE): Theoretical and Experimental Analysis

Authors: Cristina Cornelio, Michele Donini, Andrea Loreggia, Maria Silvia Pini, Francesca Rossi

Abstract: In many machine learning scenarios, looking for the best classifier that fits a particular dataset can be very costly in terms of time and resources. Moreover, it can require deep knowledge of the specific domain. We propose a new technique which does not require profound expertise in the domain and avoids the commonly used strategy of hyper-parameter tuning and model selection. Our method is an i… ▽ More In many machine learning scenarios, looking for the best classifier that fits a particular dataset can be very costly in terms of time and resources. Moreover, it can require deep knowledge of the specific domain. We propose a new technique which does not require profound expertise in the domain and avoids the commonly used strategy of hyper-parameter tuning and model selection. Our method is an innovative ensemble technique that uses voting rules over a set of randomly-generated classifiers. Given a new input sample, we interpret the output of each classifier as a ranking over the set of possible classes. We then aggregate these output rankings using a voting rule, which treats them as preferences over the classes. We show that our approach obtains good results compared to the state-of-the-art, both providing a theoretical analysis and an empirical evaluation of the approach on several datasets. △ Less

Submitted 24 May, 2021; v1 submitted 18 September, 2019; originally announced September 2019.

Journal ref: Autonomous Agents and Multi-Agent Systems volume 35, Article number: 22 (2021)

arXiv:1903.01489 [pdf, other]

doi 10.1007/s11042-018-7040-z

M-VAD Names: a Dataset for Video Captioning with Naming

Authors: Stefano Pini, Marcella Cornia, Federico Bolelli, Lorenzo Baraldi, Rita Cucchiara

Abstract: Current movie captioning architectures are not capable of mentioning characters with their proper name, replacing them with a generic "someone" tag. The lack of movie description datasets with characters' visual annotations surely plays a relevant role in this shortage. Recently, we proposed to extend the M-VAD dataset by introducing such information. In this paper, we present an improved version… ▽ More Current movie captioning architectures are not capable of mentioning characters with their proper name, replacing them with a generic "someone" tag. The lack of movie description datasets with characters' visual annotations surely plays a relevant role in this shortage. Recently, we proposed to extend the M-VAD dataset by introducing such information. In this paper, we present an improved version of the dataset, namely M-VAD Names, and its semi-automatic annotation procedure. The resulting dataset contains 63k visual tracks and 34k textual mentions, all associated with character identities. To showcase the features of the dataset and quantify the complexity of the naming task, we investigate multimodal architectures to replace the "someone" tags with proper character names in existing video captions. The evaluation is further extended by testing this application on videos outside of the M-VAD Names dataset. △ Less

Submitted 4 March, 2019; originally announced March 2019.

Comments: Source Code: https://github.com/aimagelab/mvad-names-dataset - Video Demo: https://youtu.be/dOvtAXbOOH4

Journal ref: Multimedia Tools and Applications (2018)

arXiv:1812.02041 [pdf, other]

Learn to See by Events: Color Frame Synthesis from Event and RGB Cameras

Authors: Stefano Pini, Guido Borghi, Roberto Vezzani

Abstract: Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output a corresponding stream of asynchronous events. Despite having multiple advantages with respect to traditional cameras, their use is partially prevented by the limited applicability of traditional data processing and vision algorithms. To this aim… ▽ More Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output a corresponding stream of asynchronous events. Despite having multiple advantages with respect to traditional cameras, their use is partially prevented by the limited applicability of traditional data processing and vision algorithms. To this aim, we present a framework which exploits the output stream of event cameras to synthesize RGB frames, relying on an initial or a periodic set of color key-frames and the sequence of intermediate events. Differently from existing work, we propose a deep learning-based frame synthesis method, consisting of an adversarial architecture combined with a recurrent module. Qualitative results and quantitative per-pixel, perceptual, and semantic evaluation on four public datasets confirm the quality of the synthesized images. △ Less

Submitted 10 December, 2019; v1 submitted 5 December, 2018; originally announced December 2018.

Comments: Accepted as full oral at the 15th International Conference on Computer Vision Theory and Applications (VISAPP) 2020

arXiv:1805.11927 [pdf, other]

Learning to Generate Facial Depth Maps

Authors: Stefano Pini, Filippo Grazioli, Guido Borghi, Roberto Vezzani, Rita Cucchiara

Abstract: In this paper, an adversarial architecture for facial depth map estimation from monocular intensity images is presented. By following an image-to-image approach, we combine the advantages of supervised learning and adversarial training, proposing a conditional Generative Adversarial Network that effectively learns to translate intensity face images into the corresponding depth maps. Two public dat… ▽ More In this paper, an adversarial architecture for facial depth map estimation from monocular intensity images is presented. By following an image-to-image approach, we combine the advantages of supervised learning and adversarial training, proposing a conditional Generative Adversarial Network that effectively learns to translate intensity face images into the corresponding depth maps. Two public datasets, namely Biwi database and Pandora dataset, are exploited to demonstrate that the proposed model generates high-quality synthetic depth images, both in terms of visual appearance and informative content. Furthermore, we show that the model is capable of predicting distinctive facial details by testing the generated depth maps through a deep model trained on authentic depth maps for the face verification task. △ Less

Submitted 30 May, 2018; originally announced May 2018.

arXiv:1007.5120 [pdf, ps, other]

Stable marriage problems with quantitative preferences

Authors: Maria Silvia Pini, Francesca Rossi, Brent Venable, Toby Walsh

Abstract: The stable marriage problem is a well-known problem of matching men to women so that no man and woman, who are not married to each other, both prefer each other. Such a problem has a wide variety of practical applications, ranging from matching resident doctors to hospitals, to matching students to schools or more generally to any two-sided market. In the classical stable marriage problem, both me… ▽ More The stable marriage problem is a well-known problem of matching men to women so that no man and woman, who are not married to each other, both prefer each other. Such a problem has a wide variety of practical applications, ranging from matching resident doctors to hospitals, to matching students to schools or more generally to any two-sided market. In the classical stable marriage problem, both men and women express a strict preference order over the members of the other sex, in a qualitative way. Here we consider stable marriage problems with quantitative preferences: each man (resp., woman) provides a score for each woman (resp., man). Such problems are more expressive than the classical stable marriage problems. Moreover, in some real-life situations it is more natural to express scores (to model, for example, profits or costs) rather than a qualitative preference ordering. In this context, we define new notions of stability and optimality, and we provide algorithms to find marriages which are stable and/or optimal according to these notions. While expressivity greatly increases by adopting quantitative preferences, we show that in most cases the desired solutions can be found by adapting existing algorithms for the classical stable marriage problem. △ Less

Submitted 29 July, 2010; originally announced July 2010.

Comments: To appear in Proceedings of Third International Workshop on Computational Social Choice, Dusseldorf, Germany, September 13-16, 2010

ACM Class: I.2.4

arXiv:1007.0859 [pdf]

Local search for stable marriage problems

Authors: M. Gelain, M. S. Pini, F. Rossi, K. B. Venable, T. Walsh

Abstract: The stable marriage (SM) problem has a wide variety of practical applications, ranging from matching resident doctors to hospitals, to matching students to schools, or more generally to any two-sided market. In the classical formulation, n men and n women express their preferences (via a strict total order) over the members of the other sex. Solving a SM problem means finding a stable marriage whe… ▽ More The stable marriage (SM) problem has a wide variety of practical applications, ranging from matching resident doctors to hospitals, to matching students to schools, or more generally to any two-sided market. In the classical formulation, n men and n women express their preferences (via a strict total order) over the members of the other sex. Solving a SM problem means finding a stable marriage where stability is an envy-free notion: no man and woman who are not married to each other would both prefer each other to their partners or to being single. We consider both the classical stable marriage problem and one of its useful variations (denoted SMTI) where the men and women express their preferences in the form of an incomplete preference list with ties over a subset of the members of the other sex. Matchings are permitted only with people who appear in these lists, an we try to find a stable matching that marries as many people as possible. Whilst the SM problem is polynomial to solve, the SMTI problem is NP-hard. We propose to tackle both problems via a local search approach, which exploits properties of the problems to reduce the size of the neighborhood and to make local moves efficiently. We evaluate empirically our algorithm for SM problems by measuring its runtime behaviour and its ability to sample the lattice of all possible stable marriages. We evaluate our algorithm for SMTI problems in terms of both its runtime behaviour and its ability to find a maximum cardinality stable marriage.For SM problems, the number of steps of our algorithm grows only as O(nlog(n)), and that it samples very well the set of all stable marriages. It is thus a fair and efficient approach to generate stable marriages.Furthermore, our approach for SMTI problems is able to solve large problems, quickly returning stable matchings of large and often optimal size despite the NP-hardness of this problem. △ Less

Submitted 6 July, 2010; originally announced July 2010.

Comments: 12 pages, Proc. COMSOC 2010 (Third International Workshop on Computational Social Choice)

arXiv:1007.0637 [pdf]

Local search for stable marriage problems with ties and incomplete lists

Authors: Mirco Gelain, Maria Silvia Pini, Francesca RossI, Kristen Brent Venable, Toby Walsh

Abstract: The stable marriage problem has a wide variety of practical applications, ranging from matching resident doctors to hospitals, to matching students to schools, or more generally to any two-sided market. We consider a useful variation of the stable marriage problem, where the men and women express their preferences using a preference list with ties over a subset of the members of the other sex. Mat… ▽ More The stable marriage problem has a wide variety of practical applications, ranging from matching resident doctors to hospitals, to matching students to schools, or more generally to any two-sided market. We consider a useful variation of the stable marriage problem, where the men and women express their preferences using a preference list with ties over a subset of the members of the other sex. Matchings are permitted only with people who appear in these preference lists. In this setting, we study the problem of finding a stable matching that marries as many people as possible. Stability is an envy-free notion: no man and woman who are not married to each other would both prefer each other to their partners or to being single. This problem is NP-hard. We tackle this problem using local search, exploiting properties of the problem to reduce the size of the neighborhood and to make local moves efficiently. Experimental results show that this approach is able to solve large problems, quickly returning stable matchings of large and often optimal size. △ Less

Submitted 5 July, 2010; originally announced July 2010.

Comments: 12 pages, Proc. PRICAI 2010 (11th Pacific Rim International Conference on Artificial Intelligence), Byoung-Tak Zhang and Mehmet A. Orgun eds., Springer LNAI

Showing 1–13 of 13 results for author: Pini, S