Search | arXiv e-print repository

Semantic Scene Segmentation for Robotics

Authors: Juana Valeria Hurtado, Abhinav Valada

Abstract: Comprehensive scene understanding is a critical enabler of robot autonomy. Semantic segmentation is one of the key scene understanding tasks which is pivotal for several robotics applications including autonomous driving, domestic service robotics, last mile delivery, amongst many others. Semantic segmentation is a dense prediction task that aims to provide a scene representation in which each pix… ▽ More Comprehensive scene understanding is a critical enabler of robot autonomy. Semantic segmentation is one of the key scene understanding tasks which is pivotal for several robotics applications including autonomous driving, domestic service robotics, last mile delivery, amongst many others. Semantic segmentation is a dense prediction task that aims to provide a scene representation in which each pixel of an image is assigned a semantic class label. Therefore, semantic segmentation considers the full scene context, incorporating the object category, location, and shape of all the scene elements, including the background. Numerous algorithms have been proposed for semantic segmentation over the years. However, the recent advances in deep learning combined with the boost in the computational capacity and the availability of large-scale labeled datasets have led to significant advances in semantic segmentation. In this chapter, we introduce the task of semantic segmentation and present the deep learning techniques that have been proposed to address this task over the years. We first define the task of semantic segmentation and contrast it with other closely related scene understanding problems. We detail different algorithms and architectures for semantic segmentation and the commonly employed loss functions. Furthermore, we present an overview of datasets, benchmarks, and metrics that are used in semantic segmentation. We conclude the chapter with a discussion of challenges and opportunities for further research in this area. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Journal ref: Deep Learning for Robot Perception and Cognition, chapter 12, pp. 279-311, Elsevier, 2022

arXiv:2310.11797 [pdf, other]

Panoptic Out-of-Distribution Segmentation

Authors: Rohit Mohan, Kiran Kumaraswamy, Juana Valeria Hurtado, Kürsat Petek, Abhinav Valada

Abstract: Deep learning has led to remarkable strides in scene understanding with panoptic segmentation emerging as a key holistic scene interpretation task. However, the performance of panoptic segmentation is severely impacted in the presence of out-of-distribution (OOD) objects i.e. categories of objects that deviate from the training distribution. To overcome this limitation, we propose Panoptic Out-of… ▽ More Deep learning has led to remarkable strides in scene understanding with panoptic segmentation emerging as a key holistic scene interpretation task. However, the performance of panoptic segmentation is severely impacted in the presence of out-of-distribution (OOD) objects i.e. categories of objects that deviate from the training distribution. To overcome this limitation, we propose Panoptic Out-of Distribution Segmentation for joint pixel-level semantic in-distribution and out-of-distribution classification with instance prediction. We extend two established panoptic segmentation benchmarks, Cityscapes and BDD100K, with out-of-distribution instance segmentation annotations, propose suitable evaluation metrics, and present multiple strong baselines. Importantly, we propose the novel PoDS architecture with a shared backbone, an OOD contextual module for learning global and local OOD object cues, and dual symmetrical decoders with task-specific heads that employ our alignment-mismatch strategy for better OOD generalization. Combined with our data augmentation strategy, this approach facilitates progressive learning of out-of-distribution objects while maintaining in-distribution performance. We perform extensive evaluations that demonstrate that our proposed PoDS network effectively addresses the main challenges and substantially outperforms the baselines. We make the dataset, code, and trained models publicly available at http://pods.cs.uni-freiburg.de. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2207.03444 [pdf, other]

Fairness and Bias in Robot Learning

Authors: Laura Londoño, Juana Valeria Hurtado, Nora Hertz, Philipp Kellmeyer, Silja Voeneky, Abhinav Valada

Abstract: Machine learning has significantly enhanced the abilities of robots, enabling them to perform a wide range of tasks in human environments and adapt to our uncertain real world. Recent works in various machine learning domains have highlighted the importance of accounting for fairness to ensure that these algorithms do not reproduce human biases and consequently lead to discriminatory outcomes. Wit… ▽ More Machine learning has significantly enhanced the abilities of robots, enabling them to perform a wide range of tasks in human environments and adapt to our uncertain real world. Recent works in various machine learning domains have highlighted the importance of accounting for fairness to ensure that these algorithms do not reproduce human biases and consequently lead to discriminatory outcomes. With robot learning systems increasingly performing more and more tasks in our everyday lives, it is crucial to understand the influence of such biases to prevent unintended behavior toward certain groups of people. In this work, we present the first survey on fairness in robot learning from an interdisciplinary perspective spanning technical, ethical, and legal challenges. We propose a taxonomy for sources of bias and the resulting types of discrimination due to them. Using examples from different robot learning domains, we examine scenarios of unfair outcomes and strategies to mitigate them. We present early advances in the field by covering different fairness definitions, ethical and legal considerations, and methods for fair robot learning. With this work, we aim to pave the road for groundbreaking developments in fair robot learning. △ Less

Submitted 29 October, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

arXiv:2201.10853 [pdf, ps, other]

Feminist Perspective on Robot Learning Processes

Authors: Juana Valeria Hurtado, Valentina Mejia

Abstract: As different research works report and daily life experiences confirm, learning models can result in biased outcomes. The biased learned models usually replicate historical discrimination in society and typically negatively affect the less represented identities. Robots are equipped with these models that allow them to operate, performing tasks more complex every day. The learning process consists… ▽ More As different research works report and daily life experiences confirm, learning models can result in biased outcomes. The biased learned models usually replicate historical discrimination in society and typically negatively affect the less represented identities. Robots are equipped with these models that allow them to operate, performing tasks more complex every day. The learning process consists of different stages depending on human judgments. Moreover, the resulting learned models for robot decisions rely on recorded labeled data or demonstrations. Therefore, the robot learning process is susceptible to bias linked to human behavior in society. This imposes a potential danger, especially when robots operate around humans and the learning process can reflect the social unfairness present today. Different feminist proposals study social inequality and provide essential perspectives towards removing bias in various fields. What is more, feminism allowed and still allows to reconfigure numerous social dynamics and stereotypes advocating for equality across people through their diversity. Consequently, we provide a feminist perspective on the robot learning process in this work. We base our discussion on intersectional feminism, community feminism, decolonial feminism, and pedagogy perspectives, and we frame our work in a feminist robotics approach. In this paper, we present an initial discussion to emphasize the relevance of feminist perspectives to explore, foresee, en eventually correct the biased robot decisions. △ Less

Submitted 26 January, 2022; originally announced January 2022.

arXiv:2109.03805 [pdf, other]

Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking

Authors: Whye Kit Fong, Rohit Mohan, Juana Valeria Hurtado, Lubing Zhou, Holger Caesar, Oscar Beijbom, Abhinav Valada

Abstract: Panoptic scene understanding and tracking of dynamic agents are essential for robots and automated vehicles to navigate in urban environments. As LiDARs provide accurate illumination-independent geometric depictions of the scene, performing these tasks using LiDAR point clouds provides reliable predictions. However, existing datasets lack diversity in the type of urban scenes and have a limited nu… ▽ More Panoptic scene understanding and tracking of dynamic agents are essential for robots and automated vehicles to navigate in urban environments. As LiDARs provide accurate illumination-independent geometric depictions of the scene, performing these tasks using LiDAR point clouds provides reliable predictions. However, existing datasets lack diversity in the type of urban scenes and have a limited number of dynamic object instances which hinders both learning of these tasks as well as credible benchmarking of the developed methods. In this paper, we introduce the large-scale Panoptic nuScenes benchmark dataset that extends our popular nuScenes dataset with point-wise groundtruth annotations for semantic segmentation, panoptic segmentation, and panoptic tracking tasks. To facilitate comparison, we provide several strong baselines for each of these tasks on our proposed dataset. Moreover, we analyze the drawbacks of the existing metrics for panoptic tracking and propose the novel instance-centric PAT metric that addresses the concerns. We present exhaustive experiments that demonstrate the utility of Panoptic nuScenes compared to existing datasets and make the online evaluation server available at nuScenes.org. We believe that this extension will accelerate the research of novel methods for scene understanding of dynamic urban environments. △ Less

Submitted 23 December, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

Comments: The benchmark is available at https://www.nuscenes.org

arXiv:2103.01353 [pdf, other]

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Authors: Francisco Rivera Valverde, Juana Valeria Hurtado, Abhinav Valada

Abstract: Attributes of sound inherent to objects can provide valuable cues to learn rich representations for object detection and tracking. Furthermore, the co-occurrence of audiovisual events in videos can be exploited to localize objects over the image field by solely monitoring the sound in the environment. Thus far, this has only been feasible in scenarios where the camera is static and for single obje… ▽ More Attributes of sound inherent to objects can provide valuable cues to learn rich representations for object detection and tracking. Furthermore, the co-occurrence of audiovisual events in videos can be exploited to localize objects over the image field by solely monitoring the sound in the environment. Thus far, this has only been feasible in scenarios where the camera is static and for single object detection. Moreover, the robustness of these methods has been limited as they primarily rely on RGB images which are highly susceptible to illumination and weather changes. In this work, we present the novel self-supervised MM-DistillNet framework consisting of multiple teachers that leverage diverse modalities including RGB, depth and thermal images, to simultaneously exploit complementary cues and distill knowledge into a single audio student network. We propose the new MTA loss function that facilitates the distillation of information from multimodal teachers in a self-supervised manner. Additionally, we propose a novel self-supervised pretext task for the audio student that enables us to not rely on labor-intensive manual annotations. We introduce a large-scale multimodal dataset with over 113,000 time-synchronized frames of RGB, depth, thermal, and audio modalities. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods while being able to detect multiple objects using only sound during inference and even while moving. △ Less

Submitted 1 March, 2021; originally announced March 2021.

Comments: Accepted at CVPR 2021. Dataset, code and models are available at http://rl.uni-freiburg.de/research/multimodal-distill

Journal ref: IEEE/ CVF International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11612-11621, 2021

arXiv:2101.02647 [pdf, other]

doi 10.3389/frobt.2021.650325

From Learning to Relearning: A Framework for Diminishing Bias in Social Robot Navigation

Authors: Juana Valeria Hurtado, Laura Londoño, Abhinav Valada

Abstract: The exponentially increasing advances in robotics and machine learning are facilitating the transition of robots from being confined to controlled industrial spaces to performing novel everyday tasks in domestic and urban environments. In order to make the presence of robots safe as well as comfortable for humans, and to facilitate their acceptance in public environments, they are often equipped w… ▽ More The exponentially increasing advances in robotics and machine learning are facilitating the transition of robots from being confined to controlled industrial spaces to performing novel everyday tasks in domestic and urban environments. In order to make the presence of robots safe as well as comfortable for humans, and to facilitate their acceptance in public environments, they are often equipped with social abilities for navigation and interaction. Socially compliant robot navigation is increasingly being learned from human observations or demonstrations. We argue that these techniques that typically aim to mimic human behavior do not guarantee fair behavior. As a consequence, social navigation models can replicate, promote, and amplify societal unfairness such as discrimination and segregation. In this work, we investigate a framework for diminishing bias in social robot navigation models so that robots are equipped with the capability to plan as well as adapt their paths based on both physical and social demands. Our proposed framework consists of two components: \textit{learning} which incorporates social context into the learning process to account for safety and comfort, and \textit{relearning} to detect and correct potentially harmful outcomes before the onset. We provide both technological and societal analysis using three diverse case studies in different social scenarios of interaction. Moreover, we present ethical implications of deploying robots in social environments and propose potential solutions. Through this study, we highlight the importance and advocate for fairness in human-robot interactions in order to promote more equitable social relationships, roles, and dynamics and consequently positively influence our society. △ Less

Submitted 3 March, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

Journal ref: Frontiers in Robotics and AI, 2021

arXiv:2004.08189 [pdf, other]

MOPT: Multi-Object Panoptic Tracking

Authors: Juana Valeria Hurtado, Rohit Mohan, Wolfram Burgard, Abhinav Valada

Abstract: Comprehensive understanding of dynamic scenes is a critical prerequisite for intelligent robots to autonomously operate in their environment. Research in this domain, which encompasses diverse perception problems, has primarily been focused on addressing specific tasks individually rather than modeling the ability to understand dynamic scenes holistically. In this paper, we introduce a novel perce… ▽ More Comprehensive understanding of dynamic scenes is a critical prerequisite for intelligent robots to autonomously operate in their environment. Research in this domain, which encompasses diverse perception problems, has primarily been focused on addressing specific tasks individually rather than modeling the ability to understand dynamic scenes holistically. In this paper, we introduce a novel perception task denoted as multi-object panoptic tracking (MOPT), which unifies the conventionally disjoint tasks of semantic segmentation, instance segmentation, and multi-object tracking. MOPT allows for exploiting pixel-level semantic information of 'thing' and 'stuff' classes, temporal coherence, and pixel-level associations over time, for the mutual benefit of each of the individual sub-problems. To facilitate quantitative evaluations of MOPT in a unified manner, we propose the soft panoptic tracking quality (sPTQ) metric. As a first step towards addressing this task, we propose the novel PanopticTrackNet architecture that builds upon the state-of-the-art top-down panoptic segmentation network EfficientPS by adding a new tracking head to simultaneously learn all sub-tasks in an end-to-end manner. Additionally, we present several strong baselines that combine predictions from state-of-the-art panoptic segmentation and multi-object tracking models for comparison. We present extensive quantitative and qualitative evaluations of both vision-based and LiDAR-based MOPT that demonstrate encouraging results. △ Less

Submitted 27 May, 2020; v1 submitted 17 April, 2020; originally announced April 2020.

Comments: Code & models are available at http://rl.uni-freiburg.de/research/panoptictracking

Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Scalability in Autonomous Driving, 2020

Showing 1–8 of 8 results for author: Hurtado, J V