-
Segmentation Re-thinking Uncertainty Estimation Metrics for Semantic Segmentation
Authors:
Qitian Ma,
Shyam Nanda Rai,
Carlo Masone,
Tatiana Tommasi
Abstract:
In the domain of computer vision, semantic segmentation emerges as a fundamental application within machine learning, wherein individual pixels of an image are classified into distinct semantic categories. This task transcends traditional accuracy metrics by incorporating uncertainty quantification, a critical measure for assessing the reliability of each segmentation prediction. Such quantificati…
▽ More
In the domain of computer vision, semantic segmentation emerges as a fundamental application within machine learning, wherein individual pixels of an image are classified into distinct semantic categories. This task transcends traditional accuracy metrics by incorporating uncertainty quantification, a critical measure for assessing the reliability of each segmentation prediction. Such quantification is instrumental in facilitating informed decision-making, particularly in applications where precision is paramount. Within this nuanced framework, the metric known as PAvPU (Patch Accuracy versus Patch Uncertainty) has been developed as a specialized tool for evaluating entropy-based uncertainty in image segmentation tasks. However, our investigation identifies three core deficiencies within the PAvPU framework and proposes robust solutions aimed at refining the metric. By addressing these issues, we aim to enhance the reliability and applicability of uncertainty quantification, especially in scenarios that demand high levels of safety and accuracy, thus contributing to the advancement of semantic segmentation methodologies in critical applications.
△ Less
Submitted 8 April, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
The Robust Semantic Segmentation UNCV2023 Challenge Results
Authors:
Xuanlong Yu,
Yi Zuo,
Zitao Wang,
Xiaowen Zhang,
Jiaxuan Zhao,
Yuting Yang,
Licheng Jiao,
Rui Peng,
Xinyi Wang,
Junpei Zhang,
Kexin Zhang,
Fang Liu,
Roberto Alcover-Couso,
Juan C. SanMiguel,
Marcos Escudero-ViƱolo,
Hanlin Tian,
Kenta Matsui,
Tianhao Wang,
Fahmy Adan,
Zhitong Gao,
Xuming He,
Quentin Bouniot,
Hossein Moghaddam,
Shyam Nandan Rai,
Fabio Cermelli
, et al. (12 additional authors not shown)
Abstract:
This paper outlines the winning solutions employed in addressing the MUAD uncertainty quantification challenge held at ICCV 2023. The challenge was centered around semantic segmentation in urban environments, with a particular focus on natural adversarial scenarios. The report presents the results of 19 submitted entries, with numerous techniques drawing inspiration from cutting-edge uncertainty q…
▽ More
This paper outlines the winning solutions employed in addressing the MUAD uncertainty quantification challenge held at ICCV 2023. The challenge was centered around semantic segmentation in urban environments, with a particular focus on natural adversarial scenarios. The report presents the results of 19 submitted entries, with numerous techniques drawing inspiration from cutting-edge uncertainty quantification methodologies presented at prominent conferences in the fields of computer vision and machine learning and journals over the past few years. Within this document, the challenge is introduced, shedding light on its purpose and objectives, which primarily revolved around enhancing the robustness of semantic segmentation in urban scenes under varying natural adversarial conditions. The report then delves into the top-performing solutions. Moreover, the document aims to provide a comprehensive overview of the diverse solutions deployed by all participants. By doing so, it seeks to offer readers a deeper insight into the array of strategies that can be leveraged to effectively handle the inherent uncertainties associated with autonomous driving and semantic segmentation, especially within urban environments.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Mask2Anomaly: Mask Transformer for Universal Open-set Segmentation
Authors:
Shyam Nandan Rai,
Fabio Cermelli,
Barbara Caputo,
Carlo Masone
Abstract:
Segmenting unknown or anomalous object instances is a critical task in autonomous driving applications, and it is approached traditionally as a per-pixel classification problem. However, reasoning individually about each pixel without considering their contextual semantics results in high uncertainty around the objects' boundaries and numerous false positives. We propose a paradigm change by shift…
▽ More
Segmenting unknown or anomalous object instances is a critical task in autonomous driving applications, and it is approached traditionally as a per-pixel classification problem. However, reasoning individually about each pixel without considering their contextual semantics results in high uncertainty around the objects' boundaries and numerous false positives. We propose a paradigm change by shifting from a per-pixel classification to a mask classification. Our mask-based method, Mask2Anomaly, demonstrates the feasibility of integrating a mask-classification architecture to jointly address anomaly segmentation, open-set semantic segmentation, and open-set panoptic segmentation. Mask2Anomaly includes several technical novelties that are designed to improve the detection of anomalies/unknown objects: i) a global masked attention module to focus individually on the foreground and background regions; ii) a mask contrastive learning that maximizes the margin between an anomaly and known classes; iii) a mask refinement solution to reduce false positives; and iv) a novel approach to mine unknown instances based on the mask-architecture properties. By comprehensive qualitative and qualitative evaluation, we show Mask2Anomaly achieves new state-of-the-art results across the benchmarks of anomaly segmentation, open-set semantic segmentation, and open-set panoptic segmentation.
△ Less
Submitted 12 September, 2023; v1 submitted 8 September, 2023;
originally announced September 2023.
-
Unmasking Anomalies in Road-Scene Segmentation
Authors:
Shyam Nandan Rai,
Fabio Cermelli,
Dario Fontanel,
Carlo Masone,
Barbara Caputo
Abstract:
Anomaly segmentation is a critical task for driving applications, and it is approached traditionally as a per-pixel classification problem. However, reasoning individually about each pixel without considering their contextual semantics results in high uncertainty around the objects' boundaries and numerous false positives. We propose a paradigm change by shifting from a per-pixel classification to…
▽ More
Anomaly segmentation is a critical task for driving applications, and it is approached traditionally as a per-pixel classification problem. However, reasoning individually about each pixel without considering their contextual semantics results in high uncertainty around the objects' boundaries and numerous false positives. We propose a paradigm change by shifting from a per-pixel classification to a mask classification. Our mask-based method, Mask2Anomaly, demonstrates the feasibility of integrating an anomaly detection method in a mask-classification architecture. Mask2Anomaly includes several technical novelties that are designed to improve the detection of anomalies in masks: i) a global masked attention module to focus individually on the foreground and background regions; ii) a mask contrastive learning that maximizes the margin between an anomaly and known classes; and iii) a mask refinement solution to reduce false positives. Mask2Anomaly achieves new state-of-the-art results across a range of benchmarks, both in the per-pixel and component-level evaluations. In particular, Mask2Anomaly reduces the average false positives rate by 60% wrt the previous state-of-the-art. Github page: https://github.com/shyam671/Mask2Anomaly-Unmasking-Anomalies-in-Road-Scene-Segmentation.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Top-k Community Similarity Search Over Large-Scale Road Networks (Technical Report)
Authors:
Niranjan Rai,
Xiang Lian
Abstract:
With the urbanization and development of infrastructure, the community search over road networks has become increasingly important in many real applications such as urban/city planning, social study on local communities, and community recommendations by real estate agencies. In this paper, we propose a novel problem, namely top-k community similarity search (Top-kCS2) over road networks, which eff…
▽ More
With the urbanization and development of infrastructure, the community search over road networks has become increasingly important in many real applications such as urban/city planning, social study on local communities, and community recommendations by real estate agencies. In this paper, we propose a novel problem, namely top-k community similarity search (Top-kCS2) over road networks, which efficiently and effectively obtains k spatial communities that are the most similar to a given query community in road-network graphs. In order to efficiently and effectively tackle the Top-kCS2 problem, in this paper, we will design an effective similarity measure between spatial communities, and propose a framework for retrieving Top-kCS2 query answers, which integrates offline pre-processing and online computation phases. Moreover, we also consider a variant, namely continuous top-k community similarity search (CTop-kCS2), where the query community continuously moves along a query line segment. We develop an efficient algorithm to split query line segments into intervals, incrementally obtain similar candidate communities for each interval and define actual CTop-kCS2 query answers. Extensive experiments have been conducted on real and synthetic data sets to confirm the efficiency and effectiveness of our proposed Top-kCS2 and CTop-kCS2 approaches under various parameter setting
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
Home Action Genome: Cooperative Compositional Action Understanding
Authors:
Nishant Rai,
Haofeng Chen,
**gwei Ji,
Rishi Desai,
Kazuki Kozuka,
Shun Ishizaka,
Ehsan Adeli,
Juan Carlos Niebles
Abstract:
Existing research on action recognition treats activities as monolithic events occurring in videos. Recently, the benefits of formulating actions as a combination of atomic-actions have shown promise in improving action understanding with the emergence of datasets containing such annotations, allowing us to learn representations capturing this information. However, there remains a lack of studies…
▽ More
Existing research on action recognition treats activities as monolithic events occurring in videos. Recently, the benefits of formulating actions as a combination of atomic-actions have shown promise in improving action understanding with the emergence of datasets containing such annotations, allowing us to learn representations capturing this information. However, there remains a lack of studies that extend action composition and leverage multiple viewpoints and multiple modalities of data for representation learning. To promote research in this direction, we introduce Home Action Genome (HOMAGE): a multi-view action dataset with multiple modalities and view-points supplemented with hierarchical activity and atomic action labels together with dense scene composition labels. Leveraging rich multi-modal and multi-view settings, we propose Cooperative Compositional Action Understanding (CCAU), a cooperative learning framework for hierarchical action recognition that is aware of compositional action elements. CCAU shows consistent performance improvements across all modalities. Furthermore, we demonstrate the utility of co-learning compositions in few-shot action recognition by achieving 28.6% mAP with just a single sample.
△ Less
Submitted 11 May, 2021;
originally announced May 2021.
-
Probabilistic Top-k Dominating Queries in Distributed Uncertain Databases (Technical Report)
Authors:
Niranjan Rai,
Xiang Lian
Abstract:
In many real-world applications such as business planning and sensor data monitoring, one important, yet challenging, the task is to rank objects(e.g., products, documents, or spatial objects) based on their ranking scores and efficiently return those objects with the highest scores. In practice, due to the unreliability of data sources, many real-world objects often contain noises and are thus im…
▽ More
In many real-world applications such as business planning and sensor data monitoring, one important, yet challenging, the task is to rank objects(e.g., products, documents, or spatial objects) based on their ranking scores and efficiently return those objects with the highest scores. In practice, due to the unreliability of data sources, many real-world objects often contain noises and are thus imprecise and uncertain. In this paper, we study the problem of probabilistic top-k dominating(PTD) query on such large-scale uncertain data in a distributed environment, which retrieves k uncertain objects from distributed uncertain databases(on multiple distributed servers), having the largest ranking scores with high confidences. In order to efficiently tackle the distributed PTD problem, we propose a MapReduce framework for processing distributed PTD queries over distributed uncertain databases. In this MapReduce framework, we design effective pruning strategies to filter out false alarms in the distributed setting, propose cost-model-based index distribution mechanisms over servers, and develop efficient distributed PTD query processing algorithms. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed distributed PTD approach on both real and synthetic data sets through various experimental settings.
△ Less
Submitted 12 May, 2021; v1 submitted 10 May, 2021;
originally announced May 2021.
-
Weak Multi-View Supervision for Surface Map** Estimation
Authors:
Nishant Rai,
Aidas Liaudanskas,
Srinivas Rao,
Rodrigo Ortiz Cayon,
Matteo Munaro,
Stefan Holzer
Abstract:
We propose a weakly-supervised multi-view learning approach to learn category-specific surface map** without dense annotations. We learn the underlying surface geometry of common categories, such as human faces, cars, and airplanes, given instances from those categories. While traditional approaches solve this problem using extensive supervision in the form of pixel-level annotations, we take ad…
▽ More
We propose a weakly-supervised multi-view learning approach to learn category-specific surface map** without dense annotations. We learn the underlying surface geometry of common categories, such as human faces, cars, and airplanes, given instances from those categories. While traditional approaches solve this problem using extensive supervision in the form of pixel-level annotations, we take advantage of the fact that pixel-level UV and mesh predictions can be combined with 3D reprojections to form consistency cycles. As a result of exploiting these cycles, we can establish a dense correspondence map** between image pixels and the mesh acting as a self-supervisory signal, which in turn helps improve our overall estimates. Our approach leverages information from multiple views of the object to establish additional consistency cycles, thus improving surface map** understanding without the need for explicit annotations. We also propose the use of deformation fields for predictions of an instance specific mesh. Given the lack of datasets providing multiple images of similar object instances from different viewpoints, we generate and release a multi-view ShapeNet Cars and Airplanes dataset created by rendering ShapeNet meshes using a 360 degree camera trajectory around the mesh. For the human faces category, we process and adapt an existing dataset to a multi-view setup. Through experimental evaluations, we show that, at test time, our method can generate accurate variations away from the mean shape, is multi-view consistent, and performs comparably to fully supervised approaches.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
CoCon: Cooperative-Contrastive Learning
Authors:
Nishant Rai,
Ehsan Adeli,
Kuan-Hui Lee,
Adrien Gaidon,
Juan Carlos Niebles
Abstract:
Labeling videos at scale is impractical. Consequently, self-supervised visual representation learning is key for efficient video analysis. Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge. However, when applied to real-world videos, contrastive learning may unknowingly lead to the separation of instances that contain s…
▽ More
Labeling videos at scale is impractical. Consequently, self-supervised visual representation learning is key for efficient video analysis. Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge. However, when applied to real-world videos, contrastive learning may unknowingly lead to the separation of instances that contain semantically similar events. In our work, we introduce a cooperative variant of contrastive learning to utilize complementary information across views and address this issue. We use data-driven sampling to leverage implicit relationships between multiple input video views, whether observed (e.g. RGB) or inferred (e.g. flow, segmentation masks, poses). We are one of the firsts to explore exploiting inter-instance relationships to drive learning. We experimentally evaluate our representations on the downstream task of action recognition. Our method achieves competitive performance on standard benchmarks (UCF101, HMDB51, Kinetics400). Furthermore, qualitative experiments illustrate that our models can capture higher-order class relationships.
△ Less
Submitted 30 April, 2021;
originally announced April 2021.
-
AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos
Authors:
Amlan Kar,
Nishant Rai,
Karan Sikka,
Gaurav Sharma
Abstract:
We propose a novel method for temporally pooling frames in a video for the task of human action recognition. The method is motivated by the observation that there are only a small number of frames which, together, contain sufficient information to discriminate an action class present in a video, from the rest. The proposed method learns to pool such discriminative and informative frames, while dis…
▽ More
We propose a novel method for temporally pooling frames in a video for the task of human action recognition. The method is motivated by the observation that there are only a small number of frames which, together, contain sufficient information to discriminate an action class present in a video, from the rest. The proposed method learns to pool such discriminative and informative frames, while discarding a majority of the non-informative frames in a single temporal scan of the video. Our algorithm does so by continuously predicting the discriminative importance of each video frame and subsequently pooling them in a deep learning framework. We show the effectiveness of our proposed pooling method on standard benchmarks where it consistently improves on baseline pooling methods, with both RGB and optical flow based Convolutional networks. Further, in combination with complementary video representations, we show results that are competitive with respect to the state-of-the-art results on two challenging and publicly available benchmark datasets.
△ Less
Submitted 25 June, 2017; v1 submitted 24 November, 2016;
originally announced November 2016.
-
Gradient Based Seeded Region Grow method for CT Angiographic Image Segmentation
Authors:
G. N. Harikrishna Rai,
T. R. Gopalakrishnan Nair
Abstract:
Segmentation of medical images using seeded region growing technique is increasingly becoming a popular method because of its ability to involve high-level knowledge of anatomical structures in seed selection process. Region based segmentation of medical images are widely used in varied clinical applications like visualization, bone detection, tumor detection and unsupervised image retrieval in…
▽ More
Segmentation of medical images using seeded region growing technique is increasingly becoming a popular method because of its ability to involve high-level knowledge of anatomical structures in seed selection process. Region based segmentation of medical images are widely used in varied clinical applications like visualization, bone detection, tumor detection and unsupervised image retrieval in clinical databases. As medical images are mostly fuzzy in nature, segmenting regions based intensity is the most challenging task. In this paper, we discuss about popular seeded region grow methodology used for segmenting anatomical structures in CT Angiography images. We have proposed a gradient based homogeneity criteria to control the region grow process while segmenting CTA images.
△ Less
Submitted 21 January, 2010;
originally announced January 2010.