-
Auto-Vocabulary Segmentation for LiDAR Points
Authors:
Weijie Wei,
Osman Ülger,
Fatemeh Karimi Najadasl,
Theo Gevers,
Martin R. Oswald
Abstract:
Existing perception methods for autonomous driving fall short of recognizing unknown entities not covered in the training data. Open-vocabulary methods offer promising capabilities in detecting any object but are limited by user-specified queries representing target classes. We propose AutoVoc3D, a framework for automatic object class recognition and open-ended segmentation. Evaluation on nuScenes…
▽ More
Existing perception methods for autonomous driving fall short of recognizing unknown entities not covered in the training data. Open-vocabulary methods offer promising capabilities in detecting any object but are limited by user-specified queries representing target classes. We propose AutoVoc3D, a framework for automatic object class recognition and open-ended segmentation. Evaluation on nuScenes showcases AutoVoc3D's ability to generate precise semantic classes and accurate point-wise segmentation. Moreover, we introduce Text-Point Semantic Similarity, a new metric to assess the semantic similarity between text and point cloud without eliminating novel classes.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
One-Shot Wyner-Ziv Compression of a Uniform Source
Authors:
Oğuzhan Kubilay Ülger,
Elza Erkip
Abstract:
In this paper, we consider the one-shot version of the classical Wyner-Ziv problem where a source is compressed in a lossy fashion when only the decoder has access to a correlated side information. Following the entropy-constrained quantization framework, we assume a scalar quantizer followed by variable length entropy coding. We consider compression of a uniform source, motivated by its role in t…
▽ More
In this paper, we consider the one-shot version of the classical Wyner-Ziv problem where a source is compressed in a lossy fashion when only the decoder has access to a correlated side information. Following the entropy-constrained quantization framework, we assume a scalar quantizer followed by variable length entropy coding. We consider compression of a uniform source, motivated by its role in the compression of processes with low-dimensional features embedded within a high-dimensional ambient space. We find upper and lower bounds to the entropy-distortion functions of the uniform source for quantized and noisy side information, and illustrate tightness of the bounds at high compression rates.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Robust Distributed Compression with Learned Heegard-Berger Scheme
Authors:
Eyyup Tasci,
Ezgi Ozyilkan,
Oguzhan Kubilay Ulger,
Elza Erkip
Abstract:
We consider lossy compression of an information source when decoder-only side information may be absent. This setup, also referred to as the Heegard-Berger or Kaspi problem, is a special case of robust distributed source coding. Building upon previous works on neural network-based distributed compressors developed for the decoder-only side information (Wyner-Ziv) case, we propose learning-based sc…
▽ More
We consider lossy compression of an information source when decoder-only side information may be absent. This setup, also referred to as the Heegard-Berger or Kaspi problem, is a special case of robust distributed source coding. Building upon previous works on neural network-based distributed compressors developed for the decoder-only side information (Wyner-Ziv) case, we propose learning-based schemes that are amenable to the availability of side information. We find that our learned compressors mimic the achievability part of the Heegard-Berger theorem and yield interpretable results operating close to information-theoretic bounds. Depending on the availability of the side information, our neural compressors recover characteristics of the point-to-point (i.e., with no side information) and the Wyner-Ziv coding strategies that include binning in the source space, although no structure exploiting knowledge of the source and side information was imposed into the design.
△ Less
Submitted 6 May, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Auto-Vocabulary Semantic Segmentation
Authors:
Osman Ülger,
Maksymilian Kulicki,
Yuki Asano,
Martin R. Oswald
Abstract:
Open-ended image understanding tasks gained significant attention from the research community, particularly with the emergence of Vision-Language Models. Open-Vocabulary Segmentation (OVS) methods are capable of performing semantic segmentation without relying on a fixed vocabulary, and in some cases, they operate without the need for training or fine-tuning. However, OVS methods typically require…
▽ More
Open-ended image understanding tasks gained significant attention from the research community, particularly with the emergence of Vision-Language Models. Open-Vocabulary Segmentation (OVS) methods are capable of performing semantic segmentation without relying on a fixed vocabulary, and in some cases, they operate without the need for training or fine-tuning. However, OVS methods typically require users to specify the vocabulary based on the task or dataset at hand. In this paper, we introduce \textit{Auto-Vocabulary Semantic Segmentation (AVS)}, advancing open-ended image understanding by eliminating the necessity to predefine object categories for segmentation. Our approach, \ours, presents a framework that autonomously identifies relevant class names using enhanced BLIP embeddings, which are utilized for segmentation afterwards. Given that open-ended object category predictions cannot be directly compared with a fixed ground truth, we develop a Large Language Model-based Auto-Vocabulary Evaluator (LAVE) to efficiently evaluate the automatically generated class names and their corresponding segments. Our method sets new benchmarks on datasets such as PASCAL VOC and Context, ADE20K, and Cityscapes for AVS and showcases competitive performance to OVS methods that require specified class names.
△ Less
Submitted 20 March, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Relational Prior Knowledge Graphs for Detection and Instance Segmentation
Authors:
Osman Ülger,
Yu Wang,
Ysbrand Galama,
Sezer Karaoglu,
Theo Gevers,
Martin R. Oswald
Abstract:
Humans have a remarkable ability to perceive and reason about the world around them by understanding the relationships between objects. In this paper, we investigate the effectiveness of using such relationships for object detection and instance segmentation. To this end, we propose a Relational Prior-based Feature Enhancement Model (RP-FEM), a graph transformer that enhances object proposal featu…
▽ More
Humans have a remarkable ability to perceive and reason about the world around them by understanding the relationships between objects. In this paper, we investigate the effectiveness of using such relationships for object detection and instance segmentation. To this end, we propose a Relational Prior-based Feature Enhancement Model (RP-FEM), a graph transformer that enhances object proposal features using relational priors. The proposed architecture operates on top of scene graphs obtained from initial proposals and aims to concurrently learn relational context modeling for object detection and instance segmentation. Experimental evaluations on COCO show that the utilization of scene graphs, augmented with relational priors, offer benefits for object detection and instance segmentation. RP-FEM demonstrates its capacity to suppress improbable class predictions within the image while also preventing the model from generating duplicate predictions, leading to improvements over the baseline model on which it is built.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Single-Shot Lossy Compression for Joint Inference and Reconstruction
Authors:
Oğuzhan Kubilay Ülger,
Elza Erkip
Abstract:
In the classical source coding problem, the compressed source is reconstructed at the decoder with respect to some distortion metric. Motivated by settings in which we are interested in more than simply reconstructing the compressed source, we investigate a single-shot compression problem where the decoder is tasked with reconstructing the original data as well as making inferences from it. Qualit…
▽ More
In the classical source coding problem, the compressed source is reconstructed at the decoder with respect to some distortion metric. Motivated by settings in which we are interested in more than simply reconstructing the compressed source, we investigate a single-shot compression problem where the decoder is tasked with reconstructing the original data as well as making inferences from it. Quality of inference and reconstruction is determined by a distortion criteria for each task. Given allowable distortion levels, we are interested in characterizing the probability of excess distortion. Modeling the joint inference and reconstruction problem as direct-indirect source coding one, we obtain lower and upper bounds for excess distortion probability. We specialize the converse bound and present a new easily computable achievability bound for the case where the distortion metric for reconstruction is logarithmic loss.
△ Less
Submitted 30 September, 2023; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Multi-Task Edge Prediction in Temporally-Dynamic Video Graphs
Authors:
Osman Ülger,
Julian Wiederer,
Mohsen Ghafoorian,
Vasileios Belagiannis,
Pascal Mettes
Abstract:
Graph neural networks have shown to learn effective node representations, enabling node-, link-, and graph-level inference. Conventional graph networks assume static relations between nodes, while relations between entities in a video often evolve over time, with nodes entering and exiting dynamically. In such temporally-dynamic graphs, a core problem is inferring the future state of spatio-tempor…
▽ More
Graph neural networks have shown to learn effective node representations, enabling node-, link-, and graph-level inference. Conventional graph networks assume static relations between nodes, while relations between entities in a video often evolve over time, with nodes entering and exiting dynamically. In such temporally-dynamic graphs, a core problem is inferring the future state of spatio-temporal edges, which can constitute multiple types of relations. To address this problem, we propose MTD-GNN, a graph network for predicting temporally-dynamic edges for multiple types of relations. We propose a factorized spatio-temporal graph attention layer to learn dynamic node representations and present a multi-task edge prediction loss that models multiple relations simultaneously. The proposed architecture operates on top of scene graphs that we obtain from videos through object detection and spatio-temporal linking. Experimental evaluations on ActionGenome and CLEVRER show that modeling multiple relations in our temporally-dynamic graph network can be mutually beneficial, outperforming existing static and spatio-temporal graph neural networks, as well as state-of-the-art predicate classification methods.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.