Search | arXiv e-print repository

doi 10.1007/s11263-024-01986-z

Weakly supervised training of universal visual concepts for multi-domain semantic segmentation

Authors: Petra Bevandić, Marin Oršić, Ivan Grubišić, Josip Šarić, Siniša Šegvić

Abstract: Deep supervised models have an unprecedented capacity to absorb large quantities of training data. Hence, training on multiple datasets becomes a method of choice towards strong generalization in usual scenes and graceful performance degradation in edge cases. Unfortunately, different datasets often have incompatible labels. For instance, the Cityscapes road class subsumes all driving surfaces, wh… ▽ More Deep supervised models have an unprecedented capacity to absorb large quantities of training data. Hence, training on multiple datasets becomes a method of choice towards strong generalization in usual scenes and graceful performance degradation in edge cases. Unfortunately, different datasets often have incompatible labels. For instance, the Cityscapes road class subsumes all driving surfaces, while Vistas defines separate classes for road markings, manholes etc. Furthermore, many datasets have overlap** labels. For instance, pickups are labeled as trucks in VIPER, cars in Vistas, and vans in ADE20k. We address this challenge by considering labels as unions of universal visual concepts. This allows seamless and principled learning on multi-domain dataset collections without requiring any relabeling effort. Our method achieves competitive within-dataset and cross-dataset generalization, as well as ability to learn visual concepts which are not separately labeled in any of the training datasets. Experiments reveal competitive or state-of-the-art performance on two multi-domain dataset collections and on the WildDash 2 benchmark. △ Less

Submitted 12 March, 2024; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: 27 pages, 16 figures, 10 tables, accepted to International Journal of Computer Vision

Journal ref: International Journal of Computer Vision, 2024, 1-23

arXiv:2205.06840 [pdf, other]

IRB-NLP at SemEval-2022 Task 1: Exploring the Relationship Between Words and Their Semantic Representations

Authors: Damir Korenčić, Ivan Grubišić

Abstract: What is the relation between a word and its description, or a word and its embedding? Both descriptions and embeddings are semantic representations of words. But, what information from the original word remains in these representations? Or more importantly, which information about a word do these two representations share? Definition Modeling and Reverse Dictionary are two opposite learning tasks… ▽ More What is the relation between a word and its description, or a word and its embedding? Both descriptions and embeddings are semantic representations of words. But, what information from the original word remains in these representations? Or more importantly, which information about a word do these two representations share? Definition Modeling and Reverse Dictionary are two opposite learning tasks that address these questions. The goal of the Definition Modeling task is to investigate the power of information laying inside a word embedding to express the meaning of the word in a humanly understandable way -- as a dictionary definition. Conversely, the Reverse Dictionary task explores the ability to predict word embeddings directly from its definition. In this paper, by tackling these two tasks, we are exploring the relationship between words and their semantic representations. We present our findings based on the descriptive, exploratory, and predictive data analysis conducted on the CODWOE dataset. We give a detailed overview of the systems that we designed for Definition Modeling and Reverse Dictionary tasks, and that achieved top scores on SemEval-2022 CODWOE challenge in several subtasks. We hope that our experimental results concerning the predictive models and the data analyses we provide will prove useful in future explorations of word representations and their relationships. △ Less

Submitted 13 May, 2022; originally announced May 2022.

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2108.11224 [pdf, other]

Multi-domain semantic segmentation with overlap** labels

Authors: Petra Bevandić, Marin Oršić, Ivan Grubišić, Josip Šarić, Siniša Šegvić

Abstract: Deep supervised models have an unprecedented capacity to absorb large quantities of training data. Hence, training on many datasets becomes a method of choice towards graceful degradation in unusual scenes. Unfortunately, different datasets often use incompatible labels. For instance, the Cityscapes road class subsumes all driving surfaces, while Vistas defines separate classes for road markings,… ▽ More Deep supervised models have an unprecedented capacity to absorb large quantities of training data. Hence, training on many datasets becomes a method of choice towards graceful degradation in unusual scenes. Unfortunately, different datasets often use incompatible labels. For instance, the Cityscapes road class subsumes all driving surfaces, while Vistas defines separate classes for road markings, manholes etc. We address this challenge by proposing a principled method for seamless learning on datasets with overlap** classes based on partial labels and probabilistic loss. Our method achieves competitive within-dataset and cross-dataset generalization, as well as ability to learn visual concepts which are not separately labeled in any of the training datasets. Experiments reveal competitive or state-of-the-art performance on two multi-domain dataset collections and on the WildDash 2 benchmark. △ Less

Submitted 2 November, 2021; v1 submitted 25 August, 2021; originally announced August 2021.

Comments: 18 pages, 8 figures, 11 tables

arXiv:2106.07075 [pdf, other]

doi 10.3390/s23020940

Revisiting consistency for semi-supervised semantic segmentation

Authors: Ivan Grubišić, Marin Oršić, Siniša Šegvić

Abstract: Semi-supervised learning an attractive technique in practical deployments of deep models since it relaxes the dependence on labeled data. It is especially important in the scope of dense prediction because pixel-level annotation requires significant effort. This paper considers semi-supervised algorithms that enforce consistent predictions over perturbed unlabeled inputs. We study the advantages o… ▽ More Semi-supervised learning an attractive technique in practical deployments of deep models since it relaxes the dependence on labeled data. It is especially important in the scope of dense prediction because pixel-level annotation requires significant effort. This paper considers semi-supervised algorithms that enforce consistent predictions over perturbed unlabeled inputs. We study the advantages of perturbing only one of the two model instances and preventing the backward pass through the unperturbed instance. We also propose a competitive perturbation model as a composition of geometric warp and photometric jittering. We experiment with efficient models due to their importance for real-time and low-power applications. Our experiments show clear advantages of (1) one-way consistency, (2) perturbing only the student branch, and (3) strong photometric and geometric perturbations. Our perturbation model outperforms recent work and most of the contribution comes from photometric component. Experiments with additional data from the large coarsely annotated subset of Cityscapes suggest that semi-supervised training can outperform supervised training with the coarse labels. △ Less

Submitted 20 January, 2023; v1 submitted 13 June, 2021; originally announced June 2021.

Comments: The source code is available at https://github.com/Ivan1248/semisup-seg-efficient

Journal ref: Sensors. 2023; 23(2):940

arXiv:2106.04627 [pdf, other]

Densely connected normalizing flows

Authors: Matej Grcić, Ivan Grubišić, Siniša Šegvić

Abstract: Normalizing flows are bijective map**s between inputs and latent representations with a fully factorized distribution. They are very attractive due to exact likelihood valuation and efficient sampling. However, their effective capacity is often insufficient since the bijectivity constraint limits the model width. We address this issue by incrementally padding intermediate representations with no… ▽ More Normalizing flows are bijective map**s between inputs and latent representations with a fully factorized distribution. They are very attractive due to exact likelihood valuation and efficient sampling. However, their effective capacity is often insufficient since the bijectivity constraint limits the model width. We address this issue by incrementally padding intermediate representations with noise. We precondition the noise in accordance with previous invertible units, which we describe as cross-unit coupling. Our invertible glow-like modules increase the model expressivity by fusing a densely connected block with Nystrom self-attention. We refer to our architecture as DenseFlow since both cross-unit and intra-module couplings rely on dense connectivity. Experiments show significant improvements due to the proposed contributions and reveal state-of-the-art density estimation under moderate computing budgets. △ Less

Submitted 2 November, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

Comments: Accepted at NeurIPS2021

arXiv:2009.01636 [pdf, ps, other]

Multi-domain semantic segmentation with pyramidal fusion

Authors: Petra Bevandić, Marin Oršić, Ivan Grubišić, Josip Šarić, Siniša Šegvić

Abstract: We present our submission to the semantic segmentation contest of the Robust Vision Challenge held at ECCV 2020. The contest requires submitting the same model to seven benchmarks from three different domains. Our approach is based on the SwiftNet architecture with pyramidal fusion. We address inconsistent taxonomies with a single-level 193-dimensional softmax output. We strive to train with large… ▽ More We present our submission to the semantic segmentation contest of the Robust Vision Challenge held at ECCV 2020. The contest requires submitting the same model to seven benchmarks from three different domains. Our approach is based on the SwiftNet architecture with pyramidal fusion. We address inconsistent taxonomies with a single-level 193-dimensional softmax output. We strive to train with large batches in order to stabilize optimization of a hard recognition problem, and to favour smooth evolution of batchnorm statistics. We achieve this by implementing a custom backward step through log-sum-prob loss, and by using small crops before freezing the population statistics. Our model ranks first on the RVC semantic segmentation challenge as well as on the WildDash 2 leaderboard. This suggests that pyramidal fusion is competitive not only for efficient inference with lightweight backbones, but also in large-scale setups for multi-domain application. △ Less

Submitted 7 October, 2021; v1 submitted 2 September, 2020; originally announced September 2020.

Comments: 2 pages, 2 tables, no figures

Showing 1–6 of 6 results for author: Grubišić, I