Search | arXiv e-print repository

Weakly-Supervised HOI Detection from Interaction Labels Only and Language/Vision-Language Priors

Authors: Mesut Erhan Unal, Adriana Kovashka

Abstract: Human-object interaction (HOI) detection aims to extract interacting human-object pairs and their interaction categories from a given natural image. Even though the labeling effort required for building HOI detection datasets is inherently more extensive than for many other computer vision tasks, weakly-supervised directions in this area have not been sufficiently explored due to the difficulty of… ▽ More Human-object interaction (HOI) detection aims to extract interacting human-object pairs and their interaction categories from a given natural image. Even though the labeling effort required for building HOI detection datasets is inherently more extensive than for many other computer vision tasks, weakly-supervised directions in this area have not been sufficiently explored due to the difficulty of learning human-object interactions with weak supervision, rooted in the combinatorial nature of interactions over the object and predicate space. In this paper, we tackle HOI detection with the weakest supervision setting in the literature, using only image-level interaction labels, with the help of a pretrained vision-language model (VLM) and a large language model (LLM). We first propose an approach to prune non-interacting human and object proposals to increase the quality of positive pairs within the bag, exploiting the grounding capability of the vision-language model. Second, we use a large language model to query which interactions are possible between a human and a given object category, in order to force the model not to put emphasis on unlikely interactions. Lastly, we use an auxiliary weakly-supervised preposition prediction task to make our model explicitly reason about space. Extensive experiments and ablations show that all of our contributions increase HOI detection performance. △ Less

Submitted 9 March, 2023; originally announced March 2023.

Comments: 8 pages, 3 figures and 5 tables

arXiv:2207.05727 [pdf, other]

Enhancing Fairness of Visual Attribute Predictors

Authors: Tobias Hänel, Nishant Kumar, Dmitrij Schlesinger, Mengze Li, Erdem Ünal, Abouzar Eslami, Stefan Gumhold

Abstract: The performance of deep neural networks for image recognition tasks such as predicting a smiling face is known to degrade with under-represented classes of sensitive attributes. We address this problem by introducing fairness-aware regularization losses based on batch estimates of Demographic Parity, Equalized Odds, and a novel Intersection-over-Union measure. The experiments performed on facial a… ▽ More The performance of deep neural networks for image recognition tasks such as predicting a smiling face is known to degrade with under-represented classes of sensitive attributes. We address this problem by introducing fairness-aware regularization losses based on batch estimates of Demographic Parity, Equalized Odds, and a novel Intersection-over-Union measure. The experiments performed on facial and medical images from CelebA, UTKFace, and the SIIM-ISIC melanoma classification challenge show the effectiveness of our proposed fairness losses for bias mitigation as they improve model fairness while maintaining high classification performance. To the best of our knowledge, our work is the first attempt to incorporate these types of losses in an end-to-end training scheme for mitigating biases of visual attribute predictors. Our code is available at https://github.com/nish03/FVAP. △ Less

Submitted 1 October, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

Comments: Camera Ready, ACCV 2022

arXiv:2112.13910 [pdf, other]

doi 10.1145/3487553.3524647

Visual Persuasion in COVID-19 Social Media Content: A Multi-Modal Characterization

Authors: Mesut Erhan Unal, Adriana Kovashka, Wen-Ting Chung, Yu-Ru Lin

Abstract: Social media content routinely incorporates multi-modal design to covey information and shape meanings, and sway interpretations toward desirable implications, but the choices and outcomes of using both texts and visual images have not been sufficiently studied. This work proposes a computational approach to analyze the outcome of persuasive information in multi-modal content, focusing on two aspe… ▽ More Social media content routinely incorporates multi-modal design to covey information and shape meanings, and sway interpretations toward desirable implications, but the choices and outcomes of using both texts and visual images have not been sufficiently studied. This work proposes a computational approach to analyze the outcome of persuasive information in multi-modal content, focusing on two aspects, popularity and reliability, in COVID-19-related news articles shared on Twitter. The two aspects are intertwined in the spread of misinformation: for example, an unreliable article that aims to misinform has to attain some popularity. This work has several contributions. First, we propose a multi-modal (image and text) approach to effectively identify popularity and reliability of information sources simultaneously. Second, we identify textual and visual elements that are predictive to information popularity and reliability. Third, by modeling cross-modal relations and similarity, we are able to uncover how unreliable articles construct multi-modal meaning in a distorted, biased fashion. Our work demonstrates how to use multi-modal analysis for understanding influential content and has implications to social media literacy and engagement. △ Less

Submitted 4 December, 2021; originally announced December 2021.

Comments: 10 pages

arXiv:1510.05533 [pdf]

Image-based Modelling of Organogenesis

Authors: Dagmar Iber, Zahra Karimaddini, Erkan Ünal

Abstract: One of the major challenges in biology concerns the integration of data across length and time scales into a consistent framework: how do macroscopic properties and functionalities arise from the molecular regulatory networks - and how can they change as a result of mutations? Morphogenesis provides an excellent model system to study how simple molecular networks robustly control complex processes… ▽ More One of the major challenges in biology concerns the integration of data across length and time scales into a consistent framework: how do macroscopic properties and functionalities arise from the molecular regulatory networks - and how can they change as a result of mutations? Morphogenesis provides an excellent model system to study how simple molecular networks robustly control complex processes on the macroscopic scale in spite of molecular noise, and how important functional variants can emerge from small genetic changes. Recent advancements in 3D imaging technologies, computer algorithms, and computer power now allow us to develop and analyse increasingly realistic models of biological control. Here we present our pipeline for image-based modeling that includes the segmentation of images, the determination of displacement fields, and the solution of systems of partial differential equations (PDEs) on the growing, embryonic domains. The development of suitable mathematical models, the data-based inference of parameter sets, and the evaluation of competing models are still challenging, and current approaches are discussed. △ Less

Submitted 19 October, 2015; originally announced October 2015.

arXiv:1408.1589 [pdf]

Simulating Organogenesis in COMSOL: Image-based Modeling

Authors: Zahra Karimaddini, Erkan Unal, Denis Menshykau, Dagmar Iber

Abstract: Mathematical Modelling has a long history in developmental biology. Advances in experimental techniques and computational algorithms now permit the development of increasingly more realistic models of organogenesis. In particular, 3D geometries of develo** organs have recently become available. In this paper, we show how to use image-based data for simulations of organogenesis in COMSOL Multiphy… ▽ More Mathematical Modelling has a long history in developmental biology. Advances in experimental techniques and computational algorithms now permit the development of increasingly more realistic models of organogenesis. In particular, 3D geometries of develo** organs have recently become available. In this paper, we show how to use image-based data for simulations of organogenesis in COMSOL Multiphysics. As an example, we use limb bud development, a classical model system in mouse developmental biology. We discuss how embryonic geometries with several subdomains can be read into COMSOL using the Matlab LiveLink, and how these can be used to simulate models on growing embryonic domains. The ALE method is used to solve signaling models even on strongly deforming domains. △ Less

Submitted 7 August, 2014; originally announced August 2014.

Showing 1–5 of 5 results for author: Ünal, E