Search | arXiv e-print repository

Versatile User Identification in Extended Reality using Pretrained Similarity-Learning

Authors: Christian Rack, Konstantin Kobs, Tamara Fernando, Andreas Hotho, Marc Erich Latoschik

Abstract: Various machine learning approaches have proven to be useful for user verification and identification based on motion data in eXtended Reality (XR). However, their real-world application still faces significant challenges concerning versatility, i.e., in terms of extensibility and generalization capability. This article presents a solution that is both extensible to new users without expensive ret… ▽ More Various machine learning approaches have proven to be useful for user verification and identification based on motion data in eXtended Reality (XR). However, their real-world application still faces significant challenges concerning versatility, i.e., in terms of extensibility and generalization capability. This article presents a solution that is both extensible to new users without expensive retraining, and that generalizes well across different sessions, devices, and user tasks. To this end, we developed a similarity-learning model and pretrained it on the "Who Is Alyx?" dataset. This dataset features a wide array of tasks and hence motions from users playing the VR game "Half-Life: Alyx". In contrast to previous works, we used a dedicated set of users for model validation and final evaluation. Furthermore, we extended this evaluation using an independent dataset that features completely different users, tasks, and three different XR devices. In comparison with a traditional classification-learning baseline, our model shows superior performance, especially in scenarios with limited enrollment data. The pretraining process allows immediate deployment in a diverse range of XR applications while maintaining high versatility. Looking ahead, our approach paves the way for easy integration of pretrained motion-based identification models in production XR systems. △ Less

Submitted 15 April, 2024; v1 submitted 15 February, 2023; originally announced February 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2211.12760 [pdf, other]

InDiReCT: Language-Guided Zero-Shot Deep Metric Learning for Images

Authors: Konstantin Kobs, Michael Steininger, Andreas Hotho

Abstract: Common Deep Metric Learning (DML) datasets specify only one notion of similarity, e.g., two images in the Cars196 dataset are deemed similar if they show the same car model. We argue that depending on the application, users of image retrieval systems have different and changing similarity notions that should be incorporated as easily as possible. Therefore, we present Language-Guided Zero-Shot Dee… ▽ More Common Deep Metric Learning (DML) datasets specify only one notion of similarity, e.g., two images in the Cars196 dataset are deemed similar if they show the same car model. We argue that depending on the application, users of image retrieval systems have different and changing similarity notions that should be incorporated as easily as possible. Therefore, we present Language-Guided Zero-Shot Deep Metric Learning (LanZ-DML) as a new DML setting in which users control the properties that should be important for image representations without training data by only using natural language. To this end, we propose InDiReCT (Image representations using Dimensionality Reduction on CLIP embedded Texts), a model for LanZ-DML on images that exclusively uses a few text prompts for training. InDiReCT utilizes CLIP as a fixed feature extractor for images and texts and transfers the variation in text prompt embeddings to the image embedding space. Extensive experiments on five datasets and overall thirteen similarity notions show that, despite not seeing any images during training, InDiReCT performs better than strong baselines and approaches the performance of fully-supervised models. An analysis reveals that InDiReCT learns to focus on regions of the image that correlate with the desired similarity notion, which makes it a fast to train and easy to use method to create custom embedding spaces only using natural language. △ Less

Submitted 23 November, 2022; originally announced November 2022.

Comments: Accepted to WACV 2023

arXiv:2210.01615 [pdf, other]

On Background Bias in Deep Metric Learning

Authors: Konstantin Kobs, Andreas Hotho

Abstract: Deep Metric Learning trains a neural network to map input images to a lower-dimensional embedding space such that similar images are closer together than dissimilar images. When used for item retrieval, a query image is embedded using the trained model and the closest items from a database storing their respective embeddings are returned as the most similar items for the query. Especially in produ… ▽ More Deep Metric Learning trains a neural network to map input images to a lower-dimensional embedding space such that similar images are closer together than dissimilar images. When used for item retrieval, a query image is embedded using the trained model and the closest items from a database storing their respective embeddings are returned as the most similar items for the query. Especially in product retrieval, where a user searches for a certain product by taking a photo of it, the image background is usually not important and thus should not influence the embedding process. Ideally, the retrieval process always returns fitting items for the photographed object, regardless of the environment the photo was taken in. In this paper, we analyze the influence of the image background on Deep Metric Learning models by utilizing five common loss functions and three common datasets. We find that Deep Metric Learning networks are prone to so-called background bias, which can lead to a severe decrease in retrieval performance when changing the image background during inference. We also show that replacing the background of images during training with random background images alleviates this issue. Since we use an automatic background removal method to do this background replacement, no additional manual labeling work and model changes are required while inference time stays the same. Qualitative and quantitative analyses, for which we introduce a new evaluation metric, confirm that models trained with replaced backgrounds attend more to the main object in the image, benefitting item retrieval systems. △ Less

Submitted 4 October, 2022; originally announced October 2022.

Comments: To be published at ICMV 2022

arXiv:2205.02698 [pdf]

Do Different Deep Metric Learning Losses Lead to Similar Learned Features?

Authors: Konstantin Kobs, Michael Steininger, Andrzej Dulny, Andreas Hotho

Abstract: Recent studies have shown that many deep metric learning loss functions perform very similarly under the same experimental conditions. One potential reason for this unexpected result is that all losses let the network focus on similar image regions or properties. In this paper, we investigate this by conducting a two-step analysis to extract and compare the learned visual features of the same mode… ▽ More Recent studies have shown that many deep metric learning loss functions perform very similarly under the same experimental conditions. One potential reason for this unexpected result is that all losses let the network focus on similar image regions or properties. In this paper, we investigate this by conducting a two-step analysis to extract and compare the learned visual features of the same model architecture trained with different loss functions: First, we compare the learned features on the pixel level by correlating saliency maps of the same input images. Second, we compare the clustering of embeddings for several image properties, e.g. object color or illumination. To provide independent control over these properties, photo-realistic 3D car renders similar to images in the Cars196 dataset are generated. In our analysis, we compare 14 pretrained models from a recent study and find that, even though all models perform similarly, different loss functions can guide the model to learn different features. We especially find differences between classification and ranking based losses. Our analysis also shows that some seemingly irrelevant properties can have significant influence on the resulting embedding. We encourage researchers from the deep metric learning community to use our methods to get insights into the features learned by their proposed methods. △ Less

Submitted 5 May, 2022; originally announced May 2022.

Comments: Published at ICCV 2021

arXiv:2012.01778 [pdf, other]

NICER: Aesthetic Image Enhancement with Humans in the Loop

Authors: Michael Fischer, Konstantin Kobs, Andreas Hotho

Abstract: Fully- or semi-automatic image enhancement software helps users to increase the visual appeal of photos and does not require in-depth knowledge of manual image editing. However, fully-automatic approaches usually enhance the image in a black-box manner that does not give the user any control over the optimization process, possibly leading to edited images that do not subjectively appeal to the use… ▽ More Fully- or semi-automatic image enhancement software helps users to increase the visual appeal of photos and does not require in-depth knowledge of manual image editing. However, fully-automatic approaches usually enhance the image in a black-box manner that does not give the user any control over the optimization process, possibly leading to edited images that do not subjectively appeal to the user. Semi-automatic methods mostly allow for controlling which pre-defined editing step is taken, which restricts the users in their creativity and ability to make detailed adjustments, such as brightness or contrast. We argue that incorporating user preferences by guiding an automated enhancement method simplifies image editing and increases the enhancement's focus on the user. This work thus proposes the Neural Image Correction & Enhancement Routine (NICER), a neural network based approach to no-reference image enhancement in a fully-, semi-automatic or fully manual process that is interactive and user-centered. NICER iteratively adjusts image editing parameters in order to maximize an aesthetic score based on image style and content. Users can modify these parameters at any time and guide the optimization process towards a desired direction. This interactive workflow is a novelty in the field of human-computer interaction for image enhancement tasks. In a user study, we show that NICER can improve image aesthetics without user interaction and that allowing user interaction leads to diverse enhancement outcomes that are strongly preferred over the unedited image. We make our code publicly available to facilitate further research in this direction. △ Less

Submitted 3 December, 2020; originally announced December 2020.

Comments: The code can be found at https://github.com/mr-Mojo/NICER

ACM Class: I.2.10; I.4.3; H.1.2

Journal ref: ACHI 2020, The Thirteenth International Conference on Advances in Computer-Human Interactions; 2020; pages 357-362

arXiv:2003.04576 [pdf, ps, other]

Anomaly Detection in Beehives using Deep Recurrent Autoencoders

Authors: Padraig Davidson, Michael Steininger, Florian Lautenschlager, Konstantin Kobs, Anna Krause, Andreas Hotho

Abstract: Precision beekee** allows to monitor bees' living conditions by equip** beehives with sensors. The data recorded by these hives can be analyzed by machine learning models to learn behavioral patterns of or search for unusual events in bee colonies. One typical target is the early detection of bee swarming as apiarists want to avoid this due to economical reasons. Advanced methods should be abl… ▽ More Precision beekee** allows to monitor bees' living conditions by equip** beehives with sensors. The data recorded by these hives can be analyzed by machine learning models to learn behavioral patterns of or search for unusual events in bee colonies. One typical target is the early detection of bee swarming as apiarists want to avoid this due to economical reasons. Advanced methods should be able to detect any other unusual or abnormal behavior arising from illness of bees or from technical reasons, e.g. sensor failure. In this position paper we present an autoencoder, a deep learning model, which detects any type of anomaly in data independent of its origin. Our model is able to reveal the same swarms as a simple rule-based swarm detection algorithm but is also triggered by any other anomaly. We evaluated our model on real world data sets that were collected on different hives and with different sensor setups. △ Less

Submitted 10 March, 2020; originally announced March 2020.

Journal ref: Proceedings of the 9th International Conference on Sensor Networks (SENSORNETS 2020), 2020, 142-149

arXiv:2003.03182 [pdf, other]

SimLoss: Class Similarities in Cross Entropy

Authors: Konstantin Kobs, Michael Steininger, Albin Zehe, Florian Lautenschlager, Andreas Hotho

Abstract: One common loss function in neural network classification tasks is Categorical Cross Entropy (CCE), which punishes all misclassifications equally. However, classes often have an inherent structure. For instance, classifying an image of a rose as "violet" is better than as "truck". We introduce SimLoss, a drop-in replacement for CCE that incorporates class similarities along with two techniques to… ▽ More One common loss function in neural network classification tasks is Categorical Cross Entropy (CCE), which punishes all misclassifications equally. However, classes often have an inherent structure. For instance, classifying an image of a rose as "violet" is better than as "truck". We introduce SimLoss, a drop-in replacement for CCE that incorporates class similarities along with two techniques to construct such matrices from task-specific knowledge. We test SimLoss on Age Estimation and Image Classification and find that it brings significant improvements over CCE on several metrics. SimLoss therefore allows for explicit modeling of background knowledge by simply exchanging the loss function, while kee** the neural network architecture the same. Code and additional resources can be found at https://github.com/konstantinkobs/SimLoss. △ Less

Submitted 6 March, 2020; originally announced March 2020.

Comments: This paper is going to be published in the proceedings of the 25th International Symposium on Methodologies for Intelligent Systems (ISMIS)

ACM Class: I.2.6

arXiv:2002.07493 [pdf, other]

doi 10.1145/3380973

MapLUR: Exploring a new Paradigm for Estimating Air Pollution using Deep Learning on Map Images

Authors: Michael Steininger, Konstantin Kobs, Albin Zehe, Florian Lautenschlager, Martin Becker, Andreas Hotho

Abstract: Land-use regression (LUR) models are important for the assessment of air pollution concentrations in areas without measurement stations. While many such models exist, they often use manually constructed features based on restricted, locally available data. Thus, they are typically hard to reproduce and challenging to adapt to areas beyond those they have been developed for. In this paper, we advoc… ▽ More Land-use regression (LUR) models are important for the assessment of air pollution concentrations in areas without measurement stations. While many such models exist, they often use manually constructed features based on restricted, locally available data. Thus, they are typically hard to reproduce and challenging to adapt to areas beyond those they have been developed for. In this paper, we advocate a paradigm shift for LUR models: We propose the Data-driven, Open, Global (DOG) paradigm that entails models based on purely data-driven approaches using only openly and globally available data. Progress within this paradigm will alleviate the need for experts to adapt models to the local characteristics of the available data sources and thus facilitate the generalizability of air pollution models to new areas on a global scale. In order to illustrate the feasibility of the DOG paradigm for LUR, we introduce a deep learning model called MapLUR. It is based on a convolutional neural network architecture and is trained exclusively on globally and openly available map data without requiring manual feature engineering. We compare our model to state-of-the-art baselines like linear regression, random forests and multi-layer perceptrons using a large data set of modeled $\text{NO}_2$ concentrations in Central London. Our results show that MapLUR significantly outperforms these approaches even though they are provided with manually tailored features. Furthermore, we illustrate that the automatic feature extraction inherent to models based on the DOG paradigm can learn features that are readily interpretable and closely resemble those commonly used in traditional LUR approaches. △ Less

Submitted 18 February, 2020; originally announced February 2020.

Comments: Accepted for publication in ACM TSAS - Special Issue on Deep Learning

Showing 1–8 of 8 results for author: Kobs, K