Search | arXiv e-print repository

arXiv:2406.09825 [pdf, other]

Unraveling Anomalies in Time: Unsupervised Discovery and Isolation of Anomalous Behavior in Bio-regenerative Life Support System Telemetry

Authors: Ferdinand Rewicki, Jakob Gawlikowski, Julia Niebling, Joachim Denzler

Abstract: The detection of abnormal or critical system states is essential in condition monitoring. While much attention is given to promptly identifying anomalies, a retrospective analysis of these anomalies can significantly enhance our comprehension of the underlying causes of observed undesired behavior. This aspect becomes particularly critical when the monitored system is deployed in a vital environme… ▽ More The detection of abnormal or critical system states is essential in condition monitoring. While much attention is given to promptly identifying anomalies, a retrospective analysis of these anomalies can significantly enhance our comprehension of the underlying causes of observed undesired behavior. This aspect becomes particularly critical when the monitored system is deployed in a vital environment. In this study, we delve into anomalies within the domain of Bio-Regenerative Life Support Systems (BLSS) for space exploration and analyze anomalies found in telemetry data stemming from the EDEN ISS space greenhouse in Antarctica. We employ time series clustering on anomaly detection results to categorize various types of anomalies in both uni- and multivariate settings. We then assess the effectiveness of these methods in identifying systematic anomalous behavior. Additionally, we illustrate that the anomaly detection methods MDI and DAMP produce complementary results, as previously indicated by research. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 12 pages, + Supplemental Materials, Accepted at ECML PKDD 2024 (European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases)

arXiv:2404.12295 [pdf, other]

doi 10.5220/0012382600003660

When Medical Imaging Met Self-Attention: A Love Story That Didn't Quite Work Out

Authors: Tristan Piater, Niklas Penzel, Gideon Stein, Joachim Denzler

Abstract: A substantial body of research has focused on develo** systems that assist medical professionals during labor-intensive early screening processes, many based on convolutional deep-learning architectures. Recently, multiple studies explored the application of so-called self-attention mechanisms in the vision domain. These studies often report empirical improvements over fully convolutional approa… ▽ More A substantial body of research has focused on develo** systems that assist medical professionals during labor-intensive early screening processes, many based on convolutional deep-learning architectures. Recently, multiple studies explored the application of so-called self-attention mechanisms in the vision domain. These studies often report empirical improvements over fully convolutional approaches on various datasets and tasks. To evaluate this trend for medical imaging, we extend two widely adopted convolutional architectures with different self-attention variants on two different medical datasets. With this, we aim to specifically evaluate the possible advantages of additional self-attention. We compare our models with similarly sized convolutional and attention-based baselines and evaluate performance gains statistically. Additionally, we investigate how including such layers changes the features learned by these models during the training. Following a hyperparameter search, and contrary to our expectations, we observe no significant improvement in balanced accuracy over fully convolutional models. We also find that important features, such as dermoscopic structures in skin lesion images, are still not learned by employing self-attention. Finally, analyzing local explanations, we confirm biased feature usage. We conclude that merely incorporating attention is insufficient to surpass the performance of existing fully convolutional methods. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 10 pages, 2 figures, 5 tables, presented at VISAPP 2024

Journal ref: Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP (2024), ISBN 978-989-758-679-8, ISSN 2184-4321, SciTePress, pages 149-158

arXiv:2404.12292 [pdf, other]

doi 10.5220/0012345800003660

Reducing Bias in Pre-trained Models by Tuning while Penalizing Change

Authors: Niklas Penzel, Gideon Stein, Joachim Denzler

Abstract: Deep models trained on large amounts of data often incorporate implicit biases present during training time. If later such a bias is discovered during inference or deployment, it is often necessary to acquire new data and retrain the model. This behavior is especially problematic in critical areas such as autonomous driving or medical decision-making. In these scenarios, new data is often expensiv… ▽ More Deep models trained on large amounts of data often incorporate implicit biases present during training time. If later such a bias is discovered during inference or deployment, it is often necessary to acquire new data and retrain the model. This behavior is especially problematic in critical areas such as autonomous driving or medical decision-making. In these scenarios, new data is often expensive and hard to come by. In this work, we present a method based on change penalization that takes a pre-trained model and adapts the weights to mitigate a previously detected bias. We achieve this by tuning a zero-initialized copy of a frozen pre-trained network. Our method needs very few, in extreme cases only a single, examples that contradict the bias to increase performance. Additionally, we propose an early stop** criterion to modify baselines and reduce overfitting. We evaluate our approach on a well-known bias in skin lesion classification and three other datasets from the domain shift literature. We find that our approach works especially well with very few images. Simple fine-tuning combined with our early stop** also leads to performance benefits for a larger number of tuning samples. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 12 pages, 12 figures, presented at VISAPP 2024

Journal ref: Proceedings of the 19th International Joint Conference on Computer Vision (2024), Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP, ISBN 978-989-758-679-8, ISSN 2184-4321, SciTePress, pages 90-101

arXiv:2404.07867 [pdf, other]

The Power of Properties: Uncovering the Influential Factors in Emotion Classification

Authors: Tim Büchner, Niklas Penzel, Orlando Guntinas-Lichius, Joachim Denzler

Abstract: Facial expression-based human emotion recognition is a critical research area in psychology and medicine. State-of-the-art classification performance is only reached by end-to-end trained neural networks. Nevertheless, such black-box models lack transparency in their decision-making processes, prompting efforts to ascertain the rules that underlie classifiers' decisions. Analyzing single inputs al… ▽ More Facial expression-based human emotion recognition is a critical research area in psychology and medicine. State-of-the-art classification performance is only reached by end-to-end trained neural networks. Nevertheless, such black-box models lack transparency in their decision-making processes, prompting efforts to ascertain the rules that underlie classifiers' decisions. Analyzing single inputs alone fails to expose systematic learned biases. These biases can be characterized as facial properties summarizing abstract information like age or medical conditions. Therefore, understanding a model's prediction behavior requires an analysis rooted in causality along such selected properties. We demonstrate that up to 91.25% of classifier output behavior changes are statistically significant concerning basic properties. Among those are age, gender, and facial symmetry. Furthermore, the medical usage of surface electromyography significantly influences emotion prediction. We introduce a workflow to evaluate explicit properties and their impact. These insights might help medical professionals select and apply classifiers regarding their specialized data and properties. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 8 pages, 3 tables, 1 figure, accepted at ICPRAI 2024

arXiv:2402.09305 [pdf, other]

Embracing the black box: Heading towards foundation models for causal discovery from time series data

Authors: Gideon Stein, Maha Shadaydeh, Joachim Denzler

Abstract: Causal discovery from time series data encompasses many existing solutions, including those based on deep learning techniques. However, these methods typically do not endorse one of the most prevalent paradigms in deep learning: End-to-end learning. To address this gap, we explore what we call Causal Pretraining. A methodology that aims to learn a direct map** from multivariate time series to th… ▽ More Causal discovery from time series data encompasses many existing solutions, including those based on deep learning techniques. However, these methods typically do not endorse one of the most prevalent paradigms in deep learning: End-to-end learning. To address this gap, we explore what we call Causal Pretraining. A methodology that aims to learn a direct map** from multivariate time series to the underlying causal graphs in a supervised manner. Our empirical findings suggest that causal discovery in a supervised manner is possible, assuming that the training and test time series samples share most of their dynamics. More importantly, we found evidence that the performance of Causal Pretraining can increase with data and model size, even if the additional data do not share the same dynamics. Further, we provide examples where causal discovery for real-world data with causally pretrained neural networks is possible within limits. We argue that this hints at the possibility of a foundation model for causal discovery. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: AAAI Workshop (AI4TS) 2024

MSC Class: 68T07

arXiv:2402.08439 [pdf, other]

JeFaPaTo -- A joint toolbox for blinking analysis and facial features extraction

Authors: Tim Büchner, Oliver Mothes, Orlando Guntinas-Lichius, Joachim Denzler

Abstract: Analyzing facial features and expressions is a complex task in computer vision. The human face is intricate, with significant shape, texture, and appearance variations. In medical contexts, facial structures and movements that differ from the norm are particularly important to study and require precise analysis to understand the underlying conditions. Given that solely the facial muscles, innervat… ▽ More Analyzing facial features and expressions is a complex task in computer vision. The human face is intricate, with significant shape, texture, and appearance variations. In medical contexts, facial structures and movements that differ from the norm are particularly important to study and require precise analysis to understand the underlying conditions. Given that solely the facial muscles, innervated by the facial nerve, are responsible for facial expressions, facial palsy can lead to severe impairments in facial movements. One affected area of interest is the subtle movements involved in blinking. It is an intricate spontaneous process that is not yet fully understood and needs high-resolution, time-specific analysis for detailed understanding. However, a significant challenge is that many computer vision techniques demand programming skills for automated extraction and analysis, making them less accessible to medical professionals who may not have these skills. The Jena Facial Palsy Toolbox (JeFaPaTo) has been developed to bridge this gap. It utilizes cutting-edge computer vision algorithms and offers a user-friendly interface for those without programming expertise. This toolbox makes advanced facial analysis more accessible to medical experts, simplifying integration into their workflow. △ Less

Submitted 29 April, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

Comments: A Preprint - Submitted to the Journal of Open Source Software; 7 pages, 3 figures, 3 tables

arXiv:2401.13386 [pdf, other]

Privacy-Preserving Face Recognition in Hybrid Frequency-Color Domain

Authors: Dong Han, Yong Li, Joachim Denzler

Abstract: Face recognition technology has been deployed in various real-life applications. The most sophisticated deep learning-based face recognition systems rely on training millions of face images through complex deep neural networks to achieve high accuracy. It is quite common for clients to upload face images to the service provider in order to access the model inference. However, the face image is a t… ▽ More Face recognition technology has been deployed in various real-life applications. The most sophisticated deep learning-based face recognition systems rely on training millions of face images through complex deep neural networks to achieve high accuracy. It is quite common for clients to upload face images to the service provider in order to access the model inference. However, the face image is a type of sensitive biometric attribute tied to the identity information of each user. Directly exposing the raw face image to the service provider poses a threat to the user's privacy. Current privacy-preserving approaches to face recognition focus on either concealing visual information on model input or protecting model output face embedding. The noticeable drop in recognition accuracy is a pitfall for most methods. This paper proposes a hybrid frequency-color fusion approach to reduce the input dimensionality of face recognition in the frequency domain. Moreover, sparse color information is also introduced to alleviate significant accuracy degradation after adding differential privacy noise. Besides, an identity-specific embedding map** scheme is applied to protect original face embedding by enlarging the distance among identities. Lastly, secure multiparty computation is implemented for safely computing the embedding distance during model inference. The proposed method performs well on multiple widely used verification datasets. Moreover, it has around 2.6% to 4.2% higher accuracy than the state-of-the-art in the 1:N verification scenario. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: This work is already accepted at the conference International Conference on Computer Vision Theory and Applications (VISAPP) 2024 as a regular paper

arXiv:2401.08386 [pdf, other]

Deep Learning-based Group Causal Inference in Multivariate Time-series

Authors: Wasim Ahmad, Maha Shadaydeh, Joachim Denzler

Abstract: Causal inference in a nonlinear system of multivariate timeseries is instrumental in disentangling the intricate web of relationships among variables, enabling us to make more accurate predictions and gain deeper insights into real-world complex systems. Causality methods typically identify the causal structure of a multivariate system by considering the cause-effect relationship of each pair of v… ▽ More Causal inference in a nonlinear system of multivariate timeseries is instrumental in disentangling the intricate web of relationships among variables, enabling us to make more accurate predictions and gain deeper insights into real-world complex systems. Causality methods typically identify the causal structure of a multivariate system by considering the cause-effect relationship of each pair of variables while ignoring the collective effect of a group of variables or interactions involving more than two-time series variables. In this work, we test model invariance by group-level interventions on the trained deep networks to infer causal direction in groups of variables, such as climate and ecosystem, brain networks, etc. Extensive testing with synthetic and real-world time series data shows a significant improvement of our method over other applied group causality methods and provides us insights into real-world time series. The code for our method can be found at:https://github.com/wasimahmadpk/gCause. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted in AAAI24 (AI4TS)

arXiv:2311.05221 [pdf, other]

doi 10.5220/0011619900003417

Let's Get the FACS Straight -- Reconstructing Obstructed Facial Features

Authors: Tim Büchner, Sven Sickert, Gerd Fabian Volk, Christoph Anders, Orlando Guntinas-Lichius, Joachim Denzler

Abstract: The human face is one of the most crucial parts in interhuman communication. Even when parts of the face are hidden or obstructed the underlying facial movements can be understood. Machine learning approaches often fail in that regard due to the complexity of the facial structures. To alleviate this problem a common approach is to fine-tune a model for such a specific application. However, this is… ▽ More The human face is one of the most crucial parts in interhuman communication. Even when parts of the face are hidden or obstructed the underlying facial movements can be understood. Machine learning approaches often fail in that regard due to the complexity of the facial structures. To alleviate this problem a common approach is to fine-tune a model for such a specific application. However, this is computational intensive and might have to be repeated for each desired analysis task. In this paper, we propose to reconstruct obstructed facial parts to avoid the task of repeated fine-tuning. As a result, existing facial analysis methods can be used without further changes with respect to the data. In our approach, the restoration of facial features is interpreted as a style transfer task between different recording setups. By using the CycleGAN architecture the requirement of matched pairs, which is often hard to fullfill, can be eliminated. To proof the viability of our approach, we compare our reconstructions with real unobstructed recordings. We created a novel data set in which 36 test subjects were recorded both with and without 62 surface electromyography sensors attached to their faces. In our evaluation, we feature typical facial analysis tasks, like the computation of Facial Action Units and the detection of emotions. To further assess the quality of the restoration, we also compare perceptional distances. We can show, that scores similar to the videos without obstructing sensors can be achieved. △ Less

Submitted 10 November, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

Comments: VISAPP 2023 paper

arXiv:2307.15433 [pdf, other]

Automated Visual Monitoring of Nocturnal Insects with Light-based Camera Traps

Authors: Dimitri Korsch, Paul Bodesheim, Gunnar Brehm, Joachim Denzler

Abstract: Automatic camera-assisted monitoring of insects for abundance estimations is crucial to understand and counteract ongoing insect decline. In this paper, we present two datasets of nocturnal insects, especially moths as a subset of Lepidoptera, photographed in Central Europe. One of the datasets, the EU-Moths dataset, was captured manually by citizen scientists and contains species annotations for… ▽ More Automatic camera-assisted monitoring of insects for abundance estimations is crucial to understand and counteract ongoing insect decline. In this paper, we present two datasets of nocturnal insects, especially moths as a subset of Lepidoptera, photographed in Central Europe. One of the datasets, the EU-Moths dataset, was captured manually by citizen scientists and contains species annotations for 200 different species and bounding box annotations for those. We used this dataset to develop and evaluate a two-stage pipeline for insect detection and moth species classification in previous work. We further introduce a prototype for an automated visual monitoring system. This prototype produced the second dataset consisting of more than 27,000 images captured on 95 nights. For evaluation and bootstrap** purposes, we annotated a subset of the images with bounding boxes enframing nocturnal insects. Finally, we present first detection and classification baselines for these datasets and encourage other scientists to use this publicly available data. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: Presented at the FGVC workshop at the CVPR2022

arXiv:2307.15427 [pdf, other]

doi 10.18420/informatik2021-036

Deep Learning Pipeline for Automated Visual Moth Monitoring: Insect Localization and Species Classification

Authors: Dimitri Korsch, Paul Bodesheim, Joachim Denzler

Abstract: Biodiversity monitoring is crucial for tracking and counteracting adverse trends in population fluctuations. However, automatic recognition systems are rarely applied so far, and experts evaluate the generated data masses manually. Especially the support of deep learning methods for visual monitoring is not yet established in biodiversity research, compared to other areas like advertising or enter… ▽ More Biodiversity monitoring is crucial for tracking and counteracting adverse trends in population fluctuations. However, automatic recognition systems are rarely applied so far, and experts evaluate the generated data masses manually. Especially the support of deep learning methods for visual monitoring is not yet established in biodiversity research, compared to other areas like advertising or entertainment. In this paper, we present a deep learning pipeline for analyzing images captured by a moth scanner, an automated visual monitoring system of moth species developed within the AMMOD project. We first localize individuals with a moth detector and afterward determine the species of detected insects with a classifier. Our detector achieves up to 99.01% mean average precision and our classifier distinguishes 200 moth species with an accuracy of 93.13% on image cutouts depicting single insects. Combining both in our pipeline improves the accuracy for species identification in images of the moth scanner from 79.62% to 88.05%. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2307.14825 [pdf, other]

Simplified Concrete Dropout -- Improving the Generation of Attribution Masks for Fine-grained Classification

Authors: Dimitri Korsch, Maha Shadaydeh, Joachim Denzler

Abstract: Fine-grained classification is a particular case of a classification problem, aiming to classify objects that share the visual appearance and can only be distinguished by subtle differences. Fine-grained classification models are often deployed to determine animal species or individuals in automated animal monitoring systems. Precise visual explanations of the model's decision are crucial to analy… ▽ More Fine-grained classification is a particular case of a classification problem, aiming to classify objects that share the visual appearance and can only be distinguished by subtle differences. Fine-grained classification models are often deployed to determine animal species or individuals in automated animal monitoring systems. Precise visual explanations of the model's decision are crucial to analyze systematic errors. Attention- or gradient-based methods are commonly used to identify regions in the image that contribute the most to the classification decision. These methods deliver either too coarse or too noisy explanations, unsuitable for identifying subtle visual differences reliably. However, perturbation-based methods can precisely identify pixels causally responsible for the classification result. Fill-in of the dropout (FIDO) algorithm is one of those methods. It utilizes the concrete dropout (CD) to sample a set of attribution masks and updates the sampling parameters based on the output of the classification model. A known problem of the algorithm is a high variance in the gradient estimates, which the authors have mitigated until now by mini-batch updates of the sampling parameters. This paper presents a solution to circumvent these computational instabilities by simplifying the CD sampling and reducing reliance on large mini-batch sizes. First, it allows estimating the parameters with smaller mini-batch sizes without losing the quality of the estimates but with a reduced computational effort. Furthermore, our solution produces finer and more coherent attribution masks. Finally, we use the resulting attribution masks to improve the classification performance of a trained model without additional fine-tuning of the model. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: Accepted at the German Conference on Pattern Recognition 2023 (GCPR 2023)

arXiv:2307.08559 [pdf, other]

Improving Data Efficiency for Plant Cover Prediction with Label Interpolation and Monte-Carlo Crop**

Authors: Matthias Körschens, Solveig Franziska Bucher, Christine Römermann, Joachim Denzler

Abstract: The plant community composition is an essential indicator of environmental changes and is, for this reason, usually analyzed in ecological field studies in terms of the so-called plant cover. The manual acquisition of this kind of data is time-consuming, laborious, and prone to human error. Automated camera systems can collect high-resolution images of the surveyed vegetation plots at a high frequ… ▽ More The plant community composition is an essential indicator of environmental changes and is, for this reason, usually analyzed in ecological field studies in terms of the so-called plant cover. The manual acquisition of this kind of data is time-consuming, laborious, and prone to human error. Automated camera systems can collect high-resolution images of the surveyed vegetation plots at a high frequency. In combination with subsequent algorithmic analysis, it is possible to objectively extract information on plant community composition quickly and with little human effort. An automated camera system can easily collect the large amounts of image data necessary to train a Deep Learning system for automatic analysis. However, due to the amount of work required to annotate vegetation images with plant cover data, only few labeled samples are available. As automated camera systems can collect many pictures without labels, we introduce an approach to interpolate the sparse labels in the collected vegetation plot time series down to the intermediate dense and unlabeled images to artificially increase our training dataset to seven times its original size. Moreover, we introduce a new method we call Monte-Carlo Crop**. This approach trains on a collection of cropped parts of the training images to deal with high-resolution images efficiently, implicitly augment the training images, and speed up training. We evaluate both approaches on a plant cover dataset containing images of herbaceous plant communities and find that our methods lead to improvements in the species, community, and segmentation metrics investigated. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: Accepted for publication at DAGM-GCPR 2023

arXiv:2212.12478 [pdf, other]

doi 10.1109/ACCESS.2022.3172939

Image Classification with Small Datasets: Overview and Benchmark

Authors: L. Brigato, B. Barz, L. Iocchi, J. Denzler

Abstract: Image classification with small datasets has been an active research area in the recent past. However, as research in this scope is still in its infancy, two key ingredients are missing for ensuring reliable and truthful progress: a systematic and extensive overview of the state of the art, and a common benchmark to allow for objective comparisons between published methods. This article addresses… ▽ More Image classification with small datasets has been an active research area in the recent past. However, as research in this scope is still in its infancy, two key ingredients are missing for ensuring reliable and truthful progress: a systematic and extensive overview of the state of the art, and a common benchmark to allow for objective comparisons between published methods. This article addresses both issues. First, we systematically organize and connect past studies to consolidate a community that is currently fragmented and scattered. Second, we propose a common benchmark that allows for an objective comparison of approaches. It consists of five datasets spanning various domains (e.g., natural images, medical imagery, satellite data) and data types (RGB, grayscale, multispectral). We use this benchmark to re-evaluate the standard cross-entropy baseline and ten existing methods published between 2017 and 2021 at renowned venues. Surprisingly, we find that thorough hyper-parameter tuning on held-out validation data results in a highly competitive baseline and highlights a stunted growth of performance over the years. Indeed, only a single specialized method dating back to 2019 clearly wins our benchmark and outperforms the baseline classifier. △ Less

Submitted 23 December, 2022; originally announced December 2022.

Comments: arXiv admin note: text overlap with arXiv:2108.13122

arXiv:2212.11080 [pdf, other]

doi 10.3390/app13031778

Is it worth it? Comparing six deep and classical methods for unsupervised anomaly detection in time series

Authors: Ferdinand Rewicki, Joachim Denzler, Julia Niebling

Abstract: Detecting anomalies in time series data is important in a variety of fields, including system monitoring, healthcare, and cybersecurity. While the abundance of available methods makes it difficult to choose the most appropriate method for a given application, each method has its strengths in detecting certain types of anomalies. In this study, we compare six unsupervised anomaly detection methods… ▽ More Detecting anomalies in time series data is important in a variety of fields, including system monitoring, healthcare, and cybersecurity. While the abundance of available methods makes it difficult to choose the most appropriate method for a given application, each method has its strengths in detecting certain types of anomalies. In this study, we compare six unsupervised anomaly detection methods of varying complexity to determine whether more complex methods generally perform better and if certain methods are better suited to certain types of anomalies. We evaluated the methods using the UCR anomaly archive, a recent benchmark dataset for anomaly detection. We analyzed the results on a dataset and anomaly type level after adjusting the necessary hyperparameters for each method. Additionally, we assessed the ability of each method to incorporate prior knowledge about anomalies and examined the differences between point-wise and sequence-wise features. Our experiments show that classical machine learning methods generally outperform deep learning methods across a range of anomaly types. △ Less

Submitted 2 February, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

Comments: 17 Pages, The repository to reproduce the results is available at https://gitlab.com/dlr-dw/is-it-worth-it-benchmark

arXiv:2209.11497 [pdf, other]

Time Series Causal Link Estimation under Hidden Confounding using Knockoff Interventions

Authors: Violeta Teodora Trifunov, Maha Shadaydeh, Joachim Denzler

Abstract: Latent variables often mask cause-effect relationships in observational data which provokes spurious links that may be misinterpreted as causal. This problem sparks great interest in the fields such as climate science and economics. We propose to estimate confounded causal links of time series using Sequential Causal Effect Variational Autoencoder (SCEVAE) while applying Knockoff interventions. Kn… ▽ More Latent variables often mask cause-effect relationships in observational data which provokes spurious links that may be misinterpreted as causal. This problem sparks great interest in the fields such as climate science and economics. We propose to estimate confounded causal links of time series using Sequential Causal Effect Variational Autoencoder (SCEVAE) while applying Knockoff interventions. Knockoff variables have the same distribution as the originals and preserve the correlation to other variables. This allows for counterfactuals that are more faithful to the observational distribution. We show the advantage of Knockoff interventions by applying SCEVAE to synthetic datasets with both linear and nonlinear causal links. Moreover, we apply SCEVAE with Knockoffs to real aerosol-cloud-climate observational time series data. We compare our results on synthetic data to those of a time series deconfounding method both with and without estimated confounders. We show that our method outperforms this benchmark by comparing both methods to the ground truth. For the real data analysis, we rely on expert knowledge of causal links and demonstrate how using suitable proxy variables improves the causal link estimation in the presence of hidden confounders. △ Less

Submitted 18 November, 2022; v1 submitted 23 September, 2022; originally announced September 2022.

Comments: "A causal view on dynamical systems" Workshop at NeurIPS 2022

arXiv:2207.04055 [pdf, other]

Causal Discovery using Model Invariance through Knockoff Interventions

Authors: Wasim Ahmad, Maha Shadaydeh, Joachim Denzler

Abstract: Cause-effect analysis is crucial to understand the underlying mechanism of a system. We propose to exploit model invariance through interventions on the predictors to infer causality in nonlinear multivariate systems of time series. We model nonlinear interactions in time series using DeepAR and then expose the model to different environments using Knockoffs-based interventions to test model invar… ▽ More Cause-effect analysis is crucial to understand the underlying mechanism of a system. We propose to exploit model invariance through interventions on the predictors to infer causality in nonlinear multivariate systems of time series. We model nonlinear interactions in time series using DeepAR and then expose the model to different environments using Knockoffs-based interventions to test model invariance. Knockoff samples are pairwise exchangeable, in-distribution and statistically null variables generated without knowing the response. We test model invariance where we show that the distribution of the response residual does not change significantly upon interventions on non-causal predictors. We evaluate our method on real and synthetically generated time series. Overall our method outperforms other widely used causality methods, i.e, VAR Granger causality, VARLiNGAM and PCMCI+. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Journal ref: ICML 2022: Workshop on Spurious Correlations, Invariance and Stability

arXiv:2206.10587 [pdf]

doi 10.3389/fnins.2022.975639

Guiding Visual Attention in Deep Convolutional Neural Networks Based on Human Eye Movements

Authors: Leonard E. van Dyck, Sebastian J. Denzler, Walter R. Gruber

Abstract: Deep Convolutional Neural Networks (DCNNs) were originally inspired by principles of biological vision, have evolved into best current computational models of object recognition, and consequently indicate strong architectural and functional parallelism with the ventral visual pathway throughout comparisons with neuroimaging and neural time series data. As recent advances in deep learning seem to d… ▽ More Deep Convolutional Neural Networks (DCNNs) were originally inspired by principles of biological vision, have evolved into best current computational models of object recognition, and consequently indicate strong architectural and functional parallelism with the ventral visual pathway throughout comparisons with neuroimaging and neural time series data. As recent advances in deep learning seem to decrease this similarity, computational neuroscience is challenged to reverse-engineer the biological plausibility to obtain useful models. While previous studies have shown that biologically inspired architectures are able to amplify the human-likeness of the models, in this study, we investigate a purely data-driven approach. We use human eye tracking data to directly modify training examples and thereby guide the models' visual attention during object recognition in natural images either towards or away from the focus of human fixations. We compare and validate different manipulation types (i.e., standard, human-like, and non-human-like attention) through GradCAM saliency maps against human participant eye tracking data. Our results demonstrate that the proposed guided focus manipulation works as intended in the negative direction and non-human-like models focus on significantly dissimilar image parts compared to humans. The observed effects were highly category-specific, enhanced by animacy and face presence, developed only after feedforward processing was completed, and indicated a strong influence on face detection. With this approach, however, no significantly increased human-likeness was found. Possible applications of overt visual attention in DCNNs and further implications for theories of face detection are discussed. △ Less

Submitted 6 September, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

Comments: 28 pages, 6 figures, 3 supplementary figures

arXiv:2110.11778 [pdf, other]

Domain Adaptation and Active Learning for Fine-Grained Recognition in the Field of Biodiversity

Authors: Bernd Gruner, Matthias Körschens, Björn Barz, Joachim Denzler

Abstract: Deep-learning methods offer unsurpassed recognition performance in a wide range of domains, including fine-grained recognition tasks. However, in most problem areas there are insufficient annotated training samples. Therefore, the topic of transfer learning respectively domain adaptation is particularly important. In this work, we investigate to what extent unsupervised domain adaptation can be us… ▽ More Deep-learning methods offer unsurpassed recognition performance in a wide range of domains, including fine-grained recognition tasks. However, in most problem areas there are insufficient annotated training samples. Therefore, the topic of transfer learning respectively domain adaptation is particularly important. In this work, we investigate to what extent unsupervised domain adaptation can be used for fine-grained recognition in a biodiversity context to learn a real-world classifier based on idealized training data, e.g. preserved butterflies and plants. Moreover, we investigate the influence of different normalization layers, such as Group Normalization in combination with Weight Standardization, on the classifier. We discovered that domain adaptation works very well for fine-grained recognition and that the normalization methods have a great influence on the results. Using domain adaptation and Transferable Normalization, the accuracy of the classifier could be increased by up to 12.35 % compared to the baseline. Furthermore, the domain adaptation system is combined with an active learning component to improve the results. We compare different active learning strategies with each other. Surprisingly, we found that more sophisticated strategies provide better results than the random selection baseline for only one of the two datasets. In this case, the distance and diversity strategy performed best. Finally, we present a problem analysis of the datasets. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: https://sites.google.com/view/clvision2021/call-for-papers/accepted-papers

arXiv:2109.13561 [pdf, other]

A Strong Baseline for the VIPriors Data-Efficient Image Classification Challenge

Authors: Björn Barz, Lorenzo Brigato, Luca Iocchi, Joachim Denzler

Abstract: Learning from limited amounts of data is the hallmark of intelligence, requiring strong generalization and abstraction skills. In a machine learning context, data-efficient methods are of high practical importance since data collection and annotation are prohibitively expensive in many domains. Thus, coordinated efforts to foster progress in this area emerged recently, e.g., in the form of dedicat… ▽ More Learning from limited amounts of data is the hallmark of intelligence, requiring strong generalization and abstraction skills. In a machine learning context, data-efficient methods are of high practical importance since data collection and annotation are prohibitively expensive in many domains. Thus, coordinated efforts to foster progress in this area emerged recently, e.g., in the form of dedicated workshops and competitions. Besides a common benchmark, measuring progress requires strong baselines. We present such a strong baseline for data-efficient image classification on the VIPriors challenge dataset, which is a sub-sampled version of ImageNet-1k with 100 images per class. We do not use any methods tailored to data-efficient classification but only standard models and techniques as well as common competition tricks and thorough hyper-parameter tuning. Our baseline achieves 69.7% accuracy on the VIPriors image classification dataset and outperforms 50% of submissions to the VIPriors 2021 challenge. △ Less

Submitted 28 September, 2021; originally announced September 2021.

arXiv:2109.10817 [pdf, other]

Causal Inference in Non-linear Time-series using Deep Networks and Knockoff Counterfactuals

Authors: Wasim Ahmad, Maha Shadaydeh, Joachim Denzler

Abstract: Estimating causal relations is vital in understanding the complex interactions in multivariate time series. Non-linear coupling of variables is one of the major challenges inaccurate estimation of cause-effect relations. In this paper, we propose to use deep autoregressive networks (DeepAR) in tandem with counterfactual analysis to infer nonlinear causal relations in multivariate time series. We e… ▽ More Estimating causal relations is vital in understanding the complex interactions in multivariate time series. Non-linear coupling of variables is one of the major challenges inaccurate estimation of cause-effect relations. In this paper, we propose to use deep autoregressive networks (DeepAR) in tandem with counterfactual analysis to infer nonlinear causal relations in multivariate time series. We extend the concept of Granger causality using probabilistic forecasting with DeepAR. Since deep networks can neither handle missing input nor out-of-distribution intervention, we propose to use the Knockoffs framework (Barberand Cand`es, 2015) for generating intervention variables and consequently counterfactual probabilistic forecasting. Knockoff samples are independent of their output given the observed variables and exchangeable with their counterpart variables without changing the underlying distribution of the data. We test our method on synthetic as well as real-world time series datasets. Overall our method outperforms the widely used vector autoregressive Granger causality and PCMCI in detecting nonlinear causal dependency in multivariate time series. △ Less

Submitted 18 October, 2021; v1 submitted 22 September, 2021; originally announced September 2021.

Journal ref: IEEE International Conference on Machine Learning and Applications (ICMLA) 2021

arXiv:2109.06562 [pdf, other]

Anomaly Attribution of Multivariate Time Series using Counterfactual Reasoning

Authors: Violeta Teodora Trifunov, Maha Shadaydeh, Björn Barz, Joachim Denzler

Abstract: There are numerous methods for detecting anomalies in time series, but that is only the first step to understanding them. We strive to exceed this by explaining those anomalies. Thus we develop a novel attribution scheme for multivariate time series relying on counterfactual reasoning. We aim to answer the counterfactual question of would the anomalous event have occurred if the subset of the invo… ▽ More There are numerous methods for detecting anomalies in time series, but that is only the first step to understanding them. We strive to exceed this by explaining those anomalies. Thus we develop a novel attribution scheme for multivariate time series relying on counterfactual reasoning. We aim to answer the counterfactual question of would the anomalous event have occurred if the subset of the involved variables had been more similarly distributed to the data outside of the anomalous interval. Specifically, we detect anomalous intervals using the Maximally Divergent Interval (MDI) algorithm, replace a subset of variables with their in-distribution values within the detected interval and observe if the interval has become less anomalous, by re-scoring it with MDI. We evaluate our method on multivariate temporal and spatio-temporal data and confirm the accuracy of our anomaly attribution of multiple well-understood extreme climate events such as heatwaves and hurricanes. △ Less

Submitted 14 September, 2021; originally announced September 2021.

Comments: ICMLA 2021

arXiv:2108.13122 [pdf, other]

Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification

Authors: Lorenzo Brigato, Björn Barz, Luca Iocchi, Joachim Denzler

Abstract: Data-efficient image classification using deep neural networks in settings, where only small amounts of labeled data are available, has been an active research area in the recent past. However, an objective comparison between published methods is difficult, since existing works use different datasets for evaluation and often compare against untuned baselines with default hyper-parameters. We desig… ▽ More Data-efficient image classification using deep neural networks in settings, where only small amounts of labeled data are available, has been an active research area in the recent past. However, an objective comparison between published methods is difficult, since existing works use different datasets for evaluation and often compare against untuned baselines with default hyper-parameters. We design a benchmark for data-efficient image classification consisting of six diverse datasets spanning various domains (e.g., natural images, medical imagery, satellite data) and data types (RGB, grayscale, multispectral). Using this benchmark, we re-evaluate the standard cross-entropy baseline and eight methods for data-efficient deep learning published between 2017 and 2021 at renowned venues. For a fair and realistic comparison, we carefully tune the hyper-parameters of all methods on each dataset. Surprisingly, we find that tuning learning rate, weight decay, and batch size on a separate validation split results in a highly competitive baseline, which outperforms all but one specialized method and performs competitively to the remaining one. △ Less

Submitted 30 August, 2021; originally announced August 2021.

Comments: Accepted at the 2nd Visual Inductive Priors for Data-Efficient Deep Learning Workshop (ICCV 2021)

arXiv:2108.06959 [pdf, other]

WikiChurches: A Fine-Grained Dataset of Architectural Styles with Real-World Challenges

Authors: Björn Barz, Joachim Denzler

Abstract: We introduce a novel dataset for architectural style classification, consisting of 9,485 images of church buildings. Both images and style labels were sourced from Wikipedia. The dataset can serve as a benchmark for various research fields, as it combines numerous real-world challenges: fine-grained distinctions between classes based on subtle visual features, a comparatively small sample size, a… ▽ More We introduce a novel dataset for architectural style classification, consisting of 9,485 images of church buildings. Both images and style labels were sourced from Wikipedia. The dataset can serve as a benchmark for various research fields, as it combines numerous real-world challenges: fine-grained distinctions between classes based on subtle visual features, a comparatively small sample size, a highly imbalanced class distribution, a high variance of viewpoints, and a hierarchical organization of labels, where only some images are labeled at the most precise level. In addition, we provide 631 bounding box annotations of characteristic visual features for 139 churches from four major categories. These annotations can, for example, be useful for research on fine-grained classification, where additional expert knowledge about distinctive object parts is often available. Images and annotations are available at: https://doi.org/10.5281/zenodo.5166987 △ Less

Submitted 18 October, 2021; v1 submitted 16 August, 2021; originally announced August 2021.

Comments: NeurIPS 2021 Track on Datasets and Benchmarks

Journal ref: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021

arXiv:2108.00107 [pdf]

Comparing object recognition in humans and deep convolutional neural networks -- An eye tracking study

Authors: Leonard E. van Dyck, Roland Kwitt, Sebastian J. Denzler, Walter R. Gruber

Abstract: Deep convolutional neural networks (DCNNs) and the ventral visual pathway share vast architectural and functional similarities in visual challenges such as object recognition. Recent insights have demonstrated that both hierarchical cascades can be compared in terms of both exerted behavior and underlying activation. However, these approaches ignore key differences in spatial priorities of informa… ▽ More Deep convolutional neural networks (DCNNs) and the ventral visual pathway share vast architectural and functional similarities in visual challenges such as object recognition. Recent insights have demonstrated that both hierarchical cascades can be compared in terms of both exerted behavior and underlying activation. However, these approaches ignore key differences in spatial priorities of information processing. In this proof-of-concept study, we demonstrate a comparison of human observers (N = 45) and three feedforward DCNNs through eye tracking and saliency maps. The results reveal fundamentally different resolutions in both visualization methods that need to be considered for an insightful comparison. Moreover, we provide evidence that a DCNN with biologically plausible receptive field sizes called vNet reveals higher agreement with human viewing behavior as contrasted with a standard ResNet architecture. We find that image-specific factors such as category, animacy, arousal, and valence have a direct link to the agreement of spatial object recognition priorities in humans and DCNNs, while other measures such as difficulty and general image properties do not. With this approach, we try to open up new perspectives at the intersection of biological and computer vision research. △ Less

Submitted 21 September, 2021; v1 submitted 30 July, 2021; originally announced August 2021.

Comments: 25 pages, 10 figures, 1 table

arXiv:2106.11154 [pdf, other]

Automatic Plant Cover Estimation with Convolutional Neural Networks

Authors: Matthias Körschens, Paul Bodesheim, Christine Römermann, Solveig Franziska Bucher, Mirco Migliavacca, Josephine Ulrich, Joachim Denzler

Abstract: Monitoring the responses of plants to environmental changes is essential for plant biodiversity research. This, however, is currently still being done manually by botanists in the field. This work is very laborious, and the data obtained is, though following a standardized method to estimate plant coverage, usually subjective and has a coarse temporal resolution. To remedy these caveats, we invest… ▽ More Monitoring the responses of plants to environmental changes is essential for plant biodiversity research. This, however, is currently still being done manually by botanists in the field. This work is very laborious, and the data obtained is, though following a standardized method to estimate plant coverage, usually subjective and has a coarse temporal resolution. To remedy these caveats, we investigate approaches using convolutional neural networks (CNNs) to automatically extract the relevant data from images, focusing on plant community composition and species coverages of 9 herbaceous plant species. To this end, we investigate several standard CNN architectures and different pretraining methods. We find that we outperform our previous approach at higher image resolutions using a custom CNN with a mean absolute error of 5.16%. In addition to these investigations, we also conduct an error analysis based on the temporal aspect of the plant cover images. This analysis gives insight into where problems for automatic approaches lie, like occlusion and likely misclassifications caused by temporal changes. △ Less

Submitted 28 July, 2021; v1 submitted 21 June, 2021; originally announced June 2021.

arXiv:2104.10901 [pdf, other]

Self-Supervised Learning from Semantically Imprecise Data

Authors: Clemens-Alexander Brust, Björn Barz, Joachim Denzler

Abstract: Learning from imprecise labels such as "animal" or "bird", but making precise predictions like "snow bunting" at inference time is an important capability for any classifier when expertly labeled training data is scarce. Contributions by volunteers or results of web crawling lack precision in this manner, but are still valuable. And crucially, these weakly labeled examples are available in larger… ▽ More Learning from imprecise labels such as "animal" or "bird", but making precise predictions like "snow bunting" at inference time is an important capability for any classifier when expertly labeled training data is scarce. Contributions by volunteers or results of web crawling lack precision in this manner, but are still valuable. And crucially, these weakly labeled examples are available in larger quantities for lower cost than high-quality bespoke training data. CHILLAX, a recently proposed method to tackle this task, leverages a hierarchical classifier to learn from imprecise labels. However, it has two major limitations. First, it does not learn from examples labeled as the root of the hierarchy, e.g., "object". Second, an extrapolation of annotations to precise labels is only performed at test time, where confident extrapolations could be already used as training data. In this work, we extend CHILLAX with a self-supervised scheme using constrained semantic extrapolation to generate pseudo-labels. This addresses the second concern, which in turn solves the first problem, enabling an even weaker supervision requirement than CHILLAX. We evaluate our approach empirically, showing that our method allows for a consistent accuracy improvement of 0.84 to 1.19 percent points over CHILLAX and is suitable as a drop-in replacement without any negative consequences such as longer training times. △ Less

Submitted 27 January, 2022; v1 submitted 22 April, 2021; originally announced April 2021.

Comments: 9 pages. Accepted for publication at VISAPP 2022

arXiv:2104.10066 [pdf, other]

EarthNet2021: A large-scale dataset and challenge for Earth surface forecasting as a guided video prediction task

Authors: Christian Requena-Mesa, Vitus Benson, Markus Reichstein, Jakob Runge, Joachim Denzler

Abstract: Satellite images are snapshots of the Earth surface. We propose to forecast them. We frame Earth surface forecasting as the task of predicting satellite imagery conditioned on future weather. EarthNet2021 is a large dataset suitable for training deep neural networks on the task. It contains Sentinel 2 satellite imagery at 20m resolution, matching topography and mesoscale (1.28km) meteorological va… ▽ More Satellite images are snapshots of the Earth surface. We propose to forecast them. We frame Earth surface forecasting as the task of predicting satellite imagery conditioned on future weather. EarthNet2021 is a large dataset suitable for training deep neural networks on the task. It contains Sentinel 2 satellite imagery at 20m resolution, matching topography and mesoscale (1.28km) meteorological variables packaged into 32000 samples. Additionally we frame EarthNet2021 as a challenge allowing for model intercomparison. Resulting forecasts will greatly improve (>x50) over the spatial resolution found in numerical models. This allows localized impacts from extreme weather to be predicted, thus supporting downstream applications such as crop yield prediction, forest health assessments or biodiversity monitoring. Find data, code, and how to participate at www.earthnet.tech △ Less

Submitted 16 April, 2021; originally announced April 2021.

Comments: 8 pages, 8 figures, accepted at CVPR2021 workshop EarthVision

arXiv:2103.06179 [pdf, other]

Towards Learning an Unbiased Classifier from Biased Data via Conditional Adversarial Debiasing

Authors: Christian Reimers, Paul Bodesheim, Jakob Runge, Joachim Denzler

Abstract: Bias in classifiers is a severe issue of modern deep learning methods, especially for their application in safety- and security-critical areas. Often, the bias of a classifier is a direct consequence of a bias in the training dataset, frequently caused by the co-occurrence of relevant features and irrelevant ones. To mitigate this issue, we require learning algorithms that prevent the propagation… ▽ More Bias in classifiers is a severe issue of modern deep learning methods, especially for their application in safety- and security-critical areas. Often, the bias of a classifier is a direct consequence of a bias in the training dataset, frequently caused by the co-occurrence of relevant features and irrelevant ones. To mitigate this issue, we require learning algorithms that prevent the propagation of bias from the dataset into the classifier. We present a novel adversarial debiasing method, which addresses a feature that is spuriously connected to the labels of training images but statistically independent of the labels for test images. Thus, the automatic identification of relevant features during training is perturbed by irrelevant features. This is the case in a wide range of bias-related problems for many computer vision tasks, such as automatic skin cancer detection or driver assistance. We argue by a mathematical proof that our approach is superior to existing techniques for the abovementioned bias. Our experiments show that our approach performs better than state-of-the-art techniques on a well-known benchmark dataset with real-world images of cats and dogs. △ Less

Submitted 10 March, 2021; originally announced March 2021.

arXiv:2102.00951 [pdf, other]

Counterfactual Generation with Knockoffs

Authors: Oana-Iuliana Popescu, Maha Shadaydeh, Joachim Denzler

Abstract: Human interpretability of deep neural networks' decisions is crucial, especially in domains where these directly affect human lives. Counterfactual explanations of already trained neural networks can be generated by perturbing input features and attributing importance according to the change in the classifier's outcome after perturbation. Perturbation can be done by replacing features using heuris… ▽ More Human interpretability of deep neural networks' decisions is crucial, especially in domains where these directly affect human lives. Counterfactual explanations of already trained neural networks can be generated by perturbing input features and attributing importance according to the change in the classifier's outcome after perturbation. Perturbation can be done by replacing features using heuristic or generative in-filling methods. The choice of in-filling function significantly impacts the number of artifacts, i.e., false-positive attributions. Heuristic methods result in false-positive artifacts because the image after the perturbation is far from the original data distribution. Generative in-filling methods reduce artifacts by producing in-filling values that respect the original data distribution. However, current generative in-filling methods may also increase false-negatives due to the high correlation of in-filling values with the original data. In this paper, we propose to alleviate this by generating in-fillings with the statistically-grounded Knockoffs framework, which was developed by Barber and Candès in 2015 as a tool for variable selection with controllable false discovery rate. Knockoffs are statistically null-variables as decorrelated as possible from the original data, which can be swapped with the originals without changing the underlying data distribution. A comparison of different in-filling methods indicates that in-filling with knockoffs can reveal explanations in a more causal sense while still maintaining the compactness of the explanations. △ Less

Submitted 1 February, 2021; originally announced February 2021.

Comments: 12 pages, 10 figures

arXiv:2012.08780 [pdf, other]

Analysing the Direction of Emotional Influence in Nonverbal Dyadic Communication: A Facial-Expression Study

Authors: Maha Shadaydeh, Lea Mueller, Dana Schneider, Martin Thuemmel, Thomas Kessler, Joachim Denzler

Abstract: Identifying the direction of emotional influence in a dyadic dialogue is of increasing interest in the psychological sciences with applications in psychotherapy, analysis of political interactions, or interpersonal conflict behavior. Facial expressions are widely described as being automatic and thus hard to overtly influence. As such, they are a perfect measure for a better understanding of unint… ▽ More Identifying the direction of emotional influence in a dyadic dialogue is of increasing interest in the psychological sciences with applications in psychotherapy, analysis of political interactions, or interpersonal conflict behavior. Facial expressions are widely described as being automatic and thus hard to overtly influence. As such, they are a perfect measure for a better understanding of unintentional behavior cues about social-emotional cognitive processes. With this view, this study is concerned with the analysis of the direction of emotional influence in dyadic dialogue based on facial expressions only. We exploit computer vision capabilities along with causal inference theory for quantitative verification of hypotheses on the direction of emotional influence, i.e., causal effect relationships, in dyadic dialogues. We address two main issues. First, in a dyadic dialogue, emotional influence occurs over transient time intervals and with intensity and direction that are variant over time. To this end, we propose a relevant interval selection approach that we use prior to causal inference to identify those transient intervals where causal inference should be applied. Second, we propose to use fine-grained facial expressions that are present when strong distinct facial emotions are not visible. To specify the direction of influence, we apply the concept of Granger causality to the time series of facial expressions over selected relevant intervals. We tested our approach on newly, experimentally obtained data. Based on the quantitative verification of hypotheses on the direction of emotional influence, we were able to show that the proposed approach is most promising to reveal the causal effect pattern in various instructed interaction conditions. △ Less

Submitted 16 December, 2020; originally announced December 2020.

Comments: arXiv admin note: text overlap with arXiv:1810.12171

arXiv:2012.06246 [pdf, other]

EarthNet2021: A novel large-scale dataset and challenge for forecasting localized climate impacts

Authors: Christian Requena-Mesa, Vitus Benson, Joachim Denzler, Jakob Runge, Markus Reichstein

Abstract: Climate change is global, yet its concrete impacts can strongly vary between different locations in the same region. Seasonal weather forecasts currently operate at the mesoscale (> 1 km). For more targeted mitigation and adaptation, modelling impacts to < 100 m is needed. Yet, the relationship between driving variables and Earth's surface at such local scales remains unresolved by current physica… ▽ More Climate change is global, yet its concrete impacts can strongly vary between different locations in the same region. Seasonal weather forecasts currently operate at the mesoscale (> 1 km). For more targeted mitigation and adaptation, modelling impacts to < 100 m is needed. Yet, the relationship between driving variables and Earth's surface at such local scales remains unresolved by current physical models. Large Earth observation datasets now enable us to create machine learning models capable of translating coarse weather information into high-resolution Earth surface forecasts. Here, we define high-resolution Earth surface forecasting as video prediction of satellite imagery conditional on mesoscale weather forecasts. Video prediction has been tackled with deep learning models. Develo** such models requires analysis-ready datasets. We introduce EarthNet2021, a new, curated dataset containing target spatio-temporal Sentinel 2 satellite imagery at 20 m resolution, matched with high-resolution topography and mesoscale (1.28 km) weather variables. With over 32000 samples it is suitable for training deep neural networks. Comparing multiple Earth surface forecasts is not trivial. Hence, we define the EarthNetScore, a novel ranking criterion for models forecasting Earth surface reflectance. For model intercomparison we frame EarthNet2021 as a challenge with four tracks based on different test sets. These allow evaluation of model validity and robustness as well as model applicability to extreme events and the complete annual vegetation cycle. In addition to forecasting directly observable weather impacts through satellite-derived vegetation indices, capable Earth surface models will enable downstream applications such as crop yield prediction, forest health assessments, coastline management, or biodiversity monitoring. Find data, code, and how to participate at www.earthnet.tech . △ Less

Submitted 11 December, 2020; originally announced December 2020.

Comments: 4 pages, presented at Tackling Climate Change with Machine Learning at NeurIPS2020

arXiv:2011.06490 [pdf, other]

Content-based Image Retrieval and the Semantic Gap in the Deep Learning Era

Authors: Björn Barz, Joachim Denzler

Abstract: Content-based image retrieval has seen astonishing progress over the past decade, especially for the task of retrieving images of the same object that is depicted in the query image. This scenario is called instance or object retrieval and requires matching fine-grained visual patterns between images. Semantics, however, do not play a crucial role. This brings rise to the question: Do the recent a… ▽ More Content-based image retrieval has seen astonishing progress over the past decade, especially for the task of retrieving images of the same object that is depicted in the query image. This scenario is called instance or object retrieval and requires matching fine-grained visual patterns between images. Semantics, however, do not play a crucial role. This brings rise to the question: Do the recent advances in instance retrieval transfer to more generic image retrieval scenarios? To answer this question, we first provide a brief overview of the most relevant milestones of instance retrieval. We then apply them to a semantic image retrieval task and find that they perform inferior to much less sophisticated and more generic methods in a setting that requires image understanding. Following this, we review existing approaches to closing this so-called semantic gap by integrating prior world knowledge. We conclude that the key problem for the further advancement of semantic image retrieval lies in the lack of a standardized task definition and an appropriate benchmark dataset. △ Less

Submitted 12 November, 2020; originally announced November 2020.

Comments: CBIR workshop at ICPR 2020

arXiv:2011.05756 [pdf, other]

Finding Relevant Flood Images on Twitter using Content-based Filters

Authors: Björn Barz, Kai Schröter, Ann-Christin Kra, Joachim Denzler

Abstract: The analysis of natural disasters such as floods in a timely manner often suffers from limited data due to coarsely distributed sensors or sensor failures. At the same time, a plethora of information is buried in an abundance of images of the event posted on social media platforms such as Twitter. These images could be used to document and rapidly assess the situation and derive proxy-data not ava… ▽ More The analysis of natural disasters such as floods in a timely manner often suffers from limited data due to coarsely distributed sensors or sensor failures. At the same time, a plethora of information is buried in an abundance of images of the event posted on social media platforms such as Twitter. These images could be used to document and rapidly assess the situation and derive proxy-data not available from sensors, e.g., the degree of water pollution. However, not all images posted online are suitable or informative enough for this purpose. Therefore, we propose an automatic filtering approach using machine learning techniques for finding Twitter images that are relevant for one of the following information objectives: assessing the flooded area, the inundation depth, and the degree of water pollution. Instead of relying on textual information present in the tweet, the filter analyzes the image contents directly. We evaluate the performance of two different approaches and various features on a case-study of two major flooding events. Our image-based filter is able to enhance the quality of the results substantially compared with a keyword-based filter, improving the mean average precision from 23% to 53% on average. △ Less

Submitted 11 November, 2020; originally announced November 2020.

Comments: ICPR 2020 Workshop on Machine Learning Advances Environmental Science (MAES)

arXiv:2010.06469 [pdf, other]

Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge

Authors: Clemens-Alexander Brust, Björn Barz, Joachim Denzler

Abstract: Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct label is replaced by an incorrect label from the same set. We consider an additional dimension of l… ▽ More Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct label is replaced by an incorrect label from the same set. We consider an additional dimension of label noise: imprecision. For example, a non-breeding snow bunting is labeled as a bird. This label is correct, but not as precise as the task requires. Standard softmax classifiers cannot learn from such a weak label because they consider all classes mutually exclusive, which non-breeding snow bunting and bird are not. We propose CHILLAX (Class Hierarchies for Imprecise Label Learning and Annotation eXtrapolation), a method based on hierarchical classification, to fully utilize labels of any precision. Experiments on noisy variants of NABirds and ILSVRC2012 show that our method outperforms strong baselines by as much as 16.4 percentage points, and the current state of the art by up to 3.9 percentage points. △ Less

Submitted 13 October, 2020; originally announced October 2020.

Comments: 9 pages pre-print. Accepted for publication at ICPR 2020

arXiv:2007.02080 [pdf, other]

doi 10.1007/978-3-030-92659-5_9

End-to-end Learning of a Fisher Vector Encoding for Part Features in Fine-grained Recognition

Authors: Dimitri Korsch, Paul Bodesheim, Joachim Denzler

Abstract: Part-based approaches for fine-grained recognition do not show the expected performance gain over global methods, although explicitly focusing on small details that are relevant for distinguishing highly similar classes. We assume that part-based methods suffer from a missing representation of local features, which is invariant to the order of parts and can handle a varying number of visible parts… ▽ More Part-based approaches for fine-grained recognition do not show the expected performance gain over global methods, although explicitly focusing on small details that are relevant for distinguishing highly similar classes. We assume that part-based methods suffer from a missing representation of local features, which is invariant to the order of parts and can handle a varying number of visible parts appropriately. The order of parts is artificial and often only given by ground-truth annotations, whereas viewpoint variations and occlusions result in not observable parts. Therefore, we propose integrating a Fisher vector encoding of part features into convolutional neural networks. The parameters for this encoding are estimated by an online EM algorithm jointly with those of the neural network and are more precise than the estimates of previous works. Our approach improves state-of-the-art accuracies for three bird species classification datasets. △ Less

Submitted 28 July, 2023; v1 submitted 4 July, 2020; originally announced July 2020.

Comments: Published in the proceedings of the German Conference on Pattern Recognition 2021 (GCPR21)

arXiv:2006.13084 [pdf, other]

Single-Shot 3D Detection of Vehicles from Monocular RGB Images via Geometry Constrained Keypoints in Real-Time

Authors: Nils Gählert, Jun-Jun Wan, Nicolas Jourdan, Jan Finkbeiner, Uwe Franke, Joachim Denzler

Abstract: In this paper we propose a novel 3D single-shot object detection method for detecting vehicles in monocular RGB images. Our approach lifts 2D detections to 3D space by predicting additional regression and classification parameters and hence kee** the runtime close to pure 2D object detection. The additional parameters are transformed to 3D bounding box keypoints within the network under geometri… ▽ More In this paper we propose a novel 3D single-shot object detection method for detecting vehicles in monocular RGB images. Our approach lifts 2D detections to 3D space by predicting additional regression and classification parameters and hence kee** the runtime close to pure 2D object detection. The additional parameters are transformed to 3D bounding box keypoints within the network under geometric constraints. Our proposed method features a full 3D description including all three angles of rotation without supervision by any labeled ground truth data for the object's orientation, as it focuses on certain keypoints within the image plane. While our approach can be combined with any modern object detection framework with only little computational overhead, we exemplify the extension of SSD for the prediction of 3D bounding boxes. We test our approach on different datasets for autonomous driving and evaluate it using the challenging KITTI 3D Object Detection as well as the novel nuScenes Object Detection benchmarks. While we achieve competitive results on both benchmarks we outperform current state-of-the-art methods in terms of speed with more than 20 FPS for all tested datasets and image resolutions. △ Less

Submitted 23 June, 2020; originally announced June 2020.

Comments: 2020 IEEE IV Symposium

arXiv:2006.08547 [pdf, other]

Visibility Guided NMS: Efficient Boosting of Amodal Object Detection in Crowded Traffic Scenes

Authors: Nils Gählert, Niklas Hanselmann, Uwe Franke, Joachim Denzler

Abstract: Object detection is an important task in environment perception for autonomous driving. Modern 2D object detection frameworks such as Yolo, SSD or Faster R-CNN predict multiple bounding boxes per object that are refined using Non-Maximum-Suppression (NMS) to suppress all but one bounding box. While object detection itself is fully end-to-end learnable and does not require any manual parameter sele… ▽ More Object detection is an important task in environment perception for autonomous driving. Modern 2D object detection frameworks such as Yolo, SSD or Faster R-CNN predict multiple bounding boxes per object that are refined using Non-Maximum-Suppression (NMS) to suppress all but one bounding box. While object detection itself is fully end-to-end learnable and does not require any manual parameter selection, standard NMS is parametrized by an overlap threshold that has to be chosen by hand. In practice, this often leads to an inability of standard NMS strategies to distinguish different objects in crowded scenes in the presence of high mutual occlusion, e.g. for parked cars or crowds of pedestrians. Our novel Visibility Guided NMS (vg-NMS) leverages both pixel-based as well as amodal object detection paradigms and improves the detection performance especially for highly occluded objects with little computational overhead. We evaluate vg-NMS using KITTI, VIPER as well as the Synscapes dataset and show that it outperforms current state-of-the-art NMS. △ Less

Submitted 15 June, 2020; originally announced June 2020.

Comments: Machine Learning for Autonomous Driving Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

arXiv:2006.08368 [pdf]

Sensor Artificial Intelligence and its Application to Space Systems -- A White Paper

Authors: Anko Börner, Heinz-Wilhelm Hübers, Odej Kao, Florian Schmidt, Sören Becker, Joachim Denzler, Daniel Matolin, David Haber, Sergio Lucia, Wojciech Samek, Rudolph Triebel, Sascha Eichstädt, Felix Biessmann, Anna Kruspe, Peter Jung, Manon Kok, Guillermo Gallego, Ralf Berger

Abstract: Information and communication technologies have accompanied our everyday life for years. A steadily increasing number of computers, cameras, mobile devices, etc. generate more and more data, but at the same time we realize that the data can only partially be analyzed with classical approaches. The research and development of methods based on artificial intelligence (AI) made enormous progress in t… ▽ More Information and communication technologies have accompanied our everyday life for years. A steadily increasing number of computers, cameras, mobile devices, etc. generate more and more data, but at the same time we realize that the data can only partially be analyzed with classical approaches. The research and development of methods based on artificial intelligence (AI) made enormous progress in the area of interpretability of data in recent years. With growing experience, both, the potential and limitations of these new technologies are increasingly better understood. Typically, AI approaches start with the data from which information and directions for action are derived. However, the circumstances under which such data are collected and how they change over time are rarely considered. A closer look at the sensors and their physical properties within AI approaches will lead to more robust and widely applicable algorithms. This holistic approach which considers entire signal chains from the origin to a data product, "Sensor AI", is a highly relevant topic with great potential. It will play a decisive role in autonomous driving as well as in areas of automated production, predictive maintenance or space research. The goal of this white paper is to establish "Sensor AI" as a dedicated research topic. We want to exchange knowledge on the current state-of-the-art on Sensor AI, to identify synergies among research groups and thus boost the collaboration in this key technology for science and industry. △ Less

Submitted 9 June, 2020; originally announced June 2020.

Comments: 4 pages. 1st Workshop on Sensor Artificial Intelligence, Apr. 2020, Berlin, Germany

arXiv:2006.07864 [pdf, other]

Cityscapes 3D: Dataset and Benchmark for 9 DoF Vehicle Detection

Authors: Nils Gählert, Nicolas Jourdan, Marius Cordts, Uwe Franke, Joachim Denzler

Abstract: Detecting vehicles and representing their position and orientation in the three dimensional space is a key technology for autonomous driving. Recently, methods for 3D vehicle detection solely based on monocular RGB images gained popularity. In order to facilitate this task as well as to compare and drive state-of-the-art methods, several new datasets and benchmarks have been published. Ground trut… ▽ More Detecting vehicles and representing their position and orientation in the three dimensional space is a key technology for autonomous driving. Recently, methods for 3D vehicle detection solely based on monocular RGB images gained popularity. In order to facilitate this task as well as to compare and drive state-of-the-art methods, several new datasets and benchmarks have been published. Ground truth annotations of vehicles are usually obtained using lidar point clouds, which often induces errors due to imperfect calibration or synchronization between both sensors. To this end, we propose Cityscapes 3D, extending the original Cityscapes dataset with 3D bounding box annotations for all types of vehicles. In contrast to existing datasets, our 3D annotations were labeled using stereo RGB images only and capture all nine degrees of freedom. This leads to a pixel-accurate reprojection in the RGB image and a higher range of annotations compared to lidar-based approaches. In order to ease multitask learning, we provide a pairing of 2D instance segments with 3D bounding boxes. In addition, we complement the Cityscapes benchmark suite with 3D vehicle detection based on the new annotations as well as metrics presented in this work. Dataset and benchmark are available online. △ Less

Submitted 14 June, 2020; originally announced June 2020.

Comments: 2020 "Scalability in Autonomous Driving" CVPR Workshop

arXiv:2001.07100 [pdf, other]

doi 10.1007/s13218-020-00631-4

Active and Incremental Learning with Weak Supervision

Authors: Clemens-Alexander Brust, Christoph Käding, Joachim Denzler

Abstract: Large amounts of labeled training data are one of the main contributors to the great success that deep models have achieved in the past. Label acquisition for tasks other than benchmarks can pose a challenge due to requirements of both funding and expertise. By selecting unlabeled examples that are promising in terms of model improvement and only asking for respective labels, active learning can i… ▽ More Large amounts of labeled training data are one of the main contributors to the great success that deep models have achieved in the past. Label acquisition for tasks other than benchmarks can pose a challenge due to requirements of both funding and expertise. By selecting unlabeled examples that are promising in terms of model improvement and only asking for respective labels, active learning can increase the efficiency of the labeling process in terms of time and cost. In this work, we describe combinations of an incremental learning scheme and methods of active learning. These allow for continuous exploration of newly observed unlabeled data. We describe selection criteria based on model uncertainty as well as expected model output change (EMOC). An object detection task is evaluated in a continuous exploration context on the PASCAL VOC dataset. We also validate a weakly supervised system based on active and incremental learning in a real-world biodiversity application where images from camera traps are analyzed. Labeling only 32 images by accepting or rejecting proposals generated by our method yields an increase in accuracy from 25.4% to 42.6%. △ Less

Submitted 20 January, 2020; originally announced January 2020.

Comments: Accepted for publication in KI - Künstliche Intelligenz

arXiv:1910.07486 [pdf, other]

LOST: A flexible framework for semi-automatic image annotation

Authors: Jonas Jäger, Gereon Reus, Joachim Denzler, Viviane Wolff, Klaus Fricke-Neuderth

Abstract: State-of-the-art computer vision approaches rely on huge amounts of annotated data. The collection of such data is a time consuming process since it is mainly performed by humans. The literature shows that semi-automatic annotation approaches can significantly speed up the annotation process by the automatic generation of annotation proposals to support the annotator. In this paper we present a fr… ▽ More State-of-the-art computer vision approaches rely on huge amounts of annotated data. The collection of such data is a time consuming process since it is mainly performed by humans. The literature shows that semi-automatic annotation approaches can significantly speed up the annotation process by the automatic generation of annotation proposals to support the annotator. In this paper we present a framework that allows for a quick and flexible design of semi-automatic annotation pipelines. We show that a good design of the process will speed up the collection of annotations. Our contribution is a new approach to image annotation that allows for the combination of different annotation tools and machine learning algorithms in one process. We further present potential applications of our approach. The source code of our framework called LOST (Label Objects and Save Time) is available at: https://github.com/l3p-cv/lost. △ Less

Submitted 4 November, 2019; v1 submitted 16 October, 2019; originally announced October 2019.

Comments: Under review at: Computer Vision and Image Understanding

MSC Class: 68T45

arXiv:1910.06056 [pdf, other]

Facial Behavior Analysis using 4D Curvature Statistics for Presentation Attack Detection

Authors: Martin Thümmel, Sven Sickert, Joachim Denzler

Abstract: The human face has a high potential for biometric identification due to its many individual traits. At the same time, such identification is vulnerable to biometric copies. These presentation attacks pose a great challenge in unsupervised authentication settings. As a countermeasure, we propose a method that automatically analyzes the plausibility of facial behavior based on a sequence of 3D face… ▽ More The human face has a high potential for biometric identification due to its many individual traits. At the same time, such identification is vulnerable to biometric copies. These presentation attacks pose a great challenge in unsupervised authentication settings. As a countermeasure, we propose a method that automatically analyzes the plausibility of facial behavior based on a sequence of 3D face scans. A compact feature representation measures facial behavior using the temporal curvature change. Finally, we train our method only on genuine faces in an anomaly detection scenario. Our method can detect presentation attacks using elastic 3D masks, bent photographs with eye holes, and monitor replay-attacks. For evaluation, we recorded a challenging database containing such cases using a high-quality 3D sensor. It features 109 4D face scans including eleven different types of presentation attacks. We achieve error rates of 11% and 6% for APCER and BPCER, respectively. △ Less

Submitted 10 August, 2021; v1 submitted 14 October, 2019; originally announced October 2019.

Comments: 6 pages, 6 figures, IEEE International Workshop on Biometrics and Forensics 2021

arXiv:1909.10296 [pdf, other]

Predicting Landscapes from Environmental Conditions Using Generative Networks

Authors: Christian Requena-Mesa, Markus Reichstein, Miguel Mahecha, Basil Kraft, Joachim Denzler

Abstract: Landscapes are meaningful ecological units that strongly depend on the environmental conditions. Such dependencies between landscapes and the environment have been noted since the beginning of Earth sciences and cast into conceptual models describing the interdependencies of climate, geology, vegetation and geomorphology. Here, we ask whether landscapes, as seen from space, can be statistically pr… ▽ More Landscapes are meaningful ecological units that strongly depend on the environmental conditions. Such dependencies between landscapes and the environment have been noted since the beginning of Earth sciences and cast into conceptual models describing the interdependencies of climate, geology, vegetation and geomorphology. Here, we ask whether landscapes, as seen from space, can be statistically predicted from pertinent environmental conditions. To this end we adapted a deep learning generative model in order to establish the relationship between the environmental conditions and the view of landscapes from the Sentinel-2 satellite. We trained a conditional generative adversarial network to generate multispectral imagery given a set of climatic, terrain and anthropogenic predictors. The generated imagery of the landscapes share many characteristics with the real one. Results based on landscape patch metrics, indicative of landscape composition and structure, show that the proposed generative model creates landscapes that are more similar to the targets than the baseline models while overall reflectance and vegetation cover are predicted better. We demonstrate that for many purposes the generated landscapes behave as real with immediate application for global change studies. We envision the application of machine learning as a tool to forecast the effects of climate change on the spatial features of landscapes, while we assess its limitations and breaking points. △ Less

Submitted 23 September, 2019; originally announced September 2019.

Comments: Accepted conference paper at GCPR2019

arXiv:1909.07075 [pdf, other]

doi 10.1007/978-3-030-33676-9_5

Classification-Specific Parts for Improving Fine-Grained Visual Categorization

Authors: Dimitri Korsch, Paul Bodesheim, Joachim Denzler

Abstract: Fine-grained visual categorization is a classification task for distinguishing categories with high intra-class and small inter-class variance. While global approaches aim at using the whole image for performing the classification, part-based solutions gather additional local information in terms of attentions or parts. We propose a novel classification-specific part estimation that uses an initia… ▽ More Fine-grained visual categorization is a classification task for distinguishing categories with high intra-class and small inter-class variance. While global approaches aim at using the whole image for performing the classification, part-based solutions gather additional local information in terms of attentions or parts. We propose a novel classification-specific part estimation that uses an initial prediction as well as back-propagation of feature importance via gradient computations in order to estimate relevant image regions. The subsequently detected parts are then not only selected by a-posteriori classification knowledge, but also have an intrinsic spatial extent that is determined automatically. This is in contrast to most part-based approaches and even to available ground-truth part annotations, which only provide point coordinates and no additional scale information. We show in our experiments on various widely-used fine-grained datasets the effectiveness of the mentioned part selection method in conjunction with the extracted part features. △ Less

Submitted 16 September, 2019; originally announced September 2019.

Comments: Presented at the GCPR2019

arXiv:1908.04027 [pdf, other]

Self-supervised Data Bootstrap** for Deep Optical Character Recognition of Identity Documents

Authors: Oliver Mothes, Joachim Denzler

Abstract: The essential task of verifying person identities at airports and national borders is very time consuming. To accelerate it, optical character recognition for identity documents (IDs) using dictionaries is not appropriate due to high variability of the text content in IDs, e.g., individual street names or surnames. Additionally, no properties of the used fonts in IDs are known. Therefore, we propo… ▽ More The essential task of verifying person identities at airports and national borders is very time consuming. To accelerate it, optical character recognition for identity documents (IDs) using dictionaries is not appropriate due to high variability of the text content in IDs, e.g., individual street names or surnames. Additionally, no properties of the used fonts in IDs are known. Therefore, we propose an iterative self-supervised bootstrap** approach using a smart strategy to mine real character data from IDs. In combination with synthetically generated character data, the real data is used to train efficient convolutional neural networks for character classification serving a practical runtime as well as a high accuracy. On a dataset with 74 character classes, we achieve an average class-wise accuracy of 99.4 %. In contrast, if we would apply a classifier trained only using synthetic data, the accuracy is reduced to 58.1 %. Finally, we show that our whole proposed pipeline outperforms an established open-source framework △ Less

Submitted 12 August, 2019; originally announced August 2019.

Comments: 4 pages, 2 figures

arXiv:1908.03361 [pdf, other]

doi 10.5445/KSP/1000087327/06

Enhancing Flood Impact Analysis using Interactive Retrieval of Social Media Images

Authors: Björn Barz, Kai Schröter, Moritz Münch, Bin Yang, Andrea Unger, Doris Dransch, Joachim Denzler

Abstract: The analysis of natural disasters such as floods in a timely manner often suffers from limited data due to a coarse distribution of sensors or sensor failures. This limitation could be alleviated by leveraging information contained in images of the event posted on social media platforms, so-called "Volunteered Geographic Information (VGI)". To save the analyst from the need to inspect all images p… ▽ More The analysis of natural disasters such as floods in a timely manner often suffers from limited data due to a coarse distribution of sensors or sensor failures. This limitation could be alleviated by leveraging information contained in images of the event posted on social media platforms, so-called "Volunteered Geographic Information (VGI)". To save the analyst from the need to inspect all images posted online manually, we propose to use content-based image retrieval with the possibility of relevance feedback for retrieving only relevant images of the event to be analyzed. To evaluate this approach, we introduce a new dataset of 3,710 flood images, annotated by domain experts regarding their relevance with respect to three tasks (determining the flooded area, inundation depth, water pollution). We compare several image features and relevance feedback methods on that dataset, mixed with 97,085 distractor images, and are able to improve the precision among the top 100 retrieval results from 55% with the baseline retrieval to 87% after 5 rounds of feedback. △ Less

Submitted 9 August, 2019; originally announced August 2019.

Journal ref: Archives of Data Science, Series A, 5.1, 2018

arXiv:1902.00423 [pdf, other]

doi 10.3390/jimaging6060041

Do We Train on Test Data? Purging CIFAR of Near-Duplicates

Authors: Björn Barz, Joachim Denzler

Abstract: The CIFAR-10 and CIFAR-100 datasets are two of the most heavily benchmarked datasets in computer vision and are often used to evaluate novel methods and model architectures in the field of deep learning. However, we find that 3.3% and 10% of the images from the test sets of these datasets have duplicates in the training set. These duplicates are easily recognizable by memorization and may, hence,… ▽ More The CIFAR-10 and CIFAR-100 datasets are two of the most heavily benchmarked datasets in computer vision and are often used to evaluate novel methods and model architectures in the field of deep learning. However, we find that 3.3% and 10% of the images from the test sets of these datasets have duplicates in the training set. These duplicates are easily recognizable by memorization and may, hence, bias the comparison of image recognition techniques regarding their generalization capability. To eliminate this bias, we provide the "fair CIFAR" (ciFAIR) dataset, where we replaced all duplicates in the test sets with new images sampled from the same domain. We then re-evaluate the classification performance of various popular state-of-the-art CNN architectures on these new test sets to investigate whether recent research has overfitted to memorizing data instead of learning abstract concepts. We find a significant drop in classification accuracy of between 9% and 14% relative to the original performance on the duplicate-free test set. The ciFAIR dataset and pre-trained models are available at https://cvjena.github.io/cifair/, where we also maintain a leaderboard. △ Less

Submitted 2 June, 2020; v1 submitted 1 February, 2019; originally announced February 2019.

Comments: Journal of Imaging

Journal ref: Journal of Imaging. 2020; 6(6):41

arXiv:1901.09054 [pdf, other]

Deep Learning on Small Datasets without Pre-Training using Cosine Loss

Authors: Björn Barz, Joachim Denzler

Abstract: Two things seem to be indisputable in the contemporary deep learning discourse: 1. The categorical cross-entropy loss after softmax activation is the method of choice for classification. 2. Training a CNN classifier from scratch on small datasets does not work well. In contrast to this, we show that the cosine loss function provides significantly better performance than cross-entropy on datasets w… ▽ More Two things seem to be indisputable in the contemporary deep learning discourse: 1. The categorical cross-entropy loss after softmax activation is the method of choice for classification. 2. Training a CNN classifier from scratch on small datasets does not work well. In contrast to this, we show that the cosine loss function provides significantly better performance than cross-entropy on datasets with only a handful of samples per class. For example, the accuracy achieved on the CUB-200-2011 dataset without pre-training is by 30% higher than with the cross-entropy loss. Further experiments on other popular datasets confirm our findings. Moreover, we demonstrate that integrating prior knowledge in the form of class hierarchies is straightforward with the cosine loss and improves classification performance further. △ Less

Submitted 11 December, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

Comments: Presented at WACV 2020

Journal ref: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 2020

arXiv:1812.04418 [pdf, other]

Towards Automatic Identification of Elephants in the Wild

Authors: Matthias Körschens, Björn Barz, Joachim Denzler

Abstract: Identifying animals from a large group of possible individuals is very important for biodiversity monitoring and especially for collecting data on a small number of particularly interesting individuals, as these have to be identified first before this can be done. Identifying them can be a very time-consuming task. This is especially true, if the animals look very similar and have only a small num… ▽ More Identifying animals from a large group of possible individuals is very important for biodiversity monitoring and especially for collecting data on a small number of particularly interesting individuals, as these have to be identified first before this can be done. Identifying them can be a very time-consuming task. This is especially true, if the animals look very similar and have only a small number of distinctive features, like elephants do. In most cases the animals stay at one place only for a short period of time during which the animal needs to be identified for knowing whether it is important to collect new data on it. For this reason, a system supporting the researchers in identifying elephants to speed up this process would be of great benefit. In this paper, we present such a system for identifying elephants in the face of a large number of individuals with only few training images per individual. For that purpose, we combine object part localization, off-the-shelf CNN features, and support vector machine classification to provide field researches with proposals of possible individuals given new images of an elephant. The performance of our system is demonstrated on a dataset comprising a total of 2078 images of 276 individual elephants, where we achieve 56% top-1 test accuracy and 80% top-10 accuracy. To deal with occlusion, varying viewpoints, and different poses present in the dataset, we furthermore enable the analysts to provide the system with multiple images of the same elephant to be identified and aggregate confidence values generated by the classifier. With that, our system achieves a top-1 accuracy of 74% and a top-10 accuracy of 88% on the held-out test dataset. △ Less

Submitted 11 December, 2018; originally announced December 2018.

Comments: Presented at the AI for Wildlife Conservation (AIWC) 2018 workshop in Stockholm (https://sites.google.com/a/usc.edu/aiwc/home)

Showing 1–50 of 75 results for author: Denzler, J