Search | arXiv e-print repository

3D Multimodal Image Registration for Plant Phenoty**

Authors: Eric Stumpe, Gernot Bodner, Francesco Flagiello, Matthias Zeppelzauer

Abstract: The use of multiple camera technologies in a combined multimodal monitoring system for plant phenoty** offers promising benefits. Compared to configurations that only utilize a single camera technology, cross-modal patterns can be recorded that allow a more comprehensive assessment of plant phenotypes. However, the effective utilization of cross-modal patterns is dependent on precise image regis… ▽ More The use of multiple camera technologies in a combined multimodal monitoring system for plant phenoty** offers promising benefits. Compared to configurations that only utilize a single camera technology, cross-modal patterns can be recorded that allow a more comprehensive assessment of plant phenotypes. However, the effective utilization of cross-modal patterns is dependent on precise image registration to achieve pixel-accurate alignment, a challenge often complicated by parallax and occlusion effects inherent in plant canopy imaging. In this study, we propose a novel multimodal 3D image registration method that addresses these challenges by integrating depth information from a time-of-flight camera into the registration process. By leveraging depth data, our method mitigates parallax effects and thus facilitates more accurate pixel alignment across camera modalities. Additionally, we introduce an automated mechanism to identify and differentiate different types of occlusions, thereby minimizing the introduction of registration errors. To evaluate the efficacy of our approach, we conduct experiments on a diverse image dataset comprising six distinct plant species with varying leaf geometries. Our results demonstrate the robustness of the proposed registration algorithm, showcasing its ability to achieve accurate alignment across different plant types and camera compositions. Compared to previous methods it is not reliant on detecting plant specific image features and can thereby be utilized for a wide variety of applications in plant sciences. The registration approach principally scales to arbitrary numbers of cameras with different resolutions and wavelengths. Overall, our study contributes to advancing the field of plant phenoty** by offering a robust and reliable solution for multimodal image registration. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 53 pages, 13 Figures, preprint submitted to Computers and Electronics in Agriculture

arXiv:2405.08150 [pdf, other]

cVIL: Class-Centric Visual Interactive Labeling

Authors: Matthias Matt, Matthias Zeppelzauer, Manuela Waldner

Abstract: We present cVIL, a class-centric approach to visual interactive labeling, which facilitates human annotation of large and complex image data sets. cVIL uses different property measures to support instance labeling for labeling difficult instances and batch labeling to quickly label easy instances. Simulated experiments reveal that cVIL with batch labeling can outperform traditional labeling approa… ▽ More We present cVIL, a class-centric approach to visual interactive labeling, which facilitates human annotation of large and complex image data sets. cVIL uses different property measures to support instance labeling for labeling difficult instances and batch labeling to quickly label easy instances. Simulated experiments reveal that cVIL with batch labeling can outperform traditional labeling approaches based on active learning. In a user study, cVIL led to better accuracy and higher user preference compared to a traditional instance-based visual interactive labeling approach based on 2D scatterplots. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2309.15097 [pdf, other]

Case Study: Ensemble Decision-Based Annotation of Unconstrained Real Estate Images

Authors: Miroslav Despotovic, Zedong Zhang, Eric Stumpe, Matthias Zeppelzauer

Abstract: We describe a proof-of-concept for annotating real estate images using simple iterative rule-based semi-supervised learning. In this study, we have gained important insights into the content characteristics and uniqueness of individual image classes as well as essential requirements for a practical implementation. We describe a proof-of-concept for annotating real estate images using simple iterative rule-based semi-supervised learning. In this study, we have gained important insights into the content characteristics and uniqueness of individual image classes as well as essential requirements for a practical implementation. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: 2 pages, 3 figures

MSC Class: 68 ACM Class: I.4.8

arXiv:2211.17016 [pdf, other]

doi 10.1016/j.gaitpost.2022.07.153

Explaining machine learning models for age classification in human gait analysis

Authors: Djordje Slijepcevic, Fabian Horst, Marvin Simak, Sebastian Lapuschkin, Anna-Maria Raberger, Wojciech Samek, Christian Breiteneder, Wolfgang I. Schöllhorn, Matthias Zeppelzauer, Brian Horsak

Abstract: Machine learning (ML) models have proven effective in classifying gait analysis data, e.g., binary classification of young vs. older adults. ML models, however, lack in providing human understandable explanations for their predictions. This "black-box" behavior impedes the understanding of which input features the model predictions are based on. We investigated an Explainable Artificial Intelligen… ▽ More Machine learning (ML) models have proven effective in classifying gait analysis data, e.g., binary classification of young vs. older adults. ML models, however, lack in providing human understandable explanations for their predictions. This "black-box" behavior impedes the understanding of which input features the model predictions are based on. We investigated an Explainable Artificial Intelligence method, i.e., Layer-wise Relevance Propagation (LRP), for gait analysis data. The research question was: Which input features are used by ML models to classify age-related differences in walking patterns? We utilized a subset of the AIST Gait Database 2019 containing five bilateral ground reaction force (GRF) recordings per person during barefoot walking of healthy participants. Each input signal was min-max normalized before concatenation and fed into a Convolutional Neural Network (CNN). Participants were divided into three age groups: young (20-39 years), middle-aged (40-64 years), and older (65-79 years) adults. The classification accuracy and relevance scores (derived using LRP) were averaged over a stratified ten-fold cross-validation. The mean classification accuracy of 60.1% was clearly higher than the zero-rule baseline of 37.3%. The confusion matrix shows that the CNN distinguished younger and older adults well, but had difficulty modeling the middle-aged adults. △ Less

Submitted 16 October, 2022; originally announced November 2022.

Comments: 3 pages, 1 figure

Journal ref: Gait & Posture 97 (Supplement 1) (2022) 252-253

arXiv:2211.17015 [pdf, other]

doi 10.1016/j.gaitpost.2020.07.114

Explaining automated gender classification of human gait

Authors: Fabian Horst, Djordje Slijepcevic, Matthias Zeppelzauer, Anna-Maria Raberger, Sebastian Lapuschkin, Wojciech Samek, Wolfgang I. Schöllhorn, Christian Breiteneder, Brian Horsak

Abstract: State-of-the-art machine learning (ML) models are highly effective in classifying gait analysis data, however, they lack in providing explanations for their predictions. This "black-box" characteristic makes it impossible to understand on which input patterns, ML models base their predictions. The present study investigates whether Explainable Artificial Intelligence methods, i.e., Layer-wise Rele… ▽ More State-of-the-art machine learning (ML) models are highly effective in classifying gait analysis data, however, they lack in providing explanations for their predictions. This "black-box" characteristic makes it impossible to understand on which input patterns, ML models base their predictions. The present study investigates whether Explainable Artificial Intelligence methods, i.e., Layer-wise Relevance Propagation (LRP), can be useful to enhance the explainability of ML predictions in gait classification. The research question was: Which input patterns are most relevant for an automated gender classification model and do they correspond to characteristics identified in the literature? We utilized a subset of the GAITREC dataset containing five bilateral ground reaction force (GRF) recordings per person during barefoot walking of 62 healthy participants: 34 females and 28 males. Each input signal (right and left side) was min-max normalized before concatenation and fed into a multi-layer Convolutional Neural Network (CNN). The classification accuracy was obtained over a stratified ten-fold cross-validation. To identify gender-specific patterns, the input relevance scores were derived using LRP. The mean classification accuracy of the CNN with 83.3% showed a clear superiority over the zero-rule baseline of 54.8%. △ Less

Submitted 16 October, 2022; originally announced November 2022.

Comments: 3 pages, 1 figure

Journal ref: Gait & Posture 81 (Supplement 1) (2020) 159-160

arXiv:2211.12108 [pdf, other]

doi 10.3217/978-3-85125-869-1-13

Explaining YOLO: Leveraging Grad-CAM to Explain Object Detections

Authors: Armin Kirchknopf, Djordje Slijepcevic, Ilkay Wunderlich, Michael Breiter, Johannes Traxler, Matthias Zeppelzauer

Abstract: We investigate the problem of explainability for visual object detectors. Specifically, we demonstrate on the example of the YOLO object detector how to integrate Grad-CAM into the model architecture and analyze the results. We show how to compute attribution-based explanations for individual detections and find that the normalization of the results has a great impact on their interpretation. We investigate the problem of explainability for visual object detectors. Specifically, we demonstrate on the example of the YOLO object detector how to integrate Grad-CAM into the model architecture and analyze the results. We show how to compute attribution-based explanations for individual detections and find that the normalization of the results has a great impact on their interpretation. △ Less

Submitted 22 November, 2022; originally announced November 2022.

Journal ref: Proceedings of the Workshop of the Austrian Association for Pattern Recognition 2021

arXiv:2211.09018 [pdf, other]

doi 10.3217/978-3-85125-869-1-06

Real Estate Attribute Prediction from Multiple Visual Modalities with Missing Data

Authors: Eric Stumpe, Miroslav Despotovic, Zedong Zhang, Matthias Zeppelzauer

Abstract: The assessment and valuation of real estate requires large datasets with real estate information. Unfortunately, real estate databases are usually sparse in practice, i.e., not for each property every important attribute is available. In this paper, we study the potential of predicting high-level real estate attributes from visual data, specifically from two visual modalities, namely indoor (inter… ▽ More The assessment and valuation of real estate requires large datasets with real estate information. Unfortunately, real estate databases are usually sparse in practice, i.e., not for each property every important attribute is available. In this paper, we study the potential of predicting high-level real estate attributes from visual data, specifically from two visual modalities, namely indoor (interior) and outdoor (facade) photos. We design three models using different multimodal fusion strategies and evaluate them for three different use cases. Thereby, a particular challenge is to handle missing modalities. We evaluate different fusion strategies, present baselines for the different prediction tasks, and find that enriching the training data with additional incomplete samples can lead to an improvement in prediction accuracy. Furthermore, the fusion of information from indoor and outdoor photos results in a performance boost of up to 5% in Macro F1-score. △ Less

Submitted 16 November, 2022; originally announced November 2022.

Comments: included in the Proceedings of the OAGM Workshop 2021

Journal ref: OAGM Workshop 2021 (2021) 31-37

arXiv:2208.05232 [pdf, other]

doi 10.1109/TREX57753.2022.00006

Trustworthy Visual Analytics in Clinical Gait Analysis: A Case Study for Patients with Cerebral Palsy

Authors: Alexander Rind, Djordje Slijepčević, Matthias Zeppelzauer, Fabian Unglaube, Andreas Kranzl, Brian Horsak

Abstract: Three-dimensional clinical gait analysis is essential for selecting optimal treatment interventions for patients with cerebral palsy (CP), but generates a large amount of time series data. For the automated analysis of these data, machine learning approaches yield promising results. However, due to their black-box nature, such approaches are often mistrusted by clinicians. We propose gaitXplorer,… ▽ More Three-dimensional clinical gait analysis is essential for selecting optimal treatment interventions for patients with cerebral palsy (CP), but generates a large amount of time series data. For the automated analysis of these data, machine learning approaches yield promising results. However, due to their black-box nature, such approaches are often mistrusted by clinicians. We propose gaitXplorer, a visual analytics approach for the classification of CP-related gait patterns that integrates Grad-CAM, a well-established explainable artificial intelligence algorithm, for explanations of machine learning classifications. Regions of high relevance for classification are highlighted in the interactive visual interface. The approach is evaluated in a case study with two clinical gait experts. They inspected the explanations for a sample of eight patients using the visual interface and expressed which relevance scores they found trustworthy and which they found suspicious. Overall, the clinicians gave positive feedback on the approach as it allowed them a better understanding of which regions in the data were relevant for the classification. △ Less

Submitted 19 December, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

Comments: 7 pages, 4 figures; supplemental material 9 pages, 8 figures

ACM Class: J.3; H.5.m; I.3.6

Journal ref: Proceedings of the 2022 IEEE Workshop on TRust and EXpertise in Visual Analytics, TREX (2022) 8-15

arXiv:2106.04908 [pdf, other]

Automatic Sexism Detection with Multilingual Transformer Models

Authors: Mina Schütz, Jaqueline Boeck, Daria Liakhovets, Djordje Slijepčević, Armin Kirchknopf, Manuel Hecht, Johannes Bogensperger, Sven Schlarb, Alexander Schindler, Matthias Zeppelzauer

Abstract: Sexism has become an increasingly major problem on social networks during the last years. The first shared task on sEXism Identification in Social neTworks (EXIST) at IberLEF 2021 is an international competition in the field of Natural Language Processing (NLP) with the aim to automatically identify sexism in social media content by applying machine learning methods. Thereby sexism detection is fo… ▽ More Sexism has become an increasingly major problem on social networks during the last years. The first shared task on sE**, and objectification). This paper presents the contribution of the AIT_FHSTP team at the EXIST2021 benchmark for both tasks. To solve the tasks we applied two multilingual transformer models, one based on multilingual BERT and one based on XLM-R. Our approach uses two different strategies to adapt the transformers to the detection of sexist content: first, unsupervised pre-training with additional data and second, supervised fine-tuning with additional and augmented data. For both tasks our best model is XLM-R with unsupervised pre-training on the EXIST data and additional datasets and fine-tuning on the provided dataset. The best run for the binary classification (task 1) achieves a macro F1-score of 0.7752 and scores 5th rank in the benchmark; for the multiclass classification (task 2) our best submission scores 6th rank with a macro F1-score of 0.5589. △ Less

Submitted 8 February, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: Technical Report to the AIT_FHSTP EXIST 2021 Challenge contribution (under review) http://nlp.uned.es/exist2021/

arXiv:2105.15165 [pdf, other]

Multimodal Detection of Information Disorder from Social Media

Authors: Armin Kirchknopf, Djordje Slijepcevic, Matthias Zeppelzauer

Abstract: Social media is accompanied by an increasing proportion of content that provides fake information or misleading content, known as information disorder. In this paper, we study the problem of multimodal fake news detection on a largescale multimodal dataset. We propose a multimodal network architecture that enables different levels and types of information fusion. In addition to the textual and vis… ▽ More Social media is accompanied by an increasing proportion of content that provides fake information or misleading content, known as information disorder. In this paper, we study the problem of multimodal fake news detection on a largescale multimodal dataset. We propose a multimodal network architecture that enables different levels and types of information fusion. In addition to the textual and visual content of a posting, we further leverage secondary information, i.e. user comments and metadata. We fuse information at multiple levels to account for the specific intrinsic structure of the modalities. Our results show that multimodal analysis is highly effective for the task and all modalities contribute positively when fused properly. △ Less

Submitted 31 May, 2021; originally announced May 2021.

Comments: 4 pages, 2 figures, 2 tables, PrePrint CBMI 2021

arXiv:2105.14824 [pdf, other]

Bounded logit attention: Learning to explain image classifiers

Authors: Thomas Baumhauer, Djordje Slijepcevic, Matthias Zeppelzauer

Abstract: Explainable artificial intelligence is the attempt to elucidate the workings of systems too complex to be directly accessible to human cognition through suitable side-information referred to as "explanations". We present a trainable explanation module for convolutional image classifiers we call bounded logit attention (BLA). The BLA module learns to select a subset of the convolutional feature map… ▽ More Explainable artificial intelligence is the attempt to elucidate the workings of systems too complex to be directly accessible to human cognition through suitable side-information referred to as "explanations". We present a trainable explanation module for convolutional image classifiers we call bounded logit attention (BLA). The BLA module learns to select a subset of the convolutional feature map for each input instance, which then serves as an explanation for the classifier's prediction. BLA overcomes several limitations of the instancewise feature selection method "learning to explain" (L2X) introduced by Chen et al. (2018): 1) BLA scales to real-world sized image classification problems, and 2) BLA offers a canonical way to learn explanations of variable size. Due to its modularity BLA lends itself to transfer learning setups and can also be employed as a post-hoc add-on to trained classifiers. Beyond explainability, BLA may serve as a general purpose method for differentiable approximation of subset selection. In a user study we find that BLA explanations are preferred over explanations generated by the popular (Grad-)CAM method. △ Less

Submitted 31 May, 2021; originally announced May 2021.

arXiv:2102.04763 [pdf, other]

doi 10.1016/j.cose.2021.102488

$k$-Anonymity in Practice: How Generalisation and Suppression Affect Machine Learning Classifiers

Authors: Djordje Slijepčević, Maximilian Henzl, Lukas Daniel Klausner, Tobias Dam, Peter Kieseberg, Matthias Zeppelzauer

Abstract: The protection of private information is a crucial issue in data-driven research and business contexts. Typically, techniques like anonymisation or (selective) deletion are introduced in order to allow data sharing, e. g. in the case of collaborative research endeavours. For use with anonymisation techniques, the $k$-anonymity criterion is one of the most popular, with numerous scientific publicat… ▽ More The protection of private information is a crucial issue in data-driven research and business contexts. Typically, techniques like anonymisation or (selective) deletion are introduced in order to allow data sharing, e. g. in the case of collaborative research endeavours. For use with anonymisation techniques, the $k$-anonymity criterion is one of the most popular, with numerous scientific publications on different algorithms and metrics. Anonymisation techniques often require changing the data and thus necessarily affect the results of machine learning models trained on the underlying data. In this work, we conduct a systematic comparison and detailed investigation into the effects of different $k$-anonymisation algorithms on the results of machine learning models. We investigate a set of popular $k$-anonymisation algorithms with different classifiers and evaluate them on different real-world datasets. Our systematic evaluation shows that with an increasingly strong $k$-anonymity constraint, the classification performance generally degrades, but to varying degrees and strongly depending on the dataset and anonymisation method. Furthermore, Mondrian can be considered as the method with the most appealing properties for subsequent classification. △ Less

Submitted 22 June, 2022; v1 submitted 9 February, 2021; originally announced February 2021.

Comments: 48 pages, 38 figures

Journal ref: Comput. Secur. 111, 2021

arXiv:2002.02730 [pdf, other]

Machine Unlearning: Linear Filtration for Logit-based Classifiers

Authors: Thomas Baumhauer, Pascal Schöttle, Matthias Zeppelzauer

Abstract: Recently enacted legislation grants individuals certain rights to decide in what fashion their personal data may be used, and in particular a "right to be forgotten". This poses a challenge to machine learning: how to proceed when an individual retracts permission to use data which has been part of the training process of a model? From this question emerges the field of machine unlearning, which c… ▽ More Recently enacted legislation grants individuals certain rights to decide in what fashion their personal data may be used, and in particular a "right to be forgotten". This poses a challenge to machine learning: how to proceed when an individual retracts permission to use data which has been part of the training process of a model? From this question emerges the field of machine unlearning, which could be broadly described as the investigation of how to "delete training data from models". Our work complements this direction of research for the specific setting of class-wide deletion requests for classification models (e.g. deep neural networks). As a first step, we propose linear filtration as a intuitive, computationally efficient sanitization method. Our experiments demonstrate benefits in an adversarial setting over naive deletion schemes. △ Less

Submitted 8 July, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

arXiv:1912.07737 [pdf, other]

doi 10.1145/3474121

On the Explanation of Machine Learning Predictions in Clinical Gait Analysis

Authors: Djordje Slijepcevic, Fabian Horst, Sebastian Lapuschkin, Anna-Maria Raberger, Matthias Zeppelzauer, Wojciech Samek, Christian Breiteneder, Wolfgang I. Schöllhorn, Brian Horsak

Abstract: Machine learning (ML) is increasingly used to support decision-making in the healthcare sector. While ML approaches provide promising results with regard to their classification performance, most share a central limitation, namely their black-box character. Motivated by the interest to understand the functioning of ML models, methods from the field of Explainable Artificial Intelligence (XAI) have… ▽ More Machine learning (ML) is increasingly used to support decision-making in the healthcare sector. While ML approaches provide promising results with regard to their classification performance, most share a central limitation, namely their black-box character. Motivated by the interest to understand the functioning of ML models, methods from the field of Explainable Artificial Intelligence (XAI) have recently become important. This article investigates the usefulness of XAI methods in clinical gait classification. For this purpose, predictions of state-of-the-art classification methods are explained with an established XAI method, i.e., Layer-wise Relevance Propagation (LRP). We propose to evaluate the obtained explanations with two complementary approaches: a statistical analysis of the underlying data using Statistical Parametric Map** and a qualitative evaluation by a clinical expert. A gait dataset comprising ground reaction force measurements from 132 patients with different lower-body gait disorders and 62 healthy controls is utilized. We investigate several gait classification tasks, employ multiple classification methods, and analyze the impact of data normalization and different signal components for classification performance and explanation quality. Our experiments show that explanations obtained by LRP exhibit promising statistical properties concerning inter-class discriminativity and are also in line with clinically relevant biomechanical gait characteristics. △ Less

Submitted 19 August, 2020; v1 submitted 16 December, 2019; originally announced December 2019.

Comments: 37 pages, 7 figures, 2 tables, 24 supplementary figures, 1 supplementary table

arXiv:1812.09245 [pdf, other]

Persistence Bag-of-Words for Topological Data Analysis

Authors: Bartosz Zieliński, Michał Lipiński, Mateusz Juda, Matthias Zeppelzauer, Paweł Dłotko

Abstract: Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs). PDs exhibit, however, complex structure and are difficult to integrate in today's machine learning workflows. This paper introduces persistence bag-of-words: a novel and stable vectorized representation of PDs that enables the seamless integration with mac… ▽ More Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs). PDs exhibit, however, complex structure and are difficult to integrate in today's machine learning workflows. This paper introduces persistence bag-of-words: a novel and stable vectorized representation of PDs that enables the seamless integration with machine learning. Comprehensive experiments show that the new representation achieves state-of-the-art performance and beyond in much less time than alternative approaches. △ Less

Submitted 4 June, 2019; v1 submitted 21 December, 2018; originally announced December 2018.

Comments: Accepted for the Twenty-Eight International Joint Conference on Artificial Intelligence (IJCAI-19). arXiv admin note: substantial text overlap with arXiv:1802.04852

arXiv:1807.07617 [pdf, other]

doi 10.1145/3240508.3241393

SoniControl - A Mobile Ultrasonic Firewall

Authors: Matthias Zeppelzauer, Alexis Ringot, Florian Taurer

Abstract: The exchange of data between mobile devices in the near-ultrasonic frequency band is a new promising technology for near field communication (NFC) but also raises a number of privacy concerns. We present the first ultrasonic firewall that reliably detects ultrasonic communication and provides the user with effective means to prevent hidden data exchange. This demonstration showcases a new media-ba… ▽ More The exchange of data between mobile devices in the near-ultrasonic frequency band is a new promising technology for near field communication (NFC) but also raises a number of privacy concerns. We present the first ultrasonic firewall that reliably detects ultrasonic communication and provides the user with effective means to prevent hidden data exchange. This demonstration showcases a new media-based communication technology ("data over audio") together with its related privacy concerns. It enables users to (i) interactively test out and experience ultrasonic information exchange and (ii) shows how to protect oneself against unwanted tracking. △ Less

Submitted 19 July, 2018; originally announced July 2018.

Comments: To appear in proceedings of 2018 ACM Multimedia Conference October 22--26, 2018, Seoul, Republic of Korea

arXiv:1804.10113 [pdf, other]

doi 10.1145/3210499.3210526

Visual Estimation of Building Condition with Patch-level ConvNets

Authors: David Koch, Miroslav Despotovic, Muntaha Sakeena, Mario Döller, Matthias Zeppelzauer

Abstract: The condition of a building is an important factor for real estate valuation. Currently, the estimation of condition is determined by real estate appraisers which makes it subjective to a certain degree. We propose a novel vision-based approach for the assessment of the building condition from exterior views of the building. To this end, we develop a multi-scale patch-based pattern extraction appr… ▽ More The condition of a building is an important factor for real estate valuation. Currently, the estimation of condition is determined by real estate appraisers which makes it subjective to a certain degree. We propose a novel vision-based approach for the assessment of the building condition from exterior views of the building. To this end, we develop a multi-scale patch-based pattern extraction approach and combine it with convolutional neural networks to estimate building condition from visual clues. Our evaluation shows that visually estimated building condition can serve as a proxy for condition estimates by appraisers. △ Less

Submitted 26 April, 2018; originally announced April 2018.

Comments: To appear in: Workshop on Multimedia for Real Estate Tech, ICMR 2018, Yokohama, Japan

arXiv:1804.02205 [pdf, other]

doi 10.1145/3206025.3206060

Automatic Prediction of Building Age from Photographs

Authors: Matthias Zeppelzauer, Miroslav Despotovic, Muntaha Sakeena, David Koch, Mario Döller

Abstract: We present a first method for the automated age estimation of buildings from unconstrained photographs. To this end, we propose a two-stage approach that firstly learns characteristic visual patterns for different building epochs at patch-level and then globally aggregates patch-level age estimates over the building. We compile evaluation datasets from different sources and perform an detailed eva… ▽ More We present a first method for the automated age estimation of buildings from unconstrained photographs. To this end, we propose a two-stage approach that firstly learns characteristic visual patterns for different building epochs at patch-level and then globally aggregates patch-level age estimates over the building. We compile evaluation datasets from different sources and perform an detailed evaluation of our approach, its sensitivity to parameters, and the capabilities of the employed deep networks to learn characteristic visual age-related patterns. Results show that our approach is able to estimate building age at a surprisingly high level that even outperforms human evaluators and thereby sets a new performance baseline. This work represents a first step towards the automated assessment of building parameters for automated price prediction. △ Less

Submitted 19 April, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

Comments: Preprint of paper to appear in ACM International Conference on Multimedia Retrieval (ICMR) 2018 Conference

arXiv:1802.04852 [pdf, other]

Persistence Codebooks for Topological Data Analysis

Authors: Bartosz Zielinski, Michal Lipinski, Mateusz Juda, Matthias Zeppelzauer, Pawel Dlotko

Abstract: Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs) which are 2D multisets of points. Their variable size makes them, however, difficult to combine with typical machine learning workflows. In this paper we introduce persistence codebooks, a novel expressive and discriminative fixed-size vectorized representa… ▽ More Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs) which are 2D multisets of points. Their variable size makes them, however, difficult to combine with typical machine learning workflows. In this paper we introduce persistence codebooks, a novel expressive and discriminative fixed-size vectorized representation of PDs. To this end, we adapt bag-of-words (BoW), vectors of locally aggregated descriptors (VLAD) and Fischer vectors (FV) for the quantization of PDs. Persistence codebooks represent PDs in a convenient way for machine learning and statistical analysis and have a number of favorable practical and theoretical properties including 1-Wasserstein stability. We evaluate the presented representations on several heterogeneous datasets and show their (high) discriminative power. Our approach achieves state-of-the-art performance and beyond in much less time than alternative approaches. △ Less

Submitted 13 June, 2019; v1 submitted 13 February, 2018; originally announced February 2018.

Comments: minor update, remove heading

arXiv:1712.06405 [pdf, other]

doi 10.1109/JBHI.2017.2785682

Automatic Classification of Functional Gait Disorders

Authors: Djordje Slijepcevic, Matthias Zeppelzauer, Anna-Maria Gorgas, Caterine Schwab, Michael Schüller, Arnold Baca, Christian Breiteneder, Brian Horsak

Abstract: This article proposes a comprehensive investigation of the automatic classification of functional gait disorders based solely on ground reaction force (GRF) measurements. The aim of the study is twofold: (1) to investigate the suitability of stateof-the-art GRF parameterization techniques (representations) for the discrimination of functional gait disorders; and (2) to provide a first performance… ▽ More This article proposes a comprehensive investigation of the automatic classification of functional gait disorders based solely on ground reaction force (GRF) measurements. The aim of the study is twofold: (1) to investigate the suitability of stateof-the-art GRF parameterization techniques (representations) for the discrimination of functional gait disorders; and (2) to provide a first performance baseline for the automated classification of functional gait disorders for a large-scale dataset. The utilized database comprises GRF measurements from 279 patients with gait disorders (GDs) and data from 161 healthy controls (N). Patients were manually classified into four classes with different functional impairments associated with the "hip", "knee", "ankle", and "calcaneus". Different parameterizations are investigated: GRF parameters, global principal component analysis (PCA)-based representations and a combined representation applying PCA on GRF parameters. The discriminative power of each parameterization for different classes is investigated by linear discriminant analysis (LDA). Based on this analysis, two classification experiments are pursued: (1) distinction between healthy and impaired gait (N vs. GD) and (2) multi-class classification between healthy gait and all four GD classes. Experiments show promising results and reveal among others that several factors, such as imbalanced class cardinalities and varying numbers of measurement sessions per patient have a strong impact on the classification accuracy and therefore need to be taken into account. The results represent a promising first step towards the automated classification of gait disorders and a first performance baseline for future developments in this direction. △ Less

Submitted 24 December, 2017; v1 submitted 18 December, 2017; originally announced December 2017.

Comments: 9 pages, 3 figures, IEEE Journal of Biomedical and Health Informatics

arXiv:1710.10662 [pdf, other]

doi 10.1016/j.cviu.2017.10.012

A Study on Topological Descriptors for the Analysis of 3D Surface Texture

Authors: Matthias Zeppelzauer, Bartosz Zielinski, Mateusz Juda, Markus Seidl

Abstract: Methods from computational topology are becoming more and more popular in computer vision and have shown to improve the state-of-the-art in several tasks. In this paper, we investigate the applicability of topological descriptors in the context of 3D surface analysis for the classification of different surface textures. We present a comprehensive study on topological descriptors, investigate their… ▽ More Methods from computational topology are becoming more and more popular in computer vision and have shown to improve the state-of-the-art in several tasks. In this paper, we investigate the applicability of topological descriptors in the context of 3D surface analysis for the classification of different surface textures. We present a comprehensive study on topological descriptors, investigate their robustness and expressiveness and compare them with state-of-the-art methods including Convolutional Neural Networks (CNNs). Results show that class-specific information is reflected well in topological descriptors. The investigated descriptors can directly compete with non-topological descriptors and capture complementary information. As a consequence they improve the state-of-the-art when combined with non-topological descriptors. △ Less

Submitted 29 October, 2017; originally announced October 2017.

Comments: Preprint of Article "A Study on Topological Descriptors for the Analysis of 3D Surface Texture" in Elsevier Journal on Computer Vision and Image Understanding (CVIU): https://doi.org/10.1016/j.cviu.2017.10.012, 17 Pages, 19 Figures, 4 Tables

arXiv:1707.06105 [pdf, other]

doi 10.1109/TVCG.2017.2785271

KAVAGait: Knowledge-Assisted Visual Analytics for Clinical Gait Analysis

Authors: Markus Wagner, Djordje Slijepcevic, Brian Horsak, Alexander Rind, Matthias Zeppelzauer, Wolfgang Aigner

Abstract: In 2014, more than 10 million people in the US were affected by an ambulatory disability. Thus, gait rehabilitation is a crucial part of health care systems. The quantification of human locomotion enables clinicians to describe and analyze a patient's gait performance in detail and allows them to base clinical decisions on objective data. These assessments generate a vast amount of complex data wh… ▽ More In 2014, more than 10 million people in the US were affected by an ambulatory disability. Thus, gait rehabilitation is a crucial part of health care systems. The quantification of human locomotion enables clinicians to describe and analyze a patient's gait performance in detail and allows them to base clinical decisions on objective data. These assessments generate a vast amount of complex data which need to be interpreted in a short time period. We conducted a design study in cooperation with gait analysis experts to develop a novel Knowledge-Assisted Visual Analytics solution for clinical Gait analysis (KAVAGait). KAVAGait allows the clinician to store and inspect complex data derived during clinical gait analysis. The system incorporates innovative and interactive visual interface concepts, which were developed based on the needs of clinicians. Additionally, an explicit knowledge store (EKS) allows externalization and storage of implicit knowledge from clinicians. It makes this information available for others, supporting the process of data inspection and clinical decision making. We validated our system by conducting expert reviews, a user study, and a case study. Results suggest that KAVAGait is able to support a clinician during clinical practice by visualizing complex gait data and providing knowledge of other clinicians. △ Less

Submitted 14 December, 2017; v1 submitted 19 July, 2017; originally announced July 2017.

Comments: 16 pages, 8 figures, minor revisions during the peer review, to appear in IEEE Transactions on Visualization and Computer Graphics

Journal ref: IEEE Trans. Visualization and Computer Graphics 25.3 (2018), pp. 1528-1542

arXiv:1703.03385 [pdf, other]

Visual-Interactive Similarity Search for Complex Objects by Example of Soccer Player Analysis

Authors: Jürgen Bernard, Christian Ritter, David Sessler, Matthias Zeppelzauer, Jörn Kohlhammer, Dieter Fellner

Abstract: The definition of similarity is a key prerequisite when analyzing complex data types in data mining, information retrieval, or machine learning. However, the meaningful definition is often hampered by the complexity of data objects and particularly by different notions of subjective similarity latent in targeted user groups. Taking the example of soccer players, we present a visual-interactive sys… ▽ More The definition of similarity is a key prerequisite when analyzing complex data types in data mining, information retrieval, or machine learning. However, the meaningful definition is often hampered by the complexity of data objects and particularly by different notions of subjective similarity latent in targeted user groups. Taking the example of soccer players, we present a visual-interactive system that learns users' mental models of similarity. In a visual-interactive interface, users are able to label pairs of soccer players with respect to their subjective notion of similarity. Our proposed similarity model automatically learns the respective concept of similarity using an active learning strategy. A visual-interactive retrieval technique is provided to validate the model and to execute downstream retrieval tasks for soccer player analysis. The applicability of the approach is demonstrated in different evaluation strategies, including usage scenarions and cross-validation tests. △ Less

Submitted 9 March, 2017; originally announced March 2017.

arXiv:1610.01944 [pdf, other]

doi 10.1145/3095713.3095719

PetroSurf3D - A Dataset for high-resolution 3D Surface Segmentation

Authors: Georg Poier, Markus Seidl, Matthias Zeppelzauer, Christian Reinbacher, Martin Schaich, Giovanna Bellandi, Alberto Marretta, Horst Bischof

Abstract: The development of powerful 3D scanning hardware and reconstruction algorithms has strongly promoted the generation of 3D surface reconstructions in different domains. An area of special interest for such 3D reconstructions is the cultural heritage domain, where surface reconstructions are generated to digitally preserve historical artifacts. While reconstruction quality nowadays is sufficient in… ▽ More The development of powerful 3D scanning hardware and reconstruction algorithms has strongly promoted the generation of 3D surface reconstructions in different domains. An area of special interest for such 3D reconstructions is the cultural heritage domain, where surface reconstructions are generated to digitally preserve historical artifacts. While reconstruction quality nowadays is sufficient in many cases, the robust analysis (e.g. segmentation, matching, and classification) of reconstructed 3D data is still an open topic. In this paper, we target the automatic and interactive segmentation of high-resolution 3D surface reconstructions from the archaeological domain. To foster research in this field, we introduce a fully annotated and publicly available large-scale 3D surface dataset including high-resolution meshes, depth maps and point clouds as a novel benchmark dataset to the community. We provide baseline results for our existing random forest-based approach and for the first time investigate segmentation with convolutional neural networks (CNNs) on the data. Results show that both approaches have complementary strengths and weaknesses and that the provided dataset represents a challenge for future research. △ Less

Submitted 1 March, 2017; v1 submitted 6 October, 2016; originally announced October 2016.

Comments: CBMI Submission; Dataset and more information can be found at http://lrs.icg.tugraz.at/research/petroglyphsegmentation/

arXiv:1601.06057 [pdf, other]

Topological descriptors for 3D surface analysis

Authors: Matthias Zeppelzauer, Bartosz Zieliński, Mateusz Juda, Markus Seidl

Abstract: We investigate topological descriptors for 3D surface analysis, i.e. the classification of surfaces according to their geometric fine structure. On a dataset of high-resolution 3D surface reconstructions we compute persistence diagrams for a 2D cubical filtration. In the next step we investigate different topological descriptors and measure their ability to discriminate structurally different 3D s… ▽ More We investigate topological descriptors for 3D surface analysis, i.e. the classification of surfaces according to their geometric fine structure. On a dataset of high-resolution 3D surface reconstructions we compute persistence diagrams for a 2D cubical filtration. In the next step we investigate different topological descriptors and measure their ability to discriminate structurally different 3D surface patches. We evaluate their sensitivity to different parameters and compare the performance of the resulting topological descriptors to alternative (non-topological) descriptors. We present a comprehensive evaluation that shows that topological descriptors are (i) robust, (ii) yield state-of-the-art performance for the task of 3D surface analysis and (iii) improve classification performance when combined with non-topological descriptors. △ Less

Submitted 22 January, 2016; originally announced January 2016.

Comments: 12 pages, 3 figures, CTIC 2016

arXiv:1601.00599 [pdf, other]

Multimodal Classification of Events in Social Media

Authors: Matthias Zeppelzauer, Daniel Schopfhauser

Abstract: A large amount of social media hosted on platforms like Flickr and Instagram is related to social events. The task of social event classification refers to the distinction of event and non-event-related content as well as the classification of event types (e.g. sports events, concerts, etc.). In this paper, we provide an extensive study of textual, visual, as well as multimodal representations for… ▽ More A large amount of social media hosted on platforms like Flickr and Instagram is related to social events. The task of social event classification refers to the distinction of event and non-event-related content as well as the classification of event types (e.g. sports events, concerts, etc.). In this paper, we provide an extensive study of textual, visual, as well as multimodal representations for social event classification. We investigate strengths and weaknesses of the modalities and study synergy effects between the modalities. Experimental results obtained with our multimodal representation outperform state-of-the-art methods and provide a new baseline for future research. △ Less

Submitted 4 January, 2016; originally announced January 2016.

Comments: Preprint of accepted manuscript for the Elsevier Image and Vision Computing Journal (IMAVIS). The paper will be published by IMAVIS under DOI 10.1016/j.imavis.2015.12.004

arXiv:1504.08308 [pdf, ps, other]

Efficient Image-Space Extraction and Representation of 3D Surface Topography

Authors: Matthias Zeppelzauer, Markus Seidl

Abstract: Surface topography refers to the geometric micro-structure of a surface and defines its tactile characteristics (typically in the sub-millimeter range). High-resolution 3D scanning techniques developed recently enable the 3D reconstruction of surfaces including their surface topography. In his paper, we present an efficient image-space technique for the extraction of surface topography from high-r… ▽ More Surface topography refers to the geometric micro-structure of a surface and defines its tactile characteristics (typically in the sub-millimeter range). High-resolution 3D scanning techniques developed recently enable the 3D reconstruction of surfaces including their surface topography. In his paper, we present an efficient image-space technique for the extraction of surface topography from high-resolution 3D reconstructions. Additionally, we filter noise and enhance topographic attributes to obtain an improved representation for subsequent topography classification. Comprehensive experiments show that the our representation captures well topographic attributes and significantly improves classification performance compared to alternative 2D and 3D representations. △ Less

Submitted 6 May, 2015; v1 submitted 30 April, 2015; originally announced April 2015.

Comments: Initial version of the paper accepted at the IEEE ICIP Conference 2015

ACM Class: I.4; I.4.3; I.4.7; I.5

arXiv:1504.06567 [pdf, other]

Cultural Event Recognition with Visual ConvNets and Temporal Models

Authors: Amaia Salvador, Matthias Zeppelzauer, Daniel Manchon-Vizuete, Andrea Calafell, Xavier Giro-i-Nieto

Abstract: This paper presents our contribution to the ChaLearn Challenge 2015 on Cultural Event Classification. The challenge in this task is to automatically classify images from 50 different cultural events. Our solution is based on the combination of visual features extracted from convolutional neural networks with temporal information using a hierarchical classifier scheme. We extract visual features fr… ▽ More This paper presents our contribution to the ChaLearn Challenge 2015 on Cultural Event Classification. The challenge in this task is to automatically classify images from 50 different cultural events. Our solution is based on the combination of visual features extracted from convolutional neural networks with temporal information using a hierarchical classifier scheme. We extract visual features from the last three fully connected layers of both CaffeNet (pretrained with ImageNet) and our fine tuned version for the ChaLearn challenge. We propose a late fusion strategy that trains a separate low-level SVM on each of the extracted neural codes. The class predictions of the low-level SVMs form the input to a higher level SVM, which gives the final event scores. We achieve our best result by adding a temporal refinement step into our classification scheme, which is applied directly to the output of each low-level SVM. Our approach penalizes high classification scores based on visual features when their time stamp does not match well an event-specific temporal distribution learned from the training and validation data. Our system achieved the second best result in the ChaLearn Challenge 2015 on Cultural Event Classification with a mean average precision of 0.767 on the test set. △ Less

Submitted 24 April, 2015; originally announced April 2015.

Comments: Initial version of the paper accepted at the CVPR Workshop ChaLearn Looking at People 2015

Showing 1–28 of 28 results for author: Zeppelzauer, M