Search | arXiv e-print repository

Towards Training Music Taggers on Synthetic Data

Authors: Nadine Kroher, Steven Manangu, Aggelos Pikrakis

Abstract: Most contemporary music tagging systems rely on large volumes of annotated data. As an alternative, we investigate the extent to which synthetically generated music excerpts can improve tagging systems when only small annotated collections are available. To this end, we release GTZAN-synth, a synthetic dataset that follows the taxonomy of the well-known GTZAN dataset while being ten times larger i… ▽ More Most contemporary music tagging systems rely on large volumes of annotated data. As an alternative, we investigate the extent to which synthetically generated music excerpts can improve tagging systems when only small annotated collections are available. To this end, we release GTZAN-synth, a synthetic dataset that follows the taxonomy of the well-known GTZAN dataset while being ten times larger in data volume. We first observe that simply adding this synthetic dataset to the training split of GTZAN does not result into performance improvements. We then proceed to investigating domain adaptation, transfer learning and fine-tuning strategies for the task at hand and draw the conclusion that the last two options yield an increase in accuracy. Overall, the proposed approach can be considered as a first guide in a promising field for future research. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 6 pages, 3 figures, accepted to 21st International Conference on Content-based Multimedia Indexing (CBMI) 2024, code available https://github.com/NadineKroher/music-tagging-synthetic-data-cbmi-2024

ACM Class: I.2

arXiv:2406.19081 [pdf, other]

Unsupervised Latent Stain Adaptation for Computational Pathology

Authors: Daniel Reisenbüchler, Lucas Luttner, Nadine S. Schaadt, Friedrich Feuerhake, Dorit Merhof

Abstract: In computational pathology, deep learning (DL) models for tasks such as segmentation or tissue classification are known to suffer from domain shifts due to different staining techniques. Stain adaptation aims to reduce the generalization error between different stains by training a model on source stains that generalizes to target stains. Despite the abundance of target stain data, a key challenge… ▽ More In computational pathology, deep learning (DL) models for tasks such as segmentation or tissue classification are known to suffer from domain shifts due to different staining techniques. Stain adaptation aims to reduce the generalization error between different stains by training a model on source stains that generalizes to target stains. Despite the abundance of target stain data, a key challenge is the lack of annotations. To address this, we propose a joint training between artificially labeled and unlabeled data including all available stained images called Unsupervised Latent Stain Adaptation (ULSA). Our method uses stain translation to enrich labeled source images with synthetic target images in order to increase the supervised signals. Moreover, we leverage unlabeled target stain images using stain-invariant feature consistency learning. With ULSA we present a semi-supervised strategy for efficient stain adaptation without access to annotated target stain data. Remarkably, ULSA is task agnostic in patch-level analysis for whole slide images (WSIs). Through extensive evaluation on external datasets, we demonstrate that ULSA achieves state-of-the-art (SOTA) performance in kidney tissue segmentation and breast cancer classification across a spectrum of staining variations. Our findings suggest that ULSA is an important framework for stain adaptation in computational pathology. △ Less

Submitted 3 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: Accepted MICCAI2024

arXiv:2406.16659 [pdf, other]

Data-driven Modeling in Metrology -- A Short Introduction, Current Developments and Future Perspectives

Authors: Linda-Sophie Schneider, Patrick Krauss, Nadine Schiering, Christopher Syben, Richard Schielein, Andreas Maier

Abstract: Mathematical models are vital to the field of metrology, playing a key role in the derivation of measurement results and the calculation of uncertainties from measurement data, informed by an understanding of the measurement process. These models generally represent the correlation between the quantity being measured and all other pertinent quantities. Such relationships are used to construct meas… ▽ More Mathematical models are vital to the field of metrology, playing a key role in the derivation of measurement results and the calculation of uncertainties from measurement data, informed by an understanding of the measurement process. These models generally represent the correlation between the quantity being measured and all other pertinent quantities. Such relationships are used to construct measurement systems that can interpret measurement data to generate conclusions and predictions about the measurement system itself. Classic models are typically analytical, built on fundamental physical principles. However, the rise of digital technology, expansive sensor networks, and high-performance computing hardware have led to a growing shift towards data-driven methodologies. This trend is especially prominent when dealing with large, intricate networked sensor systems in situations where there is limited expert understanding of the frequently changing real-world contexts. Here, we demonstrate the variety of opportunities that data-driven modeling presents, and how they have been already implemented in various real-world applications. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 31 pages, Preprint

arXiv:2406.04871 [pdf, other]

doi 10.1145/3643834.3661557

Mind Mansion: Exploring Metaphorical Interactions to Engage with Negative Thoughts in Virtual Reality

Authors: Julian Rasch, Michelle Johanna Zender, Sophia Sakel, Nadine Wagener

Abstract: Recurrent negative thoughts can significantly disrupt daily life and contribute to negative emotional states. Facing, confronting, and noticing such thoughts without support can be challenging. To provide a playful setting and leverage the technical maturation of Virtual Reality (VR), our VR experience, Mind Mansion, places the user in an initially cluttered virtual apartment. Here we utilize esta… ▽ More Recurrent negative thoughts can significantly disrupt daily life and contribute to negative emotional states. Facing, confronting, and noticing such thoughts without support can be challenging. To provide a playful setting and leverage the technical maturation of Virtual Reality (VR), our VR experience, Mind Mansion, places the user in an initially cluttered virtual apartment. Here we utilize established concepts from traditional therapy and metaphors identified in prior works to let users engage metaphorically with representations of thoughts, gradually sorting the space, fostering awareness of thoughts, and supporting mental self-care. The results of our user study (n = 30) reveal that Mind Mansion encourages the exploration of alternative perspectives, fosters acceptance, and potentially offers new co** mechanisms. Our findings suggest that this VR intervention can reduce negative affect and improve overall emotional awareness. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: To appear in Proceedings of the Designing Interactive Systems Conference (DIS '24), July 1-5, 2024, IT University of Copenhagen, Denmark

arXiv:2405.01533 [pdf, other]

OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning

Authors: Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, Jose M. Alvarez

Abstract: The advances in multimodal large language models (MLLMs) have led to growing interests in LLM-based autonomous driving agents to leverage their strong reasoning capabilities. However, capitalizing on MLLMs' strong reasoning capabilities for improved planning behavior is challenging since planning requires full 3D situational awareness beyond 2D reasoning. To address this challenge, our work propos… ▽ More The advances in multimodal large language models (MLLMs) have led to growing interests in LLM-based autonomous driving agents to leverage their strong reasoning capabilities. However, capitalizing on MLLMs' strong reasoning capabilities for improved planning behavior is challenging since planning requires full 3D situational awareness beyond 2D reasoning. To address this challenge, our work proposes a holistic framework for strong alignment between agent models and 3D driving tasks. Our framework starts with a novel 3D MLLM architecture that uses sparse queries to lift and compress visual representations into 3D before feeding them into an LLM. This query-based representation allows us to jointly encode dynamic objects and static map elements (e.g., traffic lanes), providing a condensed world model for perception-action alignment in 3D. We further propose OmniDrive-nuScenes, a new visual question-answering dataset challenging the true 3D situational awareness of a model with comprehensive visual question-answering (VQA) tasks, including scene description, traffic regulation, 3D grounding, counterfactual reasoning, decision making and planning. Extensive studies show the effectiveness of the proposed architecture as well as the importance of the VQA tasks for reasoning and planning in complex 3D scenes. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.02529 [pdf, other]

A School Student Essay Corpus for Analyzing Interactions of Argumentative Structure and Quality

Authors: Maja Stahl, Nadine Michel, Sebastian Kilsbach, Julian Schmidtke, Sara Rezat, Henning Wachsmuth

Abstract: Learning argumentative writing is challenging. Besides writing fundamentals such as syntax and grammar, learners must select and arrange argument components meaningfully to create high-quality essays. To support argumentative writing computationally, one step is to mine the argumentative structure. When combined with automatic essay scoring, interactions of the argumentative structure and quality… ▽ More Learning argumentative writing is challenging. Besides writing fundamentals such as syntax and grammar, learners must select and arrange argument components meaningfully to create high-quality essays. To support argumentative writing computationally, one step is to mine the argumentative structure. When combined with automatic essay scoring, interactions of the argumentative structure and quality scores can be exploited for comprehensive writing support. Although studies have shown the usefulness of using information about the argumentative structure for essay scoring, no argument mining corpus with ground-truth essay quality annotations has been published yet. Moreover, none of the existing corpora contain essays written by school students specifically. To fill this research gap, we present a German corpus of 1,320 essays from school students of two age groups. Each essay has been manually annotated for argumentative structure and quality on multiple levels of granularity. We propose baseline approaches to argument mining and essay scoring, and we analyze interactions between both tasks, thereby laying the ground for quality-oriented argumentative writing support. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Accepted to NAACL 2024

arXiv:2403.11784 [pdf, other]

ForzaETH Race Stack -- Scaled Autonomous Head-to-Head Racing on Fully Commercial off-the-Shelf Hardware

Authors: Nicolas Baumann, Edoardo Ghignone, Jonas Kühne, Niklas Bastuck, Jonathan Becker, Nadine Imholz, Tobias Kränzlin, Tian Yi Lim, Michael Lötscher, Luca Schwarzenbach, Luca Tognoni, Christian Vogt, Andrea Carron, Michele Magno

Abstract: Autonomous racing in robotics combines high-speed dynamics with the necessity for reliability and real-time decision-making. While such racing pushes software and hardware to their limits, many existing full-system solutions necessitate complex, custom hardware and software, and usually focus on Time-Trials rather than full unrestricted Head-to-Head racing, due to financial and safety constraints.… ▽ More Autonomous racing in robotics combines high-speed dynamics with the necessity for reliability and real-time decision-making. While such racing pushes software and hardware to their limits, many existing full-system solutions necessitate complex, custom hardware and software, and usually focus on Time-Trials rather than full unrestricted Head-to-Head racing, due to financial and safety constraints. This limits their reproducibility, making advancements and replication feasible mostly for well-resourced laboratories with comprehensive expertise in mechanical, electrical, and robotics fields. Researchers interested in the autonomy domain but with only partial experience in one of these fields, need to spend significant time with familiarization and integration. The ForzaETH Race Stack addresses this gap by providing an autonomous racing software platform designed for F1TENTH, a 1:10 scaled Head-to-Head autonomous racing competition, which simplifies replication by using commercial off-the-shelf hardware. This approach enhances the competitive aspect of autonomous racing and provides an accessible platform for research and development in the field. The ForzaETH Race Stack is designed with modularity and operational ease of use in mind, allowing customization and adaptability to various environmental conditions, such as track friction and layout. Capable of handling both Time-Trials and Head-to-Head racing, the stack has demonstrated its effectiveness, robustness, and adaptability in the field by winning the official F1TENTH international competition multiple times. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2402.18194 [pdf]

Formalized Identification Of Key Factors In Safety-Relevant Failure Scenarios

Authors: Tim Maurice Julitz, Nadine Schlüter, Manuel Löwer

Abstract: This research article presents a methodical data-based approach to systematically identify key factors in safety-related failure scenarios, with a focus on complex product-environmental systems in the era of Industry 4.0. The study addresses the uncertainty arising from the growing complexity of modern products. The method uses scenario analysis and focuses on failure analysis within technical pro… ▽ More This research article presents a methodical data-based approach to systematically identify key factors in safety-related failure scenarios, with a focus on complex product-environmental systems in the era of Industry 4.0. The study addresses the uncertainty arising from the growing complexity of modern products. The method uses scenario analysis and focuses on failure analysis within technical product development. The approach involves a derivation of influencing factors based on information from failure databases. The failures described here are documented individually in failure sequence diagrams and then related to each other in a relationship matrix. This creates a network of possible failure scenarios from individual failure cases that can be used in product development. To illustrate the application of the methodology, a case study of 41 Rapex safety alerts for a hair dryer is presented. The failure sequence diagrams and influencing factor relationship matrices show 46 influencing factors that lead to safety-related failures. The predominant harm is burns and electric shocks, which are highlighted by the active and passive sum diagrams. The research demonstrates a robust method for identifying key factors in safety-related failure scenarios using information from failure databases. The methodology provides valuable insights into product development and emphasizes the frequency of influencing factors and their interconnectedness. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.12880 [pdf, other]

Autism Detection in Speech -- A Survey

Authors: Nadine Probol, Margot Mieskes

Abstract: There has been a range of studies of how autism is displayed in voice, speech, and language. We analyse studies from the biomedical, as well as the psychological domain, but also from the NLP domain in order to find linguistic, prosodic and acoustic cues that could indicate autism. Our survey looks at all three domains. We define autism and which comorbidities might influence the correct detection… ▽ More There has been a range of studies of how autism is displayed in voice, speech, and language. We analyse studies from the biomedical, as well as the psychological domain, but also from the NLP domain in order to find linguistic, prosodic and acoustic cues that could indicate autism. Our survey looks at all three domains. We define autism and which comorbidities might influence the correct detection of the disorder. We especially look at observations such as verbal and semantic fluency, prosodic features, but also disfluencies and speaking rate. We also show word-based approaches and describe machine learning and transformer-based approaches both on the audio data as well as the transcripts. Lastly, we conclude, while there already is a lot of research, female patients seem to be severely under-researched. Also, most NLP research focuses on traditional machine learning methods instead of transformers which could be beneficial in this context. Additionally, we were unable to find research combining both features from audio and transcripts. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: Accepted to EACL 2024 Findings

arXiv:2402.11820 [pdf]

A critical analysis of cognitive load measurement methods for evaluating the usability of different types of interfaces: guidelines and framework for Human-Computer Interaction

Authors: Ali Darejeh, Nadine Marcusa, Gelareh Mohammadi, John Sweller

Abstract: Usability testing is an essential part of product design, particularly for user interfaces. To enhance the reliability of usability evaluations, employing cognitive load measurement methods can be highly effective in assessing the mental effort required to complete tasks during user testing. This review aims to provide an overview of the most suitable cognitive load measurement methods for evaluat… ▽ More Usability testing is an essential part of product design, particularly for user interfaces. To enhance the reliability of usability evaluations, employing cognitive load measurement methods can be highly effective in assessing the mental effort required to complete tasks during user testing. This review aims to provide an overview of the most suitable cognitive load measurement methods for evaluating various types of user interfaces, serving as a valuable resource for guiding usability assessments. To bridge the existing gap in the literature, a systematic review was conducted, analyzing 76 articles with experimental study designs that met the eligibility criteria. The review encompasses different methods of measuring cognitive load applicable to assessing the usability of diverse user interfaces, including computer software, information systems, video games, web and mobile applications, robotics, and virtual reality applications. The results highlight the most widely utilized cognitive load measurement methods in software usability, their respective usage percentages, and their application in evaluating the usability of each user interface type. Additionally, the advantages and disadvantages of each method are discussed. Furthermore, the review proposes a framework to assist usability testers in selecting an appropriate cognitive load measurement method for conducting accurate usability evaluations. △ Less

Submitted 18 February, 2024; originally announced February 2024.

arXiv:2401.15022 [pdf]

doi 10.1038/s44303-024-00020-8

Applications of artificial intelligence in the analysis of histopathology images of gliomas: a review

Authors: Jan-Philipp Redlich, Friedrich Feuerhake, Joachim Weis, Nadine S. Schaadt, Sarah Teuber-Hanselmann, Christoph Buck, Sabine Luttmann, Andrea Eberle, Stefan Nikolin, Arno Appenzeller, Andreas Portmann, André Homeyer

Abstract: In recent years, the diagnosis of gliomas has become increasingly complex. Analysis of glioma histopathology images using artificial intelligence (AI) offers new opportunities to support diagnosis and outcome prediction. To give an overview of the current state of research, this review examines 70 publicly available research studies that have proposed AI-based methods for whole-slide histopatholog… ▽ More In recent years, the diagnosis of gliomas has become increasingly complex. Analysis of glioma histopathology images using artificial intelligence (AI) offers new opportunities to support diagnosis and outcome prediction. To give an overview of the current state of research, this review examines 70 publicly available research studies that have proposed AI-based methods for whole-slide histopathology images of human gliomas, covering the diagnostic tasks of subty** (16/70), grading (23/70), molecular marker prediction (13/70), and survival prediction (27/70). All studies were reviewed with regard to methodological aspects as well as clinical applicability. It was found that the focus of current research is the assessment of hematoxylin and eosin-stained tissue sections of adult-type diffuse gliomas. The majority of studies (49/70) are based on the publicly available glioblastoma and low-grade glioma datasets from The Cancer Genome Atlas (TCGA) and only a few studies employed other datasets in isolation (10/70) or in addition to the TCGA datasets (11/70). Current approaches mostly rely on convolutional neural networks (53/70) for analyzing tissue at 20x magnification (30/70). A new field of research is the integration of clinical data, omics data, or magnetic resonance imaging (27/70). So far, AI-based methods have achieved promising results, but are not yet used in real clinical settings. Future work should focus on the independent validation of methods on larger, multi-site datasets with high-quality and up-to-date clinical and molecular pathology annotations to demonstrate routine applicability. △ Less

Submitted 5 February, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

Journal ref: npj Imaging 2024

arXiv:2401.05450 [pdf]

Reorienting Learning Game Design in Design-Based Research: a Case Study

Authors: Nadine Mandran, Estelle Prior, Eric Sanchez, Mathieu Vermeulen

Abstract: One of the main difficulties remains the collaboration between the various experts involved in designing the Learning Games (LG). Our literature review focuses on the pitfalls and principles that have been identified by various authors in learning games design. Based on this review, a prototype was designed to support the LG design process and to study more precisely the collaboration between acto… ▽ More One of the main difficulties remains the collaboration between the various experts involved in designing the Learning Games (LG). Our literature review focuses on the pitfalls and principles that have been identified by various authors in learning games design. Based on this review, a prototype was designed to support the LG design process and to study more precisely the collaboration between actors (teachers, researchers, game designers, data analyst and computer scientist). Indeed, according to the state of the art, the skills and knowledge involved in design are difficult to integrate. It has been tested in a real-world scenario for designing learning games to teach algorithmic. Through participant observation in thirty-three workshops involving nine experts, we were able to identify recurring pitfalls as we applied the recommendations in the literature. The analysis of these workshops led to propose eight principles aimed at facilitating the collaboration between the learning games design process and re-evaluating research on its. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2311.13898 [pdf]

HandiMathKey-Device

Authors: Frédéric Vella, Nathalie Dubus, Eloise Grolleau, Marjorie Deleau, Cécile Malet, Christine Gallard, Véronique Ades, Nadine Vigouroux

Abstract: Ty** mathematics is sometimes difficult with text editor functions for students with motor impairment and other associated impairments (visual, cognitive). Based on the HandiMathKey software keyboard, a user-centred design method involving the ecosytem of disabled students was applied to design the HMK-D physical keyboard for mathematical input. We opted for the Stream Deck device because of its… ▽ More Ty** mathematics is sometimes difficult with text editor functions for students with motor impairment and other associated impairments (visual, cognitive). Based on the HandiMathKey software keyboard, a user-centred design method involving the ecosytem of disabled students was applied to design the HMK-D physical keyboard for mathematical input. We opted for the Stream Deck device because of its multimedia features and its appeal to young students to the HMK-D. Preliminary tests with 8 students (5 in secondary school and 3 in high school) shows that HMK-D is highly accepted, accessible and fun for mathematical input by students with impairments. A longitudinal study of the usability and acceptability of HMK-D is planned for the 2023-2024 school year. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: Universal Access in Human-Computer Interaction. HCII 2023, Jul 2023, Copenhagen (Virtual), Denmark

arXiv:2311.13894 [pdf]

A first step towards an ecosystem meta-model for humancentered design in case of disabled users

Authors: Christophe Kolski, Nadine Vigouroux, Yohan Guerrier, Frédéric Vella, Marine Guffroy

Abstract: The involvement of the ecosystem or social environment of the disabled user is considered as very useful and even essential for the human-centered design of assistive technologies. In the era of model-based approaches, the modeling of the ecosystem is therefore to be considered. The first version of a metamodel of ecosystem is proposed. It is illustrated through a first case study. It concerns a p… ▽ More The involvement of the ecosystem or social environment of the disabled user is considered as very useful and even essential for the human-centered design of assistive technologies. In the era of model-based approaches, the modeling of the ecosystem is therefore to be considered. The first version of a metamodel of ecosystem is proposed. It is illustrated through a first case study. It concerns a project aiming at a communication aid for people with cerebral palsy. A conclusion and research perspectives end this paper. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Journal ref: Disab2023 Engineering Interactive Computing Systems for People with Disabilities, Jun 2023, Swansea, United Kingdom

arXiv:2311.13223 [pdf]

doi 10.1007/978-3-031-35681-0_13

Design Recommendations Based on Speech Analysis for Disability-Friendly Interfaces for the Control of a Home Automation Environment

Authors: Nadine Vigouroux, Frédéric Vella, Gaëlle Lepage, Éric Campo

Abstract: The objective of this paper is to describe the study on speech interaction mode for home automation control of equipment by impaired people for an inclusive housing. The study is related to the HIP HOPE project concerning a building of 19 inclusive housing units. 7 participants with different types of disabilities were invited to carry out use cases using voice and touch control. Only the results… ▽ More The objective of this paper is to describe the study on speech interaction mode for home automation control of equipment by impaired people for an inclusive housing. The study is related to the HIP HOPE project concerning a building of 19 inclusive housing units. 7 participants with different types of disabilities were invited to carry out use cases using voice and touch control. Only the results obtained on the voice interaction mode through the Amazon voice assistant are reported here. The results show, according to the type of handicap, the success rates in the speech recognition of the command emitted on the equipment and highlight the errors related to the formulation, the noisy environment, the intelligible speech, the speech segmentation and the bad synchronization of the audio channel opening. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Journal ref: Universal Access in Human-Computer Interaction. HCII 2023, Jul 2023, Copenhagen (Virtual), Denmark. pp.197-211

arXiv:2311.09094 [pdf, other]

Can MusicGen Create Training Data for MIR Tasks?

Authors: Nadine Kroher, Helena Cuesta, Aggelos Pikrakis

Abstract: We are investigating the broader concept of using AI-based generative music systems to generate training data for Music Information Retrieval (MIR) tasks. To kick off this line of work, we ran an initial experiment in which we trained a genre classifier on a fully artificial music dataset created with MusicGen. We constructed over 50 000 genre- conditioned textual descriptions and generated a coll… ▽ More We are investigating the broader concept of using AI-based generative music systems to generate training data for Music Information Retrieval (MIR) tasks. To kick off this line of work, we ran an initial experiment in which we trained a genre classifier on a fully artificial music dataset created with MusicGen. We constructed over 50 000 genre- conditioned textual descriptions and generated a collection of music excerpts that covers five musical genres. Our preliminary results show that the proposed model can learn genre-specific characteristics from artificial music tracks that generalise well to real-world music recordings. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: This is an extended abstract presented at the Late-Breaking / Demo Session of the International Society for Music Information Retrieval Conference (ISMIR) 2023 (Milan, Italy)

arXiv:2311.04780 [pdf, other]

FetMRQC: an open-source machine learning framework for multi-centric fetal brain MRI quality control

Authors: Thomas Sanchez, Oscar Esteban, Yvan Gomez, Alexandre Pron, Mériam Koob, Vincent Dunet, Nadine Girard, Andras Jakab, Elisenda Eixarch, Guillaume Auzias, Meritxell Bach Cuadra

Abstract: Fetal brain MRI is becoming an increasingly relevant complement to neurosonography for perinatal diagnosis, allowing fundamental insights into fetal brain development throughout gestation. However, uncontrolled fetal motion and heterogeneity in acquisition protocols lead to data of variable quality, potentially biasing the outcome of subsequent studies. We present FetMRQC, an open-source machine-l… ▽ More Fetal brain MRI is becoming an increasingly relevant complement to neurosonography for perinatal diagnosis, allowing fundamental insights into fetal brain development throughout gestation. However, uncontrolled fetal motion and heterogeneity in acquisition protocols lead to data of variable quality, potentially biasing the outcome of subsequent studies. We present FetMRQC, an open-source machine-learning framework for automated image quality assessment and quality control that is robust to domain shifts induced by the heterogeneity of clinical data. FetMRQC extracts an ensemble of quality metrics from unprocessed anatomical MRI and combines them to predict experts' ratings using random forests. We validate our framework on a pioneeringly large and diverse dataset of more than 1600 manually rated fetal brain T2-weighted images from four clinical centers and 13 different scanners. Our study shows that FetMRQC's predictions generalize well to unseen data while being interpretable. FetMRQC is a step towards more robust fetal brain neuroimaging, which has the potential to shed new insights on the develo** human brain. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 22 pages, 10 Figures

arXiv:2310.12956 [pdf, other]

Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

Authors: David T. Hoffmann, Simon Schrodi, Jelena Bratulić, Nadine Behrmann, Volker Fischer, Thomas Brox

Abstract: In this work, we study rapid improvements of the training loss in transformers when being confronted with multi-step decision tasks. We found that transformers struggle to learn the intermediate task and both training and validation loss saturate for hundreds of epochs. When transformers finally learn the intermediate task, they do this rapidly and unexpectedly. We call these abrupt improvements E… ▽ More In this work, we study rapid improvements of the training loss in transformers when being confronted with multi-step decision tasks. We found that transformers struggle to learn the intermediate task and both training and validation loss saturate for hundreds of epochs. When transformers finally learn the intermediate task, they do this rapidly and unexpectedly. We call these abrupt improvements Eureka-moments, since the transformer appears to suddenly learn a previously incomprehensible concept. We designed synthetic tasks to study the problem in detail, but the leaps in performance can be observed also for language modeling and in-context learning (ICL). We suspect that these abrupt transitions are caused by the multi-step nature of these tasks. Indeed, we find connections and show that ways to improve on the synthetic multi-step tasks can be used to improve the training of language modeling and ICL. Using the synthetic data we trace the problem back to the Softmax function in the self-attention block of transformers and show ways to alleviate the problem. These fixes reduce the required number of training steps, lead to higher likelihood to learn the intermediate task, to higher final accuracy and training becomes more robust to hyper-parameters. △ Less

Submitted 6 June, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: Accepted at ICML 2024

arXiv:2310.02932 [pdf, other]

Assessing Large Language Models on Climate Information

Authors: Jannis Bulian, Mike S. Schäfer, Afra Amini, Heidi Lam, Massimiliano Ciaramita, Ben Gaiarin, Michelle Chen Hübscher, Christian Buck, Niels G. Mede, Markus Leippold, Nadine Strauß

Abstract: As Large Language Models (LLMs) rise in popularity, it is necessary to assess their capability in critically relevant domains. We present a comprehensive evaluation framework, grounded in science communication research, to assess LLM responses to questions about climate change. Our framework emphasizes both presentational and epistemological adequacy, offering a fine-grained analysis of LLM genera… ▽ More As Large Language Models (LLMs) rise in popularity, it is necessary to assess their capability in critically relevant domains. We present a comprehensive evaluation framework, grounded in science communication research, to assess LLM responses to questions about climate change. Our framework emphasizes both presentational and epistemological adequacy, offering a fine-grained analysis of LLM generations spanning 8 dimensions and 30 issues. Our evaluation task is a real-world example of a growing number of challenging problems where AI can complement and lift human performance. We introduce a novel protocol for scalable oversight that relies on AI Assistance and raters with relevant education. We evaluate several recent LLMs on a set of diverse climate questions. Our results point to a significant gap between surface and epistemological qualities of LLMs in the realm of climate communication. △ Less

Submitted 28 May, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Journal ref: Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

arXiv:2308.06970 [pdf, other]

doi 10.4204/EPTCS.382.1

ProofBuddy: A Proof Assistant for Learning and Monitoring

Authors: Nadine Karsten, Frederik Krogsdal Jacobsen, Kim Jana Eiken, Uwe Nestmann, Jørgen Villadsen

Abstract: Proof competence, i.e. the ability to write and check (mathematical) proofs, is an important skill in Computer Science, but for many students it represents a difficult challenge. The main issues are the correct use of formal language and the ascertainment of whether proofs, especially the students' own, are complete and correct. Many authors have suggested using proof assistants to assist in teach… ▽ More Proof competence, i.e. the ability to write and check (mathematical) proofs, is an important skill in Computer Science, but for many students it represents a difficult challenge. The main issues are the correct use of formal language and the ascertainment of whether proofs, especially the students' own, are complete and correct. Many authors have suggested using proof assistants to assist in teaching proof competence, but the efficacy of the approach is unclear. To improve the state of affairs, we introduce ProofBuddy: a web-based tool using the Isabelle proof assistant which enables researchers to conduct studies of the efficacy of approaches to using proof assistants in education by collecting fine-grained data about the way students interact with proof assistants. We have performed a preliminary usability study of ProofBuddy at the Technical University of Denmark. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: In Proceedings TFPIE 2023, arXiv:2308.06110

ACM Class: K.3.2; D.1.1; F.3.1; D.2.4; D.2.6; G.4; H.5.2

Journal ref: EPTCS 382, 2023, pp. 1-21

arXiv:2308.00420 [pdf, other]

The complexity of the Timetable-Based Railway Network Design Problem

Authors: Nadine Friesen, Tim Sander, Karl Nachtigall, Nils Nießen

Abstract: Because of the long planning periods and their long life cycle, railway infrastructure has to be outlined long ahead. At the present, the infrastructure is designed while only little about the intended operation is known. Hence, the timetable and the operation are adjusted to the infrastructure. Since space, time and money for extension measures of railway infrastructure are limited, each modifica… ▽ More Because of the long planning periods and their long life cycle, railway infrastructure has to be outlined long ahead. At the present, the infrastructure is designed while only little about the intended operation is known. Hence, the timetable and the operation are adjusted to the infrastructure. Since space, time and money for extension measures of railway infrastructure are limited, each modification has to be done carefully and long lasting and should be appropriate for the future unknown demand. To take this into account, we present the robust network design problem for railway infrastructure under capacity constraints and uncertain timetables. Here, we plan the required expansion measures for an uncertain long-term timetable. We show that this problem is NP-hard even when restricted to bipartite graphs and very simple timetables and present easier solvable special cases. This problem corresponds to the fixed-charge network design problem where the expansion costs are minimized such that the timetable is conductible. We model this problem by an integer linear program using time expanded networks. To incorporate the uncertainty of the future timetable, we use a scenario-based approach. We define scenarios with individual departure and arrival times and optional trains. The network is then optimized such that a given percentage of the scenarios can be operated while minimizing the expansion costs and potential penalty costs for not scheduled optional trains. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2307.02916 [pdf, other]

The impact of an employee's psychological contract breach on compliance with information security policies: intrinsic and extrinsic motivation

Authors: Daeun Lee, Har**der Singh Lallie, Nadine Michaelides

Abstract: Despite the rapid rise in social engineering attacks, not all employees are as compliant with information security policies (ISPs) to the extent that organisations expect them to be. ISP non-compliance is caused by a variety of psychological motivation. This study investigates the effect of psychological contract breach (PCB) of employees on ISP compliance intention (ICI) by dividing them into int… ▽ More Despite the rapid rise in social engineering attacks, not all employees are as compliant with information security policies (ISPs) to the extent that organisations expect them to be. ISP non-compliance is caused by a variety of psychological motivation. This study investigates the effect of psychological contract breach (PCB) of employees on ISP compliance intention (ICI) by dividing them into intrinsic and extrinsic motivation using the theory of planned behaviour (TPB) and the general deterrence theory (GDT). Data analysis from UK employees (\textit{n=206}) showed that the higher the PCB, the lower the ICI. The study also found that PCBs significantly reduced intrinsic motivation (attitude and perceived fairness) for ICI, whereas PCBs did not moderate the relationship between extrinsic motivation (sanction severity and sanctions certainty) and ICI. As a result, this study successfully addresses the risks of PCBs in the field of IS security and proposes effective solutions for employees with high PCBs. △ Less

Submitted 6 July, 2023; originally announced July 2023.

Comments: 27 pages, 3 figures

Journal ref: Cognition, Technology & Work, pp.1-17 (2023)

arXiv:2306.15694 [pdf, other]

Scenario-based Failure Analysis of Product Systems and their Environment

Authors: Tim Maurice Julitz, Nadine Schlüter, Manuel Löwer

Abstract: During the usage phase, a technical product system is in permanent interaction with its environment. This interaction can lead to failures that significantly endanger the safety of the user and negatively affect the quality and reliability of the product. Conventional methods of failure analysis focus on the technical product system. The interaction of the product with its environment in the usage… ▽ More During the usage phase, a technical product system is in permanent interaction with its environment. This interaction can lead to failures that significantly endanger the safety of the user and negatively affect the quality and reliability of the product. Conventional methods of failure analysis focus on the technical product system. The interaction of the product with its environment in the usage phase is not sufficiently considered, resulting in undetected potential failures of the product that lead to complaints. For this purpose, a methodology for failure identification is developed, which is continuously improved through product usage scenarios. The use cases are modelled according to a systems engineering approach with four views. The linking of the product system, physical effects, events and environmental factors enable the analysis of fault chains. These four parameters are subject to great complexity and must be systematically analysed using databases and expert knowledge. The scenarios are continuously updated by field data and complaints. The new approach can identify potential failures in a more systematic and holistic way. Complaints provide direct input on the scenarios. Unknown, previously unrecognized events can be systematically identified through continuous improvement. The complexity of the relationship between the product system and its environmental factors can thus be adequately taken into account in product development. Keywords: failure analysis, methodology, product development, systems engineering, scenario analysis, scenario improvement, environmental factors, product environment, continuous improvement. △ Less

Submitted 25 June, 2023; originally announced June 2023.

arXiv:2306.14035 [pdf, other]

Thinking Like an Annotator: Generation of Dataset Labeling Instructions

Authors: Nadine Chang, Francesco Ferroni, Michael J. Tarr, Martial Hebert, Deva Ramanan

Abstract: Large-scale datasets are essential to modern day deep learning. Advocates argue that understanding these methods requires dataset transparency (e.g. "dataset curation, motivation, composition, collection process, etc..."). However, almost no one has suggested the release of the detailed definitions and visual category examples provided to annotators - information critical to understanding the stru… ▽ More Large-scale datasets are essential to modern day deep learning. Advocates argue that understanding these methods requires dataset transparency (e.g. "dataset curation, motivation, composition, collection process, etc..."). However, almost no one has suggested the release of the detailed definitions and visual category examples provided to annotators - information critical to understanding the structure of the annotations present in each dataset. These labels are at the heart of public datasets, yet few datasets include the instructions that were used to generate them. We introduce a new task, Labeling Instruction Generation, to address missing publicly available labeling instructions. In Labeling Instruction Generation, we take a reasonably annotated dataset and: 1) generate a set of examples that are visually representative of each category in the dataset; 2) provide a text label that corresponds to each of the examples. We introduce a framework that requires no model training to solve this task and includes a newly created rapid retrieval system that leverages a large, pre-trained vision and language model. This framework acts as a proxy to human annotators that can help to both generate a final labeling instruction set and evaluate its quality. Our framework generates multiple diverse visual and text representations of dataset categories. The optimized instruction set outperforms our strongest baseline across 5 folds by 7.06 mAP for NuImages and 12.9 mAP for COCO. △ Less

Submitted 24 June, 2023; originally announced June 2023.

arXiv:2305.11960 [pdf, other]

doi 10.1109/SMC42975.2020.9282914

Toward Mixed Reality Hybrid Objects with IoT Avatar Agents

Authors: Alexis Morris, Jie Guan, Nadine Lessio, Yiyi Shao

Abstract: The internet-of-things (IoT) refers to the growing field of interconnected pervasive computing devices and the networking that supports smart, embedded applications. The IoT has multiple human-computer interaction challenges due to its many formats and interlinked components, and central to these is the need to provide sensory information and situational context pertaining to users in a more human… ▽ More The internet-of-things (IoT) refers to the growing field of interconnected pervasive computing devices and the networking that supports smart, embedded applications. The IoT has multiple human-computer interaction challenges due to its many formats and interlinked components, and central to these is the need to provide sensory information and situational context pertaining to users in a more human-friendly, easily understandable format. This work addresses this by applying mixed reality toward expressing the underlying behaviors and states internal to IoT devices and IoT-enabled objects. It extends the authors' previous research on IoT Avatars (mixed reality character representations of physical IoT devices), presenting a new head-mounted display framework and interconnection architecture. This contributes i) an exploration of mixed reality for smart spaces, ii) an approach toward expressive avatar behaviors using fuzzy inference, and iii) an early functional prototype of a hybrid physical and mixed reality IoT-enabled object. This approach is a step toward new information presentation, interaction, and engagement capabilities for smart devices and environments. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2305.03041 [pdf, other]

Are VAEs Bad at Reconstructing Molecular Graphs?

Authors: Hagen Muenkler, Hubert Misztela, Michal Pikusa, Marwin Segler, Nadine Schneider, Krzysztof Maziarz

Abstract: Many contemporary generative models of molecules are variational auto-encoders of molecular graphs. One term in their training loss pertains to reconstructing the input, yet reconstruction capabilities of state-of-the-art models have not yet been thoroughly compared on a large and chemically diverse dataset. In this work, we show that when several state-of-the-art generative models are evaluated u… ▽ More Many contemporary generative models of molecules are variational auto-encoders of molecular graphs. One term in their training loss pertains to reconstructing the input, yet reconstruction capabilities of state-of-the-art models have not yet been thoroughly compared on a large and chemically diverse dataset. In this work, we show that when several state-of-the-art generative models are evaluated under the same conditions, their reconstruction accuracy is surprisingly low, worse than what was previously reported on seemingly harder datasets. However, we show that improving reconstruction does not directly lead to better sampling or optimization performance. Failed reconstructions from the MoLeR model are usually similar to the inputs, assembling the same motifs in a different way, and possess similar chemical properties such as solubility. Finally, we show that the input molecule and its failed reconstruction are usually mapped by the different encoders to statistically distinguishable posterior distributions, hinting that posterior collapse may not fully explain why VAEs are bad at reconstructing molecular graphs. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: Published at the ELLIS Workshop on Machine Learning for Molecules (ML4Molecules 2022)

arXiv:2304.03639 [pdf, other]

Theoretical Conditions and Empirical Failure of Bracket Counting on Long Sequences with Linear Recurrent Networks

Authors: Nadine El-Naggar, Pranava Madhyastha, Tillman Weyde

Abstract: Previous work has established that RNNs with an unbounded activation function have the capacity to count exactly. However, it has also been shown that RNNs are challenging to train effectively and generally do not learn exact counting behaviour. In this paper, we focus on this problem by studying the simplest possible RNN, a linear single-cell network. We conduct a theoretical analysis of linear R… ▽ More Previous work has established that RNNs with an unbounded activation function have the capacity to count exactly. However, it has also been shown that RNNs are challenging to train effectively and generally do not learn exact counting behaviour. In this paper, we focus on this problem by studying the simplest possible RNN, a linear single-cell network. We conduct a theoretical analysis of linear RNNs and identify conditions for the models to exhibit exact counting behaviour. We provide a formal proof that these conditions are necessary and sufficient. We also conduct an empirical analysis using tasks involving a Dyck-1-like Balanced Bracket language under two different settings. We observe that linear RNNs generally do not meet the necessary and sufficient conditions for counting behaviour when trained with the standard approach. We investigate how varying the length of training sequences and utilising different target classes impacts model behaviour during training and the ability of linear RNN models to effectively approximate the indicator conditions. △ Less

Submitted 7 April, 2023; originally announced April 2023.

Comments: 17th Conference of the European Chapter of the Association for Computational Linguistics Student Research Workshop (EACL 2023 SRW)

arXiv:2301.06078 [pdf]

Training one model to detect heart and lung sound events from single point auscultations

Authors: Leander Melms, Robert R. Ilesan, Ulrich Köhler, Olaf Hildebrandt, Regina Conradt, Jens Eckstein, Cihan Atila, Sami Matrood, Bernhard Schieffer, Jürgen R. Schaefer, Tobias Müller, Julius Obergassel, Nadine Schlicker, Martin C. Hirsch

Abstract: Objective: This work proposes a semi-supervised training approach for detecting lung and heart sounds simultaneously with only one trained model and in invariance to the auscultation point. Methods: We use open-access data from the 2016 Physionet/CinC Challenge, the 2022 George Moody Challenge, and from the lung sound database HF_V1. We first train specialist single-task models using foreground gr… ▽ More Objective: This work proposes a semi-supervised training approach for detecting lung and heart sounds simultaneously with only one trained model and in invariance to the auscultation point. Methods: We use open-access data from the 2016 Physionet/CinC Challenge, the 2022 George Moody Challenge, and from the lung sound database HF_V1. We first train specialist single-task models using foreground ground truth (GT) labels from different auscultation databases to identify background sound events in the respective lung and heart auscultation databases. The pseudo-labels generated in this way were combined with the ground truth labels in a new training iteration, such that a new model was subsequently trained to detect foreground and background signals. Benchmark tests ensured that the newly trained model could detect both, lung, and heart sound events in different auscultation sites without regressing on the original task. We also established hand-validated labels for the respective background signal in heart and lung sound auscultations to evaluate the models. Results: In this work, we report for the first time results for i) a multi-class prediction for lung sound events and ii) for simultaneous detection of heart and lung sound events and achieve competitive results using only one model. The combined multi-task model regressed slightly in heart sound detection and gained significantly in lung sound detection accuracy with an overall macro F1 score of 39.2% over six classes, representing a 6.7% improvement over the single-task baseline models. Conclusion/Significance: To the best of our knowledge, this is the first approach developed to date for measuring heart and lung sound events invariant to both, the auscultation site and capturing device. Hence, our model is capable of performing lung and heart sound detection from any auscultation location. △ Less

Submitted 15 January, 2023; originally announced January 2023.

Comments: 14 pages, 8 figures

arXiv:2211.16429 [pdf, other]

Exploring the Long-Term Generalization of Counting Behavior in RNNs

Authors: Nadine El-Naggar, Pranava Madhyastha, Tillman Weyde

Abstract: In this study, we investigate the generalization of LSTM, ReLU and GRU models on counting tasks over long sequences. Previous theoretical work has established that RNNs with ReLU activation and LSTMs have the capacity for counting with suitable configuration, while GRUs have limitations that prevent correct counting over longer sequences. Despite this and some positive empirical results for LSTMs… ▽ More In this study, we investigate the generalization of LSTM, ReLU and GRU models on counting tasks over long sequences. Previous theoretical work has established that RNNs with ReLU activation and LSTMs have the capacity for counting with suitable configuration, while GRUs have limitations that prevent correct counting over longer sequences. Despite this and some positive empirical results for LSTMs on Dyck-1 languages, our experimental results show that LSTMs fail to learn correct counting behavior for sequences that are significantly longer than in the training data. ReLUs show much larger variance in behavior and in most cases worse generalization. The long sequence generalization is empirically related to validation loss, but reliable long sequence generalization seems not practically achievable through backpropagation with current techniques. We demonstrate different failure modes for LSTMs, GRUs and ReLUs. In particular, we observe that the saturation of activation functions in LSTMs and the correct weight setting for ReLUs to generalize counting behavior are not achieved in standard training regimens. In summary, learning generalizable counting behavior is still an open problem and we discuss potential approaches for further research. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: Published in I Can't Believe It's Not Better: Understanding Deep Learning Through Empirical Falsification Workshop at NeurIPS 2022

arXiv:2211.13079 [pdf]

User Centred Method to Design a Platform to Design Augmentative and Alternative Communication Assistive Technologies

Authors: Frédéric Vella, Flavien Clastres-Babou, Nadine Vigouroux, Philippe Truillet, Charline Calmels, Caroline Mercadier, Karine Gigaud, Margot Issanchou, Kristina Gourinovitch, Anne Garaix

Abstract: We describe a co-design approach to design the online WebSoKeyTo used to design AAC. This co-design was carried out between a team of therapists and a team of human-computer interaction researchers. Our approach begins with the use and evaluation of an existing SoKeyTo AAC design application. This step was essential in the awareness and definition of the needs by the therapists and in the understa… ▽ More We describe a co-design approach to design the online WebSoKeyTo used to design AAC. This co-design was carried out between a team of therapists and a team of human-computer interaction researchers. Our approach begins with the use and evaluation of an existing SoKeyTo AAC design application. This step was essential in the awareness and definition of the needs by the therapists and in the understanding of the poor usability scores of SoKeyTo by the researchers. We then describe the various phases (focus group, brainstorming, prototy**) with the co-design choices retained. An evaluation of WebSoKeyTo is in progress. △ Less

Submitted 23 November, 2022; originally announced November 2022.

Journal ref: HCI INTERNATIONAL 2022 24TH INTERNATIONAL CONFERENCE ON HUMAN-COMPUTER INTERACTION, Jun 2022, Virtual conference, France. pp.559-571, \&\#x27E8;10.1007/978-3-031-17902-0\_40\&\#x27E9

arXiv:2211.13078 [pdf]

Participation of Stakeholder in the Design of a Conception Application of Augmentative and Alternative Communication

Authors: Frédéric Vella, Flavien Clastres-Babou, Frédéric Vella, Nadine Vigouroux, Philippe Truillet, Nadine Vigouroux, Charline Calmels, Caroline Mercadier, Karine Gigaud, Margot Issanchou, Kristina Gourinovitch, Anne Garaix

Abstract: The objective of this paper is to describe the implication of an interdisciplinary team involved during a user-centered design methodology to design the platform (WebSoKeyTo) that meets the needs of therapists to design augmentative and alternative communication (AAC) aids for disabled users. We describe the processes of the design process and the role of the various actors (therapists and human c… ▽ More The objective of this paper is to describe the implication of an interdisciplinary team involved during a user-centered design methodology to design the platform (WebSoKeyTo) that meets the needs of therapists to design augmentative and alternative communication (AAC) aids for disabled users. We describe the processes of the design process and the role of the various actors (therapists and human computer researchers) in the various phases of the process. Finally, we analyze a satisfaction scale of the therapists on their participation in the codesign process. This study demonstrates the interest in extending the design actors to other therapists and caregivers (professional and family) in the daily life of people with disabilities. △ Less

Submitted 23 November, 2022; originally announced November 2022.

Journal ref: ICCHP-AAATE 2022 Open Access Compendium ''Assistive Technology, Accessibility and (e)Inclusion'', Jul 2022, Lecco, Italy. \&\#x27E8;10.35011/icchp-aaate22-p1-17\&\#x27E9

arXiv:2211.13058 [pdf]

IDEALI: intuitively localising connected devices in order to support autonomy

Authors: Frédéric Vella, Réjane Dalcé, Antonio Serpa, Thierry Val, Adrien van Den Bossche, Frédéric Vella, Nadine Vigouroux

Abstract: The ability to localise a smart device is very useful to visually or cognitively impaired people. Localisation-capable technologies are becoming more readily available as off-the-shelf components. In this paper, we highlight the need for such a service in the field of health and autonomy, especially for disabled people. We introduce a model for Semantic Position Description (SPD) (e.g. "The pill o… ▽ More The ability to localise a smart device is very useful to visually or cognitively impaired people. Localisation-capable technologies are becoming more readily available as off-the-shelf components. In this paper, we highlight the need for such a service in the field of health and autonomy, especially for disabled people. We introduce a model for Semantic Position Description (SPD) (e.g. "The pill organiser in on the kitchen table") as well as various algorithms that transform raw distance estimations to SPD related to proximity, alignment and room identification. Two of these algorithms are evaluated using the LocURa4IoT testbed. The results are compared to the output of a pre-experiment involving ten human participants in the Maison Intelligente de Blagnac. The two studies indicate that both approaches converge up to 90% of the time. . △ Less

Submitted 23 November, 2022; originally announced November 2022.

arXiv:2211.13042 [pdf]

Usability Study of Tactile and Voice Interaction Modes by People with Disabilities for Home Automation Controls

Authors: Nadine Vigouroux, Frédéric Vella, Gaëlle Lepage, Eric Campo

Abstract: This paper presents a comparative usability study on tactile and vocal interaction modes for home automation control of equipment at home for different profiles of disabled people. The study is related to the HIP HOPE project concerning the construction of 19 inclusive housing in the Toulouse metropolitan area in France. The experimentation took place in a living lab with 7 different disabled peop… ▽ More This paper presents a comparative usability study on tactile and vocal interaction modes for home automation control of equipment at home for different profiles of disabled people. The study is related to the HIP HOPE project concerning the construction of 19 inclusive housing in the Toulouse metropolitan area in France. The experimentation took place in a living lab with 7 different disabled people who realize realistic use cases. The USE and UEQ questionnaires were selected as usability tools. The first results show that both interfaces are easy to learn but that usefulness and ease of use dimensions need to be improved. This study shows that there is real need for multimodality between touch and voice interaction to control the smart home. This study also shows that there is need to adapt the interface and the environment to the person's disability. △ Less

Submitted 23 November, 2022; originally announced November 2022.

Journal ref: ICCHP-AAATE 2022 Open Access Compendium ''Assistive Technology, Accessibility and (e)Inclusion'', Jul 2022, Lecco, Italy. pp.139-147, \&\#x27E8;10.1007/978-3-031-08645-8\_17\&\#x27E9

arXiv:2210.09021 [pdf, other]

doi 10.1117/12.2624609

Histopathological Image Classification based on Self-Supervised Vision Transformer and Weak Labels

Authors: Ahmet Gokberk Gul, Oezdemir Cetin, Christoph Reich, Tim Prangemeier, Nadine Flinner, Heinz Koeppl

Abstract: Whole Slide Image (WSI) analysis is a powerful method to facilitate the diagnosis of cancer in tissue samples. Automating this diagnosis poses various issues, most notably caused by the immense image resolution and limited annotations. WSIs commonly exhibit resolutions of 100Kx100K pixels. Annotating cancerous areas in WSIs on the pixel level is prohibitively labor-intensive and requires a high le… ▽ More Whole Slide Image (WSI) analysis is a powerful method to facilitate the diagnosis of cancer in tissue samples. Automating this diagnosis poses various issues, most notably caused by the immense image resolution and limited annotations. WSIs commonly exhibit resolutions of 100Kx100K pixels. Annotating cancerous areas in WSIs on the pixel level is prohibitively labor-intensive and requires a high level of expert knowledge. Multiple instance learning (MIL) alleviates the need for expensive pixel-level annotations. In MIL, learning is performed on slide-level labels, in which a pathologist provides information about whether a slide includes cancerous tissue. Here, we propose Self-ViT-MIL, a novel approach for classifying and localizing cancerous areas based on slide-level annotations, eliminating the need for pixel-wise annotated training data. Self-ViT- MIL is pre-trained in a self-supervised setting to learn rich feature representation without relying on any labels. The recent Vision Transformer (ViT) architecture builds the feature extractor of Self-ViT-MIL. For localizing cancerous regions, a MIL aggregator with global attention is utilized. To the best of our knowledge, Self-ViT- MIL is the first approach to introduce self-supervised ViTs in MIL-based WSI analysis tasks. We showcase the effectiveness of our approach on the common Camelyon16 dataset. Self-ViT-MIL surpasses existing state-of-the-art MIL-based approaches in terms of accuracy and area under the curve (AUC). △ Less

Submitted 17 April, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

Journal ref: Proc. SPIE 12039, Medical Imaging 2022: Digital and Computational Pathology, 120391O (4 April 2022)

arXiv:2209.13598 [pdf, other]

Computing Melodic Templates in Oral Music Traditions

Authors: Sergey Bereg, José-Miguel Díaz-Báñez, Nadine Kroher, Inmaculada Ventura

Abstract: The term melodic template or skeleton refers to a basic melody which is subject to variation during a music performance. In many oral music tradition, these templates are implicitly passed throughout generations without ever being formalized in a score. In this work, we introduce a new geometric optimization problem, the spanning tube problem, to approximate a melodic template for a set of labeled… ▽ More The term melodic template or skeleton refers to a basic melody which is subject to variation during a music performance. In many oral music tradition, these templates are implicitly passed throughout generations without ever being formalized in a score. In this work, we introduce a new geometric optimization problem, the spanning tube problem, to approximate a melodic template for a set of labeled performance transcriptions corresponding to an specific style in oral music traditions. Given a set of $n$ piecewise linear functions, we solve the problem of finding a continuous function, $f^*$, and a minimum value, $\varepsilon^*$, such that, the vertical segment of length $2\varepsilon^*$ centered at $(x,f^*(x))$ intersects at least $p$ functions ($p\leq n$). The method explored here also provide a novel tool for quantitatively assess the amount of melodic variation which occurs across performances. △ Less

Submitted 27 September, 2022; originally announced September 2022.

arXiv:2209.10970 [pdf, other]

Maths, Computation and Flamenco: overview and challenges

Authors: José-Miguel Díaz-Báñez, Nadine Kroher

Abstract: Flamenco is a rich performance-oriented art music genre from Southern Spain which attracts a growing community of aficionados around the globe. Due to its improvisational and expressive nature, its unique musical characteristics, and the fact that the genre is largely undocumented, flamenco poses a number of interesting mathematical and computational challenges. Most existing approaches in Musical… ▽ More Flamenco is a rich performance-oriented art music genre from Southern Spain which attracts a growing community of aficionados around the globe. Due to its improvisational and expressive nature, its unique musical characteristics, and the fact that the genre is largely undocumented, flamenco poses a number of interesting mathematical and computational challenges. Most existing approaches in Musical Information Retrieval (MIR) were developed in the context of popular or classical music and do often not generalize well to non-Western music traditions, in particular when the underlying music theoretical assumptions do not hold for these genres. Over the recent decade, a number of computational problems related to the automatic analysis of flamenco music have been defined and several methods addressing a variety of musical aspects have been proposed. This paper provides an overview of the challenges which arise in the context of computational analysis of flamenco music and outlines an overview of existing approaches. △ Less

Submitted 22 September, 2022; originally announced September 2022.

arXiv:2209.04346 [pdf, other]

doi 10.1109/ICRA48891.2023.10161472

Model- and Acceleration-based Pursuit Controller for High-Performance Autonomous Racing

Authors: Jonathan Becker, Nadine Imholz, Luca Schwarzenbach, Edoardo Ghignone, Nicolas Baumann, Michele Magno

Abstract: Autonomous racing is a research field gaining large popularity, as it pushes autonomous driving algorithms to their limits and serves as a catalyst for general autonomous driving. For scaled autonomous racing platforms, the computational constraint and complexity often limit the use of Model Predictive Control (MPC). As a consequence, geometric controllers are the most frequently deployed controll… ▽ More Autonomous racing is a research field gaining large popularity, as it pushes autonomous driving algorithms to their limits and serves as a catalyst for general autonomous driving. For scaled autonomous racing platforms, the computational constraint and complexity often limit the use of Model Predictive Control (MPC). As a consequence, geometric controllers are the most frequently deployed controllers. They prove to be performant while yielding implementation and operational simplicity. Yet, they inherently lack the incorporation of model dynamics, thus limiting the race car to a velocity domain where tire slip can be neglected. This paper presents Model- and Acceleration-based Pursuit (MAP) a high-performance model-based trajectory tracking algorithm that preserves the simplicity of geometric approaches while leveraging tire dynamics. The proposed algorithm allows accurate tracking of a trajectory at unprecedented velocities compared to State-of-the-Art (SotA) geometric controllers. The MAP controller is experimentally validated and outperforms the reference geometric controller four-fold in terms of lateral tracking error, yielding a tracking error of 0.055m at tested speeds up to 11m/s. △ Less

Submitted 7 July, 2023; v1 submitted 9 September, 2022; originally announced September 2022.

Comments: 6 pages, 6 figures, 1 table

Journal ref: 2023 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2209.00638 [pdf, other]

Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation

Authors: Nadine Behrmann, S. Alireza Golestaneh, Zico Kolter, Juergen Gall, Mehdi Noroozi

Abstract: This paper introduces a unified framework for video action segmentation via sequence to sequence (seq2seq) translation in a fully and timestamp supervised setup. In contrast to current state-of-the-art frame-level prediction methods, we view action segmentation as a seq2seq translation task, i.e., map** a sequence of video frames to a sequence of action segments. Our proposed method involves a s… ▽ More This paper introduces a unified framework for video action segmentation via sequence to sequence (seq2seq) translation in a fully and timestamp supervised setup. In contrast to current state-of-the-art frame-level prediction methods, we view action segmentation as a seq2seq translation task, i.e., map** a sequence of video frames to a sequence of action segments. Our proposed method involves a series of modifications and auxiliary loss functions on the standard Transformer seq2seq translation model to cope with long input sequences opposed to short output sequences and relatively few videos. We incorporate an auxiliary supervision signal for the encoder via a frame-wise loss and propose a separate alignment decoder for an implicit duration prediction. Finally, we extend our framework to the timestamp supervised setting via our proposed constrained k-medoids algorithm to generate pseudo-segmentations. Our proposed framework performs consistently on both fully and timestamp supervised settings, outperforming or competing state-of-the-art on several datasets. Our code is publicly available at https://github.com/boschresearch/UVAST. △ Less

Submitted 11 October, 2022; v1 submitted 1 September, 2022; originally announced September 2022.

Comments: ECCV 2022 (Main Conference)

arXiv:2205.07575 [pdf, other]

An automatic pipeline for atlas-based fetal and neonatal brain segmentation and analysis

Authors: Urru, Andrea, Nakaki, Ayako, Benkarim, Oualid, Crovetto, Francesca, Segales, Laura, Comte, Valentin, Hahner, Nadine, Eixarch, Elisenda, Gratacós, Eduard, Crispi, Fàtima, Piella, Gemma, González Ballester, Miguel A

Abstract: The automatic segmentation of perinatal brain structures in magnetic resonance imaging (MRI) is of utmost importance for the study of brain growth and related complications. While different methods exist for adult and pediatric MRI data, there is a lack for automatic tools for the analysis of perinatal imaging. In this work, a new pipeline for fetal and neonatal segmentation has been developed. We… ▽ More The automatic segmentation of perinatal brain structures in magnetic resonance imaging (MRI) is of utmost importance for the study of brain growth and related complications. While different methods exist for adult and pediatric MRI data, there is a lack for automatic tools for the analysis of perinatal imaging. In this work, a new pipeline for fetal and neonatal segmentation has been developed. We also report the creation of two new fetal atlases, and their use within the pipeline for atlas-based segmentation, based on novel registration methods. The pipeline is also able to extract cortical and pial surfaces and compute features, such as curvature, thickness, sulcal depth, and local gyrification index. Results show that the introduction of the new templates together with our segmentation strategy leads to accurate results when compared to expert annotations, as well as better performances when compared to a reference pipeline (develo** Human Connectome Project (dHCP)), for both early and late-onset fetal brains. △ Less

Submitted 16 May, 2022; originally announced May 2022.

arXiv:2204.11550 [pdf, other]

Speech Detection For Child-Clinician Conversations In Danish For Low-Resource In-The-Wild Conditions: A Case Study

Authors: Sneha Das, Nicole Nadine Lønfeldt, Anne Katrine Pagsberg, Line. H. Clemmensen

Abstract: Use of speech models for automatic speech processing tasks can improve efficiency in the screening, analysis, diagnosis and treatment in medicine and psychiatry. However, the performance of pre-processing speech tasks like segmentation and diarization can drop considerably on in-the-wild clinical data, specifically when the target dataset comprises of atypical speech. In this paper we study the pe… ▽ More Use of speech models for automatic speech processing tasks can improve efficiency in the screening, analysis, diagnosis and treatment in medicine and psychiatry. However, the performance of pre-processing speech tasks like segmentation and diarization can drop considerably on in-the-wild clinical data, specifically when the target dataset comprises of atypical speech. In this paper we study the performance of a pre-trained speech model on a dataset comprising of child-clinician conversations in Danish with respect to the classification threshold. Since we do not have access to sufficient labelled data, we propose few-instance threshold adaptation, wherein we employ the first minutes of the speech conversation to obtain the optimum classification threshold. Through our work in this paper, we learned that the model with default classification threshold performs worse on children from the patient group. Furthermore, the error rates of the model is directly correlated to the severity of diagnosis in the patients. Lastly, our study on few-instance adaptation shows that three-minutes of clinician-child conversation is sufficient to obtain the optimum classification threshold. △ Less

Submitted 25 April, 2022; originally announced April 2022.

Comments: 5 pages. Submitted to Interspeech 2022

arXiv:2203.15536 [pdf, other]

BARC: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information

Authors: Nadine Rueegg, Silvia Zuffi, Konrad Schindler, Michael J. Black

Abstract: Our goal is to recover the 3D shape and pose of dogs from a single image. This is a challenging task because dogs exhibit a wide range of shapes and appearances, and are highly articulated. Recent work has proposed to directly regress the SMAL animal model, with additional limb scale parameters, from images. Our method, called BARC (Breed-Augmented Regression using Classification), goes beyond pri… ▽ More Our goal is to recover the 3D shape and pose of dogs from a single image. This is a challenging task because dogs exhibit a wide range of shapes and appearances, and are highly articulated. Recent work has proposed to directly regress the SMAL animal model, with additional limb scale parameters, from images. Our method, called BARC (Breed-Augmented Regression using Classification), goes beyond prior work in several important ways. First, we modify the SMAL shape space to be more appropriate for representing dog shape. But, even with a better shape model, the problem of regressing dog shape from an image is still challenging because we lack paired images with 3D ground truth. To compensate for the lack of paired data, we formulate novel losses that exploit information about dog breeds. In particular, we exploit the fact that dogs of the same breed have similar body shapes. We formulate a novel breed similarity loss consisting of two parts: One term encourages the shape of dogs from the same breed to be more similar than dogs of different breeds. The second one, a breed classification loss, helps to produce recognizable breed-specific shapes. Through ablation studies, we find that our breed losses significantly improve shape accuracy over a baseline without them. We also compare BARC qualitatively to WLDO with a perceptual study and find that our approach produces dogs that are significantly more realistic. This work shows that a-priori information about genetic similarity can help to compensate for the lack of 3D training data. This concept may be applicable to other animal species or groups of species. Our code is publicly available for research purposes at https://barc.is.tue.mpg.de/. △ Less

Submitted 18 June, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: accepted for publication at CVPR 2022

ACM Class: I.4; I.2

arXiv:2203.14867 [pdf, other]

Continuous Metric Learning For Transferable Speech Emotion Recognition and Embedding Across Low-resource Languages

Authors: Sneha Das, Nicklas Leander Lund, Nicole Nadine Lønfeldt, Anne Katrine Pagsberg, Line H. Clemmensen

Abstract: Speech emotion recognition~(SER) refers to the technique of inferring the emotional state of an individual from speech signals. SERs continue to garner interest due to their wide applicability. Although the domain is mainly founded on signal processing, machine learning, and deep learning, generalizing over languages continues to remain a challenge. However, develo** generalizable and transferab… ▽ More Speech emotion recognition~(SER) refers to the technique of inferring the emotional state of an individual from speech signals. SERs continue to garner interest due to their wide applicability. Although the domain is mainly founded on signal processing, machine learning, and deep learning, generalizing over languages continues to remain a challenge. However, develo** generalizable and transferable models are critical due to a lack of sufficient resources in terms of data and labels for languages beyond the most commonly spoken ones. To improve performance over languages, we propose a denoising autoencoder with semi-supervision using a continuous metric loss based on either activation or valence. The novelty of this work lies in our proposal of continuous metric learning, which is among the first proposals on the topic to the best of our knowledge. Furthermore, to address the lack of activation and valence labels in the transfer datasets, we annotate the signal samples with activation and valence levels corresponding to a dimensional model of emotions, which were then used to evaluate the quality of the embedding over the transfer datasets. We show that the proposed semi-supervised model consistently outperforms the baseline unsupervised method, which is a conventional denoising autoencoder, in terms of emotion classification accuracy as well as correlation with respect to the dimensional variables. Further evaluation of classification accuracy with respect to the reference, a BERT based speech representation model, shows that the proposed method is comparable to the reference method in classifying specific emotion classes at a much lower complexity. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: Preprint of paper accepted to be presented at the Northern Lights Deep Learning Conference (NLDL), 2022. The labels are available at: https://bit.ly/3rg6VsA

arXiv:2203.14865 [pdf, other]

Towards Transferable Speech Emotion Representation: On loss functions for cross-lingual latent representations

Authors: Sneha Das, Nicole Nadine Lønfeldt, Anne Katrine Pagsberg, Line H. Clemmensen

Abstract: In recent years, speech emotion recognition (SER) has been used in wide ranging applications, from healthcare to the commercial sector. In addition to signal processing approaches, methods for SER now also use deep learning techniques which provide transfer learning possibilities. However, generalizing over languages, corpora and recording conditions is still an open challenge. In this work we add… ▽ More In recent years, speech emotion recognition (SER) has been used in wide ranging applications, from healthcare to the commercial sector. In addition to signal processing approaches, methods for SER now also use deep learning techniques which provide transfer learning possibilities. However, generalizing over languages, corpora and recording conditions is still an open challenge. In this work we address this gap by exploring loss functions that aid in transferability, specifically to non-tonal languages. We propose a variational autoencoder (VAE) with KL annealing and a semi-supervised VAE to obtain more consistent latent embedding distributions across data sets. To ensure transferability, the distribution of the latent embedding should be similar across non-tonal languages (data sets). We start by presenting a low-complexity SER based on a denoising-autoencoder, which achieves an unweighted classification accuracy of over 52.09% for four-class emotion classification. This performance is comparable to that of similar baseline methods. Following this, we employ a VAE, the semi-supervised VAE and the VAE with KL annealing to obtain a more regularized latent space. We show that while the DAE has the highest classification accuracy among the methods, the semi-supervised VAE has a comparable classification accuracy and a more consistent latent embedding distribution over data sets. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: Preprint of paper accepted to be presented at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022. Source code at https://bit.ly/34CgkSZ. arXiv admin note: text overlap with arXiv:2105.02055

arXiv:2203.01429

SMTNet: Hierarchical cavitation intensity recognition based on sub-main transfer network

Authors: Yu Sha, Johannes Faber, Shui** Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, Andreas Widl, Kai Zhou

Abstract: With the rapid development of smart manufacturing, data-driven machinery health management has been of growing attention. In situations where some classes are more difficult to be distinguished compared to others and where classes might be organised in a hierarchy of categories, current DL methods can not work well. In this study, a novel hierarchical cavitation intensity recognition framework usi… ▽ More With the rapid development of smart manufacturing, data-driven machinery health management has been of growing attention. In situations where some classes are more difficult to be distinguished compared to others and where classes might be organised in a hierarchy of categories, current DL methods can not work well. In this study, a novel hierarchical cavitation intensity recognition framework using Sub-Main Transfer Network, termed SMTNet, is proposed to classify acoustic signals of valve cavitation. SMTNet model outputs multiple predictions ordered from coarse to fine along a network corresponding to a hierarchy of target cavitation states. Firstly, a data augmentation method based on Sliding Window with Fast Fourier Transform (Swin-FFT) is developed to solve few-shot problem. Secondly, a 1-D double hierarchical residual block (1-D DHRB) is presented to capture sensitive features of the frequency domain valve acoustic signals. Thirdly, hierarchical multi-label tree is proposed to assist the embedding of the semantic structure of target cavitation states into SMTNet. Fourthly, experience filtering mechanism is proposed to fully learn a prior knowledge of cavitation detection model. Finally, SMTNet has been evaluated on two cavitation datasets without noise (Dataset 1 and Dataset 2), and one cavitation dataset with real noise (Dataset 3) provided by SAMSON AG (Frankfurt). The prediction accurcies of SMTNet for cavitation intensity recognition are as high as 95.32%, 97.16% and 100%, respectively. At the same time, the testing accuracies of SMTNet for cavitation detection are as high as 97.02%, 97.64% and 100%. In addition, SMTNet has also been tested for different frequencies of samples and has achieved excellent results of the highest frequency of samples of mobile phones. △ Less

Submitted 12 July, 2023; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: we need update this paper

arXiv:2203.01118 [pdf, other]

doi 10.1016/j.engappai.2022.104904

A multi-task learning for cavitation detection and cavitation intensity recognition of valve acoustic signals

Authors: Yu Sha, Johannes Faber, Shui** Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, Andreas Widl, Kai Zhou

Abstract: With the rapid development of smart manufacturing, data-driven machinery health management has received a growing attention. As one of the most popular methods in machinery health management, deep learning (DL) has achieved remarkable successes. However, due to the issues of limited samples and poor separability of different cavitation states of acoustic signals, which greatly hinder the eventual… ▽ More With the rapid development of smart manufacturing, data-driven machinery health management has received a growing attention. As one of the most popular methods in machinery health management, deep learning (DL) has achieved remarkable successes. However, due to the issues of limited samples and poor separability of different cavitation states of acoustic signals, which greatly hinder the eventual performance of DL modes for cavitation intensity recognition and cavitation detection. In this work, a novel multi-task learning framework for simultaneous cavitation detection and cavitation intensity recognition framework using 1-D double hierarchical residual networks (1-D DHRN) is proposed for analyzing valves acoustic signals. Firstly, a data augmentation method based on sliding window with fast Fourier transform (Swin-FFT) is developed to alleviate the small-sample issue confronted in this study. Secondly, a 1-D double hierarchical residual block (1-D DHRB) is constructed to capture sensitive features from the frequency domain acoustic signals of valve. Then, a new structure of 1-D DHRN is proposed. Finally, the devised 1-D DHRN is evaluated on two datasets of valve acoustic signals without noise (Dataset 1 and Dataset 2) and one dataset of valve acoustic signals with realistic surrounding noise (Dataset 3) provided by SAMSON AG (Frankfurt). Our method has achieved state-of-the-art results. The prediction accurcies of 1-D DHRN for cavitation intensitys recognition are as high as 93.75%, 94.31% and 100%, which indicates that 1-D DHRN outperforms other DL models and conventional methods. At the same time, the testing accuracies of 1-D DHRN for cavitation detection are as high as 97.02%, 97.64% and 100%. In addition, 1-D DHRN has also been tested for different frequencies of samples and shows excellent results for frequency of samples that mobile phones can accommodate. △ Less

Submitted 20 April, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: arXiv admin note: text overlap with arXiv:2202.13226

Journal ref: Engineering Applications of Artificial Intelligence, 113 (2022), 104904

arXiv:2202.13245 [pdf, other]

doi 10.1145/3534678.3539133

Regional-Local Adversarially Learned One-Class Classifier Anomalous Sound Detection in Global Long-Term Space

Authors: Yu Sha, Johannes Faber, Shui** Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, Andreas Widl, Kai Zhou

Abstract: Anomalous sound detection (ASD) is one of the most significant tasks of mechanical equipment monitoring and maintaining in complex industrial systems. In practice, it is vital to precisely identify abnormal status of the working mechanical system, which can further facilitate the failure troubleshooting. In this paper, we propose a multi-pattern adversarial learning one-class classification framew… ▽ More Anomalous sound detection (ASD) is one of the most significant tasks of mechanical equipment monitoring and maintaining in complex industrial systems. In practice, it is vital to precisely identify abnormal status of the working mechanical system, which can further facilitate the failure troubleshooting. In this paper, we propose a multi-pattern adversarial learning one-class classification framework, which allows us to use both the generator and the discriminator of an adversarial model for efficient ASD. The core idea is learning to reconstruct the normal patterns of acoustic data through two different patterns of auto-encoding generators, which succeeds in extending the fundamental role of a discriminator from identifying real and fake data to distinguishing between regional and local pattern reconstructions. Furthermore, we present a global filter layer for long-term interactions in the frequency domain space, which directly learns from the original data without introducing any human priors. Extensive experiments performed on four real-world datasets from different industrial domains (three cavitation datasets provided by SAMSON AG, and one existing publicly) for anomaly detection show superior results, and outperform recent state-of-the-art ASD methods. △ Less

Submitted 26 February, 2022; originally announced February 2022.

Journal ref: KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2022

arXiv:2202.13226 [pdf, other]

doi 10.1016/j.measurement.2022.110897

An acoustic signal cavitation detection framework based on XGBoost with adaptive selection feature engineering

Authors: Yu Sha, Johannes Faber, Shui** Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, Andreas Widl, Kai Zhou

Abstract: Valves are widely used in industrial and domestic pipeline systems. However, during their operation, they may suffer from the occurrence of the cavitation, which can cause loud noise, vibration and damage to the internal components of the valve. Therefore, monitoring the flow status inside valves is significantly beneficial to prevent the additional cost induced by cavitation. In this paper, a nov… ▽ More Valves are widely used in industrial and domestic pipeline systems. However, during their operation, they may suffer from the occurrence of the cavitation, which can cause loud noise, vibration and damage to the internal components of the valve. Therefore, monitoring the flow status inside valves is significantly beneficial to prevent the additional cost induced by cavitation. In this paper, a novel acoustic signal cavitation detection framework--based on XGBoost with adaptive selection feature engineering--is proposed. Firstly, a data augmentation method with non-overlap** sliding window (NOSW) is developed to solve small-sample problem involved in this study. Then, the each segmented piece of time-domain acoustic signal is transformed by fast Fourier transform (FFT) and its statistical features are extracted to be the input to the adaptive selection feature engineering (ASFE) procedure, where the adaptive feature aggregation and feature crosses are performed. Finally, with the selected features the XGBoost algorithm is trained for cavitation detection and tested on valve acoustic signal data provided by Samson AG (Frankfurt). Our method has achieved state-of-the-art results. The prediction performance on the binary classification (cavitation and no-cavitation) and the four-class classification (cavitation choked flow, constant cavitation, incipient cavitation and no-cavitation) are satisfactory and outperform the traditional XGBoost by 4.67% and 11.11% increase of the accuracy. △ Less

Submitted 1 March, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

Journal ref: Measurement 192 (2022), 110897

arXiv:2201.11736 [pdf, other]

Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives

Authors: David T. Hoffmann, Nadine Behrmann, Juergen Gall, Thomas Brox, Mehdi Noroozi

Abstract: This paper introduces Ranking Info Noise Contrastive Estimation (RINCE), a new member in the family of InfoNCE losses that preserves a ranked ordering of positive samples. In contrast to the standard InfoNCE loss, which requires a strict binary separation of the training pairs into similar and dissimilar samples, RINCE can exploit information about a similarity ranking for learning a corresponding… ▽ More This paper introduces Ranking Info Noise Contrastive Estimation (RINCE), a new member in the family of InfoNCE losses that preserves a ranked ordering of positive samples. In contrast to the standard InfoNCE loss, which requires a strict binary separation of the training pairs into similar and dissimilar samples, RINCE can exploit information about a similarity ranking for learning a corresponding embedding space. We show that the proposed loss function learns favorable embeddings compared to the standard InfoNCE whenever at least noisy ranking information can be obtained or when the definition of positives and negatives is blurry. We demonstrate this for a supervised classification task with additional superclass labels and noisy similarity scores. Furthermore, we show that RINCE can also be applied to unsupervised training with experiments on unsupervised representation learning from videos. In particular, the embedding yields higher classification accuracy, retrieval rates and performs better in out-of-distribution detection than the standard InfoNCE loss. △ Less

Submitted 27 January, 2022; originally announced January 2022.

Comments: AAAI 2022 (Main Track)

arXiv:2109.11593 [pdf, other]

Long Short View Feature Decomposition via Contrastive Video Representation Learning

Authors: Nadine Behrmann, Mohsen Fayyaz, Juergen Gall, Mehdi Noroozi

Abstract: Self-supervised video representation methods typically focus on the representation of temporal attributes in videos. However, the role of stationary versus non-stationary attributes is less explored: Stationary features, which remain similar throughout the video, enable the prediction of video-level action classes. Non-stationary features, which represent temporally varying attributes, are more be… ▽ More Self-supervised video representation methods typically focus on the representation of temporal attributes in videos. However, the role of stationary versus non-stationary attributes is less explored: Stationary features, which remain similar throughout the video, enable the prediction of video-level action classes. Non-stationary features, which represent temporally varying attributes, are more beneficial for downstream tasks involving more fine-grained temporal understanding, such as action segmentation. We argue that a single representation to capture both types of features is sub-optimal, and propose to decompose the representation space into stationary and non-stationary features via contrastive learning from long and short views, i.e. long video sequences and their shorter sub-sequences. Stationary features are shared between the short and long views, while non-stationary features aggregate the short views to match the corresponding long view. To empirically verify our approach, we demonstrate that our stationary features work particularly well on an action recognition downstream task, while our non-stationary features perform better on action segmentation. Furthermore, we analyse the learned representations and find that stationary features capture more temporally stable, static attributes, while non-stationary features encompass more temporally varying ones. △ Less

Submitted 23 September, 2021; originally announced September 2021.

Comments: ICCV 2021 (Main Conference)

arXiv:2106.03170 [pdf]

doi 10.1002/smr.2426

FlexParser -- the adaptive log file parser for continuous results in a changing world

Authors: Nadine Ruecker, Andreas Maier

Abstract: Any modern system writes events into files, called log files. Those contain crucial information which are subject to various analyses. Examples range from cybersecurity, intrusion detection over usage analyses to trouble shooting. Before data analysis is possible, desired information needs to be extracted first out of the semi-structured log messages. State-of-the-art event parsing often assumes s… ▽ More Any modern system writes events into files, called log files. Those contain crucial information which are subject to various analyses. Examples range from cybersecurity, intrusion detection over usage analyses to trouble shooting. Before data analysis is possible, desired information needs to be extracted first out of the semi-structured log messages. State-of-the-art event parsing often assumes static log events. However, any modern system is updated consistently and with updates also log file structures can change. We call those changes "mutation" and study parsing performance for different mutation cases. Latest research discovers mutations using anomaly detection post mortem, however, does not cover actual continuous parsing. Thus, we propose a novel and flexible parser, called FlexParser, which can extract desired values despite gradual changes in the log messages. It implies basic text preprocessing followed by a supervised Deep Learning method. We train a stateful LSTM on parsing one event per data set. Statefulness enforces the model to learn log message structures across several examples. Our model was tested on seven different, publicly available log file data sets and various kinds of mutations. Exhibiting an average F1-Score of 0.98, it outperforms other Deep Learning methods as well as state-of-the-art unsupervised parsers. △ Less

Submitted 1 February, 2022; v1 submitted 6 June, 2021; originally announced June 2021.

Comments: 17 pages, 9 figures, 3 tables

Showing 1–50 of 110 results for author: Nadine