Skip to main content

Showing 1–18 of 18 results for author: Schultz, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18253  [pdf, other

    cs.CV

    On the Role of Visual Grounding in VQA

    Authors: Daniel Reich, Tanja Schultz

    Abstract: Visual Grounding (VG) in VQA refers to a model's proclivity to infer answers based on question-relevant image regions. Conceptually, VG identifies as an axiomatic requirement of the VQA task. In practice, however, DNN-based VQA models are notorious for bypassing VG by way of shortcut (SC) learning without suffering obvious performance losses in standard benchmarks. To uncover the impact of SC lear… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.15119  [pdf, other

    cs.SD cs.AI eess.AS

    Speech Emotion Recognition under Resource Constraints with Data Distillation

    Authors: Yi Chang, Zhao Ren, Zhonghao Zhao, Thanh Tam Nguyen, Kun Qian, Tanja Schultz, Björn W. Schuller

    Abstract: Speech emotion recognition (SER) plays a crucial role in human-computer interaction. The emergence of edge devices in the Internet of Things (IoT) presents challenges in constructing intricate deep learning models due to constraints in memory and computational resources. Moreover, emotional speech data often contains private information, raising concerns about privacy leakage during the deployment… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  3. arXiv:2405.08021  [pdf, other

    cs.SD eess.AS

    Diff-ETS: Learning a Diffusion Probabilistic Model for Electromyography-to-Speech Conversion

    Authors: Zhao Ren, Kevin Scheck, Qinhan Hou, Stefano van Gogh, Michael Wand, Tanja Schultz

    Abstract: Electromyography-to-Speech (ETS) conversion has demonstrated its potential for silent speech interfaces by generating audible speech from Electromyography (EMG) signals during silent articulations. ETS models usually consist of an EMG encoder which converts EMG signals to acoustic speech features, and a vocoder which then synthesises the speech signals. Due to an inadequate amount of available dat… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Accepted by EMBC 2024

  4. arXiv:2402.12298  [pdf, other

    cs.CL cs.AI

    Is Open-Source There Yet? A Comparative Study on Commercial and Open-Source LLMs in Their Ability to Label Chest X-Ray Reports

    Authors: Felix J. Dorfner, Liv Jürgensen, Leonhard Donle, Fares Al Mohamad, Tobias R. Bodenmann, Mason C. Cleveland, Felix Busch, Lisa C. Adams, James Sato, Thomas Schultz, Albert E. Kim, Jameson Merkow, Keno K. Bressem, Christopher P. Bridge

    Abstract: Introduction: With the rapid advances in large language models (LLMs), there have been numerous new open source as well as commercial models. While recent publications have explored GPT-4 in its application to extracting information of interest from radiology reports, there has not been a real-world comparison of GPT-4 to different leading open-source models. Materials and Methods: Two different… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  5. arXiv:2402.01227  [pdf, other

    cs.SD cs.AI cs.HC eess.AS

    STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

    Authors: Yi Chang, Zhao Ren, Zixing Zhang, Xin **g, Kun Qian, Xi Shao, Bin Hu, Tanja Schultz, Björn W. Schuller

    Abstract: Speech contains rich information on the emotions of humans, and Speech Emotion Recognition (SER) has been an important topic in the area of human-computer interaction. The robustness of SER models is crucial, particularly in privacy-sensitive and reliability-demanding domains like private healthcare. Recently, the vulnerability of deep neural networks in the audio domain to adversarial attacks has… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  6. arXiv:2401.07803  [pdf, other

    cs.CV

    Uncovering the Full Potential of Visual Grounding Methods in VQA

    Authors: Daniel Reich, Tanja Schultz

    Abstract: Visual Grounding (VG) methods in Visual Question Answering (VQA) attempt to improve VQA performance by strengthening a model's reliance on question-relevant visual information. The presence of such relevant information in the visual input is typically assumed in training and testing. This assumption, however, is inherently flawed when dealing with imperfect image representations common in large-sc… ▽ More

    Submitted 15 February, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

  7. Measuring Faithful and Plausible Visual Grounding in VQA

    Authors: Daniel Reich, Felix Putze, Tanja Schultz

    Abstract: Metrics for Visual Grounding (VG) in Visual Question Answering (VQA) systems primarily aim to measure a system's reliance on relevant parts of the image when inferring an answer to the given question. Lack of VG has been a common problem among state-of-the-art VQA systems and can manifest in over-reliance on irrelevant image parts or a disregard for the visual modality entirely. Although inference… ▽ More

    Submitted 14 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Journal ref: EMNLP 2023 Findings

  8. arXiv:2211.08086  [pdf, other

    cs.CV cs.AI cs.CL cs.HC cs.LG

    Visually Grounded VQA by Lattice-based Retrieval

    Authors: Daniel Reich, Felix Putze, Tanja Schultz

    Abstract: Visual Grounding (VG) in Visual Question Answering (VQA) systems describes how well a system manages to tie a question and its answer to relevant image regions. Systems with strong VG are considered intuitively interpretable and suggest an improved scene understanding. While VQA accuracy performances have seen impressive gains over the past few years, explicit improvements to VG performance and ev… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  9. arXiv:2106.14476  [pdf, other

    cs.CV

    Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs

    Authors: Daniel Reich, Felix Putze, Tanja Schultz

    Abstract: With the expressed goal of improving system transparency and visual grounding in the reasoning process in VQA, we present a modular system for the task of compositional VQA based on scene graphs. Our system is called "Adventurer's Treasure Hunt" (or ATH), named after an analogy we draw between our model's search procedure for an answer and an adventurer's search for treasure. We developed ATH with… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

  10. arXiv:2103.03621  [pdf, other

    cs.HC cs.SD eess.AS

    Low-latency auditory spatial attention detection based on spectro-spatial features from EEG

    Authors: Siqi Cai, Pengcheng Sun, Tanja Schultz, Haizhou Li

    Abstract: Detecting auditory attention based on brain signals enables many everyday applications, and serves as part of the solution to the cocktail party effect in speech processing. Several studies leverage the correlation between brain signals and auditory stimuli to detect the auditory attention of listeners. Recently, studies show that the alpha band (8-13 Hz) EEG signals enable the localization of aud… ▽ More

    Submitted 4 July, 2021; v1 submitted 5 March, 2021; originally announced March 2021.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  11. arXiv:2102.03684  [pdf, other

    cs.HC

    Linking Labs: Interconnecting Experimental Environments

    Authors: Tanja Schultz, Felix Putze, Thorsten Fehr, Moritz Meier, Celeste Mason, Florian Ahrens, Manfred Herrmann

    Abstract: We introduce the concept of LabLinking: a technology-based interconnection of experimental laboratories across institutions, disciplines, cultures, languages, and time zones - in other words experiments without borders. In particular, we introduce LabLinking levels (LLL), which define the degree of tightness of empirical interconnection between labs. We describe the technological infrastructure in… ▽ More

    Submitted 6 February, 2021; originally announced February 2021.

  12. Federated Learning for Breast Density Classification: A Real-World Implementation

    Authors: Holger R. Roth, Ken Chang, Praveer Singh, Nir Neumark, Wenqi Li, Vikash Gupta, Sharut Gupta, Liangqiong Qu, Alvin Ihsani, Bernardo C. Bizzo, Yuhong Wen, Varun Buch, Meesam Shah, Felipe Kitamura, Matheus Mendonça, Vitor Lavor, Ahmed Harouni, Colin Compas, Jesse Tetreault, Prerna Dogra, Yan Cheng, Selnur Erdal, Richard White, Behrooz Hashemian, Thomas Schultz , et al. (18 additional authors not shown)

    Abstract: Building robust deep learning-based models requires large quantities of diverse training data. In this study, we investigate the use of federated learning (FL) to build medical imaging classification models in a real-world collaborative setting. Seven clinical institutions from across the world joined this FL effort to train a model for breast density classification based on Breast Imaging, Report… ▽ More

    Submitted 20 October, 2020; v1 submitted 3 September, 2020; originally announced September 2020.

    Comments: Accepted at the 1st MICCAI Workshop on "Distributed And Collaborative Learning"; add citation to Fig. 1 & 2 and update Fig. 5; fix typo in affiliations

    Journal ref: In: Albarqouni S. et al. (eds) Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning. DART 2020, DCL 2020. Lecture Notes in Computer Science, vol 12444. Springer, Cham

  13. arXiv:2006.10406  [pdf, other

    cs.CV

    Fourth-Order Anisotropic Diffusion for Inpainting and Image Compression

    Authors: Ikram Jumakulyyev, Thomas Schultz

    Abstract: Edge-enhancing diffusion (EED) can reconstruct a close approximation of an original image from a small subset of its pixels. This makes it an attractive foundation for PDE based image compression. In this work, we generalize second-order EED to a fourth-order counterpart. It involves a fourth-order diffusion tensor that is constructed from the regularized image gradient in a similar way as in trad… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted for publication in Springer book "Anisotropy Across Fields and Scales"

    ACM Class: I.4.2; I.4.4

  14. arXiv:1711.08279  [pdf, other

    cs.GR

    Visual Analytics of Group Differences in Tensor Fields: Application to Clinical DTI

    Authors: Amin Abbasloo, Vitalis Wiens, Tobias Schmidt-Wilcke, Pia Sundgren, Reinhard Klein, Thomas Schultz

    Abstract: We present a visual analytics system for exploring group differences in tensor fields with respect to all six degrees of freedom that are inherent in symmetric second-order tensors. Our framework closely integrates quantitative analysis, based on multivariate hypothesis testing and spatial cluster enhancement, with suitable visualization tools that facilitate interpretation of results, and forming… ▽ More

    Submitted 22 November, 2017; originally announced November 2017.

    Comments: Submitted to IEEE Transactions on Visualization and Computer Graphics

  15. arXiv:1710.08878  [pdf, ps, other

    cs.LG stat.ML

    Classification on Large Networks: A Quantitative Bound via Motifs and Graphons

    Authors: Andreas Haupt, Mohammad Khatami, Thomas Schultz, Ngoc Mai Tran

    Abstract: When each data point is a large graph, graph statistics such as densities of certain subgraphs (motifs) can be used as feature vectors for machine learning. While intuitive, motif counts are expensive to compute and difficult to work with theoretically. Via graphon theory, we give an explicit quantitative bound for the ability of motif homomorphisms to distinguish large networks under both generat… ▽ More

    Submitted 24 October, 2017; originally announced October 2017.

    Comments: 17 pages, 2 figures, 1 table

    MSC Class: 68T05; 05C80; 62G99

  16. Syntactic and Semantic Features For Code-Switching Factored Language Models

    Authors: Heike Adel, Ngoc Thang Vu, Katrin Kirchhoff, Dominic Telaar, Tanja Schultz

    Abstract: This paper presents our latest investigations on different features for factored language models for Code-Switching speech and their effect on automatic speech recognition (ASR) performance. We focus on syntactic and semantic features which can be extracted from Code-Switching text data and integrate them into factored language models. Different possible factors, such as words, part-of-speech tags… ▽ More

    Submitted 4 October, 2017; originally announced October 2017.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing (Volume: 23, Issue: 3, March 2015)

  17. Multi-Scale Anisotropic Fourth-Order Diffusion Improves Ridge and Valley Localization

    Authors: Shekoufeh Gorgi Zadeh, Stephan Didas, Maximilian W. M. Wintergerst, Thomas Schultz

    Abstract: Ridge and valley enhancing filters are widely used in applications such as vessel detection in medical image computing. When images are degraded by noise or include vessels at different scales, such filters are an essential step for meaningful and stable vessel localization. In this work, we propose a novel multi-scale anisotropic fourth-order diffusion equation that allows us to smooth along vess… ▽ More

    Submitted 18 April, 2017; v1 submitted 21 November, 2016; originally announced November 2016.

    Comments: 12 pages, 8 figures, 1 table

    Journal ref: Journal of Mathematical Imaging and Vision, 1-13, 2017

  18. arXiv:1307.3271  [pdf, other

    cs.CV

    Fuzzy Fibers: Uncertainty in dMRI Tractography

    Authors: Thomas Schultz, Anna Vilanova, Ralph Brecheisen, Gordon Kindlmann

    Abstract: Fiber tracking based on diffusion weighted Magnetic Resonance Imaging (dMRI) allows for noninvasive reconstruction of fiber bundles in the human brain. In this chapter, we discuss sources of error and uncertainty in this technique, and review strategies that afford a more reliable interpretation of the results. This includes methods for computing and rendering probabilistic tractograms, which esti… ▽ More

    Submitted 11 July, 2013; originally announced July 2013.