Skip to main content

Showing 1–5 of 5 results for author: Tilli, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.17647  [pdf, other

    cs.CL

    Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering

    Authors: Pascal Tilli, Ngoc Thang Vu

    Abstract: The large success of deep learning based methods in Visual Question Answering (VQA) has concurrently increased the demand for explainable methods. Most methods in Explainable Artificial Intelligence (XAI) focus on generating post-hoc explanations rather than taking an intrinsic approach, the latter characterizing an interpretable model. In this work, we introduce an interpretable approach for grap… ▽ More

    Submitted 27 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-COLING 2024

  2. Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions

    Authors: Florian Lux, Pascal Tilli, Sarina Meyer, Ngoc Thang Vu

    Abstract: Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available. Furthermore, editing an existing human's voice also comes with ethical concerns. In this paper, we propose a method to generate artificial speaker embeddings that cannot be linked to a real human while offering intui… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Published at ISCA Interspeech 2023 https://www.isca-speech.org/archive/interspeech_2023/lux23_interspeech.html

  3. arXiv:2210.07002  [pdf, other

    cs.SD cs.CL eess.AS

    Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy

    Authors: Sarina Meyer, Pascal Tilli, Pavel Denisov, Florian Lux, Julia Koch, Ngoc Thang Vu

    Abstract: In order to protect the privacy of speech data, speaker anonymization aims for hiding the identity of a speaker by changing the voice in speech recordings. This typically comes with a privacy-utility trade-off between protection of individuals and usability of the data for downstream applications. One of the challenges in this context is to create non-existent voices that sound as natural as possi… ▽ More

    Submitted 20 October, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: IEEE Spoken Language Technology Workshop 2022

  4. arXiv:2207.04834  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Speaker Anonymization with Phonetic Intermediate Representations

    Authors: Sarina Meyer, Florian Lux, Pavel Denisov, Julia Koch, Pascal Tilli, Ngoc Thang Vu

    Abstract: In this work, we propose a speaker anonymization pipeline that leverages high quality automatic speech recognition and synthesis systems to generate speech conditioned on phonetic transcriptions and anonymized speaker embeddings. Using phones as the intermediate representation ensures near complete elimination of speaker identity information from the input while preserving the original phonetic co… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: Accepted at Interspeech 2022

  5. arXiv:2110.05159  [pdf, other

    cs.CV cs.AI

    Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking

    Authors: Dirk Väth, Pascal Tilli, Ngoc Thang Vu

    Abstract: On the way towards general Visual Question Answering (VQA) systems that are able to answer arbitrary questions, the need arises for evaluation beyond single-metric leaderboards for specific datasets. To this end, we propose a browser-based benchmarking tool for researchers and challenge organizers, with an API for easy integration of new models and datasets to keep up with the fast-changing landsc… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: 6.5 pages, 9 figures