Skip to main content

Showing 1–18 of 18 results for author: Deriu, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.03235  [pdf, other

    cs.CL cs.AI

    Error-preserving Automatic Speech Recognition of Young English Learners' Language

    Authors: Janick Michot, Manuela Hürlimann, Jan Deriu, Luzia Sauer, Katsiaryna Mlynchyk, Mark Cieliebak

    Abstract: One of the central skills that language learners need to practice is speaking the language. Currently, students in school do not get enough speaking opportunities and lack conversational practice. Recent advances in speech technology and natural language processing allow for the creation of novel tools to practice their speaking skills. In this work, we tackle the first component of such a pipelin… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024 Main Conference

  2. arXiv:2406.01131  [pdf, other

    cs.AI

    Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation

    Authors: Pius von Däniken, Jan Deriu, Don Tuggener, Mark Cieliebak

    Abstract: Generative AI systems have become ubiquitous for all kinds of modalities, which makes the issue of the evaluation of such models more pressing. One popular approach is preference ratings, where the generated outputs of different systems are shown to evaluators who choose their preferences. In recent years the field shifted towards the development of automated (trained) metrics to assess generated… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL Main Conference

  3. arXiv:2310.09088  [pdf, other

    cs.CL cs.AI

    Dialect Transfer for Swiss German Speech Translation

    Authors: Claudio Paonessa, Yanick Schraner, Jan Deriu, Manuela Hürlimann, Manfred Vogel, Mark Cieliebak

    Abstract: This paper investigates the challenges in building Swiss German speech translation systems, specifically focusing on the impact of dialect diversity and differences between Swiss German and Standard German. Swiss German is a spoken language with no formal writing system, it comprises many diverse dialects and is a low-resource language with only around 5 million speakers. The study is guided by tw… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  4. arXiv:2306.04743  [pdf, other

    cs.DB cs.AI cs.CL

    ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems

    Authors: Yi Zhang, Jan Deriu, George Katsogiannis-Meimarakis, Catherine Kosten, Georgia Koutrika, Kurt Stockinger

    Abstract: Natural Language to SQL systems (NL-to-SQL) have recently shown a significant increase in accuracy for natural language to SQL query translation. This improvement is due to the emergence of transformer-based language models, and the popularity of the Spider benchmark - the de-facto standard for evaluating NL-to-SQL systems. The top NL-to-SQL systems reach accuracies of up to 85\%. However, Spider… ▽ More

    Submitted 5 December, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: 12 pages, 2 figures, 5 tables

    ACM Class: H.2.4; I.2.7

    Journal ref: PVLDB Volume 17, 2023-2024

  5. arXiv:2306.03866  [pdf, other

    cs.CL cs.AI

    Correction of Errors in Preference Ratings from Automated Metrics for Text Generation

    Authors: Jan Deriu, Pius von Däniken, Don Tuggener, Mark Cieliebak

    Abstract: A major challenge in the field of Text Generation is evaluation: Human evaluations are cost-intensive, and automated metrics often display considerable disagreement with human judgments. In this paper, we propose a statistical model of Text Generation evaluation that accounts for the error-proneness of automated metrics when used to generate preference rankings between system outputs. We show that… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

  6. arXiv:2305.19750  [pdf, other

    cs.CL cs.SD eess.AS

    Text-to-Speech Pipeline for Swiss German -- A comparison

    Authors: Tobias Bollinger, Jan Deriu, Manfred Vogel

    Abstract: In this work, we studied the synthesis of Swiss German speech using different Text-to-Speech (TTS) models. We evaluated the TTS models on three corpora, and we found, that VITS models performed best, hence, using them for further testing. We also introduce a new method to evaluate TTS models by letting the discriminator of a trained vocoder GAN model predict whether a given waveform is human or sy… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

  7. arXiv:2305.18855  [pdf, other

    cs.CL cs.AI

    STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions

    Authors: Michel Plüss, Jan Deriu, Yanick Schraner, Claudio Paonessa, Julia Hartmann, Larissa Schmidt, Christian Scheller, Manuela Hürlimann, Tanja Samardžić, Manfred Vogel, Mark Cieliebak

    Abstract: We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swiss German speech, annotated with Standard German text at the sentence level. The data is collected using a web app in which the speakers are shown Standard German sentences, which they translate to Swiss German and record. We make the corpus publicly available. It contains 343 hours of speech from all dialect regions and is th… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  8. arXiv:2210.13025  [pdf, other

    cs.CL cs.AI

    On the Effectiveness of Automated Metrics for Text Generation Systems

    Authors: Pius von Däniken, Jan Deriu, Don Tuggener, Mark Cieliebak

    Abstract: A major challenge in the field of Text Generation is evaluation because we lack a sound theory that can be leveraged to extract guidelines for evaluation campaigns. In this work, we propose a first step towards such a theory that incorporates different sources of uncertainty, such as imperfect automated metrics and insufficiently sized test sets. The theory has practical applications, such as dete… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  9. arXiv:2205.09501  [pdf, other

    cs.CL cs.AI

    SDS-200: A Swiss German Speech to Standard German Text Corpus

    Authors: Michel Plüss, Manuela Hürlimann, Marc Cuny, Alla Stöckli, Nikolaos Kapotis, Julia Hartmann, Malgorzata Anna Ulasik, Christian Scheller, Yanick Schraner, Amit Jain, Jan Deriu, Mark Cieliebak, Manfred Vogel

    Abstract: We present SDS-200, a corpus of Swiss German dialectal speech with Standard German text translations, annotated with dialect, age, and gender information of the speakers. The dataset allows for training speech translation, dialect recognition, and speech synthesis systems, among others. The data was collected using a web recording tool that is open to the public. Each participant was given a text… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

  10. arXiv:2203.10012  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges

    Authors: Shikib Mehri, **ho Choi, Luis Fernando D'Haro, Jan Deriu, Maxine Eskenazi, Milica Gasic, Kallirroi Georgila, Dilek Hakkani-Tur, Zekang Li, Verena Rieser, Samira Shaikh, David Traum, Yi-Ting Yeh, Zhou Yu, Yizhe Zhang, Chen Zhang

    Abstract: This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog. The workshop explored the current state of the art along with its limitations and suggested promising directions for future work in this important and very rapidly changing area of research.

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: Report from the NSF AED Workshop (http://dialrc.org/AED/)

  11. arXiv:2202.13887  [pdf, other

    cs.AI cs.CL

    Probing the Robustness of Trained Metrics for Conversational Dialogue Systems

    Authors: Jan Deriu, Don Tuggener, Pius von Däniken, Mark Cieliebak

    Abstract: This paper introduces an adversarial method to stress-test trained metrics to evaluate conversational dialogue systems. The method leverages Reinforcement Learning to find response strategies that elicit optimal scores from the trained metrics. We apply our method to test recently proposed trained metrics. We find that they all are susceptible to giving high scores to responses generated by relati… ▽ More

    Submitted 28 February, 2022; originally announced February 2022.

  12. arXiv:2010.02140  [pdf, other

    cs.AI cs.CL

    Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems

    Authors: Jan Deriu, Don Tuggener, Pius von Däniken, Jon Ander Campos, Alvaro Rodrigo, Thiziri Belkacem, Aitor Soroa, Eneko Agirre, Mark Cieliebak

    Abstract: The lack of time-efficient and reliable evaluation methods hamper the development of conversational dialogue systems (chatbots). Evaluations requiring humans to converse with chatbots are time and cost-intensive, put high cognitive demands on the human judges, and yield low-quality results. In this work, we introduce \emph{Spot The Bot}, a cost-efficient and robust evaluation framework that replac… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

  13. arXiv:2005.01328  [pdf, other

    cs.CL

    DoQA -- Accessing Domain-Specific FAQs via Conversational QA

    Authors: Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre

    Abstract: The goal of this work is to build conversational Question Answering (QA) interfaces for the large body of domain-specific information available in FAQ sites. We present DoQA, a dataset with 2,437 dialogues and 10,917 QA pairs. The dialogues are collected from three Stack Exchange sites using the Wizard of Oz method with crowdsourcing. Compared to previous work, DoQA comprises well-defined informat… ▽ More

    Submitted 18 May, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

    Comments: Accepted at ACL 2020. 13 pages 4 figures

    Journal ref: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020

  14. arXiv:2004.07633  [pdf, other

    cs.AI cs.CL cs.LG

    A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation

    Authors: Jan Deriu, Katsiaryna Mlynchyk, Philippe Schläpfer, Alvaro Rodrigo, Dirk von Grünigen, Nicolas Kaiser, Kurt Stockinger, Eneko Agirre, Mark Cieliebak

    Abstract: In this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database called Operation Trees (OT). This representation allows us to invert the annotation process without losing flexibility in the types of queries that we generate. Furt… ▽ More

    Submitted 25 June, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Journal ref: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020

  15. arXiv:1909.12066  [pdf, other

    cs.AI cs.CL cs.LG

    Towards a Metric for Automated Conversational Dialogue System Evaluation and Improvement

    Authors: Jan Deriu, Mark Cieliebak

    Abstract: We present "AutoJudge", an automated evaluation method for conversational dialogue systems. The method works by first generating dialogues based on self-talk, i.e. dialogue systems talking to itself. Then, it uses human ratings on these dialogues to train an automated judgement model. Our experiments show that AutoJudge correlates well with the human ratings and can be used to automatically evalua… ▽ More

    Submitted 25 June, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: 8 Pages, To be published at the INLG 2019 converence

    Journal ref: Proceedings of the 12th International Conference on Natural Language Generation. 2019

  16. arXiv:1905.04071  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Survey on Evaluation Methods for Dialogue Systems

    Authors: Jan Deriu, Alvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen, Sophie Rosset, Eneko Agirre, Mark Cieliebak

    Abstract: In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of huma… ▽ More

    Submitted 26 June, 2020; v1 submitted 10 May, 2019; originally announced May 2019.

    Journal ref: Artificial Intelligence Review, June 2020

  17. arXiv:1703.02504  [pdf, other

    cs.CL cs.IR cs.LG

    Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification

    Authors: Jan Deriu, Aurelien Lucchi, Valeria De Luca, Aliaksei Severyn, Simon Müller, Mark Cieliebak, Thomas Hofmann, Martin Jaggi

    Abstract: This paper presents a novel approach for multi-lingual sentiment classification in short texts. This is a challenging task as the amount of training data in languages other than English is very limited. Previously proposed multi-lingual approaches typically require to establish a correspondence to English for which powerful classifiers are already available. In contrast, our method does not requir… ▽ More

    Submitted 7 March, 2017; originally announced March 2017.

    Comments: appearing at WWW 2017 - 26th International World Wide Web Conference

    ACM Class: I.2.7

  18. arXiv:1511.03464  [pdf, other

    cs.CV

    A Directional Diffusion Algorithm for Inpainting

    Authors: Jan Deriu, Rolf Jagerman, Kai-En Tsay

    Abstract: The problem of inpainting involves reconstructing the missing areas of an image. Inpainting has many applications, such as reconstructing old damaged photographs or removing obfuscations from images. In this paper we present the directional diffusion algorithm for inpainting. Typical diffusion algorithms are bad at propagating edges from the image into the unknown masked regions. The directional d… ▽ More

    Submitted 11 November, 2015; originally announced November 2015.