Skip to main content

Showing 1–8 of 8 results for author: Aepli, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.19315  [pdf, other

    cs.CL

    Modeling Orthographic Variation in Occitan's Dialects

    Authors: Zachary William Hopton, Noëmi Aepli

    Abstract: Effectively normalizing textual data poses a considerable challenge, especially for low-resource languages lacking standardized writing systems. In this study, we fine-tuned a multilingual model with data from several Occitan dialects and conducted a series of experiments to assess the model's representations of these dialects. For evaluation purposes, we compiled a parallel lexicon encompassing f… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted at VarDial 2024: The Eleventh Workshop on NLP for Similar Languages, Varieties and Dialects

  2. arXiv:2404.19310  [pdf, other

    cs.CL

    Does Whisper understand Swiss German? An automatic, qualitative, and human evaluation

    Authors: Eyal Liron Dolev, Clemens Fidel Lutz, Noëmi Aepli

    Abstract: Whisper is a state-of-the-art automatic speech recognition (ASR) model (Radford et al., 2022). Although Swiss German dialects are allegedly not part of Whisper's training data, preliminary experiments showed that Whisper can transcribe Swiss German quite well, with the output being a speech translation into Standard German. To gain a better understanding of Whisper's performance on Swiss German, w… ▽ More

    Submitted 9 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted at VarDial 2024 (the eleventh Workshop on NLP for Similar Languages, Varieties and Dialects 2024), Mexico City

  3. arXiv:2403.19142  [pdf, other

    cs.CL

    A Tulu Resource for Machine Translation

    Authors: Manu Narayanan, Noëmi Aepli

    Abstract: We present the first parallel dataset for English-Tulu translation. Tulu, classified within the South Dravidian linguistic family branch, is predominantly spoken by approximately 2.5 million individuals in southwestern India. Our dataset is constructed by integrating human translations into the multilingual machine translation resource FLORES-200. Furthermore, we use this dataset for evaluation pu… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-COLING 2024

  4. arXiv:2401.14400  [pdf, other

    cs.CL

    Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect

    Authors: Jannis Vamvas, Noëmi Aepli, Rico Sennrich

    Abstract: Creating neural text encoders for written Swiss German is challenging due to a dearth of training data combined with dialectal variation. In this paper, we build on several existing multilingual encoders and adapt them to Swiss German using continued pre-training. Evaluation on three diverse downstream tasks shows that simply adding a Swiss German adapter to a modular encoder achieves 97.5% of ful… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: First Workshop on Modular and Open Multilingual NLP (MOOMIN 2024)

  5. arXiv:2311.16865  [pdf, other

    cs.CL

    A Benchmark for Evaluating Machine Translation Metrics on Dialects Without Standard Orthography

    Authors: Noëmi Aepli, Chantal Amrhein, Florian Schottmann, Rico Sennrich

    Abstract: For sensible progress in natural language processing, it is important that we are aware of the limitations of the evaluation metrics we use. In this work, we evaluate how robust metrics are to non-standardized dialects, i.e. spelling differences in language varieties that do not have a standard orthography. To investigate this, we collect a dataset of human translations and human judgments for aut… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: WMT 2023 Research Paper

    ACM Class: I.2.7

  6. arXiv:2305.20080  [pdf, other

    cs.CL

    Findings of the VarDial Evaluation Campaign 2023

    Authors: Noëmi Aepli, Çağrı Çöltekin, Rob Van Der Goot, Tommi Jauhiainen, Mourhaf Kazzaz, Nikola Ljubešić, Kai North, Barbara Plank, Yves Scherrer, Marcos Zampieri

    Abstract: This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2023. The campaign is part of the tenth workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with EACL 2023. Three separate shared tasks were included this year: Slot and intent detection for low-resource language varieties (SID4LR),… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Journal ref: In Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023), pages 251-261, Dubrovnik, Croatia. Association from Computational Linguistics

  7. arXiv:2109.06772  [pdf, other

    cs.CL

    Improving Zero-shot Cross-lingual Transfer between Closely Related Languages by injecting Character-level Noise

    Authors: Noëmi Aepli, Rico Sennrich

    Abstract: Cross-lingual transfer between a high-resource language and its dialects or closely related language varieties should be facilitated by their similarity. However, current approaches that operate in the embedding space do not take surface similarity into account. This work presents a simple yet effective strategy to imrove cross-lingual transfer between closely related varieties. We propose to augm… ▽ More

    Submitted 11 March, 2022; v1 submitted 14 September, 2021; originally announced September 2021.

    Comments: ACL 2022

    ACM Class: I.2.7

  8. arXiv:2104.03945  [pdf, other

    cs.CL

    On Biasing Transformer Attention Towards Monotonicity

    Authors: Annette Rios, Chantal Amrhein, Noëmi Aepli, Rico Sennrich

    Abstract: Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it o… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: To be published in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2021)