Skip to main content

Showing 1–7 of 7 results for author: Hangya, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.12489  [pdf, other

    cs.CL

    Multilingual Word Embeddings for Low-Resource Languages using Anchors and a Chain of Related Languages

    Authors: Viktor Hangya, Silvia Severini, Radoslav Ralev, Alexander Fraser, Hinrich Schütze

    Abstract: Very low-resource languages, having only a few million tokens worth of data, are not well-supported by multilingual NLP approaches due to poor quality cross-lingual word representations. Recent work showed that good cross-lingual performance can be achieved if a source language is related to the low-resource target language. However, not all language pairs are related. In this paper, we propose to… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: Accepted at the MRL 2023 workshop

  2. arXiv:2311.08538  [pdf, other

    cs.CL

    Extending Multilingual Machine Translation through Imitation Learning

    Authors: Wen Lai, Viktor Hangya, Alexander Fraser

    Abstract: Despite the growing variety of languages supported by existing multilingual neural machine translation (MNMT) models, most of the world's languages are still being left behind. We aim to extend large-scale MNMT models to a new language, allowing for translation between the newly added and all of the already supported languages in a challenging scenario: using only a parallel corpus between the new… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  3. arXiv:2305.14081  [pdf, other

    cs.CL

    How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have

    Authors: Viktor Hangya, Alexander Fraser

    Abstract: Due to the broad range of social media platforms, the requirements of abusive language detection systems are varied and ever-changing. Already a large set of annotated corpora with different properties and label sets were created, such as hate or misogyny detection, but the form and targets of abusive speech are constantly evolving. Since, the annotation of new corpora is expensive, in this work w… ▽ More

    Submitted 6 May, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted at LREC-COLING 2024

  4. arXiv:2205.15713  [pdf, other

    cs.CL

    Don't Forget Cheap Training Signals Before Building Unsupervised Bilingual Word Embeddings

    Authors: Silvia Severini, Viktor Hangya, Masoud Jalili Sabet, Alexander Fraser, Hinrich Schütze

    Abstract: Bilingual Word Embeddings (BWEs) are one of the cornerstones of cross-lingual transfer of NLP models. They can be built using only monolingual corpora without supervision leading to numerous works focusing on unsupervised BWEs. However, most of the current approaches to build unsupervised BWEs do not compare their results with methods based on easy-to-access cross-lingual signals. In this paper, w… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

    Comments: BUCC@LREC 2022

  5. arXiv:2201.05922  [pdf, ps, other

    cs.CL

    Addressing the Challenges of Cross-Lingual Hate Speech Detection

    Authors: Irina Bigoulaeva, Viktor Hangya, Iryna Gurevych, Alexander Fraser

    Abstract: The goal of hate speech detection is to filter negative online content aiming at certain groups of people. Due to the easy accessibility of social media platforms it is crucial to protect everyone which requires building hate speech detection systems for a wide range of languages. However, the available labeled hate speech datasets are limited making it problematic to build systems for many langua… ▽ More

    Submitted 15 January, 2022; originally announced January 2022.

  6. arXiv:2010.13192  [pdf, other

    cs.CL

    The LMU Munich System for the WMT 2020 Unsupervised Machine Translation Shared Task

    Authors: Alexandra Chronopoulou, Dario Stojanovski, Viktor Hangya, Alexander Fraser

    Abstract: This paper describes the submission of LMU Munich to the WMT 2020 unsupervised shared task, in two language directions, German<->Upper Sorbian. Our core unsupervised neural machine translation (UNMT) system follows the strategy of Chronopoulou et al. (2020), using a monolingual pretrained language generation model (on German) and fine-tuning it on both German and Upper Sorbian, before initializing… ▽ More

    Submitted 25 October, 2020; originally announced October 2020.

    Comments: WMT Unsupervised Shared Task 2020

  7. arXiv:2010.12627  [pdf, other

    cs.CL

    Anchor-based Bilingual Word Embeddings for Low-Resource Languages

    Authors: Tobias Eder, Viktor Hangya, Alexander Fraser

    Abstract: Good quality monolingual word embeddings (MWEs) can be built for languages which have large amounts of unlabeled text. MWEs can be aligned to bilingual spaces using only a few thousand word translation pairs. For low resource languages training MWEs monolingually results in MWEs of poor quality, and thus poor bilingual word embeddings (BWEs) as well. This paper proposes a new approach for building… ▽ More

    Submitted 27 July, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing