Skip to main content

Showing 1–9 of 9 results for author: Moslem, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17363  [pdf, other

    cs.CL cs.SD eess.AS

    Leveraging Synthetic Audio Data for End-to-End Low-Resource Speech Translation

    Authors: Yasmin Moslem

    Abstract: This paper describes our system submission to the International Conference on Spoken Language Translation (IWSLT 2024) for Irish-to-English speech translation. We built end-to-end systems based on Whisper, and employed a number of data augmentation techniques, such as speech back-translation and noise augmentation. We investigate the effect of using synthetic audio data and discuss several methods… ▽ More

    Submitted 27 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: IWSLT 2024

  2. arXiv:2406.10118  [pdf, other

    cs.CL

    SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

    Authors: Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James V. Miranda, Jennifer Santoso, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, Onno P. Kampman, Joel Ruben Antony Moniz, Muhammad Ravi Shulthan Habibi, Frederikus Hudi, Railey Montalan, Ryan Ignatius, Joanito Agili Lopo, William Nixon, Börje F. Karlsson, James Jaya, Ryandito Diandaru, Yuze Gao, Patrick Amadeus, Bin Wang, Jan Christian Blaise Cruz, Chenxi Whitehouse , et al. (36 additional authors not shown)

    Abstract: Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due t… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: https://github.com/SEACrowd

  3. arXiv:2401.14559  [pdf, other

    cs.CL cs.AI cs.HC cs.IR

    Language Modelling Approaches to Adaptive Machine Translation

    Authors: Yasmin Moslem

    Abstract: Consistency is a key requirement of high-quality translation. It is especially important to adhere to pre-approved terminology and adapt to corrected translations in domain-specific projects. Machine translation (MT) has achieved significant progress in the area of domain adaptation. However, in-domain data scarcity is common in translation settings, due to the lack of specialised datasets and ter… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: PhD thesis

  4. arXiv:2312.12740  [pdf, ps, other

    cs.CL cs.IR

    Fine-tuning Large Language Models for Adaptive Machine Translation

    Authors: Yasmin Moslem, Rejwanul Haque, Andy Way

    Abstract: This paper presents the outcomes of fine-tuning Mistral 7B, a general-purpose large language model (LLM), for adaptive machine translation (MT). The fine-tuning process involves utilising a combination of zero-shot and one-shot translation prompts within the medical domain. The primary objective is to enhance real-time adaptive MT capabilities of Mistral 7B, enabling it to adapt translations to th… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  5. arXiv:2310.14451  [pdf, other

    cs.CL

    Domain Terminology Integration into Machine Translation: Leveraging Large Language Models

    Authors: Yasmin Moslem, Gianfranco Romani, Mahdi Molaei, Rejwanul Haque, John D. Kelleher, Andy Way

    Abstract: This paper discusses the methods that we used for our submissions to the WMT 2023 Terminology Shared Task for German-to-English (DE-EN), English-to-Czech (EN-CS), and Chinese-to-English (ZH-EN) language pairs. The task aims to advance machine translation (MT) by challenging participants to develop systems that accurately translate technical terms, ultimately enhancing communication and understandi… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: WMT 2023

  6. arXiv:2301.13294  [pdf, other

    cs.CL

    Adaptive Machine Translation with Large Language Models

    Authors: Yasmin Moslem, Rejwanul Haque, John D. Kelleher, Andy Way

    Abstract: Consistency is a key requirement of high-quality translation. It is especially important to adhere to pre-approved terminology and adapt to corrected translations in domain-specific projects. Machine translation (MT) has achieved significant progress in the area of domain adaptation. However, real-time adaptation remains challenging. Large-scale language models (LLMs) have recently shown interesti… ▽ More

    Submitted 9 May, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: EAMT 2023 - Research: technical

  7. arXiv:2210.12802  [pdf

    cs.CL cs.HC

    Translation Word-Level Auto-Completion: What can we achieve out of the box?

    Authors: Yasmin Moslem, Rejwanul Haque, Andy Way

    Abstract: Research on Machine Translation (MT) has achieved important breakthroughs in several areas. While there is much more to be done in order to build on this success, we believe that the language industry needs better ways to take full advantage of current achievements. Due to a combination of factors, including time, resources, and skills, businesses tend to apply pragmatism into their AI workflows.… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: WMT 2022

    Journal ref: In Proceedings of the Seventh Conference on Machine Translation, 2022, Abu Dhabi, UAE. Association for Computational Linguistics

  8. arXiv:2208.05909  [pdf

    cs.CL

    Domain-Specific Text Generation for Machine Translation

    Authors: Yasmin Moslem, Rejwanul Haque, John D. Kelleher, Andy Way

    Abstract: Preservation of domain knowledge from the source to target is crucial in any translation workflow. It is common in the translation industry to receive highly specialized projects, where there is hardly any parallel in-domain data. In such scenarios where there is insufficient in-domain data to fine-tune Machine Translation (MT) models, producing translations that are consistent with the relevant c… ▽ More

    Submitted 11 August, 2022; originally announced August 2022.

    Comments: AMTA 2022 - MT Research Track

    Report number: 2022.amta-research.2

    Journal ref: Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (2022) Volume 1: Research Track, pages 14-30, Orlando, USA. Association for Machine Translation in the Americas

  9. arXiv:2205.15599  [pdf, other

    cs.CL

    Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish

    Authors: Alp Öktem, Rodolfo Zevallos, Yasmin Moslem, Güneş Öztürk, Karen Şarhon

    Abstract: We develop machine translation and speech synthesis systems to complement the efforts of revitalizing Judeo-Spanish, the exiled language of Sephardic Jews, which survived for centuries, but now faces the threat of extinction in the digital age. Building on resources created by the Sephardic community of Turkey and elsewhere, we create corpora and tools that would help preserve this language for fu… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.