Skip to main content

Showing 1–7 of 7 results for author: Atuhurra, J

.
  1. arXiv:2406.15359  [pdf, other

    cs.CL cs.CV

    Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models

    Authors: Jesse Atuhurra, Iqra Ali, Tatsuya Hiraoka, Hidetaka Kamigaito, Tomoya Iwakura, Taro Watanabe

    Abstract: Large language models (LLMs) have increased interest in vision language models (VLMs), which process image-text pairs as input. Studies investigating the visual understanding ability of VLMs have been proposed, but such studies are still preliminary because existing datasets do not permit a comprehensive evaluation of the fine-grained visual linguistic abilities of VLMs across multiple languages.… ▽ More

    Submitted 29 March, 2024; originally announced June 2024.

  2. arXiv:2406.15358  [pdf, other

    cs.CL

    Introducing Syllable Tokenization for Low-resource Languages: A Case Study with Swahili

    Authors: Jesse Atuhurra, Hiroyuki Shindo, Hidetaka Kamigaito, Taro Watanabe

    Abstract: Many attempts have been made in multilingual NLP to ensure that pre-trained language models, such as mBERT or GPT2 get better and become applicable to low-resource languages. To achieve multilingualism for pre-trained language models (PLMs), we need techniques to create word embeddings that capture the linguistic characteristics of any language. Tokenization is one such technique because it allows… ▽ More

    Submitted 26 March, 2024; originally announced June 2024.

  3. arXiv:2405.00693  [pdf, other

    cs.RO cs.CL

    Large Language Models for Human-Robot Interaction: Opportunities and Risks

    Authors: Jesse Atuhurra

    Abstract: The tremendous development in large language models (LLM) has led to a new wave of innovations and applications and yielded research results that were initially forecast to take longer. In this work, we tap into these recent developments and present a meta-study about the potential of large language models if deployed in social robots. We place particular emphasis on the applications of social rob… ▽ More

    Submitted 26 March, 2024; originally announced May 2024.

  4. arXiv:2404.14415  [pdf, other

    cs.CL

    Domain Adaptation in Intent Classification Systems: A Review

    Authors: Jesse Atuhurra, Hidetaka Kamigaito, Taro Watanabe, Eric Nichols

    Abstract: Dialogue agents, which perform specific tasks, are part of the long-term goal of NLP researchers to build intelligent agents that communicate with humans in natural language. Such systems should adapt easily from one domain to another to assist users in completing tasks. Researchers have developed a broad range of techniques, objectives, and datasets for intent classification to achieve such syste… ▽ More

    Submitted 26 March, 2024; originally announced April 2024.

  5. arXiv:2404.08666  [pdf, other

    cs.CL cs.LG

    Revealing Trends in Datasets from the 2022 ACL and EMNLP Conferences

    Authors: Jesse Atuhurra, Hidetaka Kamigaito

    Abstract: Natural language processing (NLP) has grown significantly since the advent of the Transformer architecture. Transformers have given birth to pre-trained large language models (PLMs). There has been tremendous improvement in the performance of NLP systems across several tasks. NLP systems are on par or, in some cases, better than humans at accomplishing specific tasks. However, it remains the norm… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  6. arXiv:2403.18989  [pdf, other

    cs.CR cs.AI

    Dealing with Imbalanced Classes in Bot-IoT Dataset

    Authors: Jesse Atuhurra, Takanori Hara, Yuanyu Zhang, Masahiro Sasabe, Shoji Kasahara

    Abstract: With the rapidly spreading usage of Internet of Things (IoT) devices, a network intrusion detection system (NIDS) plays an important role in detecting and protecting various types of attacks in the IoT network. To evaluate the robustness of the NIDS in the IoT network, the existing work proposed a realistic botnet dataset in the IoT network (Bot-IoT dataset) and applied it to machine learning-base… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  7. arXiv:2403.15430  [pdf, other

    cs.CL

    Distilling Named Entity Recognition Models for Endangered Species from Large Language Models

    Authors: Jesse Atuhurra, Seiveright Cargill Dujohn, Hidetaka Kamigaito, Hiroyuki Shindo, Taro Watanabe

    Abstract: Natural language processing (NLP) practitioners are leveraging large language models (LLM) to create structured datasets from semi-structured and unstructured data sources such as patents, papers, and theses, without having domain-specific knowledge. At the same time, ecological experts are searching for a variety of means to preserve biodiversity. To contribute to these efforts, we focused on end… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.