Skip to main content

Showing 1–4 of 4 results for author: Shafayat, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.05761  [pdf, other

    cs.CL

    The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

    Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Gui** Son, Ye** Cho, Sheikh Shafayat, **heon Baek, Sue Hyun Park, Hyeonbin Hwang, **kyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

    Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Work in Progress

  2. arXiv:2403.10900  [pdf, other

    cs.CL

    BEnQA: A Question Answering and Reasoning Benchmark for Bengali and English

    Authors: Sheikh Shafayat, H M Quamran Hasan, Minhajur Rahman Chowdhury Mahim, Rifki Afina Putri, James Thorne, Alice Oh

    Abstract: In this study, we introduce BEnQA, a dataset comprising parallel Bengali and English exam questions for middle and high school levels in Bangladesh. Our dataset consists of approximately 5K questions covering several subjects in science with different types of questions, including factual, application, and reasoning-based questions. We benchmark several Large Language Models (LLMs) with our parall… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  3. arXiv:2402.18045  [pdf, other

    cs.CL

    Multi-FAct: Assessing Multilingual LLMs' Multi-Regional Knowledge using FActScore

    Authors: Sheikh Shafayat, Eunsu Kim, Juhyun Oh, Alice Oh

    Abstract: Large Language Models (LLMs) are prone to factuality hallucination, generating text that contradicts established knowledge. While extensive research has addressed this in English, little is known about multilingual LLMs. This paper systematically evaluates multilingual LLMs' factual accuracy across languages and geographic regions. We introduce a novel pipeline for multilingual factuality evaluati… ▽ More

    Submitted 1 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  4. arXiv:2401.10695  [pdf, other

    cs.CL

    LangBridge: Multilingual Reasoning Without Multilingual Supervision

    Authors: Dongkeun Yoon, Joel Jang, Sungdong Kim, Seungone Kim, Sheikh Shafayat, Minjoon Seo

    Abstract: We introduce LangBridge, a zero-shot approach to adapt language models for multilingual reasoning tasks without multilingual supervision. LangBridge operates by bridging two models, each specialized in different aspects: (1) one specialized in understanding multiple languages (e.g., mT5 encoder) and (2) one specialized in reasoning (e.g., MetaMath). LangBridge connects the two models by introducin… ▽ More

    Submitted 3 June, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: ACL 2024 Main