-
Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian Language?
Authors:
Omid Ghahroodi,
Marzia Nouri,
Mohammad Vali Sanian,
Alireza Sahebi,
Doratossadat Dastgheib,
Ehsaneddin Asgari,
Mahdieh Soleymani Baghshah,
Mohammad Hossein Rohban
Abstract:
Evaluating Large Language Models (LLMs) is challenging due to their generative nature, necessitating precise evaluation methodologies. Additionally, non-English LLM evaluation lags behind English, resulting in the absence or weakness of LLMs for many languages. In response to this necessity, we introduce Khayyam Challenge (also known as PersianMMLU), a meticulously curated collection comprising 20…
▽ More
Evaluating Large Language Models (LLMs) is challenging due to their generative nature, necessitating precise evaluation methodologies. Additionally, non-English LLM evaluation lags behind English, resulting in the absence or weakness of LLMs for many languages. In response to this necessity, we introduce Khayyam Challenge (also known as PersianMMLU), a meticulously curated collection comprising 20,192 four-choice questions sourced from 38 diverse tasks extracted from Persian examinations, spanning a wide spectrum of subjects, complexities, and ages. The primary objective of the Khayyam Challenge is to facilitate the rigorous evaluation of LLMs that support the Persian language. Distinctive features of the Khayyam Challenge are (i) its comprehensive coverage of various topics, including literary comprehension, mathematics, sciences, logic, intelligence testing, etc., aimed at assessing different facets of LLMs such as language comprehension, reasoning, and information retrieval across various educational stages, from lower primary school to upper secondary school (ii) its inclusion of rich metadata such as human response rates, difficulty levels, and descriptive answers (iii) its utilization of new data to avoid data contamination issues prevalent in existing frameworks (iv) its use of original, non-translated data tailored for Persian speakers, ensuring the framework is free from translation challenges and errors while encompassing cultural nuances (v) its inherent scalability for future data updates and evaluations without requiring special human effort. Previous works lacked an evaluation framework that combined all of these features into a single comprehensive benchmark. Furthermore, we evaluate a wide range of existing LLMs that support the Persian language, with statistical analyses and interpretations of their outputs.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Doxastic Lukasiewicz Logic with Public Announcement
Authors:
Doratossadat Dastgheib,
Hadi Farahani
Abstract:
In this paper, we propose a doxastic extension $BL^+$ of Lukasiewicz logic which is sound and complete relative to the introduced corresponding semantics. Also, we equip our doxastic Lukasiewicz logic $BL^+$ with public announcement and propose the logic $DL$. As an application, we model a fuzzy version of muddy children puzzle with public announcement using $DL$. Finally, we define a translation…
▽ More
In this paper, we propose a doxastic extension $BL^+$ of Lukasiewicz logic which is sound and complete relative to the introduced corresponding semantics. Also, we equip our doxastic Lukasiewicz logic $BL^+$ with public announcement and propose the logic $DL$. As an application, we model a fuzzy version of muddy children puzzle with public announcement using $DL$. Finally, we define a translation between $DL$ and $BL^+$, and prove the soundness and completeness theorems for D L
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
The Touché23-ValueEval Dataset for Identifying Human Values behind Arguments
Authors:
Nailia Mirzakhmedova,
Johannes Kiesel,
Milad Alshomary,
Maximilian Heinrich,
Nicolas Handke,
Xiaoni Cai,
Barriere Valentin,
Doratossadat Dastgheib,
Omid Ghahroodi,
Mohammad Ali Sadraei,
Ehsaneddin Asgari,
Lea Kawaletz,
Henning Wachsmuth,
Benno Stein
Abstract:
We present the Touché23-ValueEval Dataset for Identifying Human Values behind Arguments. To investigate approaches for the automated detection of human values behind arguments, we collected 9324 arguments from 6 diverse sources, covering religious texts, political discussions, free-text arguments, newspaper editorials, and online democracy platforms. Each argument was annotated by 3 crowdworkers f…
▽ More
We present the Touché23-ValueEval Dataset for Identifying Human Values behind Arguments. To investigate approaches for the automated detection of human values behind arguments, we collected 9324 arguments from 6 diverse sources, covering religious texts, political discussions, free-text arguments, newspaper editorials, and online democracy platforms. Each argument was annotated by 3 crowdworkers for 54 values. The Touché23-ValueEval dataset extends the Webis-ArgValues-22. In comparison to the previous dataset, the effectiveness of a 1-Baseline decreases, but that of an out-of-the-box BERT model increases. Therefore, though the classification difficulty increased as per the label distribution, the larger dataset allows for training better models.
△ Less
Submitted 31 January, 2023;
originally announced January 2023.
-
Some Doxastic Łukasiewicz Logic
Authors:
Doratossadat Dastgheib,
Hadi Farahani
Abstract:
We propose a doxastic Łukasiewicz logic \textbf{BŁ} that is sound and complete with respect to the class of Kripke-based models in which atomic propositions and accessibility relations are both infinitely valued in the standard MV-algebra [0,1]. We also introduce some extensions of \textbf{BŁ} corresponding to axioms \textbf{D}, \textbf{4}, and \textbf{T} of classical epistemic logic. Furthermore,…
▽ More
We propose a doxastic Łukasiewicz logic \textbf{BŁ} that is sound and complete with respect to the class of Kripke-based models in which atomic propositions and accessibility relations are both infinitely valued in the standard MV-algebra [0,1]. We also introduce some extensions of \textbf{BŁ} corresponding to axioms \textbf{D}, \textbf{4}, and \textbf{T} of classical epistemic logic. Furthermore, completeness of these extensions are established corresponding to the appropriate classes of models.
△ Less
Submitted 11 December, 2023; v1 submitted 4 November, 2021;
originally announced November 2021.