Showing 1–2 of 2 results for author: Elgaar, M

Search v0.5.6 released 2020-02-24

arXiv:2310.20121 [pdf, other]

cs.CL

Ling-CL: Understanding NLP Models through Linguistic Curricula

Authors: Mohamed Elgaar, Hadi Amiri

Abstract: We employ a characterization of linguistic complexity from psycholinguistic and language acquisition research to develop data-driven curricula to understand the underlying linguistic knowledge that models learn to address NLP tasks. The novelty of our approach is in the development of linguistic curricula derived from data, existing knowledge about linguistic complexity, and model behavior during… ▽ More We employ a characterization of linguistic complexity from psycholinguistic and language acquisition research to develop data-driven curricula to understand the underlying linguistic knowledge that models learn to address NLP tasks. The novelty of our approach is in the development of linguistic curricula derived from data, existing knowledge about linguistic complexity, and model behavior during training. By analyzing several benchmark NLP datasets, our curriculum learning approaches identify sets of linguistic metrics (indices) that inform the challenges and reasoning required to address each task. Our work will inform future research in all NLP areas, allowing linguistic complexity to be considered early in the research and development process. In addition, our work prompts an examination of gold standards and fair evaluation in NLP. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: EMNLP 2023
arXiv:2307.07412 [pdf, other]

cs.LG cs.CL

HuCurl: Human-induced Curriculum Discovery

Authors: Mohamed Elgaar, Hadi Amiri

Abstract: We introduce the problem of curriculum discovery and describe a curriculum learning framework capable of discovering effective curricula in a curriculum space based on prior knowledge about sample difficulty. Using annotation entropy and loss as measures of difficulty, we show that (i): the top-performing discovered curricula for a given model and dataset are often non-monotonic as opposed to mono… ▽ More We introduce the problem of curriculum discovery and describe a curriculum learning framework capable of discovering effective curricula in a curriculum space based on prior knowledge about sample difficulty. Using annotation entropy and loss as measures of difficulty, we show that (i): the top-performing discovered curricula for a given model and dataset are often non-monotonic as opposed to monotonic curricula in existing literature, (ii): the prevailing easy-to-hard or hard-to-easy transition curricula are often at the risk of underperforming, and (iii): the curricula discovered for smaller datasets and models perform well on larger datasets and models respectively. The proposed framework encompasses some of the existing curriculum learning approaches and can discover curricula that outperform them across several NLP tasks. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Comments: In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL)

Search v0.5.6 released 2020-02-24