Skip to main content

Showing 1–2 of 2 results for author: Luitel, N

.
  1. Contextual Spelling Correction with Language Model for Low-resource Setting

    Authors: Nishant Luitel, Nirajan Bekoju, Anand Kumar Sah, Subarna Shakya

    Abstract: The task of Spell Correction(SC) in low-resource languages presents a significant challenge due to the availability of only a limited corpus of data and no annotated spelling correction datasets. To tackle these challenges a small-scale word-based transformer LM is trained to provide the SC model with contextual understanding. Further, the probabilistic error rules are extracted from the corpus in… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 8 pages

  2. arXiv:2404.18071  [pdf

    cs.CL cs.LG

    Can Perplexity Predict Fine-Tuning Performance? An Investigation of Tokenization Effects on Sequential Language Models for Nepali

    Authors: Nishant Luitel, Nirajan Bekoju, Anand Kumar Sah, Subarna Shakya

    Abstract: Recent language models use subwording mechanisms to handle Out-of-Vocabulary(OOV) words seen during test time and, their generation capacity is generally measured using perplexity, an intrinsic metric. It is known that increasing the subword granularity results in a decrease of perplexity value. However, the study of how subwording affects the understanding capacity of language models has been ver… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 11 pages