Skip to main content

Showing 1–1 of 1 results for author: Mhaske, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2212.10168  [pdf, other

    cs.CL

    Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages

    Authors: Arnav Mhaske, Harshit Kedia, Sumanth Doddapaneni, Mitesh M. Khapra, Pratyush Kumar, Rudra Murthy V, Anoop Kunchukuttan

    Abstract: We present, Naamapadam, the largest publicly available Named Entity Recognition (NER) dataset for the 11 major Indian languages from two language families. The dataset contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location, and, Organization) for 9 out of the 11 languages. The training dataset has been automaticall… ▽ More

    Submitted 28 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL 2023