Skip to main content

Showing 1–2 of 2 results for author: Minakova, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.16302  [pdf, other

    cs.LG cs.CL

    Comprehensive Benchmarking of Entropy and Margin Based Scoring Metrics for Data Selection

    Authors: Anusha Sabbineni, Nikhil Anand, Maria Minakova

    Abstract: While data selection methods have been studied extensively in active learning, data pruning, and data augmentation settings, there is little evidence for the efficacy of these methods in industry scale settings, particularly in low-resource languages. Our work presents ways of assessing prospective training examples in those settings for their "usefulness" or "difficulty". We also demonstrate how… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted to Efficient Natural Language and Speech Processing (ENLSP-III) workshop at NeurIPS '23

  2. arXiv:2311.16298  [pdf, other

    cs.LG cs.CL

    Influence Scores at Scale for Efficient Language Data Sampling

    Authors: Nikhil Anand, Joshua Tan, Maria Minakova

    Abstract: Modern ML systems ingest data aggregated from diverse sources, such as synthetic, human-annotated, and live customer traffic. Understanding \textit{which} examples are important to the performance of a learning algorithm is crucial for efficient model training. Recently, a growing body of literature has given rise to various "influence scores," which use training artifacts such as model confidence… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted at EMNLP '23