Skip to main content

Showing 1–4 of 4 results for author: Paullada, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2205.04505  [pdf, other

    cs.CL

    Behind the Mask: Demographic bias in name detection for PII masking

    Authors: Courtney Mansfield, Amandalynne Paullada, Kristen Howell

    Abstract: Many datasets contain personally identifiable information, or PII, which poses privacy risks to individuals. PII masking is commonly used to redact personal information such as names, addresses, and phone numbers from text data. Most modern PII masking pipelines involve machine learning algorithms. However, these systems may vary in performance, such that individuals from particular demographic gr… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

  2. arXiv:2111.15366  [pdf, other

    cs.LG cs.AI cs.PF

    AI and the Everything in the Whole Wide World Benchmark

    Authors: Inioluwa Deborah Raji, Emily M. Bender, Amandalynne Paullada, Emily Denton, Alex Hanna

    Abstract: There is a tendency across different subfields in AI to valorize a small collection of influential benchmarks. These benchmarks operate as stand-ins for a range of anointed common problems that are frequently framed as foundational milestones on the path towards flexible and generalizable AI systems. State-of-the-art performance on these benchmarks is widely understood as indicative of progress to… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

    Comments: Accepted in NeurIPS 2021 Benchmarks and Datasets track

  3. arXiv:2104.08464  [pdf, other

    cs.CL

    A multilabel approach to morphosyntactic probing

    Authors: Naomi Tachikawa Shapiro, Amandalynne Paullada, Shane Steinert-Threlkeld

    Abstract: We introduce a multilabel probing task to assess the morphosyntactic representations of word embeddings from multilingual language models. We demonstrate this task with multilingual BERT (Devlin et al., 2018), training probes for seven typologically diverse languages of varying morphological complexity: Afrikaans, Croatian, Finnish, Hebrew, Korean, Spanish, and Turkish. Through this simple but rob… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

  4. Data and its (dis)contents: A survey of dataset development and use in machine learning research

    Authors: Amandalynne Paullada, Inioluwa Deborah Raji, Emily M. Bender, Emily Denton, Alex Hanna

    Abstract: Datasets have played a foundational role in the advancement of machine learning research. They form the basis for the models we design and deploy, as well as our primary medium for benchmarking and evaluation. Furthermore, the ways in which we collect, construct and share these datasets inform the kinds of problems the field pursues and the methods explored in algorithm development. However, recen… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    Journal ref: Patterns, Volume 2, Issue 11, 100336. 2021