Skip to main content

Showing 1–1 of 1 results for author: Rocha, M L d S J

.
  1. arXiv:2303.16098  [pdf, other

    cs.CL cs.AI

    Carolina: a General Corpus of Contemporary Brazilian Portuguese with Provenance, Typology and Versioning Information

    Authors: Maria Clara Ramos Morales Crespo, Maria Lina de Souza Jeannine Rocha, Mariana Lourenço Sturzeneker, Felipe Ribas Serras, Guilherme Lamartine de Mello, Aline Silva Costa, Mayara Feliciano Palma, Renata Morais Mesquita, Raquel de Paula Guets, Mariana Marques da Silva, Marcelo Finger, Maria Clara Paixão de Sousa, Cristiane Namiuti, Vanessa Martins do Monte

    Abstract: This paper presents the first publicly available version of the Carolina Corpus and discusses its future directions. Carolina is a large open corpus of Brazilian Portuguese texts under construction using web-as-corpus methodology enhanced with provenance, typology, versioning, and text integrality. The corpus aims at being used both as a reliable source for research in Linguistics and as an import… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: 14 pages, 3 figures, 1 appendix

    MSC Class: 68T50 ACM Class: I.2.7