Carolina: a General Corpus of Contemporary Brazilian Portuguese with Provenance, Typology and Versioning Information
Authors:
Maria Clara Ramos Morales Crespo,
Maria Lina de Souza Jeannine Rocha,
Mariana Lourenço Sturzeneker,
Felipe Ribas Serras,
Guilherme Lamartine de Mello,
Aline Silva Costa,
Mayara Feliciano Palma,
Renata Morais Mesquita,
Raquel de Paula Guets,
Mariana Marques da Silva,
Marcelo Finger,
Maria Clara Paixão de Sousa,
Cristiane Namiuti,
Vanessa Martins do Monte
Abstract:
This paper presents the first publicly available version of the Carolina Corpus and discusses its future directions. Carolina is a large open corpus of Brazilian Portuguese texts under construction using web-as-corpus methodology enhanced with provenance, typology, versioning, and text integrality. The corpus aims at being used both as a reliable source for research in Linguistics and as an import…
▽ More
This paper presents the first publicly available version of the Carolina Corpus and discusses its future directions. Carolina is a large open corpus of Brazilian Portuguese texts under construction using web-as-corpus methodology enhanced with provenance, typology, versioning, and text integrality. The corpus aims at being used both as a reliable source for research in Linguistics and as an important resource for Computer Science research on language models, contributing towards removing Portuguese from the set of low-resource languages. Here we present the construction of the corpus methodology, comparing it with other existing methodologies, as well as the corpus current state: Carolina's first public version has $653,322,577$ tokens, distributed over $7$ broad types. Each text is annotated with several different metadata categories in its header, which we developed using TEI annotation standards. We also present ongoing derivative works and invite NLP researchers to contribute with their own.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
Free Instrument for Movement Measure
Authors:
Norberto Peña,
Bruno Cecílio Credidio,
Lorena Peixoto Nogueira Rodriguez Martinez Salles Corrêa,
Lucas Gabriel Souza França,
Marcelo do Vale Cunha,
Marcos Cavalcanti de Sousa,
João Paulo Bomfim Cruz Vieira,
José Garcia Vivas Miranda
Abstract:
This paper presents the validation of a computational tool that serves to obtain continuous measurements of moving objects. The software uses techniques of computer vision, pattern recognition and optical flow, to enable tracking of objects in videos, generating data trajectory, velocity, acceleration and angular movement. The program was applied to track a ball around a simple pendulum. The metho…
▽ More
This paper presents the validation of a computational tool that serves to obtain continuous measurements of moving objects. The software uses techniques of computer vision, pattern recognition and optical flow, to enable tracking of objects in videos, generating data trajectory, velocity, acceleration and angular movement. The program was applied to track a ball around a simple pendulum. The methodology used to validate it, taking as a basis to compare the values measured by the program, as well as the theoretical values expected according to the model of a simple pendulum. The experiment is appropriate to the method because it was built within the limits of the linear harmonic oscillator and energy losses due to friction had been minimized, making it the most ideal possible. The results indicate that the tool is sensitive and accurate. Deviations of less than a millimeter to the extent of the trajectory, ensures the applicability of the software on physics, whether in research or in teaching topics.
△ Less
Submitted 29 June, 2013;
originally announced July 2013.