-
Carolina: a General Corpus of Contemporary Brazilian Portuguese with Provenance, Typology and Versioning Information
Authors:
Maria Clara Ramos Morales Crespo,
Maria Lina de Souza Jeannine Rocha,
Mariana Lourenço Sturzeneker,
Felipe Ribas Serras,
Guilherme Lamartine de Mello,
Aline Silva Costa,
Mayara Feliciano Palma,
Renata Morais Mesquita,
Raquel de Paula Guets,
Mariana Marques da Silva,
Marcelo Finger,
Maria Clara Paixão de Sousa,
Cristiane Namiuti,
Vanessa Martins do Monte
Abstract:
This paper presents the first publicly available version of the Carolina Corpus and discusses its future directions. Carolina is a large open corpus of Brazilian Portuguese texts under construction using web-as-corpus methodology enhanced with provenance, typology, versioning, and text integrality. The corpus aims at being used both as a reliable source for research in Linguistics and as an import…
▽ More
This paper presents the first publicly available version of the Carolina Corpus and discusses its future directions. Carolina is a large open corpus of Brazilian Portuguese texts under construction using web-as-corpus methodology enhanced with provenance, typology, versioning, and text integrality. The corpus aims at being used both as a reliable source for research in Linguistics and as an important resource for Computer Science research on language models, contributing towards removing Portuguese from the set of low-resource languages. Here we present the construction of the corpus methodology, comparing it with other existing methodologies, as well as the corpus current state: Carolina's first public version has $653,322,577$ tokens, distributed over $7$ broad types. Each text is annotated with several different metadata categories in its header, which we developed using TEI annotation standards. We also present ongoing derivative works and invite NLP researchers to contribute with their own.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
Horizontal resolution in a nested-domain WRF simulation: a Bayesian analysis approach
Authors:
Michel d. S. Mesquita,
Bjørn Ådlandsvik,
Cindy Bruyère,
Anne D. Sandvik
Abstract:
The fast-paced development of state-of-the-art limited area models and faster computational resources have made it possible to create simulations at increasing horizontal resolution. This has led to a ubiquitous demand for even higher resolutions from users of various disciplines. This study revisits one of the simulations used in marine ecosystem projects at the Bjerknes Centre. We present a fres…
▽ More
The fast-paced development of state-of-the-art limited area models and faster computational resources have made it possible to create simulations at increasing horizontal resolution. This has led to a ubiquitous demand for even higher resolutions from users of various disciplines. This study revisits one of the simulations used in marine ecosystem projects at the Bjerknes Centre. We present a fresh perspective on the assessment of these data, related more specifically to: a) the value added by increased horizontal resolution; and b) a new method for comparing sensitivity studies. The assessment is made using a Bayesian framework for the distribution of mean surface temperature in the Hardanger fjord region in Norway. Population estimates are calculated based on samples from the joint posterior distribution generated using a Monte Carlo procedure. The Bayesian statistical model is applied to output data from the Weather Research and Forecasting (WRF) model at three horizontal resolutions (9, 3 and 1 km) and the ERA Interim Reanalysis. The period considered in this study is from 2007 to 2009, for the months of April, May and June.
△ Less
Submitted 28 May, 2014;
originally announced May 2014.
-
The Dengue risk transmission during the FIFA 2014 World Cup
Authors:
Paulo S. Lucio,
Nicolas Degallier,
Maria H. C. Spyrides,
Cláudio M. S. e Silva,
Julio C. B. da Silva,
Helder J. F. da Silva,
Geovane Máximo,
Walter Junior,
Michel Mesquita
Abstract:
Dengue is a viral infection that can produce a severe fever and symptoms that may require hospitalization. It is transmitted between humans by the urban-adapted, day-biting Aedes mosquitoes and is therefore a particular problem in towns and cities. To explore this risk, one has assessed the potential levels of exposure by a climate-driven model for dengue risk transmission in Brazil and records of…
▽ More
Dengue is a viral infection that can produce a severe fever and symptoms that may require hospitalization. It is transmitted between humans by the urban-adapted, day-biting Aedes mosquitoes and is therefore a particular problem in towns and cities. To explore this risk, one has assessed the potential levels of exposure by a climate-driven model for dengue risk transmission in Brazil and records of its seasonal variation at the key sites. Like the weather, it is unworkable to forecast the precise situation with regard to dengue in Brazil in 2014. One can, however, make informed guesses on the basis of averaged records of dengue in previous years. For the areas around the World Cup stadiums, these records show that the main dengue season will have passed before the World Cup is held in June and July. Unfortunately, the risk remains, even this is low but not negliegible, during these months in the Brazilian north and northeast. But, in fact, the risk of an outbreak of dengue fever during the upcoming soccer World Cup in Brazil is not serious enough to warrant a high alert in the host cities, according to a reliable early warning system for the disease.
△ Less
Submitted 23 May, 2014;
originally announced May 2014.