Skip to main content

Showing 1–5 of 5 results for author: Baglini, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.18209  [pdf, other

    cs.CL

    DANSK and DaCy 2.6.0: Domain Generalization of Danish Named Entity Recognition

    Authors: Kenneth Enevoldsen, Emil Trenckner Jessen, Rebekah Baglini

    Abstract: Named entity recognition is one of the cornerstones of Danish NLP, essential for language technology applications within both industry and research. However, Danish NER is inhibited by a lack of available datasets. As a consequence, no current models are capable of fine-grained named entity recognition, nor have they been evaluated for potential generalizability issues across datasets and domains.… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  2. arXiv:2105.13704  [pdf, other

    cs.CL

    Natural Language Processing 4 All (NLP4All): A New Online Platform for Teaching and Learning NLP Concepts

    Authors: Rebekah Baglini, Arthur Hjorth

    Abstract: Natural Language Processing offers new insights into language data across almost all disciplines and domains, and allows us to corroborate and/or challenge existing knowledge. The primary hurdles to widening participation in and use of these new research tools are, first, a lack of coding skills in students across K-16, and in the population at large, and second, a lack of knowledge of how NLP-met… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

    Comments: Accepted to the 5th Workshop on Teaching NLP at NAACL-HLT 2021

  3. arXiv:2102.06505  [pdf, other

    cs.CY

    When no news is bad news -- Detection of negative events from news media content

    Authors: Kristoffer L. Nielbo, Frida Haestrup, Kenneth C. Enevoldsen, Peter B. Vahlstrup, Rebekah B. Baglini, Andreas Roepstorff

    Abstract: During the first wave of Covid-19 information decoupling could be observed in the flow of news media content. The corollary of the content alignment within and between news sources experienced by readers (i.e., all news transformed into Corona-news), was that the novelty of news content went down as media focused monotonically on the pandemic event. This all-important Covid-19 news theme turned ou… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

    Comments: arXiv admin note: text overlap with arXiv:2101.02956

  4. arXiv:2101.02956  [pdf, other

    cs.CY

    News Information Decoupling: An Information Signature of Catastrophes in Legacy News Media

    Authors: Kristoffer L. Nielbo, Rebekah B. Baglini, Peter B. Vahlstrup, Kenneth C. Enevoldsen, Anja Bechmann, Andreas Roepstorff

    Abstract: Content alignment in news media was an observable information effect of Covid-19's initial phase. During the first half of 2020, legacy news media became "corona news" following national outbreak and crises management patterns. While news media are neither unbiased nor infallible as sources of events, they do provide a window into socio-cultural responses to events. In this paper, we use legacy pr… ▽ More

    Submitted 8 January, 2021; originally announced January 2021.

  5. arXiv:2005.03521  [pdf, other

    cs.CL

    The Danish Gigaword Project

    Authors: Leon Strømberg-Derczynski, Manuel R. Ciosici, Rebekah Baglini, Morten H. Christiansen, Jacob Aarup Dalsgaard, Riccardo Fusaroli, Peter Juel Henrichsen, Rasmus Hvingelby, Andreas Kirkedal, Alex Speed Kjeldsen, Claus Ladefoged, Finn Årup Nielsen, Malte Lau Petersen, Jonathan Hvithamar Rystrøm, Daniel Varab

    Abstract: Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers' socio-economic status, and Danish dialect… ▽ More

    Submitted 12 May, 2021; v1 submitted 7 May, 2020; originally announced May 2020.

    Comments: Identical to the NoDaLiDa 2021 version