Showing 1–2 of 2 results for author: Strømberg-Derczynski, L

Search v0.5.6 released 2020-02-24

arXiv:2103.09656 [pdf, other]

cs.LG cs.AI

doi 10.1613/jair.1.12839

Set-to-Sequence Methods in Machine Learning: a Review

Authors: Mateusz Jurewicz, Leon Strømberg-Derczynski

Abstract: Machine learning on sets towards sequential output is an important and ubiquitous task, with applications ranging from language modeling and meta-learning to multi-agent strategy games and power grid optimization. Combining elements of representation learning and structured prediction, its two primary challenges include obtaining a meaningful, permutation invariant set representation and subsequen… ▽ More Machine learning on sets towards sequential output is an important and ubiquitous task, with applications ranging from language modeling and meta-learning to multi-agent strategy games and power grid optimization. Combining elements of representation learning and structured prediction, its two primary challenges include obtaining a meaningful, permutation invariant set representation and subsequently utilizing this representation to output a complex target permutation. This paper provides a comprehensive introduction to the field as well as an overview of important machine learning methods tackling both of these key challenges, with a detailed qualitative comparison of selected model architectures. △ Less

Submitted 16 August, 2021; v1 submitted 17 March, 2021; originally announced March 2021.

Comments: 46 pages of text, with 10 pages of references. Contains 2 tables and 4 figures. Updated version includes expanded notes on method comparison

MSC Class: 68T07 (Primary) 68T01 (Secondary) ACM Class: A.1; I.2.6

Journal ref: Journal of Artificial Intelligence Research 71 (2021): 885 - 924
arXiv:2005.03521 [pdf, other]

cs.CL

The Danish Gigaword Project

Authors: Leon Strømberg-Derczynski, Manuel R. Ciosici, Rebekah Baglini, Morten H. Christiansen, Jacob Aarup Dalsgaard, Riccardo Fusaroli, Peter Juel Henrichsen, Rasmus Hvingelby, Andreas Kirkedal, Alex Speed Kjeldsen, Claus Ladefoged, Finn Årup Nielsen, Malte Lau Petersen, Jonathan Hvithamar Rystrøm, Daniel Varab

Abstract: Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers' socio-economic status, and Danish dialect… ▽ More Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers' socio-economic status, and Danish dialects. △ Less

Submitted 12 May, 2021; v1 submitted 7 May, 2020; originally announced May 2020.

Comments: Identical to the NoDaLiDa 2021 version

Search v0.5.6 released 2020-02-24