-
PASTRIE: A Corpus of Prepositions Annotated with Supersense Tags in Reddit International English
Authors:
Michael Kranzlein,
Emma Manning,
Siyao Peng,
Shira Wein,
Aryaman Arora,
Bradford Salen,
Nathan Schneider
Abstract:
We present the Prepositions Annotated with Supersense Tags in Reddit International English ("PASTRIE") corpus, a new dataset containing manually annotated preposition supersenses of English data from presumed speakers of four L1s: English, French, German, and Spanish. The annotations are comprehensive, covering all preposition types and tokens in the sample. Along with the corpus, we provide analy…
▽ More
We present the Prepositions Annotated with Supersense Tags in Reddit International English ("PASTRIE") corpus, a new dataset containing manually annotated preposition supersenses of English data from presumed speakers of four L1s: English, French, German, and Spanish. The annotations are comprehensive, covering all preposition types and tokens in the sample. Along with the corpus, we provide analysis of distributional patterns across the included L1s and a discussion of the influence of L1s on L2 preposition choice.
△ Less
Submitted 23 October, 2021;
originally announced October 2021.
-
Underreporting of errors in NLG output, and what to do about it
Authors:
Emiel van Miltenburg,
Miruna-Adriana Clinciu,
Ondřej Dušek,
Dimitra Gkatzia,
Stephanie Inglis,
Leo Leppänen,
Saad Mahamood,
Emma Manning,
Stephanie Schoch,
Craig Thomson,
Luou Wen
Abstract:
We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make. This is a problem, because mistakes are an important indicator of where systems should still be improved. If authors only report overall performance metrics, the research community is left in the dark about the specific weaknesses that are exhibited by `state-of-the-art' research. Ne…
▽ More
We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make. This is a problem, because mistakes are an important indicator of where systems should still be improved. If authors only report overall performance metrics, the research community is left in the dark about the specific weaknesses that are exhibited by `state-of-the-art' research. Next to quantifying the extent of error under-reporting, this position paper provides recommendations for error identification, analysis and reporting.
△ Less
Submitted 8 August, 2021; v1 submitted 2 August, 2021;
originally announced August 2021.
-
Go local: The key to controlling the COVID-19 pandemic in the post lockdown era
Authors:
Isabel Bennett,
Jobie Budd,
Erin M. Manning,
Ed Manley,
Mengdie Zhuang,
Ingemar J. Cox,
Michael Short,
Anne M. Johnson,
Deenan Pillay,
Rachel A. McKendry
Abstract:
The UK government announced its first wave of lockdown easing on 10 May 2020, two months after the non-pharmaceutical measures to reduce the spread of COVID-19 were first introduced on 23 March 2020. Analysis of reported case rate data from Public Health England and aggregated and anonymised crowd level mobility data shows variability across local authorities in the UK. A locality-based approach t…
▽ More
The UK government announced its first wave of lockdown easing on 10 May 2020, two months after the non-pharmaceutical measures to reduce the spread of COVID-19 were first introduced on 23 March 2020. Analysis of reported case rate data from Public Health England and aggregated and anonymised crowd level mobility data shows variability across local authorities in the UK. A locality-based approach to lockdown easing is needed, enabling local public health and associated health and social care services to rapidly respond to emerging hotspots of infection. National level data will hide an increasing heterogeneity of COVID-19 infections and mobility, and new ways of real-time data presentation to the public are required. Data sources (including mobile) allow for faster visualisation than more traditional data sources, and are part of a wider trend towards near real-time analysis of outbreaks needed for timely, targeted local public health interventions. Real time data visualisation may give early warnings of unusual levels of activity which warrant further investigation by local public health authorities.
△ Less
Submitted 6 July, 2020;
originally announced July 2020.
-
A Human Evaluation of AMR-to-English Generation Systems
Authors:
Emma Manning,
Shira Wein,
Nathan Schneider
Abstract:
Most current state-of-the art systems for generating English text from Abstract Meaning Representation (AMR) have been evaluated only using automated metrics, such as BLEU, which are known to be problematic for natural language generation. In this work, we present the results of a new human evaluation which collects fluency and adequacy scores, as well as categorization of error types, for several…
▽ More
Most current state-of-the art systems for generating English text from Abstract Meaning Representation (AMR) have been evaluated only using automated metrics, such as BLEU, which are known to be problematic for natural language generation. In this work, we present the results of a new human evaluation which collects fluency and adequacy scores, as well as categorization of error types, for several recent AMR generation systems. We discuss the relative quality of these systems and how our results compare to those of automatic metrics, finding that while the metrics are mostly successful in ranking systems overall, collecting human judgments allows for more nuanced comparisons. We also analyze common errors made by these systems.
△ Less
Submitted 1 December, 2020; v1 submitted 14 April, 2020;
originally announced April 2020.