Skip to main content

Showing 1–17 of 17 results for author: Belz, A

.
  1. arXiv:2405.07875  [pdf, ps, other

    cs.CL

    Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques

    Authors: Michela Lorandi, Anya Belz

    Abstract: Rerunning a metric-based evaluation should be more straightforward, and results should be closer, than in a human-based evaluation, especially where code and model checkpoints are made available by the original authors. As this report of our efforts to rerun a metric-based evaluation of a set of single-attribute and multiple-attribute controllable text generation (CTG) techniques shows however, su… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: The Fourth Workshop on Human Evaluation of NLP Systems (HumEval 2024) at LREC-COLING 2024

  2. arXiv:2402.12267  [pdf, other

    cs.CL

    High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models

    Authors: Michela Lorandi, Anya Belz

    Abstract: The performance of NLP methods for severely under-resourced languages cannot currently hope to match the state of the art in NLP methods for well resourced languages. We explore the extent to which pretrained large language models (LLMs) can bridge this gap, via the example of data-to-text generation for Irish, Welsh, Breton and Maltese. We test LLMs on these under-resourced languages and English,… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  3. arXiv:2401.14228  [pdf, other

    cs.CL cs.AI cs.LG

    Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods

    Authors: Mohammed Sabry, Anya Belz

    Abstract: As the cost of training ever larger language models has grown, so has the interest in reusing previously learnt knowledge. Transfer learning methods have shown how reusing non-task-specific knowledge can help in subsequent task-specific learning. In this paper, we investigate the inverse: porting whole functional modules that encode task-specific knowledge from one model to another. We designed a… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Accepted to Findings of EACL 2024. Camera ready version

  4. arXiv:2401.12209  [pdf

    physics.atom-ph quant-ph

    A Single Photon Source based on a Long-Range Interacting Room Temperature Vapor

    Authors: Felix Moumtsilis, Max Mäusezahl, Haim Nakav, Annika Belz, Robert Löw, Tilman Pfau

    Abstract: We report on the current development of a single photon source based on a long-range interacting room temperature rubidium vapor. We discuss the history of the project, the production of vapor cells, and the observation of Rabi-oscillations in the four-wave-mixing excitation scheme.

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 8 pages, 6 figures

  5. arXiv:2308.09957  [pdf, other

    cs.CL cs.AI

    Data-to-text Generation for Severely Under-Resourced Languages with GPT-3.5: A Bit of Help Needed from Google Translate

    Authors: Michela Lorandi, Anya Belz

    Abstract: LLMs like GPT are great at tasks involving English which dominates in their training data. In this paper, we look at how they cope with tasks involving languages that are severely under-represented in their training data, in the context of data-to-text generation for Irish, Maltese, Welsh and Breton. During the prompt-engineering phase we tested a range of prompt types and formats on GPT-3.5 and~4… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

  6. arXiv:2305.01633  [pdf, other

    cs.CL

    Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

    Authors: Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Anouck Braggaar, Mark Cieliebak, Elizabeth Clark, Kees van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Mingqi Gao, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Emiel Krahmer, Huiyuan Lai , et al. (17 additional authors not shown)

    Abstract: We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13\% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, a… ▽ More

    Submitted 7 August, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: 5 pages plus appendix, 4 tables, 1 figure. To appear at "Workshop on Insights from Negative Results in NLP" (co-located with EACL2023). Updated author list and acknowledgements

    MSC Class: 68 ACM Class: I.2.7

  7. arXiv:2304.12410  [pdf, other

    cs.CL cs.AI cs.LG

    PEFT-Ref: A Modular Reference Architecture and Typology for Parameter-Efficient Finetuning Techniques

    Authors: Mohammed Sabry, Anya Belz

    Abstract: Recent parameter-efficient finetuning (PEFT) techniques aim to improve over the considerable cost of fully finetuning large pretrained language models (PLM). As different PEFT techniques proliferate, it is becoming difficult to compare them, in particular in terms of (i) the structure and functionality they add to the PLM, (ii) the different types and degrees of efficiency improvements achieved, (… ▽ More

    Submitted 19 October, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

  8. arXiv:2211.09455  [pdf, other

    cs.CL

    Consultation Checklists: Standardising the Human Evaluation of Medical Note Generation

    Authors: Aleksandar Savkov, Francesco Moramarco, Alex Papadopoulos Korfiatis, Mark Perera, Anya Belz, Ehud Reiter

    Abstract: Evaluating automatically generated text is generally hard due to the inherently subjective nature of many aspects of the output quality. This difficulty is compounded in automatic consultation note generation by differing opinions between medical experts both about which patient statements should be included in generated notes and about their respective importance in arriving at a diagnosis. Previ… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: Accepted for publication at EMNLP 2022

  9. arXiv:2205.02549  [pdf, other

    cs.HC cs.CL

    User-Driven Research of Medical Note Generation Software

    Authors: Tom Knoll, Francesco Moramarco, Alex Papadopoulos Korfiatis, Rachel Young, Claudia Ruffini, Mark Perera, Christian Perstl, Ehud Reiter, Anya Belz, Aleksandar Savkov

    Abstract: A growing body of work uses Natural Language Processing (NLP) methods to automatically generate medical notes from audio recordings of doctor-patient consultations. However, there are very few studies on how such systems could be used in clinical practice, how clinicians would adjust to using them, or how system design should be influenced by such considerations. In this paper, we present three ro… ▽ More

    Submitted 6 May, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

    Comments: Accepted for publication at NAACL 2022

  10. arXiv:2204.05961  [pdf, other

    cs.CL

    Quantified Reproducibility Assessment of NLP Results

    Authors: Anya Belz, Maja Popović, Simon Mille

    Abstract: This paper describes and tests a method for carrying out quantified reproducibility assessment (QRA) that is based on concepts and definitions from metrology. QRA produces a single score estimating the degree of reproducibility of a given system and evaluation measure, on the basis of the scores from, and differences between, different reproductions. We test QRA on 18 system and evaluation measure… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: To be published in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL'22)

  11. arXiv:2204.00447  [pdf, other

    cs.CL

    Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation

    Authors: Francesco Moramarco, Alex Papadopoulos Korfiatis, Mark Perera, Damir Juric, Jack Flann, Ehud Reiter, Anya Belz, Aleksandar Savkov

    Abstract: In recent years, machine learning models have rapidly become better at generating clinical consultation notes; yet, there is little work on how to properly evaluate the generated consultation notes to understand the impact they may have on both the clinician using them and the patient's clinical safety. To address this we present an extensive human evaluation study of consultation notes where 5 cl… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.

    Comments: To be published in proceedings of ACL 2022

  12. arXiv:2110.00437  [pdf, other

    physics.atom-ph quant-ph

    Transient Density-Induced Dipolar Interactions in a Thin Vapor Cell

    Authors: Florian Christaller, Max Mäusezahl, Felix Moumtsilis, Annika Belz, Harald Kübler, Hadiseh Alaeian, Charles S. Adams, Robert Löw, Tilman Pfau

    Abstract: We exploit the effect of light-induced atomic desorption to produce high atomic densities ($n\gg k^3$) in a rubidium vapor cell. An intense off-resonant laser is pulsed for roughly one nanosecond on a micrometer-sized sapphire-coated cell, which results in the desorption of atomic clouds from both internal surfaces. We probe the transient atomic density evolution by time-resolved absorption spectr… ▽ More

    Submitted 28 April, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: 13 pages, 4+6 figures

    Journal ref: Phys. Rev. Lett. 128, 173401 (2022)

  13. arXiv:2109.01211  [pdf, other

    cs.CL

    Quantifying Reproducibility in NLP and ML

    Authors: Anya Belz

    Abstract: Reproducibility has become an intensely debated topic in NLP and ML over recent years, but no commonly accepted way of assessing reproducibility, let alone quantifying it, has so far emerged. The assumption has been that wider scientific reproducibility terminology and definitions are not applicable to NLP/ML, with the result that many different terms and definitions have been proposed, some diame… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

  14. arXiv:2103.09710  [pdf, other

    cs.CL

    The Human Evaluation Datasheet 1.0: A Template for Recording Details of Human Evaluation Experiments in NLP

    Authors: Anastasia Shimorina, Anya Belz

    Abstract: This paper introduces the Human Evaluation Datasheet, a template for recording the details of individual human evaluation experiments in Natural Language Processing (NLP). Originally taking inspiration from seminal papers by Bender and Friedman (2018), Mitchell et al. (2019), and Gebru et al. (2020), the Human Evaluation Datasheet is intended to facilitate the recording of properties of human eval… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

    Comments: Unpublished manuscript

  15. arXiv:2103.07929  [pdf, other

    cs.CL

    A Systematic Review of Reproducibility Research in Natural Language Processing

    Authors: Anya Belz, Shubham Agarwal, Anastasia Shimorina, Ehud Reiter

    Abstract: Against the background of what has been termed a reproducibility crisis in science, the NLP field is becoming increasingly interested in, and conscientious about, the reproducibility of its results. The past few years have seen an impressive range of new initiatives, events and active research in the area. However, the field is far from reaching a consensus about how reproducibility should be defi… ▽ More

    Submitted 21 March, 2021; v1 submitted 14 March, 2021; originally announced March 2021.

    Comments: To be published in proceedings of EACL'21

  16. arXiv:cs/0107017  [pdf, ps, other

    cs.CL

    Learning Computational Grammars

    Authors: John Nerbonne, Anja Belz, Nicola Cancedda, Herve Dejean, James Hammerton, Rob Koeling, Stasinos Konstantopoulos, Miles Osborne, Franck Thollard, Erik F. Tjong Kim Sang

    Abstract: This paper reports on the "Learning Computational Grammars" (LCG) project, a postdoc network devoted to studying the application of machine learning techniques to grammars suitable for computational use. We were interested in a more systematic survey to understand the relevance of many factors to the success of learning, esp. the availability of annotated data, the kind of dependencies in the da… ▽ More

    Submitted 15 July, 2001; originally announced July 2001.

    ACM Class: I.2.7

    Journal ref: In: Walter Daelemans and Remi Zajac (eds.), Proceedings of CoNLL-2001, Toulouse, France, 2001, pp. 97-104

  17. arXiv:cs/0102020  [pdf, ps, other

    cs.CL

    Multi-Syllable Phonotactic Modelling

    Authors: Anja Belz

    Abstract: This paper describes a novel approach to constructing phonotactic models. The underlying theoretical approach to phonological description is the multisyllable approach in which multiple syllable classes are defined that reflect phonotactically idiosyncratic syllable subcategories. A new finite-state formalism, OFS Modelling, is used as a tool for encoding, automatically constructing and generali… ▽ More

    Submitted 22 February, 2001; originally announced February 2001.

    Comments: 11 pages, 4 tables, 9 figures, workshop

    ACM Class: I.2.7

    Journal ref: Jason Eisner, Lauri Karttunen and Alain Theriault (eds.), Finite-State Phonology: Proceedings of the 5th Workshop of the ACL Special Interest Group in Computational Phonology (SIGPHON), pp. 46-56. Luxembourg, August 2000