Skip to main content

Showing 1–24 of 24 results for author: Poesio, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.01299  [pdf, other

    cs.CL cs.AI cs.LG

    The Effectiveness of LLMs as Annotators: A Comparative Overview and Empirical Analysis of Direct Representation

    Authors: Maja Pavlovic, Massimo Poesio

    Abstract: Large Language Models (LLMs) have emerged as powerful support tools across various natural language tasks and a range of application domains. Recent studies focus on exploring their capabilities for data annotation. This paper provides a comparative overview of twelve studies investigating the potential of LLMs in labelling data. While the models demonstrate promising cost and time-saving benefits… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: LREC-COLING NLPerspectives workshop

  2. arXiv:2404.10696  [pdf, other

    cs.CL

    Integrating knowledge bases to improve coreference and bridging resolution for the chemical domain

    Authors: Pengcheng Lu, Massimo Poesio

    Abstract: Resolving coreference and bridging relations in chemical patents is important for better understanding the precise chemical process, where chemical domain knowledge is very critical. We proposed an approach incorporating external knowledge into a multi-task learning model for both coreference and bridging resolution in the chemical domain. The results show that integrating external knowledge can b… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: working in progress

  3. arXiv:2403.05767  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    Extending Activation Steering to Broad Skills and Multiple Behaviours

    Authors: Teun van der Weij, Massimo Poesio, Nandi Schoots

    Abstract: Current large language models have dangerous capabilities, which are likely to become more problematic in the future. Activation steering techniques can be used to reduce risks from these capabilities. In this paper, we investigate the efficacy of activation steering for broad skills and multiple behaviours. First, by comparing the effects of reducing performance on general coding ability and Pyth… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Code is available at: https://github.com/TeunvdWeij/extending-activation-addition

  4. arXiv:2402.08392  [pdf, other

    cs.CL

    Large Language Models as Minecraft Agents

    Authors: Chris Madge, Massimo Poesio

    Abstract: In this work we examine the use of Large Language Models (LLMs) in the challenging setting of acting as a Minecraft agent. We apply and evaluate LLMs in the builder and architect settings, introduce clarification questions and examining the challenges and opportunities for improvement. In addition, we present a platform for online interaction with the agents and an evaluation against previous work… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  5. arXiv:2304.14803  [pdf

    cs.CL

    SemEval-2023 Task 11: Learning With Disagreements (LeWiDi)

    Authors: Elisa Leonardelli, Alexandra Uma, Gavin Abercrombie, Dina Almanea, Valerio Basile, Tommaso Fornaciari, Barbara Plank, Verena Rieser, Massimo Poesio

    Abstract: NLP datasets annotated with human judgments are rife with disagreements between the judges. This is especially true for tasks depending on subjective judgments such as sentiment analysis or offensive language detection. Particularly in these latter cases, the NLP community has come to realize that the approach of 'reconciling' these different subjective interpretations is inappropriate. Many NLP r… ▽ More

    Submitted 28 April, 2023; originally announced April 2023.

  6. arXiv:2210.12169  [pdf

    cs.CL cs.LG

    Joint Coreference Resolution for Zeros and non-Zeros in Arabic

    Authors: Abdulrahman Aloraini, Sameer Pradhan, Massimo Poesio

    Abstract: Most existing proposals about anaphoric zero pronoun (AZP) resolution regard full mention coreference and AZP resolution as two independent tasks, even though the two tasks are clearly related. The main issues that need tackling to develop a joint model for zero and non-zero mentions are the difference between the two types of arguments (zero pronouns, being null, provide no nominal information) a… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Journal ref: Published at The Seventh Arabic Natural Language Processing Workshop (WANLP 2022)

  7. arXiv:2210.05581  [pdf, other

    cs.CL

    Aggregating Crowdsourced and Automatic Judgments to Scale Up a Corpus of Anaphoric Reference for Fiction and Wikipedia Texts

    Authors: Juntao Yu, Silviu Paun, Maris Camilleri, Paloma Carretero Garcia, Jon Chamberlain, Udo Kruschwitz, Massimo Poesio

    Abstract: Although several datasets annotated for anaphoric reference/coreference exist, even the largest such datasets have limitations in terms of size, range of domains, coverage of anaphoric phenomena, and size of documents included. Yet, the approaches proposed to scale up anaphoric annotation haven't so far resulted in datasets overcoming these limitations. In this paper, we introduce a new release of… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

  8. arXiv:2205.12323  [pdf, ps, other

    cs.CL

    Scoring Coreference Chains with Split-Antecedent Anaphors

    Authors: Silviu Paun, Juntao Yu, Nafise Sadat Moosavi, Massimo Poesio

    Abstract: Anaphoric reference is an aspect of language interpretation covering a variety of types of interpretation beyond the simple case of identity reference to entities introduced via nominal expressions covered by the traditional coreference task in its most recent incarnation in ONTONOTES and similar datasets. One of these cases that go beyond simple coreference is anaphoric reference to entities that… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  9. arXiv:2109.13032  [pdf, other

    cs.CL

    Patterns of Lexical Ambiguity in Contextualised Language Models

    Authors: Janosch Haber, Massimo Poesio

    Abstract: One of the central aspects of contextualised language models is that they should be able to distinguish the meaning of lexically ambiguous words by their contexts. In this paper we investigate the extent to which the contextualised embeddings of word forms that display multiplicity of sense reflect traditional distinctions of polysemy and homonymy. To this end, we introduce an extended, human-anno… ▽ More

    Submitted 29 September, 2021; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: Accepted at Findings of EMNLP 2021. Data available at https://github.com/dali-ambiguity/Patterns-of-Lexical-Ambiguity . 9 pages, 4 figure, 4 tables. Includes appendix with 3 figures

  10. arXiv:2109.12424  [pdf, other

    cs.CL cs.LG

    Coreference Resolution for the Biomedical Domain: A Survey

    Authors: Pengcheng Lu, Massimo Poesio

    Abstract: Issues with coreference resolution are one of the most frequently mentioned challenges for information extraction from the biomedical literature. Thus, the biomedical genre has long been the second most researched genre for coreference resolution after the news domain, and the subject of a great deal of research for NLP in general. In recent years this interest has grown enormously leading to the… ▽ More

    Submitted 25 September, 2021; originally announced September 2021.

    Comments: accepted at CRAC2021@EMNLP2021

  11. arXiv:2109.09825  [pdf

    cs.CL cs.AI cs.LG

    Data Augmentation Methods for Anaphoric Zero Pronouns

    Authors: Abdulrahman Aloraini, Massimo Poesio

    Abstract: In pro-drop language like Arabic, Chinese, Italian, Japanese, Spanish, and many others, unrealized (null) arguments in certain syntactic positions can refer to a previously introduced entity, and are thus called anaphoric zero pronouns. The existing resources for studying anaphoric zero pronoun interpretation are however still limited. In this paper, we use five data augmentation methods to genera… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    Comments: CRAC2021@EMNLP2021

  12. arXiv:2104.05320  [pdf, other

    cs.CL

    Stay Together: A System for Single and Split-antecedent Anaphora Resolution

    Authors: Juntao Yu, Nafise Sadat Moosavi, Silviu Paun, Massimo Poesio

    Abstract: The state-of-the-art on basic, single-antecedent anaphora has greatly improved in recent years. Researchers have therefore started to pay more attention to more complex cases of anaphora such as split-antecedent anaphora, as in Time-Warner is considering a legal challenge to Telecommunications Inc's plan to buy half of Showtime Networks Inc-a move that could lead to all-out war between the two pow… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: accepted at NAACL 2021

  13. arXiv:2011.00286  [pdf, other

    cs.CL

    Neural Coreference Resolution for Arabic

    Authors: Abdulrahman Aloraini, Juntao Yu, Massimo Poesio

    Abstract: No neural coreference resolver for Arabic exists, in fact we are not aware of any learning-based coreference resolver for Arabic since (Bjorkelund and Kuhn, 2014). In this paper, we introduce a coreference resolution system for Arabic based on Lee et al's end to end architecture combined with the Arabic version of bert and an external mention detector. As far as we know, this is the first neural c… ▽ More

    Submitted 31 October, 2020; originally announced November 2020.

    Comments: accepted at CRAC@COLING2020

  14. arXiv:2011.00245  [pdf, ps, other

    cs.CL

    Free the Plural: Unrestricted Split-Antecedent Anaphora Resolution

    Authors: Juntao Yu, Nafise Sadat Moosavi, Silviu Paun, Massimo Poesio

    Abstract: Now that the performance of coreference resolvers on the simpler forms of anaphoric reference has greatly improved, more attention is devoted to more complex aspects of anaphora. One limitation of virtually all coreference resolution models is the focus on single-antecedent anaphors. Plural anaphors with multiple antecedents-so-called split-antecedent anaphors (as in John met Mary. They went to th… ▽ More

    Submitted 31 October, 2020; originally announced November 2020.

    Comments: accepted at COLING 2020

  15. arXiv:2005.07150  [pdf, other

    cs.CL

    Named Entity Recognition as Dependency Parsing

    Authors: Juntao Yu, Bernd Bohnet, Massimo Poesio

    Abstract: Named Entity Recognition (NER) is a fundamental task in Natural Language Processing, concerned with identifying spans of text expressing references to entities. NER research is often focused on flat entities only (flat NER), ignoring the fact that entity references can be nested, as in [Bank of [China]] (Finkel and Manning, 2009). In this paper, we use ideas from graph-based dependency parsing to… ▽ More

    Submitted 13 June, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

    Comments: Accepted by ACL 2020

  16. arXiv:2003.03666  [pdf, other

    cs.CL

    Multi-task Learning Based Neural Bridging Reference Resolution

    Authors: Juntao Yu, Massimo Poesio

    Abstract: We propose a multi task learning-based neural model for resolving bridging references tackling two key challenges. The first challenge is the lack of large corpora annotated with bridging references. To address this, we use multi-task learning to help bridging reference resolution with coreference resolution. We show that substantial improvements of up to 8 p.p. can be achieved on full bridging re… ▽ More

    Submitted 31 October, 2020; v1 submitted 7 March, 2020; originally announced March 2020.

    Comments: accepted by COLING 2020

  17. arXiv:1911.09532  [pdf, ps, other

    cs.CL

    A Cluster Ranking Model for Full Anaphora Resolution

    Authors: Juntao Yu, Alexandra Uma, Massimo Poesio

    Abstract: Anaphora resolution (coreference) systems designed for the CONLL 2012 dataset typically cannot handle key aspects of the full anaphora resolution task such as the identification of singletons and of certain types of non-referring expressions (e.g., expletives), as these aspects are not annotated in that corpus. However, the recently released dataset for the CRAC 2018 Shared Task can now be used fo… ▽ More

    Submitted 22 June, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

    Comments: LREC 2020

  18. arXiv:1910.11790  [pdf, other

    cs.CL

    Measuring Conversational Fluidity in Automated Dialogue Agents

    Authors: Keith Vella, Massimo Poesio, Michael Sigamani, Cihan Dogan, Aimore Dutra, Dimitrios Dimakopoulos, Alfredo Gemma, Ella Walters

    Abstract: We present an automated evaluation method to measure fluidity in conversational dialogue systems. The method combines various state of the art Natural Language tools into a classifier, and human ratings on these dialogues to train an automated judgment model. Our experiments show that the results are an improvement on existing metrics for measuring fluidity.

    Submitted 25 October, 2019; originally announced October 2019.

  19. arXiv:1907.12524  [pdf, other

    cs.CL

    Neural Mention Detection

    Authors: Juntao Yu, Bernd Bohnet, Massimo Poesio

    Abstract: Mention detection is an important preprocessing step for annotation and interpretation in applications such as NER and coreference resolution, but few stand-alone neural models have been proposed able to handle the full range of mentions. In this work, we propose and compare three neural network-based approaches to mention detection. The first approach is based on the mention detection part of a s… ▽ More

    Submitted 22 June, 2020; v1 submitted 29 July, 2019; originally announced July 2019.

    Comments: LREC 2020

  20. arXiv:1906.06703  [pdf, other

    cs.CL

    Using Automatically Extracted Minimum Spans to Disentangle Coreference Evaluation from Boundary Detection

    Authors: Nafise Sadat Moosavi, Leo Born, Massimo Poesio, Michael Strube

    Abstract: The common practice in coreference resolution is to identify and evaluate the maximum span of mentions. The use of maximum spans tangles coreference evaluation with the challenges of mention boundary detection like prepositional phrase attachment. To address this problem, minimum spans are manually annotated in smaller corpora. However, this additional annotation is costly and therefore, this solu… ▽ More

    Submitted 16 June, 2019; originally announced June 2019.

    Comments: ACL 2019

  21. arXiv:1402.2796  [pdf, other

    cs.CL

    PR2: A Language Independent Unsupervised Tool for Personality Recognition from Text

    Authors: Fabio Celli, Massimo Poesio

    Abstract: We present PR2, a personality recognition system available online, that performs instance-based classification of Big5 personality types from unstructured text, using language-independent features. It has been tested on English and Italian, achieving performances up to f=.68.

    Submitted 12 February, 2014; originally announced February 2014.

    Comments: 4 pages, peer reviewed

  22. arXiv:1204.4071  [pdf, other

    cs.SI physics.soc-ph

    Motivations for Participation in Socially Networked Collective Intelligence Systems

    Authors: Jon Chamberlain, Udo Kruschwitz, Massimo Poesio

    Abstract: One of the most significant challenges facing systems of collective intelligence is how to encourage participation on the scale required to produce high quality data. This paper details ongoing work with Phrase Detectives, an online game-with-a-purpose deployed on Facebook, and investigates user motivations for participation in social network gaming where the wisdom of crowds produces useful data.

    Submitted 18 April, 2012; originally announced April 2012.

    Comments: Presented at Collective Intelligence conference, 2012 (arXiv:1204.2991)

    Report number: CollectiveIntelligence/2012/50

  23. A Corpus-Based Investigation of Definite Description Use

    Authors: Massimo Poesio, Renata Vieira

    Abstract: We present the results of a study of definite descriptions use in written texts aimed at assessing the feasibility of annotating corpora with information about definite description interpretation. We ran two experiments, in which subjects were asked to classify the uses of definite descriptions in a corpus of 33 newspaper articles, containing a total of 1412 definite descriptions. We measured th… ▽ More

    Submitted 24 October, 1997; originally announced October 1997.

    Comments: 47 pages, uses fullname.sty and palatino.sty

  24. Semantic Ambiguity and Perceived Ambiguity

    Authors: Massimo Poesio

    Abstract: I explore some of the issues that arise when trying to establish a connection between the underspecification hypothesis pursued in the NLP literature and work on ambiguity in semantics and in the psychological literature. A theory of underspecification is developed `from the first principles', i.e., starting from a definition of what it means for a sentence to be semantically ambiguous and from… ▽ More

    Submitted 16 May, 1995; originally announced May 1995.

    Comments: Latex, 47 pages. Uses tree-dvips.sty, lingmacros.sty, fullname.sty

    Journal ref: K. van Deemter and S. Peters (eds.), Semantic ambiguity and Underspecification, CSLI, 1995