Skip to main content

Showing 1–13 of 13 results for author: Christodoulopoulos, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.14293  [pdf, other

    cs.CL

    WebIE: Faithful and Robust Information Extraction on the Web

    Authors: Chenxi Whitehouse, Clara Vania, Alham Fikri Aji, Christos Christodoulopoulos, Andrea Pierleoni

    Abstract: Extracting structured and grounded fact triples from raw text is a fundamental task in Information Extraction (IE). Existing IE datasets are typically collected from Wikipedia articles, using hyperlinks to link entities to the Wikidata knowledge base. However, models trained only on Wikipedia have limitations when applied to web domains, which often contain noisy text or text that does not have an… ▽ More

    Submitted 15 June, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: ACL 2023 Main Conference

  2. State-of-the-art generalisation research in NLP: A taxonomy and review

    Authors: Dieuwke Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella Sinclair, Dennis Ulmer, Florian Schottmann, Khuyagbaatar Batsuren, Kaiser Sun, Koustuv Sinha, Leila Khalatbari, Maria Ryskina, Rita Frieske, Ryan Cotterell, Zhi**g **

    Abstract: The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what 'good generalisation' entails and how it should be evaluated is not well understood, nor are there any evaluation standards for generalisation. In this paper, we lay the groundwork to address both of these issues. We present a taxonomy for characterising and understanding generalisation… ▽ More

    Submitted 12 January, 2024; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: This preprint was published as an Analysis article in Nature Machine Intelligence. Please refer to the published version when citing this work. 28 pages of content + 6 pages of appendix + 52 pages of references

    Journal ref: Nat Mach Intell 5, 1161-1174 (2023)

  3. arXiv:2207.04108  [pdf, other

    cs.CL

    ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity Linking

    Authors: Tom Ayoola, Shubhi Tyagi, Joseph Fisher, Christos Christodoulopoulos, Andrea Pierleoni

    Abstract: We introduce ReFinED, an efficient end-to-end entity linking model which uses fine-grained entity types and entity descriptions to perform linking. The model performs mention detection, fine-grained entity ty**, and entity disambiguation for all mentions within a document in a single forward pass, making it more than 60 times faster than competitive existing approaches. ReFinED also surpasses st… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

    Comments: Accepted at NAACL Industry Track 2022

  4. arXiv:2112.07618  [pdf, other

    cs.IR cs.LG

    Robust Information Retrieval for False Claims with Distracting Entities In Fact Extraction and Verification

    Authors: Mingwen Dong, Christos Christodoulopoulos, Sheng-Min Shih, Xiaofei Ma

    Abstract: Accurate evidence retrieval is essential for automated fact checking. Little previous research has focused on the differences between true and false claims and how they affect evidence retrieval. This paper shows that, compared with true claims, false claims more frequently contain irrelevant entities which can distract evidence retrieval model. A BERT-based retrieval model made more mistakes in r… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

  5. arXiv:2106.05707  [pdf, other

    cs.CL

    FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information

    Authors: Rami Aly, Zhijiang Guo, Michael Schlichtkrull, James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Oana Cocarascu, Arpit Mittal

    Abstract: Fact verification has attracted a lot of attention in the machine learning and natural language processing communities, as it is one of the key methods for detecting misinformation. Existing large-scale benchmarks for this task have focused mostly on textual sources, i.e. unstructured information, and thus ignored the wealth of information available in structured formats, such as tables. In this p… ▽ More

    Submitted 12 October, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Accepted at NeurIPS 2021 Datasets and Benchmarks Track

  6. arXiv:2104.10130  [pdf, other

    cs.CL cs.AI

    Hidden Biases in Unreliable News Detection Datasets

    Authors: Xiang Zhou, Heba Elfardy, Christos Christodoulopoulos, Thomas Butler, Mohit Bansal

    Abstract: Automatic unreliable news detection is a research problem with great potential impact. Recently, several papers have shown promising results on large-scale news datasets with models that only use the article itself without resorting to any fact-checking mechanism or retrieving any supporting evidence. In this work, we take a closer look at these datasets. While they all provide valuable resources… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

    Comments: EACL 2021 (11 pages, 3 figures, 8 tables)

  7. arXiv:1912.02761  [pdf, other

    cs.CL

    Measuring Social Bias in Knowledge Graph Embeddings

    Authors: Joseph Fisher, Dave Palfrey, Christos Christodoulopoulos, Arpit Mittal

    Abstract: It has recently been shown that word embeddings encode social biases, with a harmful impact on downstream tasks. However, to this point there has been no similar work done in the field of graph embeddings. We present the first study on social bias in knowledge graph embeddings, and propose a new metric suitable for measuring such bias. We conduct experiments on Wikidata and Freebase, and show that… ▽ More

    Submitted 7 May, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

  8. arXiv:1904.10717  [pdf, ps, other

    cs.LG cs.CL stat.ML

    Generating Token-Level Explanations for Natural Language Inference

    Authors: James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal

    Abstract: The task of Natural Language Inference (NLI) is widely modeled as supervised sentence pair classification. While there has been a lot of work recently on generating explanations of the predictions of classifiers on a single piece of text, there have been no attempts to generate explanations of classifiers operating on pairs of sentences. In this paper, we show that it is possible to generate token… ▽ More

    Submitted 24 April, 2019; originally announced April 2019.

    Comments: Accepted at NAACL2019

  9. arXiv:1811.10971  [pdf, ps, other

    cs.CL

    The Fact Extraction and VERification (FEVER) Shared Task

    Authors: James Thorne, Andreas Vlachos, Oana Cocarascu, Christos Christodoulopoulos, Arpit Mittal

    Abstract: We present the results of the first Fact Extraction and VERification (FEVER) Shared Task. The task challenged participants to classify whether human-written factoid claims could be Supported or Refuted using evidence retrieved from Wikipedia. We received entries from 23 competing teams, 19 of which scored higher than the previously published baseline. The best performing system achieved a FEVER sc… ▽ More

    Submitted 30 November, 2018; v1 submitted 27 November, 2018; originally announced November 2018.

    Comments: Revised from published version in the proceedings of the FEVER workshop at EMNLP 2018

  10. arXiv:1803.09091  [pdf, other

    cs.CL

    Simple Large-scale Relation Extraction from Unstructured Text

    Authors: Christos Christodoulopoulos, Arpit Mittal

    Abstract: Knowledge-based question answering relies on the availability of facts, the majority of which cannot be found in structured sources (e.g. Wikipedia info-boxes, Wikidata). One of the major components of extracting facts from unstructured text is Relation Extraction (RE). In this paper we propose a novel method for creating distant (weak) supervision labels for training a large-scale RE system. We a… ▽ More

    Submitted 24 March, 2018; originally announced March 2018.

    Comments: To be published in LREC 2018

  11. arXiv:1803.05355  [pdf, other

    cs.CL

    FEVER: a large-scale dataset for Fact Extraction and VERification

    Authors: James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal

    Abstract: In this paper we introduce a new publicly available dataset for verification against textual sources, FEVER: Fact Extraction and VERification. It consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from. The claims are classified as Supported, Refuted or NotEnoughInfo by annotators achievi… ▽ More

    Submitted 18 December, 2018; v1 submitted 14 March, 2018; originally announced March 2018.

    Comments: Updated version of NAACL2018 paper. Data is released on http://fever.ai

  12. arXiv:1707.07794  [pdf, other

    cs.AI cs.DB

    Relational Learning and Feature Extraction by Querying over Heterogeneous Information Networks

    Authors: Parisa Kordjamshidi, Sameer Singh, Daniel Khashabi, Christos Christodoulopoulos, Mark Summons, Saurabh Sinha, Dan Roth

    Abstract: Many real world systems need to operate on heterogeneous information networks that consist of numerous interacting components of different types. Examples include systems that perform data analysis on biological information networks; social networks; and information extraction systems processing unstructured data to convert raw text to knowledge graphs. Many previous works describe specialized app… ▽ More

    Submitted 24 July, 2017; originally announced July 2017.

    Comments: Seventh International Workshop on Statistical Relational AI, 2017

  13. arXiv:1609.04325  [pdf, other

    cs.CL

    Transliteration in Any Language with Surrogate Languages

    Authors: Stephen Mayhew, Christos Christodoulopoulos, Dan Roth

    Abstract: We introduce a method for transliteration generation that can produce transliterations in every language. Where previous results are only as multilingual as Wikipedia, we show how to use training data from Wikipedia as surrogate training for any language. Thus, the problem becomes one of ranking Wikipedia languages in order of suitability with respect to a target language. We introduce several tas… ▽ More

    Submitted 14 September, 2016; originally announced September 2016.