Skip to main content

Showing 1–8 of 8 results for author: Ferreira, T C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.08496  [pdf, other

    cs.CL cs.AI cs.LG

    A Systematic Review of Data-to-Text NLG

    Authors: Chinonso Cynthia Osuji, Thiago Castro Ferreira, Brian Davis

    Abstract: This systematic review undertakes a comprehensive analysis of current research on data-to-text generation, identifying gaps, challenges, and future directions within the field. Relevant literature in this field on datasets, evaluation metrics, application areas, multilingualism, language models, and hallucination mitigation methods is reviewed. Various methods for producing high-quality text are e… ▽ More

    Submitted 26 February, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  2. arXiv:2401.11268  [pdf, other

    cs.CL cs.SD eess.AS

    Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing through Analyzing Attentions of a Reference-Free Metric

    Authors: Golara Javadi, Kamer Ali Yuksel, Yunsu Kim, Thiago Castro Ferreira, Mohamed Al-Badrashiny

    Abstract: In the realm of automatic speech recognition (ASR), the quest for models that not only perform with high accuracy but also offer transparency in their decision-making processes is crucial. The potential of quality estimation (QE) metrics is introduced and evaluated as a novel tool to enhance explainable artificial intelligence (XAI) in ASR systems. Through experiments and analyses, the capabilitie… ▽ More

    Submitted 2 February, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

    Journal ref: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024), Seoul, Korea

  3. arXiv:2207.06839  [pdf, ps, other

    cs.CL

    Neural Data-to-Text Generation Based on Small Datasets: Comparing the Added Value of Two Semi-Supervised Learning Approaches on Top of a Large Language Model

    Authors: Chris van der Lee, Thiago Castro Ferreira, Chris Emmery, Travis Wiltshire, Emiel Krahmer

    Abstract: This study discusses the effect of semi-supervised learning in combination with pretrained language models for data-to-text generation. It is not known whether semi-supervised learning is still helpful when a large-scale language model is also supplemented. This study aims to answer this question by comparing a data-to-text system only supplemented with a language model, to two data-to-text system… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: 22 pages (excluding bibliography and appendix)

  4. arXiv:2009.07728  [pdf, other

    cs.CL

    NABU $\mathrm{-}$ Multilingual Graph-based Neural RDF Verbalizer

    Authors: Diego Moussallem, Dwaraknath Gnaneshwar, Thiago Castro Ferreira, Axel-Cyrille Ngonga Ngomo

    Abstract: The RDF-to-text task has recently gained substantial attention due to continuous growth of Linked Data. In contrast to traditional pipeline models, recent studies have focused on neural models, which are now able to convert a set of RDF triples into text in an end-to-end style with promising results. However, English is the only language widely targeted. We address this research gap by presenting… ▽ More

    Submitted 21 September, 2020; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: International Semantic Web Conference (ISWC) 2020

  5. arXiv:1908.09022  [pdf, ps, other

    cs.CL

    Neural data-to-text generation: A comparison between pipeline and end-to-end architectures

    Authors: Thiago Castro Ferreira, Chris van der Lee, Emiel van Miltenburg, Emiel Krahmer

    Abstract: Traditionally, most data-to-text applications have been designed using a modular pipeline architecture, in which non-linguistic input data is converted into natural language through several intermediate transformations. In contrast, recent neural models for data-to-text generation have been proposed as end-to-end approaches, where the non-linguistic input is rendered in natural language with much… ▽ More

    Submitted 27 November, 2019; v1 submitted 23 August, 2019; originally announced August 2019.

    Comments: Preprint version of the EMNLP 2019 article

  6. arXiv:1805.08093  [pdf, ps, other

    cs.CL

    NeuralREG: An end-to-end approach to referring expression generation

    Authors: Thiago Castro Ferreira, Diego Moussallem, Ákos Kádár, Sander Wubben, Emiel Krahmer

    Abstract: Traditionally, Referring Expression Generation (REG) models first decide on the form and then on the content of references to discourse entities in text, typically relying on features such as salience and grammatical function. In this paper, we present a new approach (NeuralREG), relying on deep neural networks, which makes decisions about form and content in one go without explicit feature extrac… ▽ More

    Submitted 21 May, 2018; originally announced May 2018.

    Comments: Accepted for presentation at ACL 2018

  7. arXiv:1802.08150  [pdf, other

    cs.CL

    RDF2PT: Generating Brazilian Portuguese Texts from RDF Data

    Authors: Diego Moussallem, Thiago Castro Ferreira, Marcos Zampieri, Maria Claudia Cavalcanti, Geraldo Xexéo, Mariana Neves, Axel-Cyrille Ngonga Ngomo

    Abstract: The generation of natural language from Resource Description Framework (RDF) data has recently gained significant attention due to the continuous growth of Linked Data. A number of these approaches generate natural language in languages other than English, however, no work has been proposed to generate Brazilian Portuguese texts out of RDF. We address this research gap by presenting RDF2PT, an app… ▽ More

    Submitted 22 February, 2018; originally announced February 2018.

    Comments: Accepted for publication in Language Resources and Evaluation Conference (LREC) 2018

  8. arXiv:1704.03693  [pdf, ps, other

    cs.CL

    Trainable Referring Expression Generation using Overspecification Preferences

    Authors: Thiago castro Ferreira, Ivandre Paraboni

    Abstract: Referring expression generation (REG) models that use speaker-dependent information require a considerable amount of training data produced by every individual speaker, or may otherwise perform poorly. In this work we present a simple REG experiment that allows the use of larger training data sets by grou** speakers according to their overspecification preferences. Intrinsic evaluation shows tha… ▽ More

    Submitted 12 April, 2017; originally announced April 2017.

    Comments: 8 pages