Showing 1–2 of 2 results for author: Domingo, O

Search v0.5.6 released 2020-02-24

arXiv:2202.06041 [pdf, other]

cs.CL cs.IR

A multi-task semi-supervised framework for Text2Graph & Graph2Text

Authors: Oriol Domingo, Marta R. Costa-jussà, Carlos Escolano

Abstract: The Artificial Intelligence industry regularly develops applications that mostly rely on Knowledge Bases, a data repository about specific, or general, domains, usually represented in a graph shape. Similar to other databases, they face two main challenges: information ingestion and information retrieval. We approach these challenges by jointly learning graph extraction from text and text generati… ▽ More The Artificial Intelligence industry regularly develops applications that mostly rely on Knowledge Bases, a data repository about specific, or general, domains, usually represented in a graph shape. Similar to other databases, they face two main challenges: information ingestion and information retrieval. We approach these challenges by jointly learning graph extraction from text and text generation from graphs. The proposed solution, a T5 architecture, is trained in a multi-task semi-supervised environment, with our collected non-parallel data, following a cycle training regime. Experiments on WebNLG dataset show that our approach surpasses unsupervised state-of-the-art results in text-to-graph and graph-to-text. More relevantly, our framework is more consistent across seen and unseen domains than supervised models. The resulting model can be easily trained in any new domain with non-parallel data, by simply adding text and graphs about it, in our cycle framework. △ Less

Submitted 18 February, 2022; v1 submitted 12 February, 2022; originally announced February 2022.

Comments: 5 pages, 2 figures, 3 tables and 8 equations
arXiv:2005.13156 [pdf, other]

cs.CL

MT-Adapted Datasheets for Datasets: Template and Repository

Authors: Marta R. Costa-jussà, Roger Creus, Oriol Domingo, Albert Domínguez, Miquel Escobar, Cayetana López, Marina Garcia, Margarita Geleta

Abstract: In this report we are taking the standardized model proposed by Gebru et al. (2018) for documenting the popular machine translation datasets of the EuroParl (Koehn, 2005) and News-Commentary (Barrault et al., 2019). Within this documentation process, we have adapted the original datasheet to the particular case of data consumers within the Machine Translation area. We are also proposing a reposito… ▽ More In this report we are taking the standardized model proposed by Gebru et al. (2018) for documenting the popular machine translation datasets of the EuroParl (Koehn, 2005) and News-Commentary (Barrault et al., 2019). Within this documentation process, we have adapted the original datasheet to the particular case of data consumers within the Machine Translation area. We are also proposing a repository for collecting the adapted datasheets in this research area △ Less

Submitted 27 May, 2020; originally announced May 2020.

ACM Class: I.2.7

Search v0.5.6 released 2020-02-24