Skip to main content

Showing 1–18 of 18 results for author: Gómez-Pérez, J M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.16941  [pdf, other

    cs.CL cs.AI cs.DL

    SPACE-IDEAS: A Dataset for Salient Information Detection in Space Innovation

    Authors: Andrés García-Silva, Cristian Berrío, José Manuel Gómez-Pérez

    Abstract: Detecting salient parts in text using natural language processing has been widely used to mitigate the effects of information overflow. Nevertheless, most of the datasets available for this task are derived mainly from academic publications. We introduce SPACE-IDEAS, a dataset for salient information detection from innovation ideas related to the Space domain. The text in SPACE-IDEAS varies greatl… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted in LREC-COLING 2024

  2. Textual Entailment for Effective Triple Validation in Object Prediction

    Authors: Andrés García-Silva, Cristian Berrío, José Manuel Gómez-Pérez

    Abstract: Knowledge base population seeks to expand knowledge graphs with facts that are typically extracted from a text corpus. Recently, language models pretrained on large corpora have been shown to contain factual knowledge that can be retrieved using cloze-style strategies. Such approach enables zero-shot recall of facts, showing competitive results in object prediction compared to supervised baselines… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted to ISWC'23 - The International Semantic Web Conference

  3. Capturing Pertinent Symbolic Features for Enhanced Content-Based Misinformation Detection

    Authors: Flavio Merenda, José Manuel Gómez-Pérez

    Abstract: Preventing the spread of misinformation is challenging. The detection of misleading content presents a significant hurdle due to its extreme linguistic and domain variability. Content-based models have managed to identify deceptive language by learning representations from textual data such as social media posts and web articles. However, aggregating representative samples of this heterogeneous ph… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted at K-CAP'23: The 12th Knowledge Capture Conference

  4. arXiv:2210.15327  [pdf, ps, other

    cs.CL cs.AI

    Towards Language-driven Scientific AI

    Authors: José Manuel Gómez-Pérez

    Abstract: Inspired by recent and revolutionary developments in AI, particularly in language understanding and generation, we set about designing AI systems that are able to address complex scientific tasks that challenge human capabilities to make new discoveries. Central to our approach is the notion of natural language as core representation, reasoning, and exchange format between scientific AI and human… ▽ More

    Submitted 31 October, 2022; v1 submitted 27 October, 2022; originally announced October 2022.

  5. arXiv:2210.03640  [pdf, other

    cs.CL cs.AI

    Artificial Intelligence and Natural Language Processing and Understanding in Space: A Methodological Framework and Four ESA Case Studies

    Authors: José Manuel Gómez-Pérez, Andrés García-Silva, Rosemarie Leone, Mirko Albani, Moritz Fontaine, Charles Poncet, Leopold Summerer, Alessandro Donati, Ilaria Roma, Stefano Scaglioni

    Abstract: The European Space Agency is well known as a powerful force for scientific discovery in numerous areas related to Space. The amount and depth of the knowledge produced throughout the different missions carried out by ESA and their contribution to scientific progress is enormous, involving large collections of documents like scientific publications, feasibility studies, technical reports, and quali… ▽ More

    Submitted 24 October, 2022; v1 submitted 7 October, 2022; originally announced October 2022.

  6. arXiv:2210.03427  [pdf, other

    cs.CL cs.AI

    Generating Quizzes to Support Training on Quality Management and Assurance in Space Science and Engineering

    Authors: Andrés García-Silva, Cristian Berrío, José Manuel Gómez-Pérez

    Abstract: Quality management and assurance is key for space agencies to guarantee the success of space missions, which are high-risk and extremely costly. In this paper, we present a system to generate quizzes, a common resource to evaluate the effectiveness of training sessions, from documents about quality assurance procedures in the Space domain. Our system leverages state of the art auto-regressive mode… ▽ More

    Submitted 4 November, 2022; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: In Proceedings of the 15th International Natural Language Generation Conference (INLG 2022)

  7. arXiv:2210.03422  [pdf, other

    cs.CL cs.AI

    SpaceQA: Answering Questions about the Design of Space Missions and Space Craft Concepts

    Authors: Andrés García-Silva, Cristian Berrío, José Manuel Gómez-Pérez, José Antonio Martínez-Heras, Alessandro Donati, Ilaria Roma

    Abstract: We present SpaceQA, to the best of our knowledge the first open-domain QA system in Space mission design. SpaceQA is part of an initiative by the European Space Agency (ESA) to facilitate the access, sharing and reuse of information about Space mission design within the agency and with the public. We adopt a state-of-the-art architecture consisting of a dense retriever and a neural reader and opt… ▽ More

    Submitted 4 November, 2022; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: In proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022)

  8. On the Impact of Knowledge-based Linguistic Annotations in the Quality of Scientific Embeddings

    Authors: Andres Garcia-Silva, Ronald Denaux, Jose Manuel Gomez-Perez

    Abstract: In essence, embedding algorithms work by optimizing the distance between a word and its usual context in order to generate an embedding space that encodes the distributional representation of words. In addition to single words or word pieces, other features which result from the linguistic analysis of text, including lexical, grammatical and semantic information, can be used to improve the quality… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

    Comments: Accepted for publication in Future Generation Computer Systems

  9. arXiv:2104.06182  [pdf, other

    cs.CL cs.LG

    Understanding Transformers for Bot Detection in Twitter

    Authors: Andres Garcia-Silva, Cristian Berrio, Jose Manuel Gomez-Perez

    Abstract: In this paper we shed light on the impact of fine-tuning over social media data in the internal representations of neural language models. We focus on bot detection in Twitter, a key task to mitigate and counteract the automatic spreading of disinformation and bias in social media. We investigate the use of pre-trained language models to tackle the detection of tweets generated by a bot or a human… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

  10. arXiv:2101.08114  [pdf, other

    cs.CL cs.AI cs.DL

    Classifying Scientific Publications with BERT -- Is Self-Attention a Feature Selection Method?

    Authors: Andres Garcia-Silva, Jose Manuel Gomez-Perez

    Abstract: We investigate the self-attention mechanism of BERT in a fine-tuning scenario for the classification of scientific articles over a taxonomy of research disciplines. We observe how self-attention focuses on words that are highly related to the domain of the article. Particularly, a small subset of vocabulary words tends to receive most of the attention. We compare and evaluate the subset of the mos… ▽ More

    Submitted 20 January, 2021; originally announced January 2021.

    Comments: Paper accepted for publication at ECIR2021

  11. arXiv:2010.00562  [pdf, other

    cs.CL cs.AI cs.CV

    ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention

    Authors: Jose Manuel Gomez-Perez, Raul Ortega

    Abstract: Textbook Question Answering is a complex task in the intersection of Machine Comprehension and Visual Question Answering that requires reasoning with multimodal information from text and diagrams. For the first time, this paper taps on the potential of transformer language models and bottom-up and top-down attention to tackle the language and visual understanding challenges this task entails. Rath… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

    Comments: Accepted for publication as a long paper in EMNLP2020

  12. arXiv:2008.12742  [pdf, other

    cs.CL cs.AI cs.DL

    Linked Credibility Reviews for Explainable Misinformation Detection

    Authors: Ronald Denaux, Jose Manuel Gomez-Perez

    Abstract: In recent years, misinformation on the Web has become increasingly rampant. The research community has responded by proposing systems and challenges, which are beginning to be useful for (various subtasks of) detecting misinformation. However, most proposed systems are based on deep learning techniques which are fine-tuned to specific domains, are difficult to interpret and produce results which a… ▽ More

    Submitted 28 August, 2020; originally announced August 2020.

    Comments: Accepted to the 19th International Semantic Web Conference (ISWC 2020) https://iswc2020.semanticweb.org

  13. arXiv:1909.11042  [pdf, other

    cs.CL

    Assessing the Lexico-Semantic Relational Knowledge Captured by Word and Concept Embeddings

    Authors: Ronald Denaux, Jose Manuel Gomez-Perez

    Abstract: Deep learning currently dominates the benchmarks for various NLP tasks and, at the basis of such systems, words are frequently represented as embeddings --vectors in a low dimensional space-- learned from large text corpora and various algorithms have been proposed to learn both word and concept embeddings. One of the claimed benefits of such embeddings is that they capture knowledge about semanti… ▽ More

    Submitted 24 September, 2019; originally announced September 2019.

    Comments: Accepted at the 10th International Conference on Knowledge Capture (K-CAP 2019)

  14. arXiv:1909.09070  [pdf, other

    cs.AI cs.CL cs.CV

    Look, Read and Enrich. Learning from Scientific Figures and their Captions

    Authors: Jose Manuel Gomez-Perez, Raul Ortega

    Abstract: Compared to natural images, understanding scientific figures is particularly hard for machines. However, there is a valuable source of information in scientific literature that until now has remained untapped: the correspondence between a figure and its caption. In this paper we investigate what can be learnt by looking at a large number of figures and reading their captions, and introduce a figur… ▽ More

    Submitted 19 September, 2019; originally announced September 2019.

    Comments: Accepted in the 10th International Conference on Knowledge capture (K-CAP 2019)

  15. arXiv:1809.10617  [pdf, other

    cs.CL cs.DL

    Enabling FAIR Research in Earth Science through Research Objects

    Authors: Andres Garcia-Silva, Jose Manuel Gomez-Perez, Raul Palma, Marcin Krystek, Simone Mantovani, Federica Foglini, Valentina Grande, Francesco De Leo, Stefano Salvi, Elisa Trasati, Vito Romaniello, Mirko Albani, Cristiano Silvagni, Rosemarie Leone, Fulvio Marelli, Sergio Albani, Michele Lazzarini, Hazel J. Napier, Helen M. Glaves, Timothy Aldridge, Charles Meertens, Fran Boler, Henry W. Loescher, Christine Laney, Melissa A Genazzio , et al. (2 additional authors not shown)

    Abstract: Data-intensive science communities are progressively adopting FAIR practices that enhance the visibility of scientific breakthroughs and enable reuse. At the core of this movement, research objects contain and describe scientific information and resources in a way compliant with the FAIR principles and sustain the development of key infrastructure and tools. This paper provides an account of the c… ▽ More

    Submitted 27 September, 2018; originally announced September 2018.

  16. arXiv:1807.07346  [pdf, ps, other

    cs.DB cs.IR cs.LG

    Indexing Execution Patterns in Workflow Provenance Graphs through Generalized Trie Structures

    Authors: Esteban García-Cuesta, José M. Gómez-Pérez

    Abstract: Over the last years, scientific workflows have become mature enough to be used in a production style. However, despite the increasing maturity, there is still a shortage of tools for searching, adapting, and reusing workflows that hinders a more generalized adoption by the scientific communities. Indeed, due to the limited availability of machine-readable scientific metadata and the heterogeneity… ▽ More

    Submitted 19 July, 2018; originally announced July 2018.

  17. arXiv:1804.01772  [pdf, ps, other

    cs.CL

    Not just about size - A Study on the Role of Distributed Word Representations in the Analysis of Scientific Publications

    Authors: Andres Garcia, Jose Manuel Gomez-Perez

    Abstract: The emergence of knowledge graphs in the scholarly communication domain and recent advances in artificial intelligence and natural language processing bring us closer to a scenario where intelligent systems can assist scientists over a range of knowledge-intensive tasks. In this paper we present experimental results about the generation of word embeddings from scholarly publications for the intell… ▽ More

    Submitted 5 April, 2018; originally announced April 2018.

  18. arXiv:1710.05604  [pdf, other

    cs.HC cs.CY

    Collaboration Spheres: a Visual Metaphor to Share and Reuse Research Objects

    Authors: Mariano Rico, José Manuel Gómez-Pérez, Rafael Gonzalez, Aleix Garrido, Oscar Corcho

    Abstract: Research Objects (ROs) are semantically enhanced aggregations of resources associated to scientific experiments, such as data, provenance of these data, the scientific workflow used to run the experiment, intermediate results, logs and the interpretation of the results. As the number of ROs increases, it is becoming difficult to find ROs to be used, reused or re-purposed. New search and retrieval… ▽ More

    Submitted 16 October, 2017; originally announced October 2017.

    Comments: The URL to the web app does not work