-
MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering
Authors:
Iñigo Alonso,
Maite Oronoz,
Rodrigo Agerri
Abstract:
Large Language Models (LLMs) have the potential of facilitating the development of Artificial Intelligence technology to assist medical experts for interactive decision support, which has been demonstrated by their competitive performances in Medical QA. However, while impressive, the required quality bar for medical applications remains far from being achieved. Currently, LLMs remain challenged b…
▽ More
Large Language Models (LLMs) have the potential of facilitating the development of Artificial Intelligence technology to assist medical experts for interactive decision support, which has been demonstrated by their competitive performances in Medical QA. However, while impressive, the required quality bar for medical applications remains far from being achieved. Currently, LLMs remain challenged by outdated knowledge and by their tendency to generate hallucinated content. Furthermore, most benchmarks to assess medical knowledge lack reference gold explanations which means that it is not possible to evaluate the reasoning of LLMs predictions. Finally, the situation is particularly grim if we consider benchmarking LLMs for languages other than English which remains, as far as we know, a totally neglected topic. In order to address these shortcomings, in this paper we present MedExpQA, the first multilingual benchmark based on medical exams to evaluate LLMs in Medical Question Answering. To the best of our knowledge, MedExpQA includes for the first time reference gold explanations written by medical doctors which can be leveraged to establish various gold-based upper-bounds for comparison with LLMs performance. Comprehensive multilingual experimentation using both the gold reference explanations and Retrieval Augmented Generation (RAG) approaches show that performance of LLMs still has large room for improvement, especially for languages other than English. Furthermore, and despite using state-of-the-art RAG methods, our results also demonstrate the difficulty of obtaining and integrating readily available medical knowledge that may positively impact results on downstream evaluations for Medical Question Answering. So far the benchmark is available in four languages, but we hope that this work may encourage further development to other languages.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
PixT3: Pixel-based Table-To-Text Generation
Authors:
Iñigo Alonso,
Eneko Agirre,
Mirella Lapata
Abstract:
Table-to-text generation involves generating appropriate textual descriptions given structured tabular data. It has attracted increasing attention in recent years thanks to the popularity of neural network models and the availability of large-scale datasets. A common feature across existing methods is their treatment of the input as a string, i.e., by employing linearization techniques that do not…
▽ More
Table-to-text generation involves generating appropriate textual descriptions given structured tabular data. It has attracted increasing attention in recent years thanks to the popularity of neural network models and the availability of large-scale datasets. A common feature across existing methods is their treatment of the input as a string, i.e., by employing linearization techniques that do not always preserve information in the table, are verbose, and lack space efficiency. We propose to rethink data-to-text generation as a visual recognition task, removing the need for rendering the input in a string format. We present PixT3, a multimodal table-to-text model that overcomes the challenges of linearization and input size limitations encountered by existing models. PixT3 is trained with a new self-supervised learning objective to reinforce table structure awareness and is applicable to open-ended and controlled generation settings. Experiments on the ToTTo and Logic2Text benchmarks show that PixT3 is competitive and, in some settings, superior to generators that operate solely on text.
△ Less
Submitted 3 June, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Automatic Logical Forms improve fidelity in Table-to-Text generation
Authors:
Iñigo Alonso,
Eneko Agirre
Abstract:
Table-to-text systems generate natural language statements from structured data like tables. While end-to-end techniques suffer from low factual correctness (fidelity), a previous study reported gains when using manual logical forms (LF) that represent the selected content and the semantics of the target text. Given the manual step, it was not clear whether automatic LFs would be effective, or whe…
▽ More
Table-to-text systems generate natural language statements from structured data like tables. While end-to-end techniques suffer from low factual correctness (fidelity), a previous study reported gains when using manual logical forms (LF) that represent the selected content and the semantics of the target text. Given the manual step, it was not clear whether automatic LFs would be effective, or whether the improvement came from content selection alone. We present TlT which, given a table and a selection of the content, first produces LFs and then the textual statement. We show for the first time that automatic LFs improve quality, with an increase in fidelity of 30 points over a comparable system not using LFs. Our experiments allow to quantify the remaining challenges for high factual correctness, with automatic selection of content coming first, followed by better Logic-to-Text generation and, to a lesser extent, better Table-to-Logic parsing.
△ Less
Submitted 9 January, 2024; v1 submitted 26 October, 2023;
originally announced October 2023.
-
HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine
Authors:
Rodrigo Agerri,
Iñigo Alonso,
Aitziber Atutxa,
Ander Berrondo,
Ainara Estarrona,
Iker Garcia-Ferrero,
Iakes Goenaga,
Koldo Gojenola,
Maite Oronoz,
Igor Perez-Tejedor,
German Rigau,
Anar Yeginbergenova
Abstract:
Providing high quality explanations for AI predictions based on machine learning is a challenging and complex task. To work well it requires, among other factors: selecting a proper level of generality/specificity of the explanation; considering assumptions about the familiarity of the explanation beneficiary with the AI task under consideration; referring to specific elements that have contribute…
▽ More
Providing high quality explanations for AI predictions based on machine learning is a challenging and complex task. To work well it requires, among other factors: selecting a proper level of generality/specificity of the explanation; considering assumptions about the familiarity of the explanation beneficiary with the AI task under consideration; referring to specific elements that have contributed to the decision; making use of additional knowledge (e.g. expert evidence) which might not be part of the prediction process; and providing evidence supporting negative hypothesis. Finally, the system needs to formulate the explanation in a clearly interpretable, and possibly convincing, way. Given these considerations, ANTIDOTE fosters an integrated vision of explainable AI, where low-level characteristics of the deep learning process are combined with higher level schemes proper of the human argumentation capacity. ANTIDOTE will exploit cross-disciplinary competences in deep learning and argumentation to support a broader and innovative view of explainable AI, where the need for high-quality explanations for clinical cases deliberation is critical. As a first result of the project, we publish the Antidote CasiMedicos dataset to facilitate research on explainable AI in general, and argumentation in the medical domain in particular.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
The Fragility of Multi-Treebank Parsing Evaluation
Authors:
Iago Alonso-Alonso,
David Vilares,
Carlos Gómez-Rodríguez
Abstract:
Treebank selection for parsing evaluation and the spurious effects that might arise from a biased choice have not been explored in detail. This paper studies how evaluating on a single subset of treebanks can lead to weak conclusions. First, we take a few contrasting parsers, and run them on subsets of treebanks proposed in previous work, whose use was justified (or not) on criteria such as typolo…
▽ More
Treebank selection for parsing evaluation and the spurious effects that might arise from a biased choice have not been explored in detail. This paper studies how evaluating on a single subset of treebanks can lead to weak conclusions. First, we take a few contrasting parsers, and run them on subsets of treebanks proposed in previous work, whose use was justified (or not) on criteria such as typology or data scarcity. Second, we run a large-scale version of this experiment, create vast amounts of random subsets of treebanks, and compare on them many parsers whose scores are available. The results show substantial variability across subsets and that although establishing guidelines for good treebank selection is hard, it is possible to detect potentially harmful strategies.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
Testing predictive automated driving systems: lessons learned and future recommendations
Authors:
Rubén Izquierdo Gonzalo,
Carlota Salinas Maldonado,
Javier Alonso Ruiz,
Ignacio Parra Alonso,
David Fernández Llorca,
Miguel Á. Sotelo
Abstract:
Conventional vehicles are certified through classical approaches, where different physical certification tests are set up on test tracks to assess required safety levels. These approaches are well suited for vehicles with limited complexity and limited interactions with other entities as last-second resources. However, these approaches do not allow to evaluate safety with real behaviors for critic…
▽ More
Conventional vehicles are certified through classical approaches, where different physical certification tests are set up on test tracks to assess required safety levels. These approaches are well suited for vehicles with limited complexity and limited interactions with other entities as last-second resources. However, these approaches do not allow to evaluate safety with real behaviors for critical and edge cases, nor to evaluate the ability to anticipate them in the mid or long term. This is particularly relevant for automated and autonomous driving functions that make use of advanced predictive systems to anticipate future actions and motions to be considered in the path planning layer. In this paper, we present and analyze the results of physical tests on proving grounds of several predictive systems in automated driving functions developed within the framework of the BRAVE project. Based on our experience in testing predictive automated driving functions, we identify the main limitations of current physical testing approaches when dealing with predictive systems, analyze the main challenges ahead, and provide a set of practical actions and recommendations to consider in future physical testing procedures for automated and autonomous driving functions.
△ Less
Submitted 25 April, 2022;
originally announced May 2022.
-
LyS_ACoruña at SemEval-2022 Task 10: Repurposing Off-the-Shelf Tools for Sentiment Analysis as Semantic Dependency Parsing
Authors:
Iago Alonso-Alonso,
David Vilares,
Carlos Gómez-Rodríguez
Abstract:
This paper addressed the problem of structured sentiment analysis using a bi-affine semantic dependency parser, large pre-trained language models, and publicly available translation models. For the monolingual setup, we considered: (i) training on a single treebank, and (ii) relaxing the setup by training on treebanks coming from different languages that can be adequately processed by cross-lingua…
▽ More
This paper addressed the problem of structured sentiment analysis using a bi-affine semantic dependency parser, large pre-trained language models, and publicly available translation models. For the monolingual setup, we considered: (i) training on a single treebank, and (ii) relaxing the setup by training on treebanks coming from different languages that can be adequately processed by cross-lingual language models. For the zero-shot setup and a given target treebank, we relied on: (i) a word-level translation of available treebanks in other languages to get noisy, unlikely-grammatical, but annotated data (we release as much of it as licenses allow), and (ii) merging those translated treebanks to obtain training data. In the post-evaluation phase, we also trained cross-lingual models that simply merged all the English treebanks and did not use word-level translations, and yet obtained better results. According to the official results, we ranked 8th and 9th in the monolingual and cross-lingual setups.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
FinEAS: Financial Embedding Analysis of Sentiment
Authors:
Asier Gutiérrez-Fandiño,
Miquel Noguer i Alonso,
Petter Kolm,
Jordi Armengol-Estapé
Abstract:
We introduce a new language representation model in finance called Financial Embedding Analysis of Sentiment (FinEAS). In financial markets, news and investor sentiment are significant drivers of security prices. Thus, leveraging the capabilities of modern NLP approaches for financial sentiment analysis is a crucial component in identifying patterns and trends that are useful for market participan…
▽ More
We introduce a new language representation model in finance called Financial Embedding Analysis of Sentiment (FinEAS). In financial markets, news and investor sentiment are significant drivers of security prices. Thus, leveraging the capabilities of modern NLP approaches for financial sentiment analysis is a crucial component in identifying patterns and trends that are useful for market participants and regulators. In recent years, methods that use transfer learning from large Transformer-based language models like BERT, have achieved state-of-the-art results in text classification tasks, including sentiment analysis using labelled datasets. Researchers have quickly adopted these approaches to financial texts, but best practices in this domain are not well-established. In this work, we propose a new model for financial sentiment analysis based on supervised fine-tuned sentence embeddings from a standard BERT model. We demonstrate our approach achieves significant improvements in comparison to vanilla BERT, LSTM, and FinBERT, a financial domain specific BERT.
△ Less
Submitted 19 November, 2021; v1 submitted 31 October, 2021;
originally announced November 2021.
-
Semi-Supervised Semantic Segmentation with Pixel-Level Contrastive Learning from a Class-wise Memory Bank
Authors:
Inigo Alonso,
Alberto Sabater,
David Ferstl,
Luis Montesano,
Ana C. Murillo
Abstract:
This work presents a novel approach for semi-supervised semantic segmentation. The key element of this approach is our contrastive learning module that enforces the segmentation network to yield similar pixel-level feature representations for same-class samples across the whole dataset. To achieve this, we maintain a memory bank continuously updated with relevant and high-quality feature vectors f…
▽ More
This work presents a novel approach for semi-supervised semantic segmentation. The key element of this approach is our contrastive learning module that enforces the segmentation network to yield similar pixel-level feature representations for same-class samples across the whole dataset. To achieve this, we maintain a memory bank continuously updated with relevant and high-quality feature vectors from labeled data. In an end-to-end training, the features from both labeled and unlabeled data are optimized to be similar to same-class samples from the memory bank. Our approach outperforms the current state-of-the-art for semi-supervised semantic segmentation and semi-supervised domain adaptation on well-known public benchmarks, with larger improvements on the most challenging scenarios, i.e., less available labeled data. https://github.com/Shathe/SemiSeg-Contrastive
△ Less
Submitted 6 August, 2021; v1 submitted 27 April, 2021;
originally announced April 2021.
-
Domain and View-point Agnostic Hand Action Recognition
Authors:
Alberto Sabater,
Iñigo Alonso,
Luis Montesano,
Ana C. Murillo
Abstract:
Hand action recognition is a special case of action recognition with applications in human-robot interaction, virtual reality or life-logging systems. Building action classifiers able to work for such heterogeneous action domains is very challenging. There are very subtle changes across different actions from a given application but also large variations across domains (e.g. virtual reality vs lif…
▽ More
Hand action recognition is a special case of action recognition with applications in human-robot interaction, virtual reality or life-logging systems. Building action classifiers able to work for such heterogeneous action domains is very challenging. There are very subtle changes across different actions from a given application but also large variations across domains (e.g. virtual reality vs life-logging). This work introduces a novel skeleton-based hand motion representation model that tackles this problem. The framework we propose is agnostic to the application domain or camera recording view-point. When working on a single domain (intra-domain action classification) our approach performs better or similar to current state-of-the-art methods on well-known hand action recognition benchmarks. And, more importantly, when performing hand action recognition for action domains and camera perspectives which our approach has not been trained for (cross-domain action classification), our proposed framework achieves comparable performance to intra-domain state-of-the-art methods. These experiments show the robustness and generalization capabilities of our framework.
△ Less
Submitted 7 October, 2021; v1 submitted 3 March, 2021;
originally announced March 2021.
-
Domain Adaptation in LiDAR Semantic Segmentation by Aligning Class Distributions
Authors:
Inigo Alonso,
Luis Riazuelo,
Luis Montesano,
Ana C. Murillo
Abstract:
LiDAR semantic segmentation provides 3D semantic information about the environment, an essential cue for intelligent systems during their decision making processes. Deep neural networks are achieving state-of-the-art results on large public benchmarks on this task. Unfortunately, finding models that generalize well or adapt to additional domains, where data distribution is different, remains a maj…
▽ More
LiDAR semantic segmentation provides 3D semantic information about the environment, an essential cue for intelligent systems during their decision making processes. Deep neural networks are achieving state-of-the-art results on large public benchmarks on this task. Unfortunately, finding models that generalize well or adapt to additional domains, where data distribution is different, remains a major challenge. This work addresses the problem of unsupervised domain adaptation for LiDAR semantic segmentation models. Our approach combines novel ideas on top of the current state-of-the-art approaches and yields new state-of-the-art results. We propose simple but effective strategies to reduce the domain shift by aligning the data distribution on the input space. Besides, we propose a learning-based approach that aligns the distribution of the semantic classes of the target domain to the source domain. The presented ablation study shows how each part contributes to the final performance. Our strategy is shown to outperform previous approaches for domain adaptation with comparisons run on three different domains.
△ Less
Submitted 3 December, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.
-
3D-MiniNet: Learning a 2D Representation from Point Clouds for Fast and Efficient 3D LIDAR Semantic Segmentation
Authors:
Iñigo Alonso,
Luis Riazuelo,
Luis Montesano,
Ana C. Murillo
Abstract:
LIDAR semantic segmentation, which assigns a semantic label to each 3D point measured by the LIDAR, is becoming an essential task for many robotic applications such as autonomous driving. Fast and efficient semantic segmentation methods are needed to match the strong computational and temporal restrictions of many of these real-world applications.
This work presents 3D-MiniNet, a novel approach…
▽ More
LIDAR semantic segmentation, which assigns a semantic label to each 3D point measured by the LIDAR, is becoming an essential task for many robotic applications such as autonomous driving. Fast and efficient semantic segmentation methods are needed to match the strong computational and temporal restrictions of many of these real-world applications.
This work presents 3D-MiniNet, a novel approach for LIDAR semantic segmentation that combines 3D and 2D learning layers. It first learns a 2D representation from the raw points through a novel projection which extracts local and global information from the 3D data. This representation is fed to an efficient 2D Fully Convolutional Neural Network (FCNN) that produces a 2D semantic segmentation. These 2D semantic labels are re-projected back to the 3D space and enhanced through a post-processing module. The main novelty in our strategy relies on the projection learning module. Our detailed ablation study shows how each component contributes to the final performance of 3D-MiniNet. We validate our approach on well known public benchmarks (SemanticKITTI and KITTI), where 3D-MiniNet gets state-of-the-art results while being faster and more parameter-efficient than previous methods.
△ Less
Submitted 27 April, 2021; v1 submitted 25 February, 2020;
originally announced February 2020.
-
Vehicle Ego-Lane Estimation with Sensor Failure Modeling
Authors:
Augusto Luis Ballardini,
Daniele Cattaneo,
Rubén Izquierdo,
Ignacio Parra Alonso,
Andrea Piazzoni,
Miguel Ángel Sotelo,
Domenico Giorgio Sorrenti
Abstract:
We present a probabilistic ego-lane estimation algorithm for highway-like scenarios that is designed to increase the accuracy of the ego-lane estimate, which can be obtained relying only on a noisy line detector and tracker. The contribution relies on a Hidden Markov Model (HMM) with a transient failure model. The proposed algorithm exploits the OpenStreetMap (or other cartographic services) road…
▽ More
We present a probabilistic ego-lane estimation algorithm for highway-like scenarios that is designed to increase the accuracy of the ego-lane estimate, which can be obtained relying only on a noisy line detector and tracker. The contribution relies on a Hidden Markov Model (HMM) with a transient failure model. The proposed algorithm exploits the OpenStreetMap (or other cartographic services) road property lane number as the expected number of lanes and leverages consecutive, possibly incomplete, observations. The algorithm effectiveness is proven by employing different line detectors and showing we could achieve much more usable, i.e. stable and reliable, ego-lane estimates over more than 100 Km of highway scenarios, recorded both in Italy and Spain. Moreover, as we could not find a suitable dataset for a quantitative comparison with other approaches, we collected datasets and manually annotated the Ground Truth about the vehicle ego-lane. Such datasets are made publicly available for usage from the scientific community.
△ Less
Submitted 6 February, 2020; v1 submitted 5 February, 2020;
originally announced February 2020.
-
EV-SegNet: Semantic Segmentation for Event-based Cameras
Authors:
Iñigo Alonso,
Ana C. Murillo
Abstract:
Event cameras, or Dynamic Vision Sensor (DVS), are very promising sensors which have shown several advantages over frame based cameras. However, most recent work on real applications of these cameras is focused on 3D reconstruction and 6-DOF camera tracking. Deep learning based approaches, which are leading the state-of-the-art in visual recognition tasks, could potentially take advantage of the b…
▽ More
Event cameras, or Dynamic Vision Sensor (DVS), are very promising sensors which have shown several advantages over frame based cameras. However, most recent work on real applications of these cameras is focused on 3D reconstruction and 6-DOF camera tracking. Deep learning based approaches, which are leading the state-of-the-art in visual recognition tasks, could potentially take advantage of the benefits of DVS, but some adaptations are needed still needed in order to effectively work on these cameras. This work introduces a first baseline for semantic segmentation with this kind of data. We build a semantic segmentation CNN based on state-of-the-art techniques which takes event information as the only input. Besides, we propose a novel representation for DVS data that outperforms previously used event representations for related tasks. Since there is no existing labeled dataset for this task, we propose how to automatically generate approximated semantic segmentation labels for some sequences of the DDD17 dataset, which we publish together with the model, and demonstrate they are valid to train a model for DVS data only. We compare our results on semantic segmentation from DVS data with results using corresponding grayscale images, demonstrating how they are complementary and worth combining.
△ Less
Submitted 29 November, 2018;
originally announced November 2018.
-
How Important is Syntactic Parsing Accuracy? An Empirical Evaluation on Rule-Based Sentiment Analysis
Authors:
Carlos Gómez-Rodríguez,
Iago Alonso-Alonso,
David Vilares
Abstract:
Syntactic parsing, the process of obtaining the internal structure of sentences in natural languages, is a crucial task for artificial intelligence applications that need to extract meaning from natural language text or speech. Sentiment analysis is one example of application for which parsing has recently proven useful.
In recent years, there have been significant advances in the accuracy of pa…
▽ More
Syntactic parsing, the process of obtaining the internal structure of sentences in natural languages, is a crucial task for artificial intelligence applications that need to extract meaning from natural language text or speech. Sentiment analysis is one example of application for which parsing has recently proven useful.
In recent years, there have been significant advances in the accuracy of parsing algorithms. In this article, we perform an empirical, task-oriented evaluation to determine how parsing accuracy influences the performance of a state-of-the-art rule-based sentiment analysis system that determines the polarity of sentences from their parse trees. In particular, we evaluate the system using four well-known dependency parsers, including both current models with state-of-the-art accuracy and more innacurate models which, however, require less computational resources.
The experiments show that all of the parsers produce similarly good results in the sentiment analysis task, without their accuracy having any relevant influence on the results. Since parsing is currently a task with a relatively high computational cost that varies strongly between algorithms, this suggests that sentiment analysis researchers and users should prioritize speed over accuracy when choosing a parser; and parsing researchers should investigate models that improve speed further, even at some cost to accuracy.
△ Less
Submitted 24 October, 2017; v1 submitted 7 June, 2017;
originally announced June 2017.
-
A DDS-Based Scalable and Reconfigurable Framework for Cyber-Physical Systems
Authors:
Ismael Etxeberria-Agiriano,
Isidro Calvo,
Liliana Montero,
Ivan Alonso
Abstract:
Cyber-Physical Systems (CPSs) involve the interconnection of heterogeneous computing devices which are closely integrated with the physical processes under control. Often, these systems are resource-constrained and require specific features such as the ability to adapt in a timeliness and efficient fashion to dynamic environments. Also, they must support fault tolerance and avoid single points of…
▽ More
Cyber-Physical Systems (CPSs) involve the interconnection of heterogeneous computing devices which are closely integrated with the physical processes under control. Often, these systems are resource-constrained and require specific features such as the ability to adapt in a timeliness and efficient fashion to dynamic environments. Also, they must support fault tolerance and avoid single points of failure. This paper describes a scalable framework for CPSs based on the OMG DDS standard. The proposed solution allows reconfiguring this kind of systems at run-time and managing efficiently their resources.
△ Less
Submitted 10 August, 2012;
originally announced August 2012.
-
Visual Localisation of Mobile Devices in an Indoor Environment under Network Delay Conditions
Authors:
Alberto Alonso Fernández,
Omar Álvarez Fres,
Ignacio González Alonso,
Huosheng Hu
Abstract:
Current progresses in home automation and service robotic environment have highlighted the need to develop interoperability mechanisms that allow a standard communication between the two systems. During the development of the DHCompliant protocol, the problem of locating mobile devices in an indoor environment has been investigated. The communication of the device with the location service has bee…
▽ More
Current progresses in home automation and service robotic environment have highlighted the need to develop interoperability mechanisms that allow a standard communication between the two systems. During the development of the DHCompliant protocol, the problem of locating mobile devices in an indoor environment has been investigated. The communication of the device with the location service has been carried out to study the time delay that web services offer in front of the sockets. The importance of obtaining data from real-time location systems portends that a basic tool for interoperability, such as web services, can be ineffective in this scenario because of the delays added in the invocation of services. This paper is focused on introducing a web service to resolve a coordinates request without any significant delay in comparison with the sockets.
△ Less
Submitted 29 March, 2011;
originally announced March 2011.