-
Supervising the Centroid Baseline for Extractive Multi-Document Summarization
Authors:
Simão Gonçalves,
Gonçalo Correia,
Diogo Pernes,
Afonso Mendes
Abstract:
The centroid method is a simple approach for extractive multi-document summarization and many improvements to its pipeline have been proposed. We further refine it by adding a beam search process to the sentence selection and also a centroid estimation attention model that leads to improved results. We demonstrate this in several multi-document summarization datasets, including in a multilingual s…
▽ More
The centroid method is a simple approach for extractive multi-document summarization and many improvements to its pipeline have been proposed. We further refine it by adding a beam search process to the sentence selection and also a centroid estimation attention model that leads to improved results. We demonstrate this in several multi-document summarization datasets, including in a multilingual scenario.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
An Agent-Based Discrete Event Simulation of Teleoperated Driving in Freight Transport Operations: The Fleet Sizing Problem
Authors:
Bahman Madadi,
Ali Nadi,
Gonçalo Homem de Almeida Correia,
Thierry Verduijn,
Lóránt Tavasszy
Abstract:
Teleoperated or remote-controlled driving complements automated driving and acts as transitional technology toward full automation. An economic advantage of teleoperated driving in logistics operations lies in managing fleets with fewer teleoperators compared to vehicles with in-vehicle drivers. This alleviates growing truck driver shortage problems in the logistics industry and saves costs. Howev…
▽ More
Teleoperated or remote-controlled driving complements automated driving and acts as transitional technology toward full automation. An economic advantage of teleoperated driving in logistics operations lies in managing fleets with fewer teleoperators compared to vehicles with in-vehicle drivers. This alleviates growing truck driver shortage problems in the logistics industry and saves costs. However, a trade-off exists between the teleoperator-to-vehicle ratio and the service level of teleoperation. This study designs a simulation framework to explore this trade-off generating multiple performance indicators as proxies for teleoperation service level. By applying the framework, we identify factors influencing the trade-off and optimal teleoperator-to-vehicle ratios under different scenarios. Our case study on road freight tours in The Netherlands reveals that for any operational setting, a teleoperation-to-vehicle ratio below one can manage all freight truck tours without delay, while one represents the current situation. The minimum teleoperator-to-vehicle ratio for zero-delay operations is never above 0.6, implying a minimum of 40% teleoperation labor cost saving. For operations where a small delay is allowed, teleoperator-to-vehicle ratios as low as 0.4 are shown to be feasible, which indicates potential savings of up to 60%. This confirms great promise for a positive business case for the teleoperated driving as a service.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Multilingual Natural Language Processing Model for Radiology Reports -- The Summary is all you need!
Authors:
Mariana Lindo,
Ana Sofia Santos,
André Ferreira,
Jianning Li,
Gijs Luijten,
Gustavo Correia,
Moon Kim,
Benedikt Michael Schaarschmidt,
Cornelius Deuschl,
Johannes Haubold,
Jens Kleesiek,
Jan Egger,
Victor Alves
Abstract:
The impression section of a radiology report summarizes important radiology findings and plays a critical role in communicating these findings to physicians. However, the preparation of these summaries is time-consuming and error-prone for radiologists. Recently, numerous models for radiology report summarization have been developed. Nevertheless, there is currently no model that can summarize the…
▽ More
The impression section of a radiology report summarizes important radiology findings and plays a critical role in communicating these findings to physicians. However, the preparation of these summaries is time-consuming and error-prone for radiologists. Recently, numerous models for radiology report summarization have been developed. Nevertheless, there is currently no model that can summarize these reports in multiple languages. Such a model could greatly improve future research and the development of Deep Learning models that incorporate data from patients with different ethnic backgrounds. In this study, the generation of radiology impressions in different languages was automated by fine-tuning a model, publicly available, based on a multilingual text-to-text Transformer to summarize findings available in English, Portuguese, and German radiology reports. In a blind test, two board-certified radiologists indicated that for at least 70% of the system-generated summaries, the quality matched or exceeded the corresponding human-written summaries, suggesting substantial clinical reliability. Furthermore, this study showed that the multilingual model outperformed other models that specialized in summarizing radiology reports in only one language, as well as models that were not specifically designed for summarizing radiology reports, such as ChatGPT.
△ Less
Submitted 13 January, 2024; v1 submitted 29 September, 2023;
originally announced October 2023.
-
A hybrid deep-learning-metaheuristic framework for bi-level network design problems
Authors:
Bahman Madadi,
Goncalo Homem de Almeida Correia
Abstract:
This study proposes a hybrid deep-learning-metaheuristic framework with a bi-level architecture for road network design problems (NDPs). We train a graph neural network (GNN) to approximate the solution of the user equilibrium (UE) traffic assignment problem and use inferences made by the trained model to calculate fitness function evaluations of a genetic algorithm (GA) to approximate solutions f…
▽ More
This study proposes a hybrid deep-learning-metaheuristic framework with a bi-level architecture for road network design problems (NDPs). We train a graph neural network (GNN) to approximate the solution of the user equilibrium (UE) traffic assignment problem and use inferences made by the trained model to calculate fitness function evaluations of a genetic algorithm (GA) to approximate solutions for NDPs. Using three test networks, two NDP variants and an exact solver as benchmark, we show that on average, our proposed framework can provide solutions within 1.5% gap of the best results in less than 0.5% of the time used by the exact solution procedure. Our framework can be utilized within an expert system for infrastructure planning to determine the best infrastructure planning and management decisions under different scenarios. Given the flexibility of the framework, it can easily be adapted to many other decision problems that can be modeled as bi-level problems on graphs. Moreover, we foreseen interesting future research directions, thus we also put forward a brief research agenda for this topic. The key observation from our research that can shape future research is that the fitness function evaluation time using the inferences made by the GNN model was in the order of milliseconds, which points to an opportunity and a need for novel heuristics that 1) can cope well with noisy fitness function values provided by deep learning models, and 2) can use the significantly enlarged efficiency of the evaluation step to explore the search space effectively (rather than efficiently). This opens a new avenue for a modern class of metaheuristics that are crafted for use with AI-powered predictors.
△ Less
Submitted 10 August, 2023; v1 submitted 10 March, 2023;
originally announced March 2023.
-
Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity
Authors:
Gonçalo M. Correia,
Vlad Niculae,
Wilker Aziz,
André F. T. Martins
Abstract:
Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging, due to the need for marginalization over large or combinatorial sets. To circumvent this issue, one typically resorts to sampling-based approximations of the true marginal, requiring noisy gradient estimators (e.g., score function estimator) or continuous relaxations with l…
▽ More
Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging, due to the need for marginalization over large or combinatorial sets. To circumvent this issue, one typically resorts to sampling-based approximations of the true marginal, requiring noisy gradient estimators (e.g., score function estimator) or continuous relaxations with lower-variance reparameterized gradients (e.g., Gumbel-Softmax). In this paper, we propose a new training strategy which replaces these estimators by an exact yet efficient marginalization. To achieve this, we parameterize discrete distributions over latent assignments using differentiable sparse map**s: sparsemax and its structured counterparts. In effect, the support of these distributions is greatly reduced, which enables efficient marginalization. We report successful results in three tasks covering a range of latent variable modeling applications: a semisupervised deep generative model, a latent communication game, and a generative model with a bit-vector latent representation. In all cases, we obtain good performance while still achieving the practicality of sampling-based approximations.
△ Less
Submitted 28 December, 2020; v1 submitted 3 July, 2020;
originally announced July 2020.
-
Adaptively Sparse Transformers
Authors:
Gonçalo M. Correia,
Vlad Niculae,
André F. T. Martins
Abstract:
Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the Transformer, learn powerful context-aware word representations through layered, multi-headed attention. The multiple heads learn diverse types of word relationships. However, with standard softmax attention, all attention heads are dense, assigning a non-zero weight to all context words. In this work, we introduc…
▽ More
Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the Transformer, learn powerful context-aware word representations through layered, multi-headed attention. The multiple heads learn diverse types of word relationships. However, with standard softmax attention, all attention heads are dense, assigning a non-zero weight to all context words. In this work, we introduce the adaptively sparse Transformer, wherein attention heads have flexible, context-dependent sparsity patterns. This sparsity is accomplished by replacing softmax with $α$-entmax: a differentiable generalization of softmax that allows low-scoring words to receive precisely zero weight. Moreover, we derive a method to automatically learn the $α$ parameter -- which controls the shape and sparsity of $α$-entmax -- allowing attention heads to choose between focused or spread-out behavior. Our adaptively sparse Transformer improves interpretability and head diversity when compared to softmax Transformers on machine translation datasets. Findings of the quantitative and qualitative analysis of our approach include that heads in different layers learn different sparsity preferences and tend to be more diverse in their attention distributions than softmax Transformers. Furthermore, at no cost in accuracy, sparsity in attention heads helps to uncover different head specializations.
△ Less
Submitted 6 September, 2019; v1 submitted 30 August, 2019;
originally announced September 2019.
-
A Simple and Effective Approach to Automatic Post-Editing with Transfer Learning
Authors:
Gonçalo M. Correia,
André F. T. Martins
Abstract:
Automatic post-editing (APE) seeks to automatically refine the output of a black-box machine translation (MT) system through human post-edits. APE systems are usually trained by complementing human post-edited data with large, artificial data generated through back-translations, a time-consuming process often no easier than training an MT system from scratch. In this paper, we propose an alternati…
▽ More
Automatic post-editing (APE) seeks to automatically refine the output of a black-box machine translation (MT) system through human post-edits. APE systems are usually trained by complementing human post-edited data with large, artificial data generated through back-translations, a time-consuming process often no easier than training an MT system from scratch. In this paper, we propose an alternative where we fine-tune pre-trained BERT models on both the encoder and decoder of an APE system, exploring several parameter sharing strategies. By only training on a dataset of 23K sentences for 3 hours on a single GPU, we obtain results that are competitive with systems that were trained on 5M artificial sentences. When we add this artificial data, our method obtains state-of-the-art results.
△ Less
Submitted 14 June, 2019;
originally announced June 2019.
-
Unbabel's Submission to the WMT2019 APE Shared Task: BERT-based Encoder-Decoder for Automatic Post-Editing
Authors:
António V. Lopes,
M. Amin Farajian,
Gonçalo M. Correia,
Jonay Trenous,
André F. T. Martins
Abstract:
This paper describes Unbabel's submission to the WMT2019 APE Shared Task for the English-German language pair. Following the recent rise of large, powerful, pre-trained models, we adapt the BERT pretrained model to perform Automatic Post-Editing in an encoder-decoder framework. Analogously to dual-encoder architectures we develop a BERT-based encoder-decoder (BED) model in which a single pretraine…
▽ More
This paper describes Unbabel's submission to the WMT2019 APE Shared Task for the English-German language pair. Following the recent rise of large, powerful, pre-trained models, we adapt the BERT pretrained model to perform Automatic Post-Editing in an encoder-decoder framework. Analogously to dual-encoder architectures we develop a BERT-based encoder-decoder (BED) model in which a single pretrained BERT encoder receives both the source src and machine translation tgt strings. Furthermore, we explore a conservativeness factor to constrain the APE system to perform fewer edits. As the official results show, when trained on a weighted combination of in-domain and artificial training data, our BED system with the conservativeness penalty improves significantly the translations of a strong Neural Machine Translation system by $-0.78$ and $+1.23$ in terms of TER and BLEU, respectively. Finally, our submission achieves a new state-of-the-art, ex-aequo, in English-German APE of NMT.
△ Less
Submitted 29 June, 2019; v1 submitted 30 May, 2019;
originally announced May 2019.