-
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
Authors:
David Romero,
Chenyang Lyu,
Haryo Akbarianto Wibowo,
Teresa Lynn,
Injy Hamed,
Aditya Nanda Kishore,
Aishik Mandal,
Alina Dragonetti,
Artem Abzaliev,
Atnafu Lambebo Tonja,
Bontu Fufa Balcha,
Chenxi Whitehouse,
Christian Salamea,
Dan John Velasco,
David Ifeoluwa Adelani,
David Le Meur,
Emilio Villa-Cueva,
Fajri Koto,
Fauzan Farooqui,
Frederico Belcavello,
Ganzorig Batnasan,
Gisela Vallejo,
Grainne Caulfield,
Guido Ivetta,
Haiyue Song
, et al. (50 additional authors not shown)
Abstract:
Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recen…
▽ More
Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recent efforts have tried to increase the number of languages covered on VQA datasets, they still lack diversity in low-resource languages. More importantly, although these datasets often extend their linguistic range via translation or some other approaches, they usually keep images the same, resulting in narrow cultural representation. To address these limitations, we construct CVQA, a new Culturally-diverse multilingual Visual Question Answering benchmark, designed to cover a rich set of languages and cultures, where we engage native speakers and cultural experts in the data collection process. As a result, CVQA includes culturally-driven images and questions from across 28 countries on four continents, covering 26 languages with 11 scripts, providing a total of 9k questions. We then benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models. This benchmark can serve as a probing evaluation suite for assessing the cultural capability and bias of multimodal models and hopefully encourage more research efforts toward increasing cultural awareness and linguistic diversity in this field.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Can a Multichoice Dataset be Repurposed for Extractive Question Answering?
Authors:
Teresa Lynn,
Malik H. Altakrori,
Samar Mohamed Magdy,
Rocktim Jyoti Das,
Chenyang Lyu,
Mohamed Nasr,
Younes Samih,
Alham Fikri Aji,
Preslav Nakov,
Shantanu Godbole,
Salim Roukos,
Radu Florian,
Nizar Habash
Abstract:
The rapid evolution of Natural Language Processing (NLP) has favored major languages such as English, leaving a significant gap for many others due to limited resources. This is especially evident in the context of data annotation, a task whose importance cannot be underestimated, but which is time-consuming and costly. Thus, any dataset for resource-poor languages is precious, in particular when…
▽ More
The rapid evolution of Natural Language Processing (NLP) has favored major languages such as English, leaving a significant gap for many others due to limited resources. This is especially evident in the context of data annotation, a task whose importance cannot be underestimated, but which is time-consuming and costly. Thus, any dataset for resource-poor languages is precious, in particular when it is task-specific. Here, we explore the feasibility of repurposing existing datasets for a new NLP task: we repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA), to enable extractive QA (EQA) in the style of machine reading comprehension. We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA). We also present QA evaluation results for several monolingual and cross-lingual QA pairs including English, MSA, and five Arabic dialects. Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced. We also conduct a thorough analysis and share our insights from the process, which we hope will contribute to a deeper understanding of the challenges and the opportunities associated with task reformulation in NLP research.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
A Paradigm Shift: The Future of Machine Translation Lies with Large Language Models
Authors:
Chenyang Lyu,
Zefeng Du,
Jitao Xu,
Yitao Duan,
Minghao Wu,
Teresa Lynn,
Alham Fikri Aji,
Derek F. Wong,
Siyou Liu,
Longyue Wang
Abstract:
Machine Translation (MT) has greatly advanced over the years due to the developments in deep neural networks. However, the emergence of Large Language Models (LLMs) like GPT-4 and ChatGPT is introducing a new phase in the MT domain. In this context, we believe that the future of MT is intricately tied to the capabilities of LLMs. These models not only offer vast linguistic understandings but also…
▽ More
Machine Translation (MT) has greatly advanced over the years due to the developments in deep neural networks. However, the emergence of Large Language Models (LLMs) like GPT-4 and ChatGPT is introducing a new phase in the MT domain. In this context, we believe that the future of MT is intricately tied to the capabilities of LLMs. These models not only offer vast linguistic understandings but also bring innovative methodologies, such as prompt-based techniques, that have the potential to further elevate MT. In this paper, we provide an overview of the significant enhancements in MT that are influenced by LLMs and advocate for their pivotal role in upcoming MT research and implementations. We highlight several new MT directions, emphasizing the benefits of LLMs in scenarios such as Long-Document Translation, Stylized Translation, and Interactive Translation. Additionally, we address the important concern of privacy in LLM-driven MT and suggest essential privacy-preserving strategies. By showcasing practical instances, we aim to demonstrate the advantages that LLMs offer, particularly in tasks like translating extended documents. We conclude by emphasizing the critical role of LLMs in guiding the future evolution of MT and offer a roadmap for future exploration in the sector.
△ Less
Submitted 1 April, 2024; v1 submitted 1 May, 2023;
originally announced May 2023.
-
gaBERT -- an Irish Language Model
Authors:
James Barry,
Joachim Wagner,
Lauren Cassidy,
Alan Cowap,
Teresa Lynn,
Abigail Walsh,
Mícheál J. Ó Meachair,
Jennifer Foster
Abstract:
The BERT family of neural language models have become highly popular due to their ability to provide sequences of text with rich context-sensitive token encodings which are able to generalise well to many NLP tasks. We introduce gaBERT, a monolingual BERT model for the Irish language. We compare our gaBERT model to multilingual BERT and the monolingual Irish WikiBERT, and we show that gaBERT provi…
▽ More
The BERT family of neural language models have become highly popular due to their ability to provide sequences of text with rich context-sensitive token encodings which are able to generalise well to many NLP tasks. We introduce gaBERT, a monolingual BERT model for the Irish language. We compare our gaBERT model to multilingual BERT and the monolingual Irish WikiBERT, and we show that gaBERT provides better representations for a downstream parsing task. We also show how different filtering criteria, vocabulary size and the choice of subword tokenisation model affect downstream performance. We compare the results of fine-tuning a gaBERT model with an mBERT model for the task of identifying verbal multiword expressions, and show that the fine-tuned gaBERT model also performs better at this task. We release gaBERT and related code to the community.
△ Less
Submitted 28 June, 2022; v1 submitted 27 July, 2021;
originally announced July 2021.
-
Towards transparency in NLP shared tasks
Authors:
Carla Parra Escartín,
Teresa Lynn,
Joss Moorkens,
Jane Dunne
Abstract:
This article reports on a survey carried out across the Natural Language Processing (NLP) community. The survey aimed to capture the opinions of the research community on issues surrounding shared tasks, with respect to both participation and organisation. Amongst the 175 responses received, both positive and negative observations were made. We carried out and report on an extensive analysis of th…
▽ More
This article reports on a survey carried out across the Natural Language Processing (NLP) community. The survey aimed to capture the opinions of the research community on issues surrounding shared tasks, with respect to both participation and organisation. Amongst the 175 responses received, both positive and negative observations were made. We carried out and report on an extensive analysis of these responses, which leads us to propose a Shared Task Organisation Checklist that could support future participants and organisers. The proposed Checklist is flexible enough to accommodate the wide diversity of shared tasks in our field and its goal is not to be prescriptive, but rather to serve as a tool that encourages shared task organisers to foreground ethical behaviour, beginning with the common issues that the 175 respondents deemed important. Its usage would not only serve as an instrument to reflect on important aspects of shared tasks, but would also promote increased transparency around them.
△ Less
Submitted 11 May, 2021;
originally announced May 2021.
-
TCP D*: A Low Latency First Congestion Control Algorithm
Authors:
Taran Lynn,
Dipak Ghosal
Abstract:
The choice of feedback mechanism between delay and packet loss has long been a point of contention in TCP congestion control. This has partly been resolved, as it has become increasingly evident that delay based methods are needed to facilitate modern interactive web applications. However, what has not been resolved is what control should be used, with the two candidates being the congestion windo…
▽ More
The choice of feedback mechanism between delay and packet loss has long been a point of contention in TCP congestion control. This has partly been resolved, as it has become increasingly evident that delay based methods are needed to facilitate modern interactive web applications. However, what has not been resolved is what control should be used, with the two candidates being the congestion window and the pacing rate. BBR is a new delay based congestion control algorithm that uses a pacing rate as its primary control and the congestion window as a secondary control. We propose that a congestion window first algorithm might give more desirable performance characteristics in situations where latency must be minimized even at the expense of some loss in throughput. To evaluate this hypothesis we introduce a new congestion control algorithm called TCP D*, which is a congestion window first algorithm that adopts BBR's approach of maximizing delivery rate while minimizing latency. In this paper, we discuss the key features of this algorithm, discuss the differences and similarity to BBR, and present some preliminary results based on a real implementation.
△ Less
Submitted 29 December, 2020;
originally announced December 2020.
-
Treebanking User-Generated Content: a UD Based Overview of Guidelines, Corpora and Unified Recommendations
Authors:
Manuela Sanguinetti,
Lauren Cassidy,
Cristina Bosco,
Özlem Çetinoğlu,
Alessandra Teresa Cignarella,
Teresa Lynn,
Ines Rehbein,
Josef Ruppenhofer,
Djamé Seddah,
Amir Zeldes
Abstract:
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis. Given on the one hand the increasing number of treebanks featuring user-generated content, an…
▽ More
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this article is twofold: (1) to provide a condensed, though comprehensive, overview of such treebanks -- based on available literature -- along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The overarching goal of this article is to provide a common framework for researchers interested in develo** similar resources in UD, thus promoting cross-linguistic consistency, which is a principle that has always been central to the spirit of UD.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Predicting Short-term Mobile Internet Traffic from Internet Activity using Recurrent Neural Networks
Authors:
Guto Leoni Santos,
Pierangelo Rosati,
Theo Lynn,
Judith Kelner,
Djamel Sadok,
Patricia Takako Endo
Abstract:
Mobile network traffic prediction is an important input in to network capacity planning and optimization. Existing approaches may lack the speed and computational complexity to account for bursting, non-linear patterns or other important correlations in time series mobile network data. We compare the performance of two deep learning architectures - Long Short-Term Memory (LSTM) and Gated Recurrent…
▽ More
Mobile network traffic prediction is an important input in to network capacity planning and optimization. Existing approaches may lack the speed and computational complexity to account for bursting, non-linear patterns or other important correlations in time series mobile network data. We compare the performance of two deep learning architectures - Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) - for predicting mobile Internet traffic using two months of Telecom Italia data for the metropolitan area of Milan. K-Means clustering was used a priori to group cells based on Internet activity and the Grid Search method was used to identify the best configurations for each model. The predictive quality of the models was evaluated using root mean squared error. Both Deep Learning algorithms were effective in modeling Internet activity and seasonality, both within days and across two months. We find variations in performance across clusters within the city. Overall, the LSTM outperformed the GRU in our experiments.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
The Greatest Teacher, Failure is: Using Reinforcement Learning for SFC Placement Based on Availability and Energy Consumption
Authors:
Guto Leoni Santos,
Theo Lynn,
Judith Kelner,
Patricia Takako Endo
Abstract:
Software defined networking (SDN) and network functions virtualisation (NFV) are making networks programmable and consequently much more flexible and agile. To meet service level agreements, achieve greater utilisation of legacy networks, faster service deployment, and reduce expenditure, telecommunications operators are deploying increasingly complex service function chains (SFCs). Notwithstandin…
▽ More
Software defined networking (SDN) and network functions virtualisation (NFV) are making networks programmable and consequently much more flexible and agile. To meet service level agreements, achieve greater utilisation of legacy networks, faster service deployment, and reduce expenditure, telecommunications operators are deploying increasingly complex service function chains (SFCs). Notwithstanding the benefits of SFCs, increasing heterogeneity and dynamism from the cloud to the edge introduces significant SFC placement challenges, not least adding or removing network functions while maintaining availability, quality of service, and minimising cost. In this paper, an availability- and energy-aware solution based on reinforcement learning (RL) is proposed for dynamic SFC placement. Two policy-aware RL algorithms, Advantage Actor-Critic (A2C) and Proximal Policy Optimisation (PPO2), are compared using simulations of a ground truth network topology based on the Rede Nacional de Ensino e Pesquisa (RNP) Network, Brazil's National Teaching and Research Network backbone. The simulation results showed that PPO2 generally outperformed A2C and a greedy approach both in terms of acceptance rate and energy consumption. A2C outperformed PPO2 only in the scenario where network servers had a greater number of computing resources.
△ Less
Submitted 18 November, 2020; v1 submitted 12 October, 2020;
originally announced October 2020.
-
Model Predictive Congestion Control for TCP Endpoints
Authors:
Taran Lynn,
Dipak Ghosal,
Nathan Hanford
Abstract:
A common problem in science networks and private wide area networks (WANs) is that of achieving predictable data transfers of multiple concurrent flows by maintaining specific pacing rates for each. We address this problem by develo** a control algorithm based on concepts from model predictive control (MPC) to produce flows with smooth pacing rates and round trip times (RTTs). In the proposed ap…
▽ More
A common problem in science networks and private wide area networks (WANs) is that of achieving predictable data transfers of multiple concurrent flows by maintaining specific pacing rates for each. We address this problem by develo** a control algorithm based on concepts from model predictive control (MPC) to produce flows with smooth pacing rates and round trip times (RTTs). In the proposed approach, we model the bottleneck link as a queue and derive a model relating the pacing rate and the RTT. A MPC based control algorithm based on this model is shown to avoid the extreme window (which translates to rate) reduction that exists in current control algorithms when facing network congestion. We have implemented our algorithm as a Linux kernel module. Through simulation and experimental analysis, we show that our algorithm achieves the goals of a low standard deviation of RTT and pacing rate, even when the bottleneck link is fully utilized. In the case of multiple flows, we can assign different rates to each flow and as long as the sum of rates is less than bottleneck rate, they can maintain their assigned pacing rate with low standard deviation. This is achieved even when the flows have different RTTs.
△ Less
Submitted 22 February, 2020;
originally announced February 2020.
-
Right Scaling for Right Pricing: A Case Study on Total Cost of Ownership Measurement for Cloud Migration
Authors:
Pierangelo Rosati,
Frank Fowley,
Claus Pahl,
Davide Taibi,
Theo Lynn
Abstract:
Cloud computing promises traditional enterprises and independent software vendors a myriad of advantages over on-premise installations including cost, operational and organizational efficiencies. The decision to migrate software configured for on-premise delivery to the cloud requires careful technical consideration and planning. In this chapter, we discuss the impact of right-scaling on the cost…
▽ More
Cloud computing promises traditional enterprises and independent software vendors a myriad of advantages over on-premise installations including cost, operational and organizational efficiencies. The decision to migrate software configured for on-premise delivery to the cloud requires careful technical consideration and planning. In this chapter, we discuss the impact of right-scaling on the cost modelling for migration decision making and price setting of software for commercial resale. An integrated process is presented for measuring total cost of ownership, taking in to account IaaS/PaaS resource consumption based on forecast SaaS usage levels. The process is illustrated with a real world case study.
△ Less
Submitted 12 August, 2019;
originally announced August 2019.
-
The Case for Cloud Service Trustmarks and Assurance-as-a-Service
Authors:
Theo Lynn,
Philip Healy,
Richard McClatchey,
John Morrison,
Claus Pahl,
Brian Lee
Abstract:
Cloud computing represents a significant economic opportunity for Europe. However, this growth is threatened by adoption barriers largely related to trust. This position paper examines trust and confidence issues in cloud computing and advances a case for addressing them through the implementation of a novel trustmark scheme for cloud service providers. The proposed trustmark would be both active…
▽ More
Cloud computing represents a significant economic opportunity for Europe. However, this growth is threatened by adoption barriers largely related to trust. This position paper examines trust and confidence issues in cloud computing and advances a case for addressing them through the implementation of a novel trustmark scheme for cloud service providers. The proposed trustmark would be both active and dynamic featuring multi-modal information about the performance of the underlying cloud service. The trustmarks would be informed by live performance data from the cloud service provider, or ideally an independent third-party accountability and assurance service that would communicate up-to-date information relating to service performance and dependability. By combining assurance measures with a remediation scheme, cloud service providers could both signal dependability to customers and the wider marketplace and provide customers, auditors and regulators with a mechanism for determining accountability in the event of failure or non-compliance. As a result, the trustmarks would convey to consumers of cloud services and other stakeholders that strong assurance and accountability measures are in place for the service in question and thereby address trust and confidence issues in cloud computing.
△ Less
Submitted 24 February, 2014;
originally announced February 2014.
-
Bid-Centric Cloud Service Provisioning
Authors:
Philip Healy,
Stefan Meyer,
John Morrison,
Theo Lynn,
Ashkan Paya,
Dan C. Marinescu
Abstract:
Bid-centric service descriptions have the potential to offer a new cloud service provisioning model that promotes portability, diversity of choice and differentiation between providers. A bid matching model based on requirements and capabilities is presented that provides the basis for such an approach. In order to facilitate the bidding process, tenders should be specified as abstractly as possib…
▽ More
Bid-centric service descriptions have the potential to offer a new cloud service provisioning model that promotes portability, diversity of choice and differentiation between providers. A bid matching model based on requirements and capabilities is presented that provides the basis for such an approach. In order to facilitate the bidding process, tenders should be specified as abstractly as possible so that the solution space is not needlessly restricted. To this end, we describe how partial TOSCA service descriptions allow for a range of diverse solutions to be proposed by multiple providers in response to tenders. Rather than adopting a lowest common denominator approach, true portability should allow for the relative strengths and differentiating features of cloud service providers to be applied to bids. With this in mind, we describe how TOSCA service descriptions could be augmented with additional information in order to facilitate heterogeneity in proposed solutions, such as the use of coprocessors and provider-specific services.
△ Less
Submitted 17 December, 2013;
originally announced December 2013.