-
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
Authors:
David Romero,
Chenyang Lyu,
Haryo Akbarianto Wibowo,
Teresa Lynn,
Injy Hamed,
Aditya Nanda Kishore,
Aishik Mandal,
Alina Dragonetti,
Artem Abzaliev,
Atnafu Lambebo Tonja,
Bontu Fufa Balcha,
Chenxi Whitehouse,
Christian Salamea,
Dan John Velasco,
David Ifeoluwa Adelani,
David Le Meur,
Emilio Villa-Cueva,
Fajri Koto,
Fauzan Farooqui,
Frederico Belcavello,
Ganzorig Batnasan,
Gisela Vallejo,
Grainne Caulfield,
Guido Ivetta,
Haiyue Song
, et al. (50 additional authors not shown)
Abstract:
Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recen…
▽ More
Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recent efforts have tried to increase the number of languages covered on VQA datasets, they still lack diversity in low-resource languages. More importantly, although these datasets often extend their linguistic range via translation or some other approaches, they usually keep images the same, resulting in narrow cultural representation. To address these limitations, we construct CVQA, a new Culturally-diverse multilingual Visual Question Answering benchmark, designed to cover a rich set of languages and cultures, where we engage native speakers and cultural experts in the data collection process. As a result, CVQA includes culturally-driven images and questions from across 28 countries on four continents, covering 26 languages with 11 scripts, providing a total of 9k questions. We then benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models. This benchmark can serve as a probing evaluation suite for assessing the cultural capability and bias of multimodal models and hopefully encourage more research efforts toward increasing cultural awareness and linguistic diversity in this field.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Requirements are All You Need: The Final Frontier for End-User Software Engineering
Authors:
Diana Robinson,
Christian Cabrera,
Andrew D. Gordon,
Neil D. Lawrence,
Lars Mennen
Abstract:
What if end users could own the software development lifecycle from conception to deployment using only requirements expressed in language, images, video or audio? We explore this idea, building on the capabilities that generative Artificial Intelligence brings to software generation and maintenance techniques. How could designing software in this way better serve end users? What are the implicati…
▽ More
What if end users could own the software development lifecycle from conception to deployment using only requirements expressed in language, images, video or audio? We explore this idea, building on the capabilities that generative Artificial Intelligence brings to software generation and maintenance techniques. How could designing software in this way better serve end users? What are the implications of this process for the future of end-user software engineering and the software development lifecycle? We discuss the research needed to bridge the gap between where we are today and these imagined systems of the future.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Self-sustaining Software Systems (S4): Towards Improved Interpretability and Adaptation
Authors:
Christian Cabrera,
Andrei Paleyes,
Neil D. Lawrence
Abstract:
Software systems impact society at different levels as they pervasively solve real-world problems. Modern software systems are often so sophisticated that their complexity exceeds the limits of human comprehension. These systems must respond to changing goals, dynamic data, unexpected failures, and security threats, among other variable factors in real-world environments. Systems' complexity chall…
▽ More
Software systems impact society at different levels as they pervasively solve real-world problems. Modern software systems are often so sophisticated that their complexity exceeds the limits of human comprehension. These systems must respond to changing goals, dynamic data, unexpected failures, and security threats, among other variable factors in real-world environments. Systems' complexity challenges their interpretability and requires autonomous responses to dynamic changes. Two main research areas explore autonomous systems' responses: evolutionary computing and autonomic computing. Evolutionary computing focuses on software improvement based on iterative modifications to the source code. Autonomic computing focuses on optimising systems' performance by changing their structure, behaviour, or environment variables. Approaches from both areas rely on feedback loops that accumulate knowledge from the system interactions to inform autonomous decision-making. However, this knowledge is often limited, constraining the systems' interpretability and adaptability. This paper proposes a new concept for interpretable and adaptable software systems: self-sustaining software systems (S4). S4 builds knowledge loops between all available knowledge sources that define modern software systems to improve their interpretability and adaptability. This paper introduces and discusses the S4 concept.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Prevalence of Code Smells in Reinforcement Learning Projects
Authors:
Nicolás Cardozo,
Ivana Dusparic,
Christian Cabrera
Abstract:
Reinforcement Learning (RL) is being increasingly used to learn and adapt application behavior in many domains, including large-scale and safety critical systems, as for example, autonomous driving. With the advent of plug-n-play RL libraries, its applicability has further increased, enabling integration of RL algorithms by users. We note, however, that the majority of such code is not developed b…
▽ More
Reinforcement Learning (RL) is being increasingly used to learn and adapt application behavior in many domains, including large-scale and safety critical systems, as for example, autonomous driving. With the advent of plug-n-play RL libraries, its applicability has further increased, enabling integration of RL algorithms by users. We note, however, that the majority of such code is not developed by RL engineers, which as a consequence, may lead to poor program quality yielding bugs, suboptimal performance, maintainability, and evolution problems for RL-based projects. In this paper we begin the exploration of this hypothesis, specific to code utilizing RL, analyzing different projects found in the wild, to assess their quality from a software engineering perspective. Our study includes 24 popular RL-based Python projects, analyzed with standard software engineering metrics. Our results, aligned with similar analyses for ML code in general, show that popular and widely reused RL repositories contain many code smells (3.95% of the code base on average), significantly affecting the projects' maintainability. The most common code smells detected are long method and long method chain, highlighting problems in the definition and interaction of agents. Detected code smells suggest problems in responsibility separation, and the appropriateness of current abstractions for the definition of RL algorithms.
△ Less
Submitted 3 August, 2023; v1 submitted 17 March, 2023;
originally announced March 2023.
-
Real-world Machine Learning Systems: A survey from a Data-Oriented Architecture Perspective
Authors:
Christian Cabrera,
Andrei Paleyes,
Pierre Thodoroff,
Neil D. Lawrence
Abstract:
Machine Learning models are being deployed as parts of real-world systems with the upsurge of interest in artificial intelligence. The design, implementation, and maintenance of such systems are challenged by real-world environments that produce larger amounts of heterogeneous data and users requiring increasingly faster responses with efficient resource consumption. These requirements push preval…
▽ More
Machine Learning models are being deployed as parts of real-world systems with the upsurge of interest in artificial intelligence. The design, implementation, and maintenance of such systems are challenged by real-world environments that produce larger amounts of heterogeneous data and users requiring increasingly faster responses with efficient resource consumption. These requirements push prevalent software architectures to the limit when deploying ML-based systems. Data-oriented Architecture (DOA) is an emerging concept that equips systems better for integrating ML models. DOA extends current architectures to create data-driven, loosely coupled, decentralised, open systems. Even though papers on deployed ML-based systems do not mention DOA, their authors made design decisions that implicitly follow DOA. The reasons why, how, and the extent to which DOA is adopted in these systems are unclear. Implicit design decisions limit the practitioners' knowledge of DOA to design ML-based systems in the real world. This paper answers these questions by surveying real-world deployments of ML-based systems. The survey shows the design decisions of the systems and the requirements these satisfy. Based on the survey findings, we also formulate practical advice to facilitate the deployment of ML-based systems. Finally, we outline open challenges to deploying DOA-based systems that integrate ML models.
△ Less
Submitted 9 October, 2023; v1 submitted 9 February, 2023;
originally announced February 2023.
-
An Empirical Evaluation of Flow Based Programming in the Machine Learning Deployment Context
Authors:
Andrei Paleyes,
Christian Cabrera,
Neil D. Lawrence
Abstract:
As use of data driven technologies spreads, software engineers are more often faced with the task of solving a business problem using data-driven methods such as machine learning (ML) algorithms. Deployment of ML within large software systems brings new challenges that are not addressed by standard engineering practices and as a result businesses observe high rate of ML deployment project failures…
▽ More
As use of data driven technologies spreads, software engineers are more often faced with the task of solving a business problem using data-driven methods such as machine learning (ML) algorithms. Deployment of ML within large software systems brings new challenges that are not addressed by standard engineering practices and as a result businesses observe high rate of ML deployment project failures. Data Oriented Architecture (DOA) is an emerging approach that can support data scientists and software developers when addressing such challenges. However, there is a lack of clarity about how DOA systems should be implemented in practice. This paper proposes to consider Flow-Based Programming (FBP) as a paradigm for creating DOA applications. We empirically evaluate FBP in the context of ML deployment on four applications that represent typical data science projects. We use Service Oriented Architecture (SOA) as a baseline for comparison. Evaluation is done with respect to different application domains, ML deployment stages, and code quality metrics. Results reveal that FBP is a suitable paradigm for data collection and data science tasks, and is able to simplify data collection and discovery when compared with SOA. We discuss the advantages of FBP as well as the gaps that need to be addressed to increase FBP adoption as a standard design paradigm for DOA.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
Towards better data discovery and collection with flow-based programming
Authors:
Andrei Paleyes,
Christian Cabrera,
Neil D. Lawrence
Abstract:
Despite huge successes reported by the field of machine learning, such as voice assistants or self-driving cars, businesses still observe very high failure rate when it comes to deployment of ML in production. We argue that part of the reason is infrastructure that was not designed for data-oriented activities. This paper explores the potential of flow-based programming (FBP) for simplifying data…
▽ More
Despite huge successes reported by the field of machine learning, such as voice assistants or self-driving cars, businesses still observe very high failure rate when it comes to deployment of ML in production. We argue that part of the reason is infrastructure that was not designed for data-oriented activities. This paper explores the potential of flow-based programming (FBP) for simplifying data discovery and collection in software systems. We compare FBP with the currently prevalent service-oriented paradigm to assess characteristics of each paradigm in the context of ML deployment. We develop a data processing application, formulate a subsequent ML deployment task, and measure the impact of the task implementation within both programming paradigms. Our main conclusion is that FBP shows great potential for providing data-centric infrastructural benefits for deployment of ML. Additionally, we provide an insight into the current trend that prioritizes model development over data quality management.
△ Less
Submitted 25 October, 2021; v1 submitted 9 August, 2021;
originally announced August 2021.
-
WasteNet: Waste Classification at the Edge for Smart Bins
Authors:
Gary White,
Christian Cabrera,
Andrei Palade,
Fan Li,
Siobhan Clarke
Abstract:
Smart Bins have become popular in smart cities and campuses around the world. These bins have a compaction mechanism that increases the bins' capacity as well as automated real-time collection notifications. In this paper, we propose WasteNet, a waste classification model based on convolutional neural networks that can be deployed on a low power device at the edge of the network, such as a Jetson…
▽ More
Smart Bins have become popular in smart cities and campuses around the world. These bins have a compaction mechanism that increases the bins' capacity as well as automated real-time collection notifications. In this paper, we propose WasteNet, a waste classification model based on convolutional neural networks that can be deployed on a low power device at the edge of the network, such as a Jetson Nano. The problem of segregating waste is a big challenge for many countries around the world. Automated waste classification at the edge allows for fast intelligent decisions in smart bins without needing access to the cloud. Waste is classified into six categories: paper, cardboard, glass, metal, plastic and other. Our model achieves a 97\% prediction accuracy on the test dataset. This level of classification accuracy will help to alleviate some common smart bin problems, such as recycling contamination, where different types of waste become mixed with recycling waste causing the bin to be contaminated. It also makes the bins more user friendly as citizens do not have to worry about disposing their rubbish in the correct bin as the smart bin will be able to make the decision for them.
△ Less
Submitted 10 June, 2020;
originally announced June 2020.
-
Estimating the concentration of gold nanoparticles incorporated on Natural Rubber membranes using Multi-Level Starlet Optimal Segmentation
Authors:
Alexandre Fioravante de Siqueira,
Flávio Camargo Cabrera,
Aylton Pagamisse,
Aldo Eloizo Job
Abstract:
This study consolidates Multi-Level Starlet Segmentation (MLSS) and Multi-Level Starlet Optimal Segmentation (MLSOS), techniques for photomicrograph segmentation that use starlet wavelet detail levels to separate areas of interest in an input image. Several segmentation levels can be obtained using Multi-Level Starlet Segmentation; after that, Matthews correlation coefficient (MCC) is used to choo…
▽ More
This study consolidates Multi-Level Starlet Segmentation (MLSS) and Multi-Level Starlet Optimal Segmentation (MLSOS), techniques for photomicrograph segmentation that use starlet wavelet detail levels to separate areas of interest in an input image. Several segmentation levels can be obtained using Multi-Level Starlet Segmentation; after that, Matthews correlation coefficient (MCC) is used to choose an optimal segmentation level, giving rise to Multi-Level Starlet Optimal Segmentation. In this paper, MLSOS is employed to estimate the concentration of gold nanoparticles with diameter around 47 nm, reducted on natural rubber membranes. These samples were used on the construction of SERS/SERRS substrates and in the study of natural rubber membranes with incorporated gold nanoparticles influence on Leishmania braziliensis physiology. Precision, recall and accuracy are used to evaluate the segmentation performance, and MLSOS presents accuracy greater than 88% for this application.
△ Less
Submitted 26 October, 2016;
originally announced October 2016.
-
Segmentation of scanning electron microscopy images from natural rubber samples with gold nanoparticles using starlet wavelets
Authors:
Alexandre Fioravante de Siqueira,
Flávio Camargo Cabrera,
Aylton Pagamisse,
Aldo Eloizo Job
Abstract:
Electronic microscopy has been used for morphology evaluation of different materials structures. However, microscopy results may be affected by several factors. Image processing methods can be used to correct and improve the quality of these results. In this paper we propose an algorithm based on starlets to perform the segmentation of scanning electron microscopy images. An application is present…
▽ More
Electronic microscopy has been used for morphology evaluation of different materials structures. However, microscopy results may be affected by several factors. Image processing methods can be used to correct and improve the quality of these results. In this paper we propose an algorithm based on starlets to perform the segmentation of scanning electron microscopy images. An application is presented in order to locate gold nanoparticles in natural rubber membranes. In this application, our method showed accuracy greater than 85% for all test images. Results given by this method will be used in future studies, to computationally estimate the density distribution of gold nanoparticles in natural rubber samples and to predict reduction kinetics of gold nanoparticles at different time periods.
△ Less
Submitted 12 June, 2016;
originally announced June 2016.
-
Jansen-MIDAS: a multi-level photomicrograph segmentation software based on isotropic undecimated wavelets
Authors:
Alexandre Fioravante de Siqueira,
Flávio Camargo Cabrera,
Wagner Massayuki Nakasuga,
Aylton Pagamisse,
Aldo Eloizo Job
Abstract:
Image segmentation, the process of separating the elements within an image, is frequently used for obtaining information from photomicrographs. However, segmentation methods should be used with reservations: incorrect segmentation can mislead when interpreting regions of interest (ROI), thus decreasing the success rate of additional procedures. Multi-Level Starlet Segmentation (MLSS) and Multi-Lev…
▽ More
Image segmentation, the process of separating the elements within an image, is frequently used for obtaining information from photomicrographs. However, segmentation methods should be used with reservations: incorrect segmentation can mislead when interpreting regions of interest (ROI), thus decreasing the success rate of additional procedures. Multi-Level Starlet Segmentation (MLSS) and Multi-Level Starlet Optimal Segmentation (MLSOS) were developed to address the photomicrograph segmentation deficiency on general tools. These methods gave rise to Jansen-MIDAS, an open-source software which a scientist can use to obtain a multi-level threshold segmentation of his/hers photomicrographs. This software is presented in two versions: a text-based version, for GNU Octave, and a graphical user interface (GUI) version, for MathWorks MATLAB. It can be used to process several types of images, becoming a reliable alternative to the scientist.
△ Less
Submitted 25 May, 2016; v1 submitted 20 April, 2016;
originally announced April 2016.