-
Reliable edge machine learning hardware for scientific applications
Authors:
Tommaso Baldi,
Javier Campos,
Ben Hawks,
Jennifer Ngadiuba,
Nhan Tran,
Daniel Diaz,
Javier Duarte,
Ryan Kastner,
Andres Meza,
Melissa Quinnan,
Olivia Weng,
Caleb Geniesse,
Amir Gholami,
Michael W. Mahoney,
Vladimir Loncar,
Philip Harris,
Joshua Agar,
Shuyu Qin
Abstract:
Extreme data rate scientific experiments create massive amounts of data that require efficient ML edge processing. This leads to unique validation challenges for VLSI implementations of ML algorithms: enabling bit-accurate functional simulations for performance validation in experimental software frameworks, verifying those ML models are robust under extreme quantization and pruning, and enabling…
▽ More
Extreme data rate scientific experiments create massive amounts of data that require efficient ML edge processing. This leads to unique validation challenges for VLSI implementations of ML algorithms: enabling bit-accurate functional simulations for performance validation in experimental software frameworks, verifying those ML models are robust under extreme quantization and pruning, and enabling ultra-fine-grained model inspection for efficient fault tolerance. We discuss approaches to develo** and validating reliable algorithms at the scientific edge under such strict latency, resource, power, and area requirements in extreme experimental environments. We study metrics for develo** robust algorithms, present preliminary results and mitigation strategies, and conclude with an outlook of these and future directions of research towards the longer-term goal of develo** autonomous scientific experimentation methods for accelerated scientific discovery.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Improving Reward Models with Synthetic Critiques
Authors:
Zihuiwen Ye,
Fraser Greenlee-Scott,
Max Bartolo,
Phil Blunsom,
Jon Ander Campos,
Matthias Gallé
Abstract:
Reward models (RM) play a critical role in aligning language models through the process of reinforcement learning from human feedback. RMs are trained to predict a score reflecting human preference, which requires significant time and cost for human annotation. Additionally, RMs tend to quickly overfit on superficial features in the training set, hindering their generalization performance on unsee…
▽ More
Reward models (RM) play a critical role in aligning language models through the process of reinforcement learning from human feedback. RMs are trained to predict a score reflecting human preference, which requires significant time and cost for human annotation. Additionally, RMs tend to quickly overfit on superficial features in the training set, hindering their generalization performance on unseen distributions. We propose a novel approach using synthetic natural language critiques generated by large language models to provide additional feedback, evaluating aspects such as instruction following, correctness, and style. This offers richer signals and more robust features for RMs to assess and score on. We demonstrate that high-quality critiques improve the performance and data efficiency of RMs initialized from different pretrained models. Conversely, we also show that low-quality critiques negatively impact performance. Furthermore, incorporating critiques enhances the interpretability and robustness of RM training.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Aya 23: Open Weight Releases to Further Multilingual Progress
Authors:
Viraat Aryabumi,
John Dang,
Dwarak Talupuru,
Saurabh Dash,
David Cairuz,
Hangyu Lin,
Bharat Venkitesh,
Madeline Smith,
Jon Ander Campos,
Yi Chern Tan,
Kelly Marchisio,
Max Bartolo,
Sebastian Ruder,
Acyr Locatelli,
Julia Kreutzer,
Nick Frosst,
Aidan Gomez,
Phil Blunsom,
Marzieh Fadaee,
Ahmet Üstün,
Sara Hooker
Abstract:
This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modelin…
▽ More
This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.
△ Less
Submitted 31 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively
Authors:
Tiziano Labruna,
Jon Ander Campos,
Gorka Azkune
Abstract:
In this paper, we demonstrate how Large Language Models (LLMs) can effectively learn to use an off-the-shelf information retrieval (IR) system specifically when additional context is required to answer a given question. Given the performance of IR systems, the optimal strategy for question answering does not always entail external information retrieval; rather, it often involves leveraging the par…
▽ More
In this paper, we demonstrate how Large Language Models (LLMs) can effectively learn to use an off-the-shelf information retrieval (IR) system specifically when additional context is required to answer a given question. Given the performance of IR systems, the optimal strategy for question answering does not always entail external information retrieval; rather, it often involves leveraging the parametric memory of the LLM itself. Prior research has identified this phenomenon in the PopQA dataset, wherein the most popular questions are effectively addressed using the LLM's parametric memory, while less popular ones require IR system usage. Following this, we propose a tailored training approach for LLMs, leveraging existing open-domain question answering datasets. Here, LLMs are trained to generate a special token, <RET>, when they do not know the answer to a question. Our evaluation of the Adaptive Retrieval LLM (Adapt-LLM) on the PopQA dataset showcases improvements over the same LLM under three configurations: (i) retrieving information for all the questions, (ii) using always the parametric memory of the LLM, and (iii) using a popularity threshold to decide when to use a retriever. Through our analysis, we demonstrate that Adapt-LLM is able to generate the <RET> token when it determines that it does not know how to answer a question, indicating the need for IR, while it achieves notably high accuracy levels when it chooses to rely only on its parametric memory.
△ Less
Submitted 6 May, 2024; v1 submitted 30 April, 2024;
originally announced April 2024.
-
Verifying message-passing neural networks via topology-based bounds tightening
Authors:
Christopher Hojny,
Shiqiang Zhang,
Juan S. Campos,
Ruth Misener
Abstract:
Since graph neural networks (GNNs) are often vulnerable to attack, we need to know when we can trust them. We develop a computationally effective approach towards providing robust certificates for message-passing neural networks (MPNNs) using a Rectified Linear Unit (ReLU) activation function. Because our work builds on mixed-integer optimization, it encodes a wide variety of subproblems, for exam…
▽ More
Since graph neural networks (GNNs) are often vulnerable to attack, we need to know when we can trust them. We develop a computationally effective approach towards providing robust certificates for message-passing neural networks (MPNNs) using a Rectified Linear Unit (ReLU) activation function. Because our work builds on mixed-integer optimization, it encodes a wide variety of subproblems, for example it admits (i) both adding and removing edges, (ii) both global and local budgets, and (iii) both topological perturbations and feature modifications. Our key technology, topology-based bounds tightening, uses graph structure to tighten bounds. We also experiment with aggressive bounds tightening to dynamically change the optimization constraints by tightening variable bounds. To demonstrate the effectiveness of these strategies, we implement an extension to the open-source branch-and-cut solver SCIP. We test on both node and graph classification problems and consider topological attacks that both add and remove edges.
△ Less
Submitted 21 May, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Augmenting optimization-based molecular design with graph neural networks
Authors:
Shiqiang Zhang,
Juan S. Campos,
Christian Feldmann,
Frederik Sandfort,
Miriam Mathea,
Ruth Misener
Abstract:
Computer-aided molecular design (CAMD) studies quantitative structure-property relationships and discovers desired molecules using optimization algorithms. With the emergence of machine learning models, CAMD score functions may be replaced by various surrogates to automatically learn the structure-property relationships. Due to their outstanding performance on graph domains, graph neural networks…
▽ More
Computer-aided molecular design (CAMD) studies quantitative structure-property relationships and discovers desired molecules using optimization algorithms. With the emergence of machine learning models, CAMD score functions may be replaced by various surrogates to automatically learn the structure-property relationships. Due to their outstanding performance on graph domains, graph neural networks (GNNs) have recently appeared frequently in CAMD. But using GNNs introduces new optimization challenges. This paper formulates GNNs using mixed-integer programming and then integrates this GNN formulation into the optimization and machine learning toolkit OMLT. To characterize and formulate molecules, we inherit the well-established mixed-integer optimization formulation for CAMD and propose symmetry-breaking constraints to remove symmetric solutions caused by graph isomorphism. In two case studies, we investigate fragment-based odorant molecular design with more practical requirements to test the compatibility and performance of our approaches.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark
Authors:
Oscar Sainz,
Jon Ander Campos,
Iker García-Ferrero,
Julen Etxaniz,
Oier Lopez de Lacalle,
Eneko Agirre
Abstract:
In this position paper, we argue that the classical evaluation on Natural Language Processing (NLP) tasks using annotated benchmarks is in trouble. The worst kind of data contamination happens when a Large Language Model (LLM) is trained on the test split of a benchmark, and then evaluated in the same benchmark. The extent of the problem is unknown, as it is not straightforward to measure. Contami…
▽ More
In this position paper, we argue that the classical evaluation on Natural Language Processing (NLP) tasks using annotated benchmarks is in trouble. The worst kind of data contamination happens when a Large Language Model (LLM) is trained on the test split of a benchmark, and then evaluated in the same benchmark. The extent of the problem is unknown, as it is not straightforward to measure. Contamination causes an overestimation of the performance of a contaminated model in a target benchmark and associated task with respect to their non-contaminated counterparts. The consequences can be very harmful, with wrong scientific conclusions being published while other correct ones are discarded. This position paper defines different levels of data contamination and argues for a community effort, including the development of automatic and semi-automatic measures to detect when data from a benchmark was exposed to a model, and suggestions for flagging papers with conclusions that are compromised by data contamination.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Unsupervised Domain Adaption for Neural Information Retrieval
Authors:
Carlos Dominguez,
Jon Ander Campos,
Eneko Agirre,
Gorka Azkune
Abstract:
Neural information retrieval requires costly annotated data for each target domain to be competitive. Synthetic annotation by query generation using Large Language Models or rule-based string manipulation has been proposed as an alternative, but their relative merits have not been analysed. In this paper, we compare both methods head-to-head using the same neural IR architecture. We focus on the B…
▽ More
Neural information retrieval requires costly annotated data for each target domain to be competitive. Synthetic annotation by query generation using Large Language Models or rule-based string manipulation has been proposed as an alternative, but their relative merits have not been analysed. In this paper, we compare both methods head-to-head using the same neural IR architecture. We focus on the BEIR benchmark, which includes test datasets from several domains with no training data, and explore two scenarios: zero-shot, where the supervised system is trained in a large out-of-domain dataset (MS-MARCO); and unsupervised domain adaptation, where, in addition to MS-MARCO, the system is fine-tuned in synthetic data from the target domain. Our results indicate that Large Language Models outperform rule-based methods in all scenarios by a large margin, and, more importantly, that unsupervised domain adaptation is effective compared to applying a supervised IR system in a zero-shot fashion. In addition we explore several sizes of open Large Language Models to generate synthetic data and find that a medium-sized model suffices. Code and models are publicly available for reproducibility.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named Entity Recognition using Knowledge Bases
Authors:
Iker García-Ferrero,
Jon Ander Campos,
Oscar Sainz,
Ander Salaberria,
Dan Roth
Abstract:
Named Entity Recognition (NER) is a core natural language processing task in which pre-trained language models have shown remarkable performance. However, standard benchmarks like CoNLL 2003 do not address many of the challenges that deployed NER systems face, such as having to classify emerging or complex entities in a fine-grained way. In this paper we present a novel NER cascade approach compri…
▽ More
Named Entity Recognition (NER) is a core natural language processing task in which pre-trained language models have shown remarkable performance. However, standard benchmarks like CoNLL 2003 do not address many of the challenges that deployed NER systems face, such as having to classify emerging or complex entities in a fine-grained way. In this paper we present a novel NER cascade approach comprising three steps: first, identifying candidate entities in the input sentence; second, linking the each candidate to an existing knowledge base; third, predicting the fine-grained category for each entity candidate. We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities. Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting where we leverage knowledge bases of high-resource languages.
△ Less
Submitted 27 April, 2023; v1 submitted 20 April, 2023;
originally announced April 2023.
-
End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs
Authors:
Javier Campos,
Zhen Dong,
Javier Duarte,
Amir Gholami,
Michael W. Mahoney,
Jovan Mitrevski,
Nhan Tran
Abstract:
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs) for efficient field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) hardware. Our approach leverages Hessian-aware quantization (HAWQ) of NNs, the Quantized Open Neural Network Exchange (QONNX) intermediate representation, and the hls4ml tool flow for transpi…
▽ More
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs) for efficient field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) hardware. Our approach leverages Hessian-aware quantization (HAWQ) of NNs, the Quantized Open Neural Network Exchange (QONNX) intermediate representation, and the hls4ml tool flow for transpiling NNs into FPGA and ASIC firmware. This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow that can be deployed for real-time machine learning applications in a wide range of scientific and industrial settings. We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the CERN Large Hadron Collider (LHC). Given the high collision rate, all data processing must be implemented on custom ASIC and FPGA hardware within a strict area and latency. Based on these constraints, we implement an optimized mixed-precision NN classifier for high-momentum particle jets in simulated LHC proton-proton collisions.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Training Language Models with Language Feedback at Scale
Authors:
Jérémy Scheurer,
Jon Ander Campos,
Tomasz Korbak,
Jun Shern Chan,
Angelica Chen,
Kyunghyun Cho,
Ethan Perez
Abstract:
Pretrained language models often generate outputs that are not in line with human preferences, such as harmful text or factually incorrect summaries. Recent work approaches the above issues by learning from a simple form of human feedback: comparisons between pairs of model-generated outputs. However, comparison feedback only conveys limited information about human preferences. In this paper, we i…
▽ More
Pretrained language models often generate outputs that are not in line with human preferences, such as harmful text or factually incorrect summaries. Recent work approaches the above issues by learning from a simple form of human feedback: comparisons between pairs of model-generated outputs. However, comparison feedback only conveys limited information about human preferences. In this paper, we introduce Imitation learning from Language Feedback (ILF), a new approach that utilizes more informative language feedback. ILF consists of three steps that are applied iteratively: first, conditioning the language model on the input, an initial LM output, and feedback to generate refinements. Second, selecting the refinement incorporating the most feedback. Third, finetuning the language model to maximize the likelihood of the chosen refinement given the input. We show theoretically that ILF can be viewed as Bayesian Inference, similar to Reinforcement Learning from human feedback. We evaluate ILF's effectiveness on a carefully-controlled toy task and a realistic summarization task. Our experiments demonstrate that large language models accurately incorporate feedback and that finetuning with ILF scales well with the dataset size, even outperforming finetuning on human summaries. Learning from both language and comparison feedback outperforms learning from each alone, achieving human-level summarization performance.
△ Less
Submitted 22 February, 2024; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Improving Code Generation by Training with Natural Language Feedback
Authors:
Angelica Chen,
Jérémy Scheurer,
Tomasz Korbak,
Jon Ander Campos,
Jun Shern Chan,
Samuel R. Bowman,
Kyunghyun Cho,
Ethan Perez
Abstract:
The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development. We build upon this observation by formalizing an algorithm for learning from natural language feedback at training time instead, which we call Imitation learning from Language Feedback (ILF). ILF requires only a small amount of human-written feedbac…
▽ More
The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development. We build upon this observation by formalizing an algorithm for learning from natural language feedback at training time instead, which we call Imitation learning from Language Feedback (ILF). ILF requires only a small amount of human-written feedback during training and does not require the same feedback at test time, making it both user-friendly and sample-efficient. We further show that ILF can be seen as a form of minimizing the KL divergence to the ground truth distribution and demonstrate a proof-of-concept on a neural program synthesis task. We use ILF to improve a Codegen-Mono 6.1B model's pass@1 rate by 38% relative (and 10% absolute) on the Mostly Basic Python Problems (MBPP) benchmark, outperforming both fine-tuning on MBPP and fine-tuning on repaired programs written by humans. Overall, our results suggest that learning from human-written natural language feedback is both more effective and sample-efficient than training exclusively on demonstrations for improving an LLM's performance on code generation tasks.
△ Less
Submitted 22 February, 2024; v1 submitted 28 March, 2023;
originally announced March 2023.
-
BLAB Reporter: Automated journalism covering the Blue Amazon
Authors:
Yan V. Sym,
João Gabriel M. Campos,
Fabio G. Cozman
Abstract:
This demo paper introduces the BLAB Reporter, a robot-journalist covering the Brazilian Blue Amazon. The Reporter is based on a pipeline architecture for Natural Language Generation; it offers daily reports, news summaries and curious facts in Brazilian Portuguese. By collecting, storing and analysing structured data from publicly available sources, the robot-journalist uses domain knowledge to ge…
▽ More
This demo paper introduces the BLAB Reporter, a robot-journalist covering the Brazilian Blue Amazon. The Reporter is based on a pipeline architecture for Natural Language Generation; it offers daily reports, news summaries and curious facts in Brazilian Portuguese. By collecting, storing and analysing structured data from publicly available sources, the robot-journalist uses domain knowledge to generate and publish texts in Twitter. Code and corpus are publicly available
△ Less
Submitted 8 October, 2022;
originally announced October 2022.
-
Comparing Computational Architectures for Automated Journalism
Authors:
Yan V. Sym,
João Gabriel M. Campos,
Marcos M. José,
Fabio G. Cozman
Abstract:
The majority of NLG systems have been designed following either a template-based or a pipeline-based architecture. Recent neural models for data-to-text generation have been proposed with an end-to-end deep learning flavor, which handles non-linguistic input in natural language without explicit intermediary representations. This study compares the most often employed methods for generating Brazili…
▽ More
The majority of NLG systems have been designed following either a template-based or a pipeline-based architecture. Recent neural models for data-to-text generation have been proposed with an end-to-end deep learning flavor, which handles non-linguistic input in natural language without explicit intermediary representations. This study compares the most often employed methods for generating Brazilian Portuguese texts from structured data. Results suggest that explicit intermediate steps in the generation process produce better texts than the ones generated by neural end-to-end architectures, avoiding data hallucination while better generalizing to unseen inputs. Code and corpus are publicly available.
△ Less
Submitted 8 October, 2022;
originally announced October 2022.
-
The BLue Amazon Brain (BLAB): A Modular Architecture of Services about the Brazilian Maritime Territory
Authors:
Paulo Pirozelli,
Ais B. R. Castro,
Ana Luiza C. de Oliveira,
André S. Oliveira,
Flávio N. Cação,
Igor C. Silveira,
João G. M. Campos,
Laura C. Motheo,
Leticia F. Figueiredo,
Lucas F. A. O. Pellicer,
Marcelo A. José,
Marcos M. José,
Pedro de M. Ligabue,
Ricardo S. Grava,
Rodrigo M. Tavares,
Vinícius B. Matos,
Yan V. Sym,
Anna H. R. Costa,
Anarosa A. F. Brandão,
Denis D. Mauá,
Fabio G. Cozman,
Sarajane M. Peres
Abstract:
We describe the first steps in the development of an artificial agent focused on the Brazilian maritime territory, a large region within the South Atlantic also known as the Blue Amazon. The "BLue Amazon Brain" (BLAB) integrates a number of services aimed at disseminating information about this region and its importance, functioning as a tool for environmental awareness. The main service provided…
▽ More
We describe the first steps in the development of an artificial agent focused on the Brazilian maritime territory, a large region within the South Atlantic also known as the Blue Amazon. The "BLue Amazon Brain" (BLAB) integrates a number of services aimed at disseminating information about this region and its importance, functioning as a tool for environmental awareness. The main service provided by BLAB is a conversational facility that deals with complex questions about the Blue Amazon, called BLAB-Chat; its central component is a controller that manages several task-oriented natural language processing modules (e.g., question answering and summarizer systems). These modules have access to an internal data lake as well as to third-party databases. A news reporter (BLAB-Reporter) and a purposely-developed wiki (BLAB-Wiki) are also part of the BLAB service architecture. In this paper, we describe our current version of BLAB's architecture (interface, backend, web services, NLP modules, and resources) and comment on the challenges we have faced so far, such as the lack of training data and the scattered state of domain information. Solving these issues presents a considerable challenge in the development of artificial intelligence for technical domains.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Unifying Effects of Direct and Relational Associations for Visual Communication
Authors:
Melissa A. Schoenlein,
Johnny Campos,
Kevin J. Lande,
Laurent Lessard,
Karen B. Schloss
Abstract:
People have expectations about how colors map to concepts in visualizations, and they are better at interpreting visualizations that match their expectations. Traditionally, studies on these expectations (inferred map**s) distinguished distinct factors relevant for visualizations of categorical vs. continuous information. Studies on categorical information focused on direct associations (e.g., m…
▽ More
People have expectations about how colors map to concepts in visualizations, and they are better at interpreting visualizations that match their expectations. Traditionally, studies on these expectations (inferred map**s) distinguished distinct factors relevant for visualizations of categorical vs. continuous information. Studies on categorical information focused on direct associations (e.g., mangos are associated with yellows) whereas studies on continuous information focused on relational associations (e.g., darker colors map to larger quantities; dark-is-more bias). We unite these two areas within a single framework of assignment inference. Assignment inference is the process by which people infer map**s between perceptual features and concepts represented in encoding systems. Observers infer globally optimal assignments by maximizing the "merit," or "goodness," of each possible assignment. Previous work on assignment inference focused on visualizations of categorical information. We extend this approach to visualizations of continuous data by (a) broadening the notion of merit to include relational associations and (b) develo** a method for combining multiple (sometimes conflicting) sources of merit to predict people's inferred map**s. We developed and tested our model on data from experiments in which participants interpreted colormap data visualizations, representing fictitious data about environmental concepts (sunshine, shade, wild fire, ocean water, glacial ice). We found both direct and relational associations contribute independently to inferred map**s. These results can be used to optimize visualization design to facilitate visual communication.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Towards Explainable Social Agent Authoring tools: A case study on FAtiMA-Toolkit
Authors:
Manuel Guimarães,
Joana Campos,
Pedro A. Santos,
João Dias,
Rui Prada
Abstract:
The deployment of Socially Intelligent Agents (SIAs) in learning environments has proven to have several advantages in different areas of application. Social Agent Authoring Tools allow scenario designers to create tailored experiences with high control over SIAs behaviour, however, on the flip side, this comes at a cost as the complexity of the scenarios and its authoring can become overbearing.…
▽ More
The deployment of Socially Intelligent Agents (SIAs) in learning environments has proven to have several advantages in different areas of application. Social Agent Authoring Tools allow scenario designers to create tailored experiences with high control over SIAs behaviour, however, on the flip side, this comes at a cost as the complexity of the scenarios and its authoring can become overbearing. In this paper we introduce the concept of Explainable Social Agent Authoring Tools with the goal of analysing if authoring tools for social agents are understandable and interpretable. To this end we examine whether an authoring tool, FAtiMA-Toolkit, is understandable and its authoring steps interpretable, from the point-of-view of the author. We conducted two user studies to quantitatively assess the Interpretability, Comprehensibility and Transparency of FAtiMA-Toolkit from the perspective of a scenario designer. One of the key findings is the fact that FAtiMA-Toolkit's conceptual model is, in general, understandable, however the emotional-based concepts were not as easily understood and used by the authors. Although there are some positive aspects regarding the explainability of FAtiMA-Toolkit, there is still progress to be made to achieve a fully explainable social agent authoring tool. We provide a set of key concepts and possible solutions that can guide developers to build such tools.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
Training Language Models with Language Feedback
Authors:
Jérémy Scheurer,
Jon Ander Campos,
Jun Shern Chan,
Angelica Chen,
Kyunghyun Cho,
Ethan Perez
Abstract:
Pretrained language models often do not perform tasks in ways that are in line with our preferences, e.g., generating offensive text or factually incorrect summaries. Recent work approaches the above issue by learning from a simple form of human evaluation: comparisons between pairs of model-generated task outputs. Comparison feedback conveys limited information about human preferences per human e…
▽ More
Pretrained language models often do not perform tasks in ways that are in line with our preferences, e.g., generating offensive text or factually incorrect summaries. Recent work approaches the above issue by learning from a simple form of human evaluation: comparisons between pairs of model-generated task outputs. Comparison feedback conveys limited information about human preferences per human evaluation. Here, we propose to learn from natural language feedback, which conveys more information per human evaluation. We learn from language feedback on model outputs using a three-step learning algorithm. First, we condition the language model on the initial output and feedback to generate many refinements. Second, we choose the refinement with the highest similarity to the feedback. Third, we finetune a language model to maximize the likelihood of the chosen refinement given the input. In synthetic experiments, we first evaluate whether language models accurately incorporate feedback to produce refinements, finding that only large language models (175B parameters) do so. Using only 100 samples of human-written feedback, our learning algorithm finetunes a GPT-3 model to roughly human-level summarization ability.
△ Less
Submitted 17 November, 2022; v1 submitted 29 April, 2022;
originally announced April 2022.
-
Measuring Complexity of Learning Schemes Using Hessian-Schatten Total Variation
Authors:
Shayan Aziznejad,
Joaquim Campos,
Michael Unser
Abstract:
In this paper, we introduce the Hessian-Schatten total variation (HTV) -- a novel seminorm that quantifies the total "rugosity" of multivariate functions. Our motivation for defining HTV is to assess the complexity of supervised-learning schemes. We start by specifying the adequate matrix-valued Banach spaces that are equipped with suitable classes of mixed norms. We then show that the HTV is inva…
▽ More
In this paper, we introduce the Hessian-Schatten total variation (HTV) -- a novel seminorm that quantifies the total "rugosity" of multivariate functions. Our motivation for defining HTV is to assess the complexity of supervised-learning schemes. We start by specifying the adequate matrix-valued Banach spaces that are equipped with suitable classes of mixed norms. We then show that the HTV is invariant to rotations, scalings, and translations. Additionally, its minimum value is achieved for linear map**s, which supports the common intuition that linear regression is the least complex learning model. Next, we present closed-form expressions of the HTV for two general classes of functions. The first one is the class of Sobolev functions with a certain degree of regularity, for which we show that the HTV coincides with the Hessian-Schatten seminorm that is sometimes used as a regularizer for image reconstruction. The second one is the class of continuous and piecewise-linear (CPWL) functions. In this case, we show that the HTV reflects the total change in slopes between linear regions that have a common facet. Hence, it can be viewed as a convex relaxation (l1-type) of the number of linear regions (l0-type) of CPWL map**s. Finally, we illustrate the use of our proposed seminorm.
△ Less
Submitted 31 January, 2022; v1 submitted 12 December, 2021;
originally announced December 2021.
-
QBugs: A Collection of Reproducible Bugs in Quantum Algorithms and a Supporting Infrastructure to Enable Controlled Quantum Software Testing and Debugging Experiments
Authors:
José Campos,
André Souto
Abstract:
Reproducibility and comparability of empirical results are at the core tenet of the scientific method in any scientific field. To ease reproducibility of empirical studies, several benchmarks in software engineering research, such as Defects4J, have been developed and widely used. For quantum software engineering research, however, no benchmark has been established yet. In this position paper, we…
▽ More
Reproducibility and comparability of empirical results are at the core tenet of the scientific method in any scientific field. To ease reproducibility of empirical studies, several benchmarks in software engineering research, such as Defects4J, have been developed and widely used. For quantum software engineering research, however, no benchmark has been established yet. In this position paper, we propose a new benchmark -- named QBugs -- which will provide experimental subjects and an experimental infrastructure to ease the evaluation of new research and the reproducibility of previously published results on quantum software engineering.
△ Less
Submitted 31 March, 2021;
originally announced March 2021.
-
CHARET: Character-centered Approach to Emotion Tracking in Stories
Authors:
Diogo S. Carvalho,
Joana Campos,
Manuel Guimarães,
Ana Antunes,
João Dias,
Pedro A. Santos
Abstract:
Autonomous agents that can engage in social interactions witha human is the ultimate goal of a myriad of applications. A keychallenge in the design of these applications is to define the socialbehavior of the agent, which requires extensive content creation.In this research, we explore how we can leverage current state-of-the-art tools to make inferences about the emotional state ofa character in…
▽ More
Autonomous agents that can engage in social interactions witha human is the ultimate goal of a myriad of applications. A keychallenge in the design of these applications is to define the socialbehavior of the agent, which requires extensive content creation.In this research, we explore how we can leverage current state-of-the-art tools to make inferences about the emotional state ofa character in a story as events unfold, in a coherent way. Wepropose a character role-labelling approach to emotion tracking thataccounts for the semantics of emotions. We show that by identifyingactors and objects of events and considering the emotional stateof the characters, we can achieve better performance in this task,when compared to end-to-end approaches.
△ Less
Submitted 19 July, 2021; v1 submitted 15 February, 2021;
originally announced February 2021.
-
Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning
Authors:
Jon Ander Campos,
Kyunghyun Cho,
Arantxa Otegi,
Aitor Soroa,
Gorka Azkune,
Eneko Agirre
Abstract:
The interaction of conversational systems with users poses an exciting opportunity for improving them after deployment, but little evidence has been provided of its feasibility. In most applications, users are not able to provide the correct answer to the system, but they are able to provide binary (correct, incorrect) feedback. In this paper we propose feedback-weighted learning based on importan…
▽ More
The interaction of conversational systems with users poses an exciting opportunity for improving them after deployment, but little evidence has been provided of its feasibility. In most applications, users are not able to provide the correct answer to the system, but they are able to provide binary (correct, incorrect) feedback. In this paper we propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback. We perform simulated experiments on document classification (for development) and Conversational Question Answering datasets like QuAC and DoQA, where binary user feedback is derived from gold annotations. The results show that our method is able to improve over the initial supervised system, getting close to a fully-supervised system that has access to the same labeled examples in in-domain experiments (QuAC), and even matching in out-of-domain experiments (DoQA). Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
△ Less
Submitted 1 November, 2020;
originally announced November 2020.
-
Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems
Authors:
Jan Deriu,
Don Tuggener,
Pius von Däniken,
Jon Ander Campos,
Alvaro Rodrigo,
Thiziri Belkacem,
Aitor Soroa,
Eneko Agirre,
Mark Cieliebak
Abstract:
The lack of time-efficient and reliable evaluation methods hamper the development of conversational dialogue systems (chatbots). Evaluations requiring humans to converse with chatbots are time and cost-intensive, put high cognitive demands on the human judges, and yield low-quality results. In this work, we introduce \emph{Spot The Bot}, a cost-efficient and robust evaluation framework that replac…
▽ More
The lack of time-efficient and reliable evaluation methods hamper the development of conversational dialogue systems (chatbots). Evaluations requiring humans to converse with chatbots are time and cost-intensive, put high cognitive demands on the human judges, and yield low-quality results. In this work, we introduce \emph{Spot The Bot}, a cost-efficient and robust evaluation framework that replaces human-bot conversations with conversations between bots. Human judges then only annotate for each entity in a conversation whether they think it is human or not (assuming there are humans participants in these conversations). These annotations then allow us to rank chatbots regarding their ability to mimic the conversational behavior of humans. Since we expect that all bots are eventually recognized as such, we incorporate a metric that measures which chatbot can uphold human-like behavior the longest, i.e., \emph{Survival Analysis}. This metric has the ability to correlate a bot's performance to certain of its characteristics (e.g., \ fluency or sensibleness), yielding interpretable results. The comparably low cost of our framework allows for frequent evaluations of chatbots during their evaluation cycle. We empirically validate our claims by applying \emph{Spot The Bot} to three domains, evaluating several state-of-the-art chatbots, and drawing comparisons to related work. The framework is released as a ready-to-use tool.
△ Less
Submitted 5 October, 2020;
originally announced October 2020.
-
DoQA -- Accessing Domain-Specific FAQs via Conversational QA
Authors:
Jon Ander Campos,
Arantxa Otegi,
Aitor Soroa,
Jan Deriu,
Mark Cieliebak,
Eneko Agirre
Abstract:
The goal of this work is to build conversational Question Answering (QA) interfaces for the large body of domain-specific information available in FAQ sites. We present DoQA, a dataset with 2,437 dialogues and 10,917 QA pairs. The dialogues are collected from three Stack Exchange sites using the Wizard of Oz method with crowdsourcing. Compared to previous work, DoQA comprises well-defined informat…
▽ More
The goal of this work is to build conversational Question Answering (QA) interfaces for the large body of domain-specific information available in FAQ sites. We present DoQA, a dataset with 2,437 dialogues and 10,917 QA pairs. The dialogues are collected from three Stack Exchange sites using the Wizard of Oz method with crowdsourcing. Compared to previous work, DoQA comprises well-defined information needs, leading to more coherent and natural conversations with less factoid questions and is multi-domain. In addition, we introduce a more realistic information retrieval(IR) scenario where the system needs to find the answer in any of the FAQ documents. The results of an existing, strong, system show that, thanks to transfer learning from a Wikipedia QA dataset and fine tuning on a single FAQ domain, it is possible to build high quality conversational QA systems for FAQs without in-domain training data. The good results carry over into the more challenging IR scenario. In both cases, there is still ample room for improvement, as indicated by the higher human upperbound.
△ Less
Submitted 18 May, 2020; v1 submitted 4 May, 2020;
originally announced May 2020.
-
Give your Text Representation Models some Love: the Case for Basque
Authors:
Rodrigo Agerri,
Iñaki San Vicente,
Jon Ander Campos,
Ander Barrena,
Xabier Saralegi,
Aitor Soroa,
Eneko Agirre
Abstract:
Word embeddings and pre-trained language models allow to build rich representations of text and have enabled improvements across most NLP tasks. Unfortunately they are very expensive to train, and many small companies and research groups tend to use models that have been pre-trained and made available by third parties, rather than building their own. This is suboptimal as, for many languages, the…
▽ More
Word embeddings and pre-trained language models allow to build rich representations of text and have enabled improvements across most NLP tasks. Unfortunately they are very expensive to train, and many small companies and research groups tend to use models that have been pre-trained and made available by third parties, rather than building their own. This is suboptimal as, for many languages, the models have been trained on smaller (or lower quality) corpora. In addition, monolingual pre-trained models for non-English languages are not always available. At best, models for those languages are included in multilingual versions, where each language shares the quota of substrings and parameters with the rest of the languages. This is particularly true for smaller languages such as Basque. In this paper we show that a number of monolingual models (FastText word embeddings, FLAIR and BERT language models) trained with larger Basque corpora produce much better results than publicly available versions in downstream NLP tasks, including topic classification, sentiment classification, PoS tagging and NER. This work sets a new state-of-the-art in those tasks for Basque. All benchmarks and models used in this work are publicly available.
△ Less
Submitted 2 April, 2020; v1 submitted 31 March, 2020;
originally announced April 2020.
-
Deep Neural Networks with Trainable Activations and Controlled Lipschitz Constant
Authors:
Shayan Aziznejad,
Harshit Gupta,
Joaquim Campos,
Michael Unser
Abstract:
We introduce a variational framework to learn the activation functions of deep neural networks. Our aim is to increase the capacity of the network while controlling an upper-bound of the actual Lipschitz constant of the input-output relation. To that end, we first establish a global bound for the Lipschitz constant of neural networks. Based on the obtained bound, we then formulate a variational pr…
▽ More
We introduce a variational framework to learn the activation functions of deep neural networks. Our aim is to increase the capacity of the network while controlling an upper-bound of the actual Lipschitz constant of the input-output relation. To that end, we first establish a global bound for the Lipschitz constant of neural networks. Based on the obtained bound, we then formulate a variational problem for learning activation functions. Our variational problem is infinite-dimensional and is not computationally tractable. However, we prove that there always exists a solution that has continuous and piecewise-linear (linear-spline) activations. This reduces the original problem to a finite-dimensional minimization where an l1 penalty on the parameters of the activations favors the learning of sparse nonlinearities. We numerically compare our scheme with standard ReLU network and its variations, PReLU and LeakyReLU and we empirically demonstrate the practical aspects of our framework.
△ Less
Submitted 7 August, 2020; v1 submitted 17 January, 2020;
originally announced January 2020.
-
Content Adaptive Optimization for Neural Image Compression
Authors:
Joaquim Campos,
Simon Meierhans,
Abdelaziz Djelouah,
Christopher Schroers
Abstract:
The field of neural image compression has witnessed exciting progress as recently proposed architectures already surpass the established transform coding based approaches. While, so far, research has mainly focused on architecture and model improvements, in this work we explore content adaptive optimization. To this end, we introduce an iterative procedure which adapts the latent representation to…
▽ More
The field of neural image compression has witnessed exciting progress as recently proposed architectures already surpass the established transform coding based approaches. While, so far, research has mainly focused on architecture and model improvements, in this work we explore content adaptive optimization. To this end, we introduce an iterative procedure which adapts the latent representation to the specific content we wish to compress while kee** the parameters of the network and the predictive model fixed. Our experiments show that this allows for an overall increase in rate-distortion performance, independently of the specific architecture used. Furthermore, we also evaluate this strategy in the context of adapting a pretrained network to other content that is different in visual appearance or resolution. Here, our experiments show that our adaptation strategy can largely close the gap as compared to models specifically trained for the given content while having the benefit that no additional data in the form of model parameter updates has to be transmitted.
△ Less
Submitted 5 June, 2019; v1 submitted 4 June, 2019;
originally announced June 2019.
-
POSEAMM: A Unified Framework for Solving Pose Problems using an Alternating Minimization Method
Authors:
Joao Campos,
Joao R. Cardoso,
Pedro Miraldo
Abstract:
Pose estimation is one of the most important problems in computer vision. It can be divided in two different categories -- absolute and relative -- and may involve two different types of camera models: central and non-central. State-of-the-art methods have been designed to solve separately these problems. This paper presents a unified framework that is able to solve any pose problem by alternating…
▽ More
Pose estimation is one of the most important problems in computer vision. It can be divided in two different categories -- absolute and relative -- and may involve two different types of camera models: central and non-central. State-of-the-art methods have been designed to solve separately these problems. This paper presents a unified framework that is able to solve any pose problem by alternating optimization techniques between two set of parameters, rotation and translation. In order to make this possible, it is necessary to define an objective function that captures the problem at hand. Since the objective function will depend on the rotation and translation it is not possible to solve it as a simple minimization problem. Hence the use of Alternating Minimization methods, in which the function will be alternatively minimized with respect to the rotation and the translation. We show how to use our framework in three distinct pose problems. Our methods are then benchmarked with both synthetic and real data, showing their better balance between computational time and accuracy.
△ Less
Submitted 9 April, 2019;
originally announced April 2019.
-
Improving the Visualization of Alloy Instances
Authors:
Rui Couto,
José C. Campos,
Nuno Macedo,
Alcino Cunha
Abstract:
Alloy is a lightweight formal specification language, supported by an IDE, which has proven well-suited for reasoning about software design in early development stages. The IDE provides a visualizer that produces graphical representations of analysis results, which is essential for the proper validation of the model. Alloy is a rich language but inherently static, so behavior needs to be explicitl…
▽ More
Alloy is a lightweight formal specification language, supported by an IDE, which has proven well-suited for reasoning about software design in early development stages. The IDE provides a visualizer that produces graphical representations of analysis results, which is essential for the proper validation of the model. Alloy is a rich language but inherently static, so behavior needs to be explicitly encoded and reasoned about. Even though this is a common scenario, the visualizer presents limitations when dealing with such models. The main contribution of this paper is a principled approach to generate instance visualizations, which improves the current Alloy Visualizer, focusing on the representation of behavior.
△ Less
Submitted 27 November, 2018;
originally announced November 2018.
-
Propheticus: Generalizable Machine Learning Framework
Authors:
João R. Campos,
Marco Vieira,
Ernesto Costa
Abstract:
Due to recent technological developments, Machine Learning (ML), a subfield of Artificial Intelligence (AI), has been successfully used to process and extract knowledge from a variety of complex problems. However, a thorough ML approach is complex and highly dependent on the problem at hand. Additionally, implementing the logic required to execute the experiments is no small nor trivial deed, cons…
▽ More
Due to recent technological developments, Machine Learning (ML), a subfield of Artificial Intelligence (AI), has been successfully used to process and extract knowledge from a variety of complex problems. However, a thorough ML approach is complex and highly dependent on the problem at hand. Additionally, implementing the logic required to execute the experiments is no small nor trivial deed, consequentially increasing the probability of faulty code which can compromise the results. Propheticus is a data-driven framework which results of the need for a tool that abstracts some of the inherent complexity of ML, whilst being easy to understand and use, as well as to adapt and expand to assist the user's specific needs. Propheticus systematizes and enforces various complex concepts of an ML experiment workflow, taking into account the nature of both the problem and the data. It contains functionalities to execute all the different tasks, from data preprocessing, to results analysis and comparison. Notwithstanding, it can be fairly easily adapted to different problems due to its flexible architecture, and customized as needed to address the user's needs.
△ Less
Submitted 6 September, 2018;
originally announced September 2018.
-
Evaluation of Formal IDEs for Human-Machine Interface Design and Analysis: The Case of CIRCUS and PVSio-web
Authors:
Camille Fayollas,
Célia Martinie,
Philippe Palanque,
Paolo Masci,
Michael D. Harrison,
José C. Campos,
Saulo Rodrigues e Silva
Abstract:
Critical human-machine interfaces are present in many systems including avionics systems and medical devices. Use error is a concern in these systems both in terms of hardware panels and input devices, and the software that drives the interfaces. Guaranteeing safe usability, in terms of buttons, knobs and displays is now a key element in the overall safety of the system. New integrated developmen…
▽ More
Critical human-machine interfaces are present in many systems including avionics systems and medical devices. Use error is a concern in these systems both in terms of hardware panels and input devices, and the software that drives the interfaces. Guaranteeing safe usability, in terms of buttons, knobs and displays is now a key element in the overall safety of the system. New integrated development environments (IDEs) based on formal methods technologies have been developed by the research community to support the design and analysis of high-confidence human-machine interfaces. To date, little work has focused on the comparison of these particular types of formal IDEs. This paper compares and evaluates two state-of-the-art toolkits: CIRCUS, a model-based development and analysis tool based on Petri net extensions, and PVSio-web, a prototy** toolkit based on the PVS theorem proving system.
△ Less
Submitted 29 January, 2017;
originally announced January 2017.
-
Validating an Approach to Formalize Use Cases with Ontologies
Authors:
Rui Couto,
António Nestor Ribeiro,
José Creissac Campos
Abstract:
Use case driven development methodologies put use cases at the center of the software development process. However, in order to support automated development and analysis, use cases need to be appropriately formalized. This will also help guarantee consistency between requirements specifications and the developed solutions. Formal methods tend to suffer from take up issues, as they are usually har…
▽ More
Use case driven development methodologies put use cases at the center of the software development process. However, in order to support automated development and analysis, use cases need to be appropriately formalized. This will also help guarantee consistency between requirements specifications and the developed solutions. Formal methods tend to suffer from take up issues, as they are usually hard to accept by industry. In this context, it is relevant not only to produce languages and approaches to support formalization, but also to perform their validation. In previous works we have developed an approach to formalize use cases resorting to ontologies. In this paper we present the validation of one such approach. Through a three stage study, we evaluate the acceptance of the language and supporting tool. The first stage focusses on the acceptance of the process and language, the second on the support the tool provides to the process, and finally the third one on the tool's usability aspects. Results show test subjects found the approach feasible and useful and the tool easy to use.
△ Less
Submitted 29 March, 2016;
originally announced March 2016.
-
Application of Ontologies in Identifying Requirements Patterns in Use Cases
Authors:
Rui Couto,
António Nestor Ribeiro,
José Creissac Campos
Abstract:
Use case specifications have successfully been used for requirements description. They allow joining, in the same modeling space, the expectations of the stakeholders as well as the needs of the software engineer and analyst involved in the process. While use cases are not meant to describe a system's implementation, by formalizing their description we are able to extract implementation relevant…
▽ More
Use case specifications have successfully been used for requirements description. They allow joining, in the same modeling space, the expectations of the stakeholders as well as the needs of the software engineer and analyst involved in the process. While use cases are not meant to describe a system's implementation, by formalizing their description we are able to extract implementation relevant information from them. More specifically, we are interested in identifying requirements patterns (common requirements with typical implementation solutions) in support for a requirements based software development approach. In the paper we propose the transformation of Use Case descriptions expressed in a Controlled Natural Language into an ontology expressed in the Web Ontology Language (OWL). OWL's query engines can then be used to identify requirements patterns expressed as queries over the ontology. We describe a tool that we have developed to support the approach and provide an example of usage.
△ Less
Submitted 3 April, 2014;
originally announced April 2014.
-
Channels as Objects in Concurrent Object-Oriented Programming
Authors:
Joana Campos,
Vasco T. Vasconcelos
Abstract:
There is often a sort of a protocol associated to each class, stating when and how certain methods should be called. Given that this protocol is, if at all, described in the documentation accompanying the class, current mainstream object-oriented languages cannot provide for the verification of client code adherence against the sought class behaviour. We have defined a class-based concurrent objec…
▽ More
There is often a sort of a protocol associated to each class, stating when and how certain methods should be called. Given that this protocol is, if at all, described in the documentation accompanying the class, current mainstream object-oriented languages cannot provide for the verification of client code adherence against the sought class behaviour. We have defined a class-based concurrent object-oriented language that formalises such protocols in the form of usage types. Usage types are attached to class definitions, allowing for the specification of (1) the available methods, (2) the tests clients must perform on the result of methods, and (3) the object status - linear or shared - all of which depend on the object's state. Our work extends the recent approach on modular session types by eliminating channel operations, and defining the method call as the single communication primitive in both sequential and concurrent settings. In contrast to previous works, we define a single category for objects, instead of distinct categories for linear and for shared objects, and let linear objects evolve into shared ones. We introduce a standard sync qualifier to prevent thread interference in certain operations on shared objects. We formalise the language syntax, the operational semantics, and a type system that enforces by static ty** that methods are called only when available, and by a single client if so specified in the usage type. We illustrate the language via a complete example.
△ Less
Submitted 18 October, 2011;
originally announced October 2011.