Search | arXiv e-print repository

Teaching and Learning Ethnography for Software Engineering Contexts

Authors: Yvonne Dittrich, Helen Sharp, Cleidson de Souza

Abstract: Ethnography has become one of the established methods for empirical research on software engineering. Although there is a wide variety of introductory books available, there has been no material targeting software engineering students particularly, until now. In this chapter we provide an introduction to teaching and learning ethnography for faculty teaching ethnography to software engineering gra… ▽ More Ethnography has become one of the established methods for empirical research on software engineering. Although there is a wide variety of introductory books available, there has been no material targeting software engineering students particularly, until now. In this chapter we provide an introduction to teaching and learning ethnography for faculty teaching ethnography to software engineering graduate students and for the students themselves of such courses. The contents of the chapter focuses on what we think is the core basic knowledge for newbies to ethnography as a research method. We complement the text with proposals for exercises, tips for teaching, and pitfalls that we and our students have experienced. The chapter is designed to support part of a course on empirical software engineering and provides pointers and literature for further reading. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 38 pages, to be published in: Daniel Mendez, Paris Avgeriou, Marcos Kalinowski, and Nauman bin Ali (eds.) Teaching Empirical Research Methods in Software Engineering, Springer

arXiv:2406.00049 [pdf, other]

QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation

Authors: Gonçalo R. A. Faria, Sweta Agrawal, António Farinhas, Ricardo Rei, José G. C. de Souza, André F. T. Martins

Abstract: An important challenge in machine translation (MT) is to generate high-quality and diverse translations. Prior work has shown that the estimated likelihood from the MT model correlates poorly with translation quality. In contrast, quality evaluation metrics (such as COMET or BLEURT) exhibit high correlations with human judgments, which has motivated their use as rerankers (such as quality-aware an… ▽ More An important challenge in machine translation (MT) is to generate high-quality and diverse translations. Prior work has shown that the estimated likelihood from the MT model correlates poorly with translation quality. In contrast, quality evaluation metrics (such as COMET or BLEURT) exhibit high correlations with human judgments, which has motivated their use as rerankers (such as quality-aware and minimum Bayes risk decoding). However, relying on a single translation with high estimated quality increases the chances of "gaming the metric''. In this paper, we address the problem of sampling a set of high-quality and diverse translations. We provide a simple and effective way to avoid over-reliance on noisy quality estimates by using them as the energy function of a Gibbs distribution. Instead of looking for a mode in the distribution, we generate multiple samples from high-density areas through the Metropolis-Hastings algorithm, a simple Markov chain Monte Carlo approach. The results show that our proposed method leads to high-quality and diverse outputs across multiple language pairs (English$\leftrightarrow${German, Russian}) with two strong decoder-only LLMs (Alma-7b, Tower-7b). △ Less

Submitted 28 May, 2024; originally announced June 2024.

arXiv:2402.17733 [pdf, other]

Tower: An Open Multilingual Large Language Model for Translation-Related Tasks

Authors: Duarte M. Alves, José Pombal, Nuno M. Guerreiro, Pedro H. Martins, João Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes, Sweta Agrawal, Pierre Colombo, José G. C. de Souza, André F. T. Martins

Abstract: While general-purpose large language models (LLMs) demonstrate proficiency on multiple tasks within the domain of translation, approaches based on open LLMs are competitive only when specializing on a single task. In this paper, we propose a recipe for tailoring LLMs to multiple tasks present in translation workflows. We perform continued pretraining on a multilingual mixture of monolingual and pa… ▽ More While general-purpose large language models (LLMs) demonstrate proficiency on multiple tasks within the domain of translation, approaches based on open LLMs are competitive only when specializing on a single task. In this paper, we propose a recipe for tailoring LLMs to multiple tasks present in translation workflows. We perform continued pretraining on a multilingual mixture of monolingual and parallel data, creating TowerBase, followed by finetuning on instructions relevant for translation processes, creating TowerInstruct. Our final model surpasses open alternatives on several tasks relevant to translation workflows and is competitive with general-purpose closed LLMs. To facilitate future research, we release the Tower models, our specialization dataset, an evaluation framework for LLMs focusing on the translation ecosystem, and a collection of model generations, including ours, on our benchmark. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.17420 [pdf, other]

PANDAS: Prototype-based Novel Class Discovery and Detection

Authors: Tyler L. Hayes, César R. de Souza, Namil Kim, Jiwon Kim, Riccardo Volpi, Diane Larlus

Abstract: Object detectors are typically trained once and for all on a fixed set of classes. However, this closed-world assumption is unrealistic in practice, as new classes will inevitably emerge after the detector is deployed in the wild. In this work, we look at ways to extend a detector trained for a set of base classes so it can i) spot the presence of novel classes, and ii) automatically enrich its re… ▽ More Object detectors are typically trained once and for all on a fixed set of classes. However, this closed-world assumption is unrealistic in practice, as new classes will inevitably emerge after the detector is deployed in the wild. In this work, we look at ways to extend a detector trained for a set of base classes so it can i) spot the presence of novel classes, and ii) automatically enrich its repertoire to be able to detect those newly discovered classes together with the base ones. We propose PANDAS, a method for novel class discovery and detection. It discovers clusters representing novel classes from unlabeled data, and represents old and new classes with prototypes. During inference, a distance-based classifier uses these prototypes to assign a label to each detected object instance. The simplicity of our method makes it widely applicable. We experimentally demonstrate the effectiveness of PANDAS on the VOC 2012 and COCO-to-LVIS benchmarks. It performs favorably against the state of the art for this task while being computationally more affordable. △ Less

Submitted 30 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted to the Conference on Lifelong Learning Agents (CoLLAs 2024)

arXiv:2312.08472 [pdf, other]

AutoNumerics-Zero: Automated Discovery of State-of-the-Art Mathematical Functions

Authors: Esteban Real, Yao Chen, Mirko Rossini, Connal de Souza, Manav Garg, Akhil Verghese, Moritz Firsching, Quoc V. Le, Ekin Dogus Cubuk, David H. Park

Abstract: Computers calculate transcendental functions by approximating them through the composition of a few limited-precision instructions. For example, an exponential can be calculated with a Taylor series. These approximation methods were developed over the centuries by mathematicians, who emphasized the attainability of arbitrary precision. Computers, however, operate on few limited precision types, su… ▽ More Computers calculate transcendental functions by approximating them through the composition of a few limited-precision instructions. For example, an exponential can be calculated with a Taylor series. These approximation methods were developed over the centuries by mathematicians, who emphasized the attainability of arbitrary precision. Computers, however, operate on few limited precision types, such as the popular float32. In this study, we show that when aiming for limited precision, existing approximation methods can be outperformed by programs automatically discovered from scratch by a simple evolutionary algorithm. In particular, over real numbers, our method can approximate the exponential function reaching orders of magnitude more precision for a given number of operations when compared to previous approaches. More practically, over float32 numbers and constrained to less than 1 ULP of error, the same method attains a speedup over baselines by generating code that triggers better XLA/LLVM compilation paths. In other words, in both cases, evolution searched a vast space of possible programs, without knowledge of mathematics, to discover previously unknown optimized approximations to high precision, for the first time. We also give evidence that these results extend beyond the exponential. The ubiquity of transcendental functions suggests that our method has the potential to reduce the cost of scientific computing applications. △ Less

Submitted 13 December, 2023; originally announced December 2023.

ACM Class: I.2.2; I.2.6; G.1.2

arXiv:2311.18452 [pdf, other]

Developer Experiences with a Contextualized AI Coding Assistant: Usability, Expectations, and Outcomes

Authors: Gustavo Pinto, Cleidson de Souza, Thayssa Rocha, Igor Steinmacher, Alberto de Souza, Edward Monteiro

Abstract: In the rapidly advancing field of artificial intelligence, software development has emerged as a key area of innovation. Despite the plethora of general-purpose AI assistants available, their effectiveness diminishes in complex, domain-specific scenarios. Noting this limitation, both the academic community and industry players are relying on contextualized coding AI assistants. These assistants su… ▽ More In the rapidly advancing field of artificial intelligence, software development has emerged as a key area of innovation. Despite the plethora of general-purpose AI assistants available, their effectiveness diminishes in complex, domain-specific scenarios. Noting this limitation, both the academic community and industry players are relying on contextualized coding AI assistants. These assistants surpass general-purpose AI tools by integrating proprietary, domain-specific knowledge, offering precise and relevant solutions. Our study focuses on the initial experiences of 62 participants who used a contextualized coding AI assistant -- named StackSpot AI -- in a controlled setting. According to the participants, the assistants' use resulted in significant time savings, easier access to documentation, and the generation of accurate codes for internal APIs. However, challenges associated with the knowledge sources necessary to make the coding assistant access more contextual information as well as variable responses and limitations in handling complex codes were observed. The study's findings, detailing both the benefits and challenges of contextualized AI assistants, underscore their potential to revolutionize software development practices, while also highlighting areas for further refinement. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.18450 [pdf, other]

Lessons from Building StackSpot AI: A Contextualized AI Coding Assistant

Authors: Gustavo Pinto, Cleidson de Souza, João Batista Neto, Alberto de Souza, Tarcísio Gotto, Edward Monteiro

Abstract: With their exceptional natural language processing capabilities, tools based on Large Language Models (LLMs) like ChatGPT and Co-Pilot have swiftly become indispensable resources in the software developer's toolkit. While recent studies suggest the potential productivity gains these tools can unlock, users still encounter drawbacks, such as generic or incorrect answers. Additionally, the pursuit o… ▽ More With their exceptional natural language processing capabilities, tools based on Large Language Models (LLMs) like ChatGPT and Co-Pilot have swiftly become indispensable resources in the software developer's toolkit. While recent studies suggest the potential productivity gains these tools can unlock, users still encounter drawbacks, such as generic or incorrect answers. Additionally, the pursuit of improved responses often leads to extensive prompt engineering efforts, diverting valuable time from writing code that delivers actual value. To address these challenges, a new breed of tools, built atop LLMs, is emerging. These tools aim to mitigate drawbacks by employing techniques like fine-tuning or enriching user prompts with contextualized information. In this paper, we delve into the lessons learned by a software development team venturing into the creation of such a contextualized LLM-based application, using retrieval-based techniques, called CodeBuddy. Over a four-month period, the team, despite lacking prior professional experience in LLM-based applications, built the product from scratch. Following the initial product release, we engaged with the development team responsible for the code generative components. Through interviews and analysis of the application's issue tracker, we uncover various intriguing challenges that teams working on LLM-based applications might encounter. For instance, we found three main group of lessons: LLM-based lessons, User-based lessons, and Technical lessons. By understanding these lessons, software development teams could become better prepared to build LLM-based applications. △ Less

Submitted 4 January, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

arXiv:2310.13448 [pdf, other]

Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning

Authors: Duarte M. Alves, Nuno M. Guerreiro, João Alves, José Pombal, Ricardo Rei, José G. C. de Souza, Pierre Colombo, André F. T. Martins

Abstract: Large language models (LLMs) are a promising avenue for machine translation (MT). However, current LLM-based MT systems are brittle: their effectiveness highly depends on the choice of few-shot examples and they often require extra post-processing due to overgeneration. Alternatives such as finetuning on translation instructions are computationally expensive and may weaken in-context learning capa… ▽ More Large language models (LLMs) are a promising avenue for machine translation (MT). However, current LLM-based MT systems are brittle: their effectiveness highly depends on the choice of few-shot examples and they often require extra post-processing due to overgeneration. Alternatives such as finetuning on translation instructions are computationally expensive and may weaken in-context learning capabilities, due to overspecialization. In this paper, we provide a closer look at this problem. We start by showing that adapter-based finetuning with LoRA matches the performance of traditional finetuning while reducing the number of training parameters by a factor of 50. This method also outperforms few-shot prompting and eliminates the need for post-processing or in-context examples. However, we show that finetuning generally degrades few-shot performance, hindering adaptation capabilities. Finally, to obtain the best of both worlds, we propose a simple approach that incorporates few-shot examples during finetuning. Experiments on 10 language pairs show that our proposed approach recovers the original few-shot capabilities while kee** the added benefits of finetuning. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: Accepted at EMNLP 2023 - Findings

arXiv:2310.11430 [pdf, other]

An Empirical Study of Translation Hypothesis Ensembling with Large Language Models

Authors: António Farinhas, José G. C. de Souza, André F. T. Martins

Abstract: Large language models (LLMs) are becoming a one-fits-many solution, but they sometimes hallucinate or produce unreliable output. In this paper, we investigate how hypothesis ensembling can improve the quality of the generated text for the specific problem of LLM-based machine translation. We experiment with several techniques for ensembling hypotheses produced by LLMs such as ChatGPT, LLaMA, and A… ▽ More Large language models (LLMs) are becoming a one-fits-many solution, but they sometimes hallucinate or produce unreliable output. In this paper, we investigate how hypothesis ensembling can improve the quality of the generated text for the specific problem of LLM-based machine translation. We experiment with several techniques for ensembling hypotheses produced by LLMs such as ChatGPT, LLaMA, and Alpaca. We provide a comprehensive study along multiple dimensions, including the method to generate hypotheses (multiple prompts, temperature-based sampling, and beam search) and the strategy to produce the final translation (instruction-based, quality-based reranking, and minimum Bayes risk (MBR) decoding). Our results show that MBR decoding is a very effective method, that translation quality can be improved using a small number of samples, and that instruction tuning has a strong impact on the relation between the diversity of the hypotheses and the sampling temperature. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 (main conference)

arXiv:2309.11925 [pdf, other]

Scaling up COMETKIWI: Unbabel-IST 2023 Submission for the Quality Estimation Shared Task

Authors: Ricardo Rei, Nuno M. Guerreiro, José Pombal, Daan van Stigt, Marcos Treviso, Luisa Coheur, José G. C. de Souza, André F. T. Martins

Abstract: We present the joint contribution of Unbabel and Instituto Superior Técnico to the WMT 2023 Shared Task on Quality Estimation (QE). Our team participated on all tasks: sentence- and word-level quality prediction (task 1) and fine-grained error span detection (task 2). For all tasks, we build on the COMETKIWI-22 model (Rei et al., 2022b). Our multilingual approaches are ranked first for all tasks,… ▽ More We present the joint contribution of Unbabel and Instituto Superior Técnico to the WMT 2023 Shared Task on Quality Estimation (QE). Our team participated on all tasks: sentence- and word-level quality prediction (task 1) and fine-grained error span detection (task 2). For all tasks, we build on the COMETKIWI-22 model (Rei et al., 2022b). Our multilingual approaches are ranked first for all tasks, reaching state-of-the-art performance for quality estimation at word-, span- and sentence-level granularity. Compared to the previous state-of-the-art COMETKIWI-22, we show large improvements in correlation with human judgements (up to 10 Spearman points). Moreover, we surpass the second-best multilingual submission to the shared-task with up to 3.8 absolute points. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2308.05019 [pdf, other]

doi 10.1109/TVCG.2023.3326514

ProWis: A Visual Approach for Building, Managing, and Analyzing Weather Simulation Ensembles at Runtime

Authors: Carolina Veiga Ferreira de Souza, Suzanna Maria Bonnet, Daniel de Oliveira, Marcio Cataldi, Fabio Miranda, Marcos Lage

Abstract: Weather forecasting is essential for decision-making and is usually performed using numerical modeling. Numerical weather models, in turn, are complex tools that require specialized training and laborious setup and are challenging even for weather experts. Moreover, weather simulations are data-intensive computations and may take hours to days to complete. When the simulation is finished, the expe… ▽ More Weather forecasting is essential for decision-making and is usually performed using numerical modeling. Numerical weather models, in turn, are complex tools that require specialized training and laborious setup and are challenging even for weather experts. Moreover, weather simulations are data-intensive computations and may take hours to days to complete. When the simulation is finished, the experts face challenges analyzing its outputs, a large mass of spatiotemporal and multivariate data. From the simulation setup to the analysis of results, working with weather simulations involves several manual and error-prone steps. The complexity of the problem increases exponentially when the experts must deal with ensembles of simulations, a frequent task in their daily duties. To tackle these challenges, we propose ProWis: an interactive and provenance-oriented system to help weather experts build, manage, and analyze simulation ensembles at runtime. Our system follows a human-in-the-loop approach to enable the exploration of multiple atmospheric variables and weather scenarios. ProWis was built in close collaboration with weather experts, and we demonstrate its effectiveness by presenting two case studies of rainfall events in Brazil. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: Accepted at IEEE VIS 2023

Journal ref: Published in: IEEE Transactions on Visualization and Computer Graphics ( Volume: 30, Issue: 1, January 2024)

arXiv:2305.00955 [pdf, other]

Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation

Authors: Patrick Fernandes, Aman Madaan, Emmy Liu, António Farinhas, Pedro Henrique Martins, Amanda Bertsch, José G. C. de Souza, Shuyan Zhou, Tongshuang Wu, Graham Neubig, André F. T. Martins

Abstract: Many recent advances in natural language generation have been fueled by training large language models on internet-scale data. However, this paradigm can lead to models that generate toxic, inaccurate, and unhelpful content, and automatic evaluation metrics often fail to identify these behaviors. As models become more capable, human feedback is an invaluable signal for evaluating and improving mod… ▽ More Many recent advances in natural language generation have been fueled by training large language models on internet-scale data. However, this paradigm can lead to models that generate toxic, inaccurate, and unhelpful content, and automatic evaluation metrics often fail to identify these behaviors. As models become more capable, human feedback is an invaluable signal for evaluating and improving models. This survey aims to provide an overview of the recent research that has leveraged human feedback to improve natural language generation. First, we introduce an encompassing formalization of feedback, and identify and organize existing research into a taxonomy following this formalization. Next, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using the feedback or training feedback models. We also discuss existing datasets for human-feedback data collection, and concerns surrounding feedback collection. Finally, we provide an overview of the nascent field of AI feedback, which exploits large language models to make judgments based on a set of principles and minimize the need for human intervention. △ Less

Submitted 31 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Comments: Work in Progress

arXiv:2302.05488 [pdf]

Element-Wise Attention Layers: an option for optimization

Authors: Giovanni Araujo Bacochina, Rodrigo Clemente Thom de Souza

Abstract: The use of Attention Layers has become a trend since the popularization of the Transformer-based models, being the key element for many state-of-the-art models that have been developed through recent years. However, one of the biggest obstacles in implementing these architectures - as well as many others in Deep Learning Field - is the enormous amount of optimizing parameters they possess, which m… ▽ More The use of Attention Layers has become a trend since the popularization of the Transformer-based models, being the key element for many state-of-the-art models that have been developed through recent years. However, one of the biggest obstacles in implementing these architectures - as well as many others in Deep Learning Field - is the enormous amount of optimizing parameters they possess, which make its use conditioned on the availability of robust hardware. In this paper, it's proposed a new method of attention mechanism that adapts the Dot-Product Attention, which uses matrices multiplications, to become element-wise through the use of arrays multiplications. To test the effectiveness of such approach, two models (one with a VGG-like architecture and one with the proposed method) have been trained in a classification task using Fashion MNIST and CIFAR10 datasets. Each model has been trained for 10 epochs in a single Tesla T4 GPU from Google Colaboratory. The results show that this mechanism allows for an accuracy of 92% of the VGG-like counterpart in Fashion MNIST dataset, while reducing the number of parameters in 97%. For CIFAR10, the accuracy is still equivalent to 60% of the VGG-like counterpart while using 50% less parameters. △ Less

Submitted 10 February, 2023; originally announced February 2023.

arXiv:2302.05433 [pdf, other]

Unified Functional Hashing in Automatic Machine Learning

Authors: Ryan Gillard, Stephen Jonany, Yingjie Miao, Michael Munn, Connal de Souza, Jonathan Dungay, Chen Liang, David R. So, Quoc V. Le, Esteban Real

Abstract: The field of Automatic Machine Learning (AutoML) has recently attained impressive results, including the discovery of state-of-the-art machine learning solutions, such as neural image classifiers. This is often done by applying an evolutionary search method, which samples multiple candidate solutions from a large space and evaluates the quality of each candidate through a long training process. As… ▽ More The field of Automatic Machine Learning (AutoML) has recently attained impressive results, including the discovery of state-of-the-art machine learning solutions, such as neural image classifiers. This is often done by applying an evolutionary search method, which samples multiple candidate solutions from a large space and evaluates the quality of each candidate through a long training process. As a result, the search tends to be slow. In this paper, we show that large efficiency gains can be obtained by employing a fast unified functional hash, especially through the functional equivalence caching technique, which we also present. The central idea is to detect by hashing when the search method produces equivalent candidates, which occurs very frequently, and this way avoid their costly re-evaluation. Our hash is "functional" in that it identifies equivalent candidates even if they were represented or coded differently, and it is "unified" in that the same algorithm can hash arbitrary representations; e.g. compute graphs, imperative code, or lambda functions. As evidence, we show dramatic improvements on multiple AutoML domains, including neural architecture search and algorithm discovery. Finally, we consider the effect of hash collisions, evaluation noise, and search distribution through empirical analysis. Altogether, we hope this paper may serve as a guide to hashing techniques in AutoML. △ Less

Submitted 10 February, 2023; originally announced February 2023.

ACM Class: I.2.2; I.2.6

arXiv:2209.12985 [pdf, other]

A Bibliometrics Analysis on 28 years of Authentication and Threat Model Area

Authors: Wesley dos Reis Bezerra, Cristiano Antônio de Souza, Carla Merkle Westphall, Carlos Becker Westphall

Abstract: The large volume of publications in any research area can make it difficult for researchers to track their research areas' trends, challenges, and characteristics. Bibliometrics solves this problem by bringing statistical tools to help the analysis of selected publications from an online database. Although there are different works in security, our study aims to fill the bibliometric gap in the au… ▽ More The large volume of publications in any research area can make it difficult for researchers to track their research areas' trends, challenges, and characteristics. Bibliometrics solves this problem by bringing statistical tools to help the analysis of selected publications from an online database. Although there are different works in security, our study aims to fill the bibliometric gap in the authentication and threat model area. As a result, a description of the dataset obtained, an overview of some selected variables, and an analysis of the ten most cited articles in this selected dataset is presented, which brings together publications from the last 28 years in these areas combined. △ Less

Submitted 26 September, 2022; originally announced September 2022.

arXiv:2209.12984 [pdf, other]

Characteristics and Main Threats about Multi-Factor Authentication: A Survey

Authors: Wesley dos Reis Bezerra, Cristiano Antônio de Souza, Carla Merkle Westphall, Carlos Becker Westphall

Abstract: This work reports that the Systematic Literature Review process is responsible for providing theoretical support to research in the Threat Model and Multi-Factor Authentication. However, different from the related works, this study aims to evaluate the main characteristics of authentication solutions and their threat model. Also, it intends to list characteristics, threats, and related content to… ▽ More This work reports that the Systematic Literature Review process is responsible for providing theoretical support to research in the Threat Model and Multi-Factor Authentication. However, different from the related works, this study aims to evaluate the main characteristics of authentication solutions and their threat model. Also, it intends to list characteristics, threats, and related content to a state-of-art. As a result, we brought a portfolio analysis through charts, figures, and tables presented in the discussion section. △ Less

Submitted 26 September, 2022; originally announced September 2022.

arXiv:2209.06243 [pdf, other]

CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task

Authors: Ricardo Rei, Marcos Treviso, Nuno M. Guerreiro, Chrysoula Zerva, Ana C. Farinha, Christine Maroti, José G. C. de Souza, Taisiya Glushkova, Duarte M. Alves, Alon Lavie, Luisa Coheur, André F. T. Martins

Abstract: We present the joint contribution of IST and Unbabel to the WMT 2022 Shared Task on Quality Estimation (QE). Our team participated on all three subtasks: (i) Sentence and Word-level Quality Prediction; (ii) Explainable QE; and (iii) Critical Error Detection. For all tasks we build on top of the COMET framework, connecting it with the predictor-estimator architecture of OpenKiwi, and equip** it w… ▽ More We present the joint contribution of IST and Unbabel to the WMT 2022 Shared Task on Quality Estimation (QE). Our team participated on all three subtasks: (i) Sentence and Word-level Quality Prediction; (ii) Explainable QE; and (iii) Critical Error Detection. For all tasks we build on top of the COMET framework, connecting it with the predictor-estimator architecture of OpenKiwi, and equip** it with a word-level sequence tagger and an explanation extractor. Our results suggest that incorporating references during pretraining improves performance across several language pairs on downstream tasks, and that jointly training with sentence and word-level objectives yields a further boost. Furthermore, combining attention and gradient information proved to be the top strategy for extracting good explanations of sentence-level QE models. Overall, our submissions achieved the best results for all three tasks for almost all language pairs by a considerable margin. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: WMT 2022 Quality Estimation shared task

arXiv:2205.00978 [pdf, other]

Quality-Aware Decoding for Neural Machine Translation

Authors: Patrick Fernandes, António Farinhas, Ricardo Rei, José G. C. de Souza, Perez Ogayo, Graham Neubig, André F. T. Martins

Abstract: Despite the progress in machine translation quality estimation and evaluation in the last years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers around finding the most probable translation according to the model (MAP decoding), approximated with beam search. In this paper, we bring together these two lines of research and propose quality-aware decoding for NMT… ▽ More Despite the progress in machine translation quality estimation and evaluation in the last years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers around finding the most probable translation according to the model (MAP decoding), approximated with beam search. In this paper, we bring together these two lines of research and propose quality-aware decoding for NMT, by leveraging recent breakthroughs in reference-free and reference-based MT evaluation through various inference methods like $N$-best reranking and minimum Bayes risk decoding. We perform an extensive comparison of various possible candidate generation and ranking methods across four datasets and two model classes and find that quality-aware decoding consistently outperforms MAP-based decoding according both to state-of-the-art automatic metrics (COMET and BLEURT) and to human assessments. Our code is available at https://github.com/deep-spin/qaware-decode. △ Less

Submitted 2 May, 2022; originally announced May 2022.

Comments: NAACL2022

arXiv:2203.05446 [pdf, other]

Algorithms for the Maximum Eulerian Cycle Decomposition Problem

Authors: Pedro O. Pinheiro, Alexsandro Oliveira Alexandrino, Andre R. Oliveira, Cid C. de Souza, Zanoni Dias

Abstract: Given an Eulerian graph G, in the Maximum Eulerian Cycle Decomposition problem, we are interested in finding a collection of edge-disjoint cycles {E_1, E_2, ..., E_k} in G such that all edges of G are in exactly one cycle and k is maximum. We present an algorithm to solve the pricing problem of a column generation Integer Linear Programming (ILP) model introduced by Lancia and Serafini (2016). Fur… ▽ More Given an Eulerian graph G, in the Maximum Eulerian Cycle Decomposition problem, we are interested in finding a collection of edge-disjoint cycles {E_1, E_2, ..., E_k} in G such that all edges of G are in exactly one cycle and k is maximum. We present an algorithm to solve the pricing problem of a column generation Integer Linear Programming (ILP) model introduced by Lancia and Serafini (2016). Furthermore, we propose a greedy heuristic, which searches for minimum size cycles starting from a random vertex, and a heuristic based on partially solving the ILP model. We performed tests comparing the three approaches in relation to the quality of solutions and execution time, using distinct sets of Eulerian graphs, each set grou** graphs with different numbers of vertices and edges. Our experimental results show that the ILP based heuristic outperforms the other methods. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Journal ref: LIII S. Brasileiro de Pesquisa Operacional (SBPO 2021), Galoa, 2021. v. 53. p. 139228

arXiv:2201.05658 [pdf, other]

Sequence-to-Sequence Models for Extracting Information from Registration and Legal Documents

Authors: Ramon Pires, Fábio C. de Souza, Guilherme Rosa, Roberto A. Lotufo, Rodrigo Nogueira

Abstract: A typical information extraction pipeline consists of token- or span-level classification models coupled with a series of pre- and post-processing scripts. In a production pipeline, requirements often change, with classes being added and removed, which leads to nontrivial modifications to the source code and the possible introduction of bugs. In this work, we evaluate sequence-to-sequence models a… ▽ More A typical information extraction pipeline consists of token- or span-level classification models coupled with a series of pre- and post-processing scripts. In a production pipeline, requirements often change, with classes being added and removed, which leads to nontrivial modifications to the source code and the possible introduction of bugs. In this work, we evaluate sequence-to-sequence models as an alternative to token-level classification methods for information extraction of legal and registration documents. We finetune models that jointly extract the information and generate the output already in a structured format. Post-processing steps are learned during training, thus eliminating the need for rule-based methods and simplifying the pipeline. Furthermore, we propose a novel method to align the output with the input text, thus facilitating system inspection and auditing. Our experiments on four real-world datasets show that the proposed method is an alternative to classical pipelines. △ Less

Submitted 14 January, 2022; originally announced January 2022.

arXiv:2012.03381 [pdf, other]

Solving the Minimum Convex Partition of Point Sets with Integer Programming

Authors: Allan Sapucaia, Pedro J. de Rezende, Cid C. de Souza

Abstract: The partition of a problem into smaller sub-problems satisfying certain properties is often a key ingredient in the design of divide-and-conquer algorithms. For questions related to location, the partition problem can be modeled, in geometric terms, as finding a subdivision of a planar map -- which represents, say, a geographical area -- into regions subject to certain conditions while optimizing… ▽ More The partition of a problem into smaller sub-problems satisfying certain properties is often a key ingredient in the design of divide-and-conquer algorithms. For questions related to location, the partition problem can be modeled, in geometric terms, as finding a subdivision of a planar map -- which represents, say, a geographical area -- into regions subject to certain conditions while optimizing some objective function. In this paper, we investigate one of these geometric problems known as the Minimum Convex Partition Problem (MCPP). A convex partition of a point set $P$ in the plane is a subdivision of the convex hull of $P$ whose edges are segments with both endpoints in $P$ and such that all internal faces are empty convex polygons. The MCPP is an NP-hard problem where one seeks to find a convex partition with the least number of faces. We present a novel polygon-based integer programming formulation for the MCPP, which leads to better dual bounds than the previously known edge-based model. Moreover, we introduce a primal heuristic, a branching rule and a pricing algorithm. The combination of these techniques leads to the ability to solve instances with twice as many points as previously possible while constrained to identical computational resources. A comprehensive experimental study is presented to show the impact of our design choices. △ Less

Submitted 6 December, 2020; originally announced December 2020.

Comments: 28 pages, 14 figures, submitted for publication

arXiv:2010.11677 [pdf, other]

Second layer data governance for permissioned blockchains: the privacy management challenge

Authors: Paulo Henrique Alves, Isabella Z. Frajhof, Fernando A. Correia, Clarisse de Souza, Helio Lopes

Abstract: Data privacy is a trending topic in the internet era. Given such importance, many challenges emerged in order to collect, manage, process, and publish data. In this sense, personal data have got attention, and many regulations emerged, such as GDPR in the European Union and LGPD in Brazil. This regulation model aims to protect users' data from misusage and leakage and allow users to request an exp… ▽ More Data privacy is a trending topic in the internet era. Given such importance, many challenges emerged in order to collect, manage, process, and publish data. In this sense, personal data have got attention, and many regulations emerged, such as GDPR in the European Union and LGPD in Brazil. This regulation model aims to protect users' data from misusage and leakage and allow users to request an explanation from companies when needed. In pandemic situations, such as the COVID-19 and Ebola outbreak, the action related to sharing health data between different organizations is/ was crucial to develop a significant movement to avoid the massive infection and decrease the number of deaths. However, the data subject, i.e., the users, should have the right to request the purpose of data use, anonymization, and data deletion. In this sense, permissioned blockchain technology emerges to empower users to get their rights providing data ownership, transparency, and security through an immutable, unified, and distributed database ruled by smart contracts. The governance model discussed in blockchain applications is usually regarding the first layer governance, i.e., public and permissioned models. However, this discussion is too superficial, and they do not cover compliance with the data regulations. Therefore, in order to organize the relationship between data owners and the stakeholders, i.e., companies and governmental entities, we developed a second layer data governance model for permissioned blockchains based on the Governance Analytical Framework principles applied in pandemic situations preserving the users' privacy and their duties. From the law perspective, we based our model on the UE GDPR in regard to data privacy concerns. △ Less

Submitted 22 October, 2020; originally announced October 2020.

arXiv:2007.13867 [pdf, other]

Robust Image Retrieval-based Visual Localization using Kapture

Authors: Martin Humenberger, Yohann Cabon, Nicolas Guerin, Julien Morat, Vincent Leroy, Jérôme Revaud, Philippe Rerole, Noé Pion, Cesar de Souza, Gabriela Csurka

Abstract: Visual localization tackles the challenge of estimating the camera pose from images by using correspondence analysis between query images and a map. This task is computation and data intensive which poses challenges on thorough evaluation of methods on various datasets. However, in order to further advance in the field, we claim that robust visual localization algorithms should be evaluated on mul… ▽ More Visual localization tackles the challenge of estimating the camera pose from images by using correspondence analysis between query images and a map. This task is computation and data intensive which poses challenges on thorough evaluation of methods on various datasets. However, in order to further advance in the field, we claim that robust visual localization algorithms should be evaluated on multiple datasets covering a broad domain variety. To facilitate this, we introduce kapture, a new, flexible, unified data format and toolbox for visual localization and structure-from-motion (SFM). It enables easy usage of different datasets as well as efficient and reusable data processing. To demonstrate this, we present a versatile pipeline for visual localization that facilitates the use of different local and global features, 3D data (e.g. depth maps), non-vision sensor data (e.g. IMU, GPS, WiFi), and various processing algorithms. Using multiple configurations of the pipeline, we show the great versatility of kapture in our experiments. Furthermore, we evaluate our methods on eight public datasets where they rank top on all and first on many of them. To foster future research, we release code, models, and all datasets used in this paper in the kapture format open source under a permissive BSD license. github.com/naver/kapture, github.com/naver/kapture-localization △ Less

Submitted 7 January, 2022; v1 submitted 27 July, 2020; originally announced July 2020.

arXiv:2007.10816 [pdf, ps, other]

Infinite Sequences, Series Convergence and the Discrete Time Fourier Transform over Finite Fields

Authors: R. M. Campello de Souza, M. M. Campello de Souza, H. M. de Oliveira, M. M. Vasconcelos

Abstract: Digital Transforms have important applications on subjects such as channel coding, cryptography and digital signal processing. In this paper, two Fourier Transforms are considered, the discrete time Fourier transform (DTFT) and the finite field Fourier transform (FFFT). A finite field version of the DTFT is introduced and the FFFT is redefined with a complex kernel, which makes it a more appropria… ▽ More Digital Transforms have important applications on subjects such as channel coding, cryptography and digital signal processing. In this paper, two Fourier Transforms are considered, the discrete time Fourier transform (DTFT) and the finite field Fourier transform (FFFT). A finite field version of the DTFT is introduced and the FFFT is redefined with a complex kernel, which makes it a more appropriate finite field version of the Discrete Fourier Transform. These transforms can handle FIR and IIR filters defined over finite algebraic structures. △ Less

Submitted 17 July, 2020; originally announced July 2020.

Comments: 8 pages. arXiv admin note: text overlap with arXiv:1502.03371

MSC Class: 12E20; 11F80; 11F80; 40G05; 40A05 ACM Class: G.2

arXiv:2002.01537 [pdf]

doi 10.1145/3358961.3358971

Academic viewpoints and concerns on CSCW education and training in Latin America

Authors: Francisco J. Gutierrez, Yazmin Magallanes, Laura S. Gaytán-Lugo, Claudia López, Cleidson R. B. de Souza

Abstract: Computer-Supported Cooperative Work, or simply CSCW, is the research area that studies the design and use of socio-technical technology for supporting group work. CSCW has a long tradition in interdisciplinary work exploring technical, social, and theoretical challenges for the design of technologies to support cooperative and collaborative work and life activities. However, most of the research t… ▽ More Computer-Supported Cooperative Work, or simply CSCW, is the research area that studies the design and use of socio-technical technology for supporting group work. CSCW has a long tradition in interdisciplinary work exploring technical, social, and theoretical challenges for the design of technologies to support cooperative and collaborative work and life activities. However, most of the research tradition, methods, and theories in the field follow a strong trend grounded in social and cultural aspects from North America and Western Europe. Therefore, it is inevitable that some of the underlying, and established, knowledge in the field will not be directly transferrable or applicable to other populations. This paper presents the results of an interview study conducted with Latin American faculty on the feasability, viability, and prospect of a curriculum proposal for CSCW Education in Latin America: To this end, we conducted nine interviews with faculty currently based in six countries of the region, aiming to understand how a CSCW course targeted to undergraduate and/or graduate students in Latin America might be deployed. Our findings suggest that there are specific traits that need to be addressed in such a course, such as: tailoring foundational CSCW concepts to the diversity of local cultures, motivating the involvement of students by tackling relevant problems to their local communities, and revitalizing CSCW research and practice in the continent. △ Less

Submitted 4 February, 2020; originally announced February 2020.

Comments: https://dl.acm.org/doi/abs/10.1145/3358961.3358971

arXiv:1910.06699 [pdf, other]

Generating Human Action Videos by Coupling 3D Game Engines and Probabilistic Graphical Models

Authors: César Roberto de Souza, Adrien Gaidon, Yohann Cabon, Naila Murray, Antonio Manuel López

Abstract: Deep video action recognition models have been highly successful in recent years but require large quantities of manually annotated data, which are expensive and laborious to obtain. In this work, we investigate the generation of synthetic training data for video action recognition, as synthetic data have been successfully used to supervise models for a variety of other computer vision tasks. We p… ▽ More Deep video action recognition models have been highly successful in recent years but require large quantities of manually annotated data, which are expensive and laborious to obtain. In this work, we investigate the generation of synthetic training data for video action recognition, as synthetic data have been successfully used to supervise models for a variety of other computer vision tasks. We propose an interpretable parametric generative model of human action videos that relies on procedural generation, physics models and other components of modern game engines. With this model we generate a diverse, realistic, and physically plausible dataset of human action videos, called PHAV for "Procedural Human Action Videos". PHAV contains a total of 39,982 videos, with more than 1,000 examples for each of 35 action categories. Our video generation approach is not limited to existing motion capture sequences: 14 of these 35 categories are procedurally defined synthetic actions. In addition, each video is represented with 6 different data modalities, including RGB, optical flow and pixel-level semantic labels. These modalities are generated almost simultaneously using the Multiple Render Targets feature of modern GPUs. In order to leverage PHAV, we introduce a deep multi-task (i.e. that considers action classes from multiple datasets) representation learning architecture that is able to simultaneously learn from synthetic and real video datasets, even when their action categories differ. Our experiments on the UCF-101 and HMDB-51 benchmarks suggest that combining our large set of synthetic videos with small real-world datasets can boost recognition performance. Our approach also significantly outperforms video representations produced by fine-tuning state-of-the-art unsupervised generative models of videos. △ Less

Submitted 12 October, 2019; originally announced October 2019.

Comments: Pre-print of the article accepted for publication in the Special Issue on Generating Realistic Visual Data of Human Behavior of the International Journal of Computer Vision (IJCV). arXiv admin note: substantial text overlap with arXiv:1612.00881

arXiv:1907.07178 [pdf]

Mediation Challenges and Socio-Technical Gaps for Explainable Deep Learning Applications

Authors: Rafael Brandão, Joel Carbonera, Clarisse de Souza, Juliana Ferreira, Bernardo Gonçalves, Carla Leitão

Abstract: The presumed data owners' right to explanations brought about by the General Data Protection Regulation in Europe has shed light on the social challenges of explainable artificial intelligence (XAI). In this paper, we present a case study with Deep Learning (DL) experts from a research and development laboratory focused on the delivery of industrial-strength AI technologies. Our aim was to investi… ▽ More The presumed data owners' right to explanations brought about by the General Data Protection Regulation in Europe has shed light on the social challenges of explainable artificial intelligence (XAI). In this paper, we present a case study with Deep Learning (DL) experts from a research and development laboratory focused on the delivery of industrial-strength AI technologies. Our aim was to investigate the social meaning (i.e. meaning to others) that DL experts assign to what they do, given a richly contextualized and familiar domain of application. Using qualitative research techniques to collect and analyze empirical data, our study has shown that participating DL experts did not spontaneously engage into considerations about the social meaning of machine learning models that they build. Moreover, when explicitly stimulated to do so, these experts expressed expectations that, with real-world DL application, there will be available mediators to bridge the gap between technical meanings that drive DL work, and social meanings that AI technology users assign to it. We concluded that current research incentives and values guiding the participants' scientific interests and conduct are at odds with those required to face some of the scientific challenges involved in advancing XAI, and thus responding to the alleged data owners' right to explanations or similar societal demands emerging from current debates. As a concrete contribution to mitigate what seems to be a more general problem, we propose three preliminary XAI Mediation Challenges with the potential to bring together technical and social meanings of DL applications, as well as to foster much needed interdisciplinary collaboration among AI and the Social Sciences researchers. △ Less

Submitted 16 July, 2019; originally announced July 2019.

Comments: 39 pages

arXiv:1906.07589 [pdf, other]

Learning with Average Precision: Training Image Retrieval with a Listwise Loss

Authors: Jerome Revaud, Jon Almazan, Rafael Sampaio de Rezende, Cesar Roberto de Souza

Abstract: Image retrieval can be formulated as a ranking problem where the goal is to order database images by decreasing similarity to the query. Recent deep models for image retrieval have outperformed traditional methods by leveraging ranking-tailored loss functions, but important theoretical and practical problems remain. First, rather than directly optimizing the global ranking, they minimize an upper-… ▽ More Image retrieval can be formulated as a ranking problem where the goal is to order database images by decreasing similarity to the query. Recent deep models for image retrieval have outperformed traditional methods by leveraging ranking-tailored loss functions, but important theoretical and practical problems remain. First, rather than directly optimizing the global ranking, they minimize an upper-bound on the essential loss, which does not necessarily result in an optimal mean average precision (mAP). Second, these methods require significant engineering efforts to work well, e.g. special pre-training and hard-negative mining. In this paper we propose instead to directly optimize the global mAP by leveraging recent advances in listwise loss formulations. Using a histogram binning approximation, the AP can be differentiated and thus employed to end-to-end learning. Compared to existing losses, the proposed method considers thousands of images simultaneously at each iteration and eliminates the need for ad hoc tricks. It also establishes a new state of the art on many standard retrieval benchmarks. Models and evaluation scripts have been made available at https://europe.naverlabs.com/Deep-Image-Retrieval/ △ Less

Submitted 18 June, 2019; originally announced June 2019.

arXiv:1906.06195 [pdf, other]

R2D2: Repeatable and Reliable Detector and Descriptor

Authors: Jerome Revaud, Philippe Weinzaepfel, César De Souza, Noe Pion, Gabriela Csurka, Yohann Cabon, Martin Humenberger

Abstract: Interest point detection and local feature description are fundamental steps in many computer vision applications. Classical methods for these tasks are based on a detect-then-describe paradigm where separate handcrafted methods are used to first identify repeatable keypoints and then represent them with a local descriptor. Neural networks trained with metric learning losses have recently caught u… ▽ More Interest point detection and local feature description are fundamental steps in many computer vision applications. Classical methods for these tasks are based on a detect-then-describe paradigm where separate handcrafted methods are used to first identify repeatable keypoints and then represent them with a local descriptor. Neural networks trained with metric learning losses have recently caught up with these techniques, focusing on learning repeatable saliency maps for keypoint detection and learning descriptors at the detected keypoint locations. In this work, we argue that salient regions are not necessarily discriminative, and therefore can harm the performance of the description. Furthermore, we claim that descriptors should be learned only in regions for which matching can be performed with high confidence. We thus propose to jointly learn keypoint detection and description together with a predictor of the local descriptor discriminativeness. This allows us to avoid ambiguous areas and leads to reliable keypoint detections and descriptions. Our detection-and-description approach, trained with self-supervision, can simultaneously output sparse, repeatable and reliable keypoints that outperforms state-of-the-art detectors and descriptors on the HPatches dataset. It also establishes a record on the recently released Aachen Day-Night localization dataset. △ Less

Submitted 17 June, 2019; v1 submitted 14 June, 2019; originally announced June 2019.

arXiv:1808.08138 [pdf]

SigniFYI-CDN: merged communicability and usability methods to evaluate notation-intensive interaction

Authors: Juliana Soares Jansen Ferreira, Clarisse Sieckenius de Souza, Rafael Rossi de Mello Brandão, Carla Faria Leitão

Abstract: We present SigniFYI-CDN, an inspection method built from previously proposed methods combining Semiotic Engineering and the Cognitive Dimensions of Notations. Compared to its predecessors, SigniFYI-CDN simplifies procedural steps and supports them with more analytic scaffolds. It is especially fit for the study of interaction with technologies where notations are created and used by various people… ▽ More We present SigniFYI-CDN, an inspection method built from previously proposed methods combining Semiotic Engineering and the Cognitive Dimensions of Notations. Compared to its predecessors, SigniFYI-CDN simplifies procedural steps and supports them with more analytic scaffolds. It is especially fit for the study of interaction with technologies where notations are created and used by various people, or by a single person in various, and potentially distant, occasions. In such cases, notations may serve several purposes, like (mutual) comprehension, recall, coordination, negotiation, and documentation. We illustrate SigniFYI-CDN with highlights from the evaluation of a computer tool that supports qualitative data analysis. Our contribution is a simpler tool for researchers and practitioners to probe the power of combined communicability and usability analysis of interaction with increasingly complex data-intensive applications. △ Less

Submitted 25 August, 2022; v1 submitted 24 August, 2018; originally announced August 2018.

arXiv:1808.05891 [pdf]

The Case for API Communicability Evaluation: Introducing API-SI with Examples from Keras

Authors: Luiz Marques Afonso, João Antonio Marcondes Dutra Bastos, Clarisse Sieckenius de Souza, Renato Fontoura de Gusmão Cerqueira

Abstract: In addition to their vital role in professional software development, Application Programming Interfaces (APIs) are now increasingly used by non-professional programmers, including end users, scientists and experts from other domains. Therefore, good APIs must meet old and new user requirements. Most of the re-search on API evaluation and design derives from user-centered, cognitive perspectives o… ▽ More In addition to their vital role in professional software development, Application Programming Interfaces (APIs) are now increasingly used by non-professional programmers, including end users, scientists and experts from other domains. Therefore, good APIs must meet old and new user requirements. Most of the re-search on API evaluation and design derives from user-centered, cognitive perspectives on human-computer interaction. As an alternative, we present a lower-threshold variant of a previously proposed semiotic API evaluation tool. We illustrate the procedures and power of this variant, called API Signification Inspection (API-SI), with Keras, a Deep Learning API. The illustration also shows how the method can complement and fertilize API usability studies. Additionally, API-SI is packaged as an introductory semiotic tool that API designers and researchers can use to evaluate the communication of design intent and product rationale to other programmers through implicit and explicit signs thereof, encountered in the API structure, behavior and documentation. △ Less

Submitted 17 August, 2018; originally announced August 2018.

arXiv:1806.09727 [pdf, other]

doi 10.14209/SBRT.2018.179

The Hamming and Golay Number-Theoretic Transforms

Authors: A. J. A. Paschoal, R. M. Campello de Souza, H. M. de Oliveira

Abstract: New number-theoretic transforms are derived from known linear block codes over finite fields. In particular, two new such transforms are built from perfect codes, namely the \textit {Hamming number-theoretic transform} and the \textit {Golay number-theoretic transform}. A few properties of these new transforms are presented. New number-theoretic transforms are derived from known linear block codes over finite fields. In particular, two new such transforms are built from perfect codes, namely the \textit {Hamming number-theoretic transform} and the \textit {Golay number-theoretic transform}. A few properties of these new transforms are presented. △ Less

Submitted 25 September, 2018; v1 submitted 25 June, 2018; originally announced June 2018.

Comments: 5 pages, 2 figures

Report number: XXXVI Simp\'osio Brasileiro de Telecomunica\c{c}\~oes SBrT 2018 MSC Class: 11Txx; 11Yxx; 11H71; 11D04; 15Bxx

arXiv:1702.01793 [pdf, ps, other]

Multiuser Communication Based on the DFT Eigenstructure

Authors: R. M. Campello de Souza, H. M. de Oliveira, R. J. Cintra

Abstract: The eigenstructure of the discrete Fourier transform (DFT) is examined and new systematic procedures to generate eigenvectors of the unitary DFT are proposed. DFT eigenvectors are suggested as user signatures for data communication over the real adder channel (RAC). The proposed multiuser communication system over the 2-user RAC is detailed. The eigenstructure of the discrete Fourier transform (DFT) is examined and new systematic procedures to generate eigenvectors of the unitary DFT are proposed. DFT eigenvectors are suggested as user signatures for data communication over the real adder channel (RAC). The proposed multiuser communication system over the 2-user RAC is detailed. △ Less

Submitted 6 February, 2017; originally announced February 2017.

Comments: 5 pages, 2 figures, 3 tables

MSC Class: 94A05; 94A11; 94A40

arXiv:1612.00881 [pdf, other]

Procedural Generation of Videos to Train Deep Action Recognition Networks

Authors: César Roberto de Souza, Adrien Gaidon, Yohann Cabon, Antonio Manuel López Peña

Abstract: Deep learning for human action recognition in videos is making significant progress, but is slowed down by its dependency on expensive manual labeling of large video collections. In this work, we investigate the generation of synthetic training data for action recognition, as it has recently shown promising results for a variety of other computer vision tasks. We propose an interpretable parametri… ▽ More Deep learning for human action recognition in videos is making significant progress, but is slowed down by its dependency on expensive manual labeling of large video collections. In this work, we investigate the generation of synthetic training data for action recognition, as it has recently shown promising results for a variety of other computer vision tasks. We propose an interpretable parametric generative model of human action videos that relies on procedural generation and other computer graphics techniques of modern game engines. We generate a diverse, realistic, and physically plausible dataset of human action videos, called PHAV for "Procedural Human Action Videos". It contains a total of 39,982 videos, with more than 1,000 examples for each action of 35 categories. Our approach is not limited to existing motion capture sequences, and we procedurally define 14 synthetic actions. We introduce a deep multi-task representation learning architecture to mix synthetic and real videos, even if the action categories differ. Our experiments on the UCF101 and HMDB51 benchmarks suggest that combining our large set of synthetic videos with small real-world datasets can boost recognition performance, significantly outperforming fine-tuning state-of-the-art unsupervised generative models of videos. △ Less

Submitted 19 July, 2017; v1 submitted 2 December, 2016; originally announced December 2016.

Comments: Accepted for publication at CVPR 2017. http://adas.cvc.uab.es/phav/

arXiv:1608.07138 [pdf, other]

Sympathy for the Details: Dense Trajectories and Hybrid Classification Architectures for Action Recognition

Authors: César Roberto de Souza, Adrien Gaidon, Eleonora Vig, Antonio Manuel López

Abstract: Action recognition in videos is a challenging task due to the complexity of the spatio-temporal patterns to model and the difficulty to acquire and learn on large quantities of video data. Deep learning, although a breakthrough for image classification and showing promise for videos, has still not clearly superseded action recognition methods using hand-crafted features, even when training on mass… ▽ More Action recognition in videos is a challenging task due to the complexity of the spatio-temporal patterns to model and the difficulty to acquire and learn on large quantities of video data. Deep learning, although a breakthrough for image classification and showing promise for videos, has still not clearly superseded action recognition methods using hand-crafted features, even when training on massive datasets. In this paper, we introduce hybrid video classification architectures based on carefully designed unsupervised representations of hand-crafted spatio-temporal features classified by supervised deep networks. As we show in our experiments on five popular benchmarks for action recognition, our hybrid model combines the best of both worlds: it is data efficient (trained on 150 to 10000 short clips) and yet improves significantly on the state of the art, including recent deep models trained on millions of manually labelled images and videos. △ Less

Submitted 25 August, 2016; originally announced August 2016.

Comments: Accepted for publication in the 14th European Conference on Computer Vision (ECCV), Amsterdam, 2016, plus supplementary material

arXiv:1506.03865 [pdf, ps, other]

Counterexample for the 2-approximation of finding partitions of rectilinear polygons with minimum stabbing number

Authors: Breno Piva, Cid C. de Souza

Abstract: This paper presents a counterexample for the approximation algorithm proposed by Durocher and Mehrabi [1] for the general problem of finding a rectangular partition of a rectilinear polygon with minimum stabbing number. This paper presents a counterexample for the approximation algorithm proposed by Durocher and Mehrabi [1] for the general problem of finding a rectangular partition of a rectilinear polygon with minimum stabbing number. △ Less

Submitted 11 June, 2015; originally announced June 2015.

arXiv:1505.04140 [pdf, other]

Efficient Multiplex for Band-Limited Channels: Galois-Field Division Multiple Access

Authors: H. M. de Oliveira, R. M. Campello de Souza, A. N. Kauffman

Abstract: A new Efficient-bandwidth code-division-multiple-access (CDMA) for band-limited channels is introduced which is based on finite field transforms. A multilevel code division multiplex exploits orthogonality properties of nonbinary sequences defined over a complex finite field. Galois-Fourier transforms contain some redundancy and just cyclotomic coefficients are needed to be transmitted yielding co… ▽ More A new Efficient-bandwidth code-division-multiple-access (CDMA) for band-limited channels is introduced which is based on finite field transforms. A multilevel code division multiplex exploits orthogonality properties of nonbinary sequences defined over a complex finite field. Galois-Fourier transforms contain some redundancy and just cyclotomic coefficients are needed to be transmitted yielding compact spectrum requirements. The primary advantage of such schemes regarding classical multiplex is their better spectral efficiency. This paper estimates the \textit{bandwidth compactness factor} relatively to Time Division Multiple Access TDMA showing that it strongly depends on the alphabet extension. These multiplex schemes termed Galois Division Multiplex (GDM) are based on transforms for which there exists fast algorithms. They are also convenient from the implementation viewpoint since they can be implemented by a Digital Signal Processor. △ Less

Submitted 15 May, 2015; originally announced May 2015.

Comments: 6 pages, 5 figures, in: Workshop on Coding and Cryptography, INRIA, 1999, Paris. pp.235-241. arXiv admin note: text overlap with arXiv:1502.05881

arXiv:1503.08109 [pdf]

Spread-Spectrum Based on Finite Field Fourier Transforms

Authors: H. M. de Oliveira, J. P. C. L. Miranda, R. M. Campello de Souza

Abstract: Spread-spectrum systems are presented, which are based on Finite Field Fourier Transforms. Orthogonal spreading sequences defined over a finite field are derived. New digital multiplex schemes based on such spread-spectrum systems are also introduced, which are multilevel Coding Division Multiplex. These schemes termed Galois-field Division Multiplex (GDM) offer compact bandwidth requirements beca… ▽ More Spread-spectrum systems are presented, which are based on Finite Field Fourier Transforms. Orthogonal spreading sequences defined over a finite field are derived. New digital multiplex schemes based on such spread-spectrum systems are also introduced, which are multilevel Coding Division Multiplex. These schemes termed Galois-field Division Multiplex (GDM) offer compact bandwidth requirements because only leaders of cyclotomic cosets are needed to be transmitted. △ Less

Submitted 12 February, 2015; originally announced March 2015.

Comments: 6 pages, 7 figures. Int. Conf. on System Engineering, Comm. and. Info. Technol., Punta Arenas, Chile, 2001

arXiv:1503.07551 [pdf]

A Low-throughput Wavelet-based Steganography Audio Scheme

Authors: P. Carrion, H. M. de Oliveira, R. M. Campello de Souza

Abstract: This paper presents the preliminary of a novel scheme of steganography, and introduces the idea of combining two secret keys in the operation. The first secret key encrypts the text using a standard cryptographic scheme (e.g. IDEA, SAFER+, etc.) prior to the wavelet audio decomposition. The way in which the cipher text is embedded in the file requires another key, namely a stego-key, which is asso… ▽ More This paper presents the preliminary of a novel scheme of steganography, and introduces the idea of combining two secret keys in the operation. The first secret key encrypts the text using a standard cryptographic scheme (e.g. IDEA, SAFER+, etc.) prior to the wavelet audio decomposition. The way in which the cipher text is embedded in the file requires another key, namely a stego-key, which is associated with features of the audio wavelet analysis. △ Less

Submitted 4 February, 2015; originally announced March 2015.

Comments: 2 pages, 1 figure, conference: 8th Brazilian Symposium on Information and Computer System Security, 2008, Gramado, RS, Brazil

arXiv:1503.03794 [pdf]

Radix-2 Fast Hartley Transform Revisited

Authors: H. M. de Oliveira, V. L. Sousa, H. A. N., R. M. Campello de Souza

Abstract: A Fast algorithm for the Discrete Hartley Transform (DHT) is presented, which resembles radix-2 fast Fourier Transform (FFT). Although fast DHTs are already known, this new approach bring some light about the deep relationship between fast DHT algorithms and a multiplication-free fast algorithm for the Hadamard Transform. A Fast algorithm for the Discrete Hartley Transform (DHT) is presented, which resembles radix-2 fast Fourier Transform (FFT). Although fast DHTs are already known, this new approach bring some light about the deep relationship between fast DHT algorithms and a multiplication-free fast algorithm for the Hadamard Transform. △ Less

Submitted 12 March, 2015; originally announced March 2015.

Comments: 5 pages, 4 figures: Anais do I Congresso de Informática da Amazônia, 2001. v.1.pp.285-292

arXiv:1503.03763 [pdf]

doi 10.1007/978-3-540-27824-5_65

The Discrete Cosine Transform over Prime Finite Fields

Authors: M. M. Campello de Souza, H. M. de Oliveira, R. M. Campello de Souza, M. M. Vasconcelos

Abstract: This paper examines finite field trigonometry as a tool to construct trigonometric digital transforms. In particular, by using properties of the k-cosine function over GF(p), the Finite Field Discrete Cosine Transform (FFDCT) is introduced. The FFDCT pair in GF(p) is defined, having blocklengths that are divisors of (p+1)/2. A special case is the Mersenne FFDCT, defined when p is a Mersenne prime.… ▽ More This paper examines finite field trigonometry as a tool to construct trigonometric digital transforms. In particular, by using properties of the k-cosine function over GF(p), the Finite Field Discrete Cosine Transform (FFDCT) is introduced. The FFDCT pair in GF(p) is defined, having blocklengths that are divisors of (p+1)/2. A special case is the Mersenne FFDCT, defined when p is a Mersenne prime. In this instance blocklengths that are powers of two are possible and radix-2 fast algorithms can be used to compute the transform. △ Less

Submitted 12 March, 2015; originally announced March 2015.

Comments: 5 pages, 1 table, Lecture Notes in Computer Science, LNCS 3124, Heidelberg: Springer Verlag, 2004, vol.1, pp.482-487, 2004

arXiv:1503.03293 [pdf]

Fourier Codes

Authors: R. M. Campello de Souza, E. S. V. Freire, H. M. de Oliveira

Abstract: A new family of error-correcting codes, called Fourier codes, is introduced. The code parity-check matrix, dimension and an upper bound on its minimum distance are obtained from the eigenstructure of the Fourier number theoretic transform. A decoding technique for such codes is proposed. A new family of error-correcting codes, called Fourier codes, is introduced. The code parity-check matrix, dimension and an upper bound on its minimum distance are obtained from the eigenstructure of the Fourier number theoretic transform. A decoding technique for such codes is proposed. △ Less

Submitted 11 March, 2015; originally announced March 2015.

Comments: 6 pages, 2 tables. In: 10th International Symposium on Communication Theory and Applications 2009, Ambleside, Lake District, UK

arXiv:1503.02577 [pdf, ps, other]

New Algorithms for Computing a Single Component of the Discrete Fourier Transform

Authors: G. Jerônimo da Silva Jr., R. M. Campello de Souza, H. M. de Oliveira

Abstract: This paper introduces the theory and hardware implementation of two new algorithms for computing a single component of the discrete Fourier transform. In terms of multiplicative complexity, both algorithms are more efficient, in general, than the well known Goertzel Algorithm. This paper introduces the theory and hardware implementation of two new algorithms for computing a single component of the discrete Fourier transform. In terms of multiplicative complexity, both algorithms are more efficient, in general, than the well known Goertzel Algorithm. △ Less

Submitted 9 March, 2015; originally announced March 2015.

Comments: 4 pages, 3 figures, 1 table. In: 10th International Symposium on Communication Theory and Applications, Ambleside, UK

arXiv:1503.02536 [pdf]

Genomic Imaging Based on Codongrams and a^2grams

Authors: E. A. Bouton, H. M. de Oliveira, R. M. Campello de Souza, N. S. Santos-Magalhaes

Abstract: This paper introduces new tools for genomic signal processing, which can assist for genomic attribute extracting or describing biologically meaningful features embedded in a DNA. The codongrams and a2grams are offered as an alternative to spectrograms and scalograms. Twenty different a^2grams are defined for a genome, one for each amino acid (valgram is an a^2gram for valine; alagram is an a^2gram… ▽ More This paper introduces new tools for genomic signal processing, which can assist for genomic attribute extracting or describing biologically meaningful features embedded in a DNA. The codongrams and a2grams are offered as an alternative to spectrograms and scalograms. Twenty different a^2grams are defined for a genome, one for each amino acid (valgram is an a^2gram for valine; alagram is an a^2gram for alanine and so on). They provide information about the distribution and occurrence of the investigated amino acid. In particular, the metgram can be used to find out potential start position of genes within a genome. This approach can help implementing a new diagnosis test for genetic diseases by providing a type of DNA-medical imaging. △ Less

Submitted 5 March, 2015; originally announced March 2015.

Comments: 7 pages, 3 figures

Journal ref: WSEAS Trans. on Biology and Biomedicine, vol.1, n.2, pp.255-260, April 2004

arXiv:1502.05881 [pdf]

Orthogonal Multilevel Spreading Sequence Design

Authors: H. M. de Oliveira, R. M. Campello de Souza

Abstract: Finite field transforms are offered as a new tool of spreading sequence design. This approach exploits orthogonality properties of synchronous non-binary sequences defined over a complex finite field. It is promising for channels supporting a high signal-to-noise ratio. New digital multiplex schemes based on such sequences have also been introduced, which are multilevel Code Division Multiplex. Th… ▽ More Finite field transforms are offered as a new tool of spreading sequence design. This approach exploits orthogonality properties of synchronous non-binary sequences defined over a complex finite field. It is promising for channels supporting a high signal-to-noise ratio. New digital multiplex schemes based on such sequences have also been introduced, which are multilevel Code Division Multiplex. These schemes termed Galois-field Division Multiplex (GDM) are based on transforms for which there exists fast algorithms. They are also convenient from the hardware viewpoint since they can be implemented by a Digital Signal Processor. A new Efficient-bandwidth code-division-multiple-access (CDMA) is introduced, which is based on multilevel spread spectrum sequences over a Galois field. The primary advantage of such schemes regarding classical multiple access digital schemes is their better spectral efficiency. Galois-Fourier transforms contain some redundancy and only cyclotomic coefficients are needed to be transmitted yielding compact spectrum requirements. △ Less

Submitted 20 February, 2015; originally announced February 2015.

Comments: 9 pages, 5 figures. In: Coding, Communication and Broadcasting.1 ed.Hertfordshire: Reseach Studies Press (RSP), 2000. ISBN 0-86380-259-1

arXiv:1502.05880 [pdf]

doi 10.1109/SPL.2010.5483017

A Flexible Implementation of a Matrix Laurent Series-Based 16-Point Fast Fourier and Hartley Transforms

Authors: R. C. de Oliveira, H. M. de Oliveira, R. M. Campello de Souza, E. J. P. Santos

Abstract: This paper describes a flexible architecture for implementing a new fast computation of the discrete Fourier and Hartley transforms, which is based on a matrix Laurent series. The device calculates the transforms based on a single bit selection operator. The hardware structure and synthesis are presented, which handled a 16-point fast transform in 65 nsec, with a Xilinx SPARTAN 3E device. This paper describes a flexible architecture for implementing a new fast computation of the discrete Fourier and Hartley transforms, which is based on a matrix Laurent series. The device calculates the transforms based on a single bit selection operator. The hardware structure and synthesis are presented, which handled a 16-point fast transform in 65 nsec, with a Xilinx SPARTAN 3E device. △ Less

Submitted 20 February, 2015; originally announced February 2015.

Comments: 4 pages, 4 figures. IEEE VI Southern Programmable Logic Conference 2010

arXiv:1502.04670 [pdf]

doi 10.14209/jcis.1999.7

The Hartley Transform in a Finite Field

Authors: R. M. Campello de Souza, H. M. de Oliveira, A. N. Kauffman

Abstract: The k-trigonometric functions over the Galois Field GF(q) are introduced and their main properties derived. This leads to the definition of the cask(.) function over GF(q), which in turn leads to a finite field Hartley Transform. The main properties of this new discrete transform are presented and areas for possible applications are mentioned. The k-trigonometric functions over the Galois Field GF(q) are introduced and their main properties derived. This leads to the definition of the cask(.) function over GF(q), which in turn leads to a finite field Hartley Transform. The main properties of this new discrete transform are presented and areas for possible applications are mentioned. △ Less

Submitted 16 February, 2015; originally announced February 2015.

Comments: 7 pages, IEEE/SBT International Telecommunication Symposium, ITS, 1998, Sao Paulo, Brazil

Journal ref: Journal of Communication and Information Systems, vol.14, N.1, 1999

arXiv:1502.03387 [pdf]

doi 10.5540/03.2015.003.01.0468

A Full Frequency Masking Vocoder for Legal Eavesdrop** Conversation Recording

Authors: R. F. B. Sotero Filho, H. M. de Oliveira, R. M. Campello de Souza

Abstract: This paper presents a new approach for a vocoder design based on full frequency masking by octaves in addition to a technique for spectral filling via beta probability distribution. Some psycho-acoustic characteristics of human hearing - inaudibility masking in frequency and phase - are used as a basis for the proposed algorithm. The results confirm that this technique may be useful to save bandwi… ▽ More This paper presents a new approach for a vocoder design based on full frequency masking by octaves in addition to a technique for spectral filling via beta probability distribution. Some psycho-acoustic characteristics of human hearing - inaudibility masking in frequency and phase - are used as a basis for the proposed algorithm. The results confirm that this technique may be useful to save bandwidth in applications requiring intelligibility. It is recommended for the legal eavesdrop** of long voice conversations. △ Less

Submitted 11 February, 2015; originally announced February 2015.

Comments: 7 pages, 3 figures, 3 tables, XXXV Cong. Nac. de Matematica Aplicada e Computacional, Natal, RN, Brazil 2014

arXiv:1502.02489 [pdf]

Fourier Codes and Hartley Codes

Authors: H. M. de Oliveira, C. M. F. Barros, R. M. Campello de Souza

Abstract: Real-valued block codes are introduced, which are derived from Discrete Fourier Transforms (DFT) and Discrete Hartley Transforms (DHT). These algebraic structures are built from the eigensequences of the transforms. Generator and parity check matrices were computed for codes up to block length N=24. They can be viewed as lattices codes so the main parameters (dimension, minimal norm, area of the V… ▽ More Real-valued block codes are introduced, which are derived from Discrete Fourier Transforms (DFT) and Discrete Hartley Transforms (DHT). These algebraic structures are built from the eigensequences of the transforms. Generator and parity check matrices were computed for codes up to block length N=24. They can be viewed as lattices codes so the main parameters (dimension, minimal norm, area of the Voronoi region, density, and centre density) are computed. Particularly, Hamming-Hartley and Golay-Hartley block codes are presented. These codes may possibly help an efficient computation of a DHT/DFT. △ Less

Submitted 9 February, 2015; originally announced February 2015.

Comments: 5 pages, 4 tables, 1 appedix. conference: XXV Simposio Brasileiro de Telecomunicacoes, SBrT'07, Recife, PE, Brazil, 2007

arXiv:1502.02168 [pdf, other]

Multilayer Hadamard Decomposition of Discrete Hartley Transforms

Authors: H. M. de Oliveira, R. J. Cintra, R. M. Campello de Souza

Abstract: Discrete transforms such as the discrete Fourier transform (DFT) or the discrete Hartley transform (DHT) furnish an indispensable tool in signal processing. The successful application of transform techniques relies on the existence of the so-called fast transforms. In this paper some fast algorithms are derived which meet the lower bound on the multiplicative complexity of the DFT/DHT. The approac… ▽ More Discrete transforms such as the discrete Fourier transform (DFT) or the discrete Hartley transform (DHT) furnish an indispensable tool in signal processing. The successful application of transform techniques relies on the existence of the so-called fast transforms. In this paper some fast algorithms are derived which meet the lower bound on the multiplicative complexity of the DFT/DHT. The approach is based on a decomposition of the DHT into layers of Walsh-Hadamard transforms. In particular, fast algorithms for short block lengths such as $N \in \{4, 8, 12, 24\}$ are presented. △ Less

Submitted 26 August, 2015; v1 submitted 7 February, 2015; originally announced February 2015.

Comments: Fixed several typos. 7 pages, 5 figures, XVIII Simpósio Brasileiro de Telecomunicações, 2000, Gramado, RS, Brazil

Showing 1–50 of 56 results for author: De Souza, C