Skip to main content

Showing 1–23 of 23 results for author: Alves, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.01485  [pdf, other

    cs.NI

    Experimental comparison of 5G SDR platforms: srsRAN x OpenAirInterface

    Authors: Ruan P. Alves, Joao Guilherme A. da S. Alves, Mikael R. Camelo, Wilker O. de Feitosa, Victor F. Monteiro, Fco. Rodrigo P. Cavalcanti

    Abstract: A Software-Defined Radio (SDR) platform is a communication system that implements as software functions that are typically implemented in dedicated hardware. One of its main advantages is the flexibility to test and deploy radio communication networks in a fast and cheap way. In the context of the Fifth Generation (5G) of wireless cellular networks, there are open source SDR platforms available on… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Published at XLI BRAZILIAN SYMPOSIUM ON TELECOMMUNICATIONS AND SIGNAL PROCESSING - SBrT 2023

  2. arXiv:2404.01630  [pdf, other

    cs.NI

    SMaRTT-REPS: Sender-based Marked Rapidly-adapting Trimmed & Timed Transport with Recycled Entropies

    Authors: Tommaso Bonato, Abdul Kabbani, Daniele De Sensi, Rong Pan, Yanfang Le, Costin Raiciu, Mark Handley, Timo Schneider, Nils Blach, Ahmad Ghalayini, Daniel Alves, Michael Papamichael, Adrian Caulfield, Torsten Hoefler

    Abstract: With the rapid growth of machine learning (ML) workloads in datacenters, existing congestion control (CC) algorithms fail to deliver the required performance at scale. ML traffic is bursty and bulk-synchronous and thus requires quick reaction and strong fairness. We show that existing CC algorithms that use delay as a main signal react too slowly and are not always fair. We design SMaRTT, a simple… ▽ More

    Submitted 27 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Fixed typo and wrong y axis of one plot

  3. arXiv:2402.17733  [pdf, other

    cs.CL

    Tower: An Open Multilingual Large Language Model for Translation-Related Tasks

    Authors: Duarte M. Alves, José Pombal, Nuno M. Guerreiro, Pedro H. Martins, João Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes, Sweta Agrawal, Pierre Colombo, José G. C. de Souza, André F. T. Martins

    Abstract: While general-purpose large language models (LLMs) demonstrate proficiency on multiple tasks within the domain of translation, approaches based on open LLMs are competitive only when specializing on a single task. In this paper, we propose a recipe for tailoring LLMs to multiple tasks present in translation workflows. We perform continued pretraining on a multilingual mixture of monolingual and pa… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  4. arXiv:2402.00786  [pdf, other

    cs.CL cs.LG

    CroissantLLM: A Truly Bilingual French-English Language Model

    Authors: Manuel Faysse, Patrick Fernandes, Nuno M. Guerreiro, António Loison, Duarte M. Alves, Caio Corro, Nicolas Boizard, João Alves, Ricardo Rei, Pedro H. Martins, Antoni Bigata Casademunt, François Yvon, André F. T. Martins, Gautier Viaud, Céline Hudelot, Pierre Colombo

    Abstract: We introduce CroissantLLM, a 1.3B language model pretrained on a set of 3T English and French tokens, to bring to the research and industrial community a high-performance, fully open-sourced bilingual model that runs swiftly on consumer-grade local hardware. To that end, we pioneer the approach of training an intrinsically bilingual model with a 1:1 English-to-French pretraining data ratio, a cust… ▽ More

    Submitted 29 March, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  5. arXiv:2312.14211  [pdf, ps, other

    cs.CL astro-ph.IM cs.AI

    Experimenting with Large Language Models and vector embeddings in NASA SciX

    Authors: Sergi Blanco-Cuaresma, Ioana Ciucă, Alberto Accomazzi, Michael J. Kurtz, Edwin A. Henneken, Kelly E. Lockhart, Felix Grezes, Thomas Allen, Golnaz Shapurian, Carolyn S. Grant, Donna M. Thompson, Timothy W. Hostetler, Matthew R. Templeton, Shinyi Chen, Jennifer Koch, Taylor Jacovich, Daniel Chivvis, Fernanda de Macedo Alves, Jean-Claude Paquin, Jennifer Bartlett, Mugdha Polimera, Stephanie Jarmak

    Abstract: Open-source Large Language Models enable projects such as NASA SciX (i.e., NASA ADS) to think out of the box and try alternative approaches for information retrieval and data augmentation, while respecting data copyright and users' privacy. However, when large language models are directly prompted with questions without any context, they are prone to hallucination. At NASA SciX we have developed a… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: To appear in the proceedings of the 33th annual international Astronomical Data Analysis Software & Systems (ADASS XXXIII)

  6. arXiv:2310.13448  [pdf, other

    cs.CL

    Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning

    Authors: Duarte M. Alves, Nuno M. Guerreiro, João Alves, José Pombal, Ricardo Rei, José G. C. de Souza, Pierre Colombo, André F. T. Martins

    Abstract: Large language models (LLMs) are a promising avenue for machine translation (MT). However, current LLM-based MT systems are brittle: their effectiveness highly depends on the choice of few-shot examples and they often require extra post-processing due to overgeneration. Alternatives such as finetuning on translation instructions are computationally expensive and may weaken in-context learning capa… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023 - Findings

  7. arXiv:2307.10018  [pdf, other

    cs.RO cs.AI

    RobôCIn Small Size League Extended Team Description Paper for RoboCup 2023

    Authors: Aline Lima de Oliveira, Cauê Addae da Silva Gomes, Cecília Virginia Santos da Silva, Charles Matheus de Sousa Alves, Danilo Andrade Martins de Souza, Driele Pires Ferreira Araújo Xavier, Edgleyson Pereira da Silva, Felipe Bezerra Martins, Lucas Henrique Cavalcanti Santos, Lucas Dias Maciel, Matheus Paixão Gumercindo dos Santos, Matheus Lafayette Vasconcelos, Matheus Vinícius Teotonio do Nascimento Andrade, João Guilherme Oliveira Carvalho de Melo, João Pedro Souza Pereira de Moura, José Ronald da Silva, José Victor Silva Cruz, Pedro Henrique Santana de Morais, Pedro Paulo Salman de Oliveira, Riei Joaquim Matos Rodrigues, Roberto Costa Fernandes, Ryan Vinicius Santos Morais, Tamara Mayara Ramos Teobaldo, Washington Igor dos Santos Silva, Edna Natividade Silva Barros

    Abstract: RobôCIn has participated in RoboCup Small Size League since 2019, won its first world title in 2022 (Division B), and is currently a three-times Latin-American champion. This paper presents our improvements to defend the Small Size League (SSL) division B title in RoboCup 2023 in Bordeaux, France. This paper aims to share some of the academic research that our team developed over the past year. Ou… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

  8. arXiv:2303.16104  [pdf, other

    cs.CL

    Hallucinations in Large Multilingual Translation Models

    Authors: Nuno M. Guerreiro, Duarte Alves, Jonas Waldendorf, Barry Haddow, Alexandra Birch, Pierre Colombo, André F. T. Martins

    Abstract: Large-scale multilingual machine translation systems have demonstrated remarkable ability to translate directly between numerous languages, making them increasingly appealing for real-world applications. However, when deployed in the wild, these models may generate hallucinated translations which have the potential to severely undermine user trust and raise safety concerns. Existing research on ha… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  9. arXiv:2302.14688  [pdf, other

    cs.AI

    OEKG: The Open Event Knowledge Graph

    Authors: Simon Gottschalk, Endri Kacupaj, Sara Abdollahi, Diego Alves, Gabriel Amaral, Elisavet Koutsiana, Tin Kuculo, Daniela Major, Caio Mello, Gullal S. Cheema, Abdul Sittar, Swati, Golsa Tahmasebzadeh, Gaurish Thakkar

    Abstract: Accessing and understanding contemporary and historical events of global impact such as the US elections and the Olympic Games is a major prerequisite for cross-lingual event analytics that investigate event causes, perception and consequences across country borders. In this paper, we present the Open Event Knowledge Graph (OEKG), a multilingual, event-centric, temporal knowledge graph composed of… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: The definitive version of this work was published in the Proceedings of the 2nd International Workshop on Cross-lingual Event-centric Open Analytics co-located with the 30th The Web Conference (WWW 2021)

  10. arXiv:2212.07429  [pdf, other

    cs.CL

    Building Multilingual Corpora for a Complex Named Entity Recognition and Classification Hierarchy using Wikipedia and DBpedia

    Authors: Diego Alves, Gaurish Thakkar, Gabriel Amaral, Tin Kuculo, Marko Tadić

    Abstract: With the ever-growing popularity of the field of NLP, the demand for datasets in low resourced-languages follows suit. Following a previously established framework, in this paper, we present the UNER dataset, a multilingual and hierarchical parallel corpus annotated for named-entities. We describe in detail the developed procedure necessary to create this type of dataset in any language available… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2212.07162

  11. arXiv:2212.07172  [pdf, ps, other

    cs.CL

    Quotations, Coreference Resolution, and Sentiment Annotations in Croatian News Articles: An Exploratory Study

    Authors: Jelena Sarajlić, Gaurish Thakkar, Diego Alves, Nives Mikelic Preradović

    Abstract: This paper presents a corpus annotated for the task of direct-speech extraction in Croatian. The paper focuses on the annotation of the quotation, co-reference resolution, and sentiment annotation in SETimes news corpus in Croatian and on the analysis of its language-specific differences compared to English. From this, a list of the phenomena that require special attention when performing these an… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

  12. arXiv:2212.07162  [pdf, other

    cs.CL

    Building and Evaluating Universal Named-Entity Recognition English corpus

    Authors: Diego Alves, Gaurish Thakkar, Marko Tadić

    Abstract: This article presents the application of the Universal Named Entity framework to generate automatically annotated corpora. By using a workflow that extracts Wikipedia data and meta-data and DBpedia information, we generated an English dataset which is described and evaluated. Furthermore, we conducted a set of experiments to improve the annotations in terms of precision, recall, and F1-measure. Th… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

  13. arXiv:2211.13358  [pdf, other

    cs.LG

    Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation

    Authors: Sérgio Jesus, José Pombal, Duarte Alves, André Cruz, Pedro Saleiro, Rita P. Ribeiro, João Gama, Pedro Bizarro

    Abstract: Evaluating new techniques on realistic datasets plays a crucial role in the development of ML research and its broader adoption by practitioners. In recent years, there has been a significant increase of publicly available unstructured data resources for computer vision and NLP tasks. However, tabular data -- which is prevalent in many high-stakes domains -- has been lagging behind. To bridge this… ▽ More

    Submitted 28 November, 2022; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted at NeurIPS 2022. https://openreview.net/forum?id=UrAYT2QwOX8

  14. arXiv:2209.06243  [pdf, other

    cs.CL cs.LG

    CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task

    Authors: Ricardo Rei, Marcos Treviso, Nuno M. Guerreiro, Chrysoula Zerva, Ana C. Farinha, Christine Maroti, José G. C. de Souza, Taisiya Glushkova, Duarte M. Alves, Alon Lavie, Luisa Coheur, André F. T. Martins

    Abstract: We present the joint contribution of IST and Unbabel to the WMT 2022 Shared Task on Quality Estimation (QE). Our team participated on all three subtasks: (i) Sentence and Word-level Quality Prediction; (ii) Explainable QE; and (iii) Critical Error Detection. For all tasks we build on top of the COMET framework, connecting it with the predictor-estimator architecture of OpenKiwi, and equip** it w… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: WMT 2022 Quality Estimation shared task

  15. arXiv:2206.14571  [pdf

    cs.RO cs.AI cs.CY

    Should Social Robots in Retail Manipulate Customers?

    Authors: Oliver Bendel, Liliana Margarida Dos Santos Alves

    Abstract: Against the backdrop of structural changes in the retail trade, social robots have found their way into retail stores and shop** malls in order to attract, welcome, and greet customers; to inform them, advise them, and persuade them to make a purchase. Salespeople often have a broad knowledge of their product and rely on offering competent and honest advice, whether it be on shoes, clothing, or… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Accepted paper of the AAAI 2022 Spring Symposium "How Fair is Fair? Achieving Wellbeing AI" (Stanford University)

    MSC Class: K.4

  16. arXiv:2206.02278  [pdf, other

    eess.IV cs.CV stat.AP stat.ME

    Autoregressive Model for Multi-Pass SAR Change Detection Based on Image Stacks

    Authors: B. G. Palm, D. I. Alves, V. T. Vu, M. I. Pettersson, F. M. Bayer, R. J. Cintra, R. Machado, P. Dammert, H. Hellsten

    Abstract: Change detection is an important synthetic aperture radar (SAR) application, usually used to detect changes on the ground scene measurements in different moments in time. Traditionally, change detection algorithm (CDA) is mainly designed for two synthetic aperture radar (SAR) images retrieved at different instants. However, more images can be used to improve the algorithms performance, witch emerg… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

    Comments: 9 pages, 10 figures

    Journal ref: Proceedings Volume 10789, Image and Signal Processing for Remote Sensing XXIV; 1078916 (2018)

  17. arXiv:2010.12433  [pdf

    cs.CL

    Natural Language Processing Chains Inside a Cross-lingual Event-Centric Knowledge Pipeline for European Union Under-resourced Languages

    Authors: Diego Alves, Gaurish Thakkar, Marko Tadić

    Abstract: This article presents the strategy for develo** a platform containing Language Processing Chains for European Union languages, consisting of Tokenization to Parsing, also including Named Entity recognition andwith addition ofSentiment Analysis. These chains are part of the first step of an event-centric knowledge processing pipeline whose aim is to process multilingual media information about ma… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

  18. arXiv:2010.12428  [pdf

    cs.CL

    Evaluating Language Tools for Fifteen EU-official Under-resourced Languages

    Authors: Diego Alves, Gaurish Thakkar, Marko Tadić

    Abstract: This article presents the results of the evaluation campaign of language tools available for fifteen EU-official under-resourced languages. The evaluation was conducted within the MSC ITN CLEOPATRA action that aims at building the cross-lingual event-centric knowledge processing on top of the application of linguistic processing chains (LPCs) for at least 24 EU-official languages. In this campaign… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

  19. arXiv:2010.12406  [pdf, other

    cs.CL

    UNER: Universal Named-Entity RecognitionFramework

    Authors: Diego Alves, Tin Kuculo, Gabriel Amaral, Gaurish Thakkar, Marko Tadic

    Abstract: We introduce the Universal Named-Entity Recognition (UNER)framework, a 4-level classification hierarchy, and the methodology that isbeing adopted to create the first multilingual UNER corpus: the SETimesparallel corpus annotated for named-entities. First, the English SETimescorpus will be annotated using existing tools and knowledge bases. Afterevaluating the resulting annotations through crowdsou… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

  20. Teaching Programming to Novices: A Large-scale Analysis of App Inventor Projects

    Authors: Nathalia da Cruz Alves, Christiane Gresse von Wangenheim, Jean Carlo Rossa Hauck

    Abstract: Teaching programming to K-12 students has become essential. In this context, App Inventor is a popular block-based programming environment used by a wide audience, from K-12 to higher education, including end-users to create mobile applications to support their primary job or hobbies. Although learning programming with App Inventor has been investigated, a question that remains is which programmin… ▽ More

    Submitted 24 April, 2021; v1 submitted 19 June, 2020; originally announced June 2020.

    Comments: 10 pages, 11 figures

    ACM Class: K.3.1

    Journal ref: 2020 XV Conferencia Latinoamericana de Tecnologias de Aprendizaje (LACLO), 2020, pp. 1-10

  21. A continuous integration and web framework in support of the ATLAS Publication Process

    Authors: Juan Pedro Araque Espinosa, Gabriel Baldi Levcovitz, Riccardo-Maria Bianchi, Ian Brock, Tancredi Carli, Nuno Filipe Castro, Alessandra Ciocio, Maurizio Colautti, Ana Carolina Da Silva Menezes, Gabriel De Oliveira da Fonseca, Leandro Domingues Macedo Alves, Andreas Hoecker, Bruno Lange Ramos, Gabriela Lemos Lúcidi Pinhão, Carmen Maidantchik, Fairouz Malek, Robert McPherson, Gianluca Picco, Marcelo Teixeira Dos Santos

    Abstract: The ATLAS collaboration defines methods, establishes procedures, and organises advisory groups to manage the publication processes of scientific papers, conference papers, and public notes. All stages are managed through web systems, computing programs, and tools that are designed and developed by the collaboration. A framework called FENCE is integrated into the CERN GitLab software repository, t… ▽ More

    Submitted 28 January, 2021; v1 submitted 14 May, 2020; originally announced May 2020.

    Comments: 22 pages in total,11 figures, submitted to JINST. All figures including auxiliary figures are available at https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/GENR-2018-01/

    Report number: CERN-OPEN-2020-007

  22. arXiv:1712.01694  [pdf, ps, other

    cs.CV cs.AI cs.GR cs.NE eess.IV

    Fuzzy-Based Dialectical Non-Supervised Image Classification and Clustering

    Authors: Wellington Pinheiro dos Santos, Francisco Marcos de Assis, Ricardo Emmanuel de Souza, Priscilla B. Mendes, Henrique S. S. Monteiro, Havana Diogo Alves

    Abstract: The materialist dialectical method is a philosophical investigative method to analyze aspects of reality. These aspects are viewed as complex processes composed by basic units named poles, which interact with each other. Dialectics has experienced considerable progress in the 19th century, with Hegel's dialectics and, in the 20th century, with the works of Marx, Engels, and Gramsci, in Philosophy… ▽ More

    Submitted 3 December, 2017; originally announced December 2017.

    Journal ref: International Journal of Hybrid Intelligent Systems, v. 7, p. 115-124, 2010

  23. arXiv:1509.02490  [pdf, ps, other

    cs.LO cs.SE

    Fault Localization in Multi-Threaded C Programs using Bounded Model Checking (extended version)

    Authors: Erickson H. da S. Alves, Lucas C. Cordeiro, Eddie B. de Lima Filho

    Abstract: Software debugging is a very time-consuming process, which is even worse for multi-threaded programs, due to the non-deterministic behavior of thread-scheduling algorithms. However, the debugging time may be greatly reduced, if automatic methods are used for localizing faults. In this study, a new method for fault localization, in multi-threaded C programs, is proposed. It transforms a multi-threa… ▽ More

    Submitted 8 September, 2015; originally announced September 2015.

    Comments: extended version of paper published at SBESC'15